[ceph-users] Fwd: ceph-objectstore-tool remove-clone-metadata. How to use?
Hi community, 10 months ago, we discovered issue, after removing cache tier from cluster with cluster HEALTH, and start email thread, as result - new bug was created on tracker by Samuel Just http://tracker.ceph.com/issues/12738 Till that time, i'm looking for good moment to upgrade (after fix was backported to 0.94.7). And yesterday i did upgrade on my production cluster. >From 28 scrub errors, only 5 remains, so i need to fix them by ceph-objectstore-tool remove-clone-metadata subcommand. I try to did it, but without real results... Can you please give me advice, what i'm doing wrong? My flow was the next: 1. Identify problem PGs... - ceph health detail | grep inco | grep -v HEALTH | cut -d " " -f 2 2. Start repair for them, to collect info about errors into logs - ceph pg repair After this for example, i received next records into logs 2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF] 2.c4 repair starts 2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/14d 2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/138 2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir 1 missing clone(s) 2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR] 2.c4 repair 2 errors, 0 fixed So, i try to fix it with next command: stop ceph-osd id=56 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307 remove-clone-metadata 138 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307 remove-clone-metadata 14d start ceph-osd id=56 Strange fact, that after I did this commands - i don;t receive message like (according to sources... ) cout << "Removal of clone " << cloneid << " complete" << std::endl; cout << "Use pg repair after OSD restarted to correct stat information" << std::endl; I received silent (no output after command, and command take about 30-35 min to execute... ) Sure, i start pg repair again after this actions... But result - same, errors still exists... So, possible i misunderstand input format for ceph-objectstore-tool... Please help with this.. :) Thanks you in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-objectstore-tool remove-clone-metadata. How to use?
Hi community, 10 months ago, we discovered issue, after removing cache tier from cluster with cluster HEALTH, and start email thread, as result - new bug was created on tracker by Samuel Just http://tracker.ceph.com/issues/12738 Till that time, i'm looking for good moment to upgrade (after fix was backported to 0.94.7). And yesterday i did upgrade on my production cluster. >From 28 scrub errors, only 5 remains, so i need to fix them by ceph-objectstore-tool remove-clone-metadata subcommand. I try to did it, but without real results... Can you please give me advice, what i'm doing wrong? My flow was the next: 1. Identify problem PGs... - ceph health detail | grep inco | grep -v HEALTH | cut -d " " -f 2 2. Start repair for them, to collect info about errors into logs - ceph pg repair After this for example, i received next records into logs 2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF] 2.c4 repair starts 2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/14d 2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/138 2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir 1 missing clone(s) 2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR] 2.c4 repair 2 errors, 0 fixed So, i try to fix it with next command: stop ceph-osd id=56 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307 remove-clone-metadata 138 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307 remove-clone-metadata 14d start ceph-osd id=56 Strange fact, that after I did this commands - i don;t receive message like (according to sources... ) cout << "Removal of clone " << cloneid << " complete" << std::endl; cout << "Use pg repair after OSD restarted to correct stat information" << std::endl; I received silent (no output after command, and command take about 30-35 min to execute... ) Sure, i start pg repair again after this actions... But result - same, errors still exists... So, possible i misunderstand input format for ceph-objectstore-tool... Please help with this.. :) Thanks you in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Wido, also minor issue with 0,2.0 java-rados We still catch: -storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876 2015-11-02 11:41:14,958 WARN [cloud.agent.Agent] (agentRequest-Handler-4:null) Caught: java.lang.NegativeArraySizeException at com.ceph.rbd.RbdImage.snapList(Unknown Source) at com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854) at com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175) at com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206) at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124) at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385) at com.cloud.agent.Agent.processRequest(Agent.java:503) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808) at com.cloud.utils.nio.Task.run(Task.java:84) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Even with updated lib: root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls /usr/share/cloudstack-agent/lib | grep rados rados-0.2.0.jar 2015-11-03 11:01 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>: > Wido, it's the main issue. No records at all... > > > So, from last time: > > > 2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk > '{print $2}' > 2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-2:null) Execution is successful. > 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-4:null) Processing command: > com.cloud.agent.api.GetVmStatsCommand > 2015-11-02 11:40:35,867 INFO [cloud.agent.AgentShell] (main:null) Agent > started > 2015-11-02 11:40:35,868 INFO [cloud.agent.AgentShell] (main:null) > Implementation Version is 4.5.1 > > So, almost alsways it's exception after RbdUnprotect then in approx . 20 > minutes - crash.. > Almost all the time - it's happen after GetVmStatsCommand or Disks > stats... Possible that evil hiden into UpadteDiskInfo method... but i can;t > find any bad code there ((( > > 2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com>: > >> >> >> On 03-11-15 01:54, Voloshanenko Igor wrote: >> > Thank you, Jason! >> > >> > Any advice, for troubleshooting >> > >> > I'm looking in code, and right now don;t see any bad things :( >> > >> >> Can you run the CloudStack Agent in DEBUG mode and then see after which >> lines in the logs it crashes? >> >> Wido >> >> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com >> > <mailto:dilla...@redhat.com>>: >> > >> > Most likely not going to be related to 13045 since you aren't >> > actively exporting an image diff. The most likely problem is that >> > the RADOS IO context is being closed prior to closing the RBD image. >> > >> > -- >> > >> > Jason Dillaman >> > >> > >> > - Original Message - >> > >> > > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com >> > <mailto:igor.voloshane...@gmail.com>> >> > > To: "Ceph Users" <ceph-users@lists.ceph.com >> > <mailto:ceph-users@lists.ceph.com>> >> > > Sent: Thursday, October 29, 2015 5:27:17 PM >> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with >> > exception in >> > > librbd >> > >> > > From all we analyzed - look like - it's this issue >> > > http://tracker.ceph.com/issues/13045 >> > >> > > PR: https://github.com/ceph/ceph/pull/6097 >> > >> > > Can anyone help us to confirm this? :) >> > >> > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor < >> > igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com> > >> > > : >> > >> > > > Additional trace: >> > > >> > >> > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at >> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >> > > >
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Yes, we recompiled ACS too Also we delete all snapshots... but we can do it for a while... New snapshot created each days.. And the main issue - agent crash, not exception itself... Each RBD operations which cause exception in 20-30 minutes cause agent crash... 2015-11-03 11:09 GMT+02:00 Wido den Hollander <w...@42on.com>: > > > On 03-11-15 10:04, Voloshanenko Igor wrote: > > Wido, also minor issue with 0,2.0 java-rados > > > > Did you also re-compile CloudStack against the new rados-java? I still > think it's related to when the Agent starts cleaning up and there are > snapshots which need to be unprotected. > > In the meantime you might want to remove any existing RBD snapshots > using the RBD commands from Ceph, that might solve the problem. > > Wido > > > We still catch: > > > > -storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876 > > 2015-11-02 11:41:14,958 WARN [cloud.agent.Agent] > > (agentRequest-Handler-4:null) Caught: > > java.lang.NegativeArraySizeException > > at com.ceph.rbd.RbdImage.snapList(Unknown Source) > > at > > > com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854) > > at > > > com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175) > > at > > > com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206) > > at > > > com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124) > > at > > > com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57) > > at > > > com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385) > > at com.cloud.agent.Agent.processRequest(Agent.java:503) > > at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808) > > at com.cloud.utils.nio.Task.run(Task.java:84) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > Even with updated lib: > > > > root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls > > /usr/share/cloudstack-agent/lib | grep rados > > rados-0.2.0.jar > > > > 2015-11-03 11:01 GMT+02:00 Voloshanenko Igor > > <igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com>>: > > > > Wido, it's the main issue. No records at all... > > > > > > So, from last time: > > > > > > 2015-11-02 11:40:33,204 DEBUG > > [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep > > Mem:|awk '{print $2}' > > 2015-11-02 11:40:33,207 DEBUG > > [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-2:null) Execution is successful. > > 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent] > > (agentRequest-Handler-4:null) Processing command: > > com.cloud.agent.api.GetVmStatsCommand > > 2015-11-02 11:40:35,867 INFO [cloud.agent.AgentShell] (main:null) > > Agent started > > 2015-11-02 11:40:35,868 INFO [cloud.agent.AgentShell] (main:null) > > Implementation Version is 4.5.1 > > > > So, almost alsways it's exception after RbdUnprotect then in approx > > . 20 minutes - crash.. > > Almost all the time - it's happen after GetVmStatsCommand or Disks > > stats... Possible that evil hiden into UpadteDiskInfo method... but > > i can;t find any bad code there ((( > > > > 2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com > > <mailto:w...@42on.com>>: > > > > > > > > On 03-11-15 01:54, Voloshanenko Igor wrote: > > > Thank you, Jason! > > > > > > Any advice, for troubleshooting > > > > > > I'm looking in code, and right now don;t see any bad things :( > > > > > > > Can you run the CloudStack Agent in DEBUG mode and then see > > after which > > lines in the logs it crashes? > > > > Wido > > > > > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com > <mailto:dilla...@redhat.com> > > > <mailto:dilla...@redhat.com <mailto:dilla...@redhat.com>&g
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Wido, it's the main issue. No records at all... So, from last time: 2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk '{print $2}' 2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent] (agentRequest-Handler-4:null) Processing command: com.cloud.agent.api.GetVmStatsCommand 2015-11-02 11:40:35,867 INFO [cloud.agent.AgentShell] (main:null) Agent started 2015-11-02 11:40:35,868 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.5.1 So, almost alsways it's exception after RbdUnprotect then in approx . 20 minutes - crash.. Almost all the time - it's happen after GetVmStatsCommand or Disks stats... Possible that evil hiden into UpadteDiskInfo method... but i can;t find any bad code there ((( 2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com>: > > > On 03-11-15 01:54, Voloshanenko Igor wrote: > > Thank you, Jason! > > > > Any advice, for troubleshooting > > > > I'm looking in code, and right now don;t see any bad things :( > > > > Can you run the CloudStack Agent in DEBUG mode and then see after which > lines in the logs it crashes? > > Wido > > > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com > > <mailto:dilla...@redhat.com>>: > > > > Most likely not going to be related to 13045 since you aren't > > actively exporting an image diff. The most likely problem is that > > the RADOS IO context is being closed prior to closing the RBD image. > > > > -- > > > > Jason Dillaman > > > > > > - Original Message - > > > > > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com > > <mailto:igor.voloshane...@gmail.com>> > > > To: "Ceph Users" <ceph-users@lists.ceph.com > > <mailto:ceph-users@lists.ceph.com>> > > > Sent: Thursday, October 29, 2015 5:27:17 PM > > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with > > exception in > > > librbd > > > > > From all we analyzed - look like - it's this issue > > > http://tracker.ceph.com/issues/13045 > > > > > PR: https://github.com/ceph/ceph/pull/6097 > > > > > Can anyone help us to confirm this? :) > > > > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor < > > igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com> > > > > : > > > > > > Additional trace: > > > > > > > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at > > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > > > > > > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 > > > > > > > #2 0x7f30f87b36b5 in > > __gnu_cxx::__verbose_terminate_handler() () from > > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > > > #3 0x7f30f87b1836 in ?? () from > > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > > > #4 0x7f30f87b1863 in std::terminate() () from > > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > > > #5 0x7f30f87b1aa2 in __cxa_throw () from > > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail > > > > (assertion=assertion@entry=0x7f2fdddeca05 "sub < > m_subsys.size()", > > > > > > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", > > line=line@entry=62, > > > > > > > func=func@entry=0x7f2fdddedba0 > > > > > > > <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool > > > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at > > > > common/assert.cc:77 > > > > > > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather > > > > (level=, sub=, this= out>) > > > > > > > at ./log/SubsystemMap.h:62 > > > > > > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather > > > > (this=, sub=, level= out>) > > > > > > > at ./log/SubsystemMap.h:61 > > > > > > &
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Dear all, can anybody help? 2015-10-30 10:37 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>: > It's pain, but not... :( > We already used your updated lib in dev env... :( > > 2015-10-30 10:06 GMT+02:00 Wido den Hollander <w...@42on.com>: > >> >> >> On 29-10-15 16:38, Voloshanenko Igor wrote: >> > Hi Wido and all community. >> > >> > We catched very idiotic issue on our Cloudstack installation, which >> > related to ceph and possible to java-rados lib. >> > >> >> I think you ran into this one: >> https://issues.apache.org/jira/browse/CLOUDSTACK-8879 >> >> Cleaning up RBD snapshots for volumes didn't go well and caused the JVM >> to crash. >> >> Wido >> >> > So, we have constantly agent crashed (which cause very big problem for >> > us... ). >> > >> > When agent crashed - it's crash JVM. And no event in logs at all. >> > We enabled crush dump, and after crash we see next picture: >> > >> > #grep -A1 "Problematic frame" < /hs_err_pid30260.log >> > Problematic frame: >> > C [librbd.so.1.0.0+0x5d681] >> > >> > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core >> > (gdb) bt >> > ... >> > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather >> > (level=, sub=, this=) >> > at ./log/SubsystemMap.h:62 >> > #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather >> > (this=, sub=, level=) >> > at ./log/SubsystemMap.h:61 >> > #9 0x7f30b9d879be in ObjectCacher::flusher_entry >> > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 >> > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry >> > (this=) at osdc/ObjectCacher.h:374 >> > >> > From ceph code, this part executed when flushing cache object... And we >> > don;t understand why. Becasue we have absolutely different race >> > condition to reproduce it. >> > >> > As cloudstack have not good implementation yet of snapshot lifecycle, >> > sometime, it's happen, that some volumes already marked as EXPUNGED in >> > DB and then cloudstack try to delete bas Volume, before it's try to >> > unprotect it. >> > >> > Sure, unprotecting fail, normal exception returned back (fail because >> > snap has childs... ) >> > >> > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] >> > (Thread-1304:null) Executing: >> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh >> > -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m >> > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 >> > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] >> > (Thread-1304:null) Execution is successful. >> > 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] >> > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of >> > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the >> image >> > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at >> > cephmon.anolim.net:6789 <http://cephmon.anolim.net:6789> >> > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> > (agentRequest-Handler-5:null) Unprotecting snapshot >> > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap >> > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] >> > (agentRequest-Handler-5:null) Failed to delete volume: >> > com.cloud.utils.exception.CloudRuntimeException: >> > com.ceph.rbd.RbdException: Failed to unprotect snapshot >> cloudstack-base-snap >> > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] >> > (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: >> > 161344838950, via: 4, Ver: v1, Flags: 10, >> > >> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: >> > com.ceph.rbd.RbdException: Failed to unprotect snapshot >> > cloudstack-base-snap","wait":0}}] } >> > 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent] >> > (agentRequest-Handler-2:null) Processing command: >> > com.cloud.agent.api.GetHostStatsCommand >> > 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource] >> > (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Thank you, Jason! Any advice, for troubleshooting I'm looking in code, and right now don;t see any bad things :( 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com>: > Most likely not going to be related to 13045 since you aren't actively > exporting an image diff. The most likely problem is that the RADOS IO > context is being closed prior to closing the RBD image. > > -- > > Jason Dillaman > > > - Original Message - > > > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com> > > To: "Ceph Users" <ceph-users@lists.ceph.com> > > Sent: Thursday, October 29, 2015 5:27:17 PM > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with exception in > > librbd > > > From all we analyzed - look like - it's this issue > > http://tracker.ceph.com/issues/13045 > > > PR: https://github.com/ceph/ceph/pull/6097 > > > Can anyone help us to confirm this? :) > > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor < > igor.voloshane...@gmail.com > > > : > > > > Additional trace: > > > > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > > > > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 > > > > > #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () > from > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > #3 0x7f30f87b1836 in ?? () from > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > #4 0x7f30f87b1863 in std::terminate() () from > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > #5 0x7f30f87b1aa2 in __cxa_throw () from > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail > > > (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()", > > > > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry > =62, > > > > > func=func@entry=0x7f2fdddedba0 > > > <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> > "bool > > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at > > > common/assert.cc:77 > > > > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather > > > (level=, sub=, this=) > > > > > at ./log/SubsystemMap.h:62 > > > > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather > > > (this=, sub=, level=) > > > > > at ./log/SubsystemMap.h:61 > > > > > #9 0x7f2fddd879be in ObjectCacher::flusher_entry > (this=0x7f2ff80b27a0) > > > at > > > osdc/ObjectCacher.cc:1527 > > > > > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry > > > (this= > > out>) at osdc/ObjectCacher.h:374 > > > > > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at > > > pthread_create.c:312 > > > > > #12 0x7f30f995547d in clone () at > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > > > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor < > igor.voloshane...@gmail.com > > > > > > > : > > > > > > > Hi Wido and all community. > > > > > > > > > > We catched very idiotic issue on our Cloudstack installation, which > > > > related > > > > to ceph and possible to java-rados lib. > > > > > > > > > > So, we have constantly agent crashed (which cause very big problem > for > > > > us... > > > > ). > > > > > > > > > > When agent crashed - it's crash JVM. And no event in logs at all. > > > > > > > > > We enabled crush dump, and after crash we see next picture: > > > > > > > > > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log > > > > > > > > > Problematic frame: > > > > > > > > > C [librbd.so.1.0.0+0x5d681] > > > > > > > > > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core > > > > > > > > > (gdb) bt > > > > > > > > > ... > > > > > > > > > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather > > > > (level=, sub=, this=) > > > > > > > > > at ./log/SubsystemMap.h:62 > > > > > > > > > #8 0x7f30b9
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
It's pain, but not... :( We already used your updated lib in dev env... :( 2015-10-30 10:06 GMT+02:00 Wido den Hollander <w...@42on.com>: > > > On 29-10-15 16:38, Voloshanenko Igor wrote: > > Hi Wido and all community. > > > > We catched very idiotic issue on our Cloudstack installation, which > > related to ceph and possible to java-rados lib. > > > > I think you ran into this one: > https://issues.apache.org/jira/browse/CLOUDSTACK-8879 > > Cleaning up RBD snapshots for volumes didn't go well and caused the JVM > to crash. > > Wido > > > So, we have constantly agent crashed (which cause very big problem for > > us... ). > > > > When agent crashed - it's crash JVM. And no event in logs at all. > > We enabled crush dump, and after crash we see next picture: > > > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log > > Problematic frame: > > C [librbd.so.1.0.0+0x5d681] > > > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core > > (gdb) bt > > ... > > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather > > (level=, sub=, this=) > > at ./log/SubsystemMap.h:62 > > #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather > > (this=, sub=, level=) > > at ./log/SubsystemMap.h:61 > > #9 0x7f30b9d879be in ObjectCacher::flusher_entry > > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 > > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry > > (this=) at osdc/ObjectCacher.h:374 > > > > From ceph code, this part executed when flushing cache object... And we > > don;t understand why. Becasue we have absolutely different race > > condition to reproduce it. > > > > As cloudstack have not good implementation yet of snapshot lifecycle, > > sometime, it's happen, that some volumes already marked as EXPUNGED in > > DB and then cloudstack try to delete bas Volume, before it's try to > > unprotect it. > > > > Sure, unprotecting fail, normal exception returned back (fail because > > snap has childs... ) > > > > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] > > (Thread-1304:null) Executing: > > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh > > -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m > > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 > > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] > > (Thread-1304:null) Execution is successful. > > 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] > > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of > > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the > image > > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] > > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at > > cephmon.anolim.net:6789 <http://cephmon.anolim.net:6789> > > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] > > (agentRequest-Handler-5:null) Unprotecting snapshot > > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap > > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] > > (agentRequest-Handler-5:null) Failed to delete volume: > > com.cloud.utils.exception.CloudRuntimeException: > > com.ceph.rbd.RbdException: Failed to unprotect snapshot > cloudstack-base-snap > > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] > > (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: > > 161344838950, via: 4, Ver: v1, Flags: 10, > > > [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: > > com.ceph.rbd.RbdException: Failed to unprotect snapshot > > cloudstack-base-snap","wait":0}}] } > > 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent] > > (agentRequest-Handler-2:null) Processing command: > > com.cloud.agent.api.GetHostStatsCommand > > 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n > > 1| awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo > $idle > > 2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-2:null) Execution is successful. > > 2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource] > > (agentRequest-Handler-2:null) Executing: /bin/bash -c > > freeMem=$(free|grep cache:|awk '{print $4}');echo $freeMem > > 2015-10
[ceph-users] Cloudstack agent crashed JVM with exception in librbd
Hi Wido and all community. We catched very idiotic issue on our Cloudstack installation, which related to ceph and possible to java-rados lib. So, we have constantly agent crashed (which cause very big problem for us... ). When agent crashed - it's crash JVM. And no event in logs at all. We enabled crush dump, and after crash we see next picture: #grep -A1 "Problematic frame" < /hs_err_pid30260.log Problematic frame: C [librbd.so.1.0.0+0x5d681] # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core (gdb) bt ... #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather (level=, sub=, this=) at ./log/SubsystemMap.h:62 #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather (this=, sub=, level=) at ./log/SubsystemMap.h:61 #9 0x7f30b9d879be in ObjectCacher::flusher_entry (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry (this=) at osdc/ObjectCacher.h:374 >From ceph code, this part executed when flushing cache object... And we don;t understand why. Becasue we have absolutely different race condition to reproduce it. As cloudstack have not good implementation yet of snapshot lifecycle, sometime, it's happen, that some volumes already marked as EXPUNGED in DB and then cloudstack try to delete bas Volume, before it's try to unprotect it. Sure, unprotecting fail, normal exception returned back (fail because snap has childs... ) 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] (Thread-1304:null) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] (Thread-1304:null) Execution is successful. 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at cephmon.anolim.net:6789 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Unprotecting snapshot cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-5:null) Failed to delete volume: com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: Failed to unprotect snapshot cloudstack-base-snap 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: 161344838950, via: 4, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: Failed to unprotect snapshot cloudstack-base-snap","wait":0}}] } 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Processing command: com.cloud.agent.api.GetHostStatsCommand 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1| awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo $idle 2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. 2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /bin/bash -c freeMem=$(free|grep cache:|awk '{print $4}');echo $freeMem 2015-10-29 09:02:26,254 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. BUT, after 20 minutes - agent crashed... If we remove all childs and create conditions for cloudstack to delete volume - all OK - no agent crash in 20 minutes... We can't connect this action - Volume delete with agent crashe... Also we don't understand why +- 20 minutes need to last, and only then agent crashed... >From logs, before crash - only GetVMStats... And then - agent started... 2015-10-29 09:21:55,143 DEBUG [cloud.agent.Agent] (UgentTask-5:null) Sending ping: Seq 4-1343: { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingCommand":{"newStates":{},"_hostVmStateReport":{"i-881-1117-VM":{"state":"PowerOn","host":" cs2.anolim.net"},"i-7-106-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-1683-1984-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-11-504-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-325-616-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-10-52-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-941-1237-VM":{"state":"PowerOn","host":"cs2.anolim.net"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":4,"wait":0}}] } 2015-10-29 09:21:55,149 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) Received
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
>From all we analyzed - look like - it's this issue http://tracker.ceph.com/issues/13045 PR: https://github.com/ceph/ceph/pull/6097 Can anyone help us to confirm this? :) 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>: > Additional trace: > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 > #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #3 0x7f30f87b1836 in ?? () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #4 0x7f30f87b1863 in std::terminate() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #5 0x7f30f87b1aa2 in __cxa_throw () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail > (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()", > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry > =62, > func=func@entry=0x7f2fdddedba0 > <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at > common/assert.cc:77 > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather > (level=, sub=, this=) > at ./log/SubsystemMap.h:62 > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather > (this=, sub=, level=) > at ./log/SubsystemMap.h:61 > #9 0x7f2fddd879be in ObjectCacher::flusher_entry > (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527 > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry > (this=) at osdc/ObjectCacher.h:374 > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at > pthread_create.c:312 > #12 0x7f30f995547d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com> > : > >> Hi Wido and all community. >> >> We catched very idiotic issue on our Cloudstack installation, which >> related to ceph and possible to java-rados lib. >> >> So, we have constantly agent crashed (which cause very big problem for >> us... ). >> >> When agent crashed - it's crash JVM. And no event in logs at all. >> We enabled crush dump, and after crash we see next picture: >> >> #grep -A1 "Problematic frame" < /hs_err_pid30260.log >> Problematic frame: >> C [librbd.so.1.0.0+0x5d681] >> >> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core >> (gdb) bt >> ... >> #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather >> (level=, sub=, this=) >> at ./log/SubsystemMap.h:62 >> #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather >> (this=, sub=, level=) >> at ./log/SubsystemMap.h:61 >> #9 0x7f30b9d879be in ObjectCacher::flusher_entry >> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 >> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry >> (this=) at osdc/ObjectCacher.h:374 >> >> From ceph code, this part executed when flushing cache object... And we >> don;t understand why. Becasue we have absolutely different race condition >> to reproduce it. >> >> As cloudstack have not good implementation yet of snapshot lifecycle, >> sometime, it's happen, that some volumes already marked as EXPUNGED in DB >> and then cloudstack try to delete bas Volume, before it's try to unprotect >> it. >> >> Sure, unprotecting fail, normal exception returned back (fail because >> snap has childs... ) >> >> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] >> (Thread-1304:null) Executing: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i >> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m >> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 >> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] >> (Thread-1304:null) Execution is successful. >> 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of >> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image >> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at >> cephmon.anolim.net:6789 >> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Unprotecting snapshot >> cloudstack/71b1e2e9-1985-
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Additional trace: #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7f30f87b1836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x7f30f87b1863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x7f30f87b1aa2 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x7f2fddb50778 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()", file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry =62, func=func@entry=0x7f2fdddedba0 <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool ceph::log::SubsystemMap::should_gather(unsigned int, int)") at common/assert.cc:77 #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather (level=, sub=, this=) at ./log/SubsystemMap.h:62 #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather (this=, sub=, level=) at ./log/SubsystemMap.h:61 #9 0x7f2fddd879be in ObjectCacher::flusher_entry (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527 #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry (this=) at osdc/ObjectCacher.h:374 #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at pthread_create.c:312 #12 0x7f30f995547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>: > Hi Wido and all community. > > We catched very idiotic issue on our Cloudstack installation, which > related to ceph and possible to java-rados lib. > > So, we have constantly agent crashed (which cause very big problem for > us... ). > > When agent crashed - it's crash JVM. And no event in logs at all. > We enabled crush dump, and after crash we see next picture: > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log > Problematic frame: > C [librbd.so.1.0.0+0x5d681] > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core > (gdb) bt > ... > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather > (level=, sub=, this=) > at ./log/SubsystemMap.h:62 > #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather > (this=, sub=, level=) > at ./log/SubsystemMap.h:61 > #9 0x7f30b9d879be in ObjectCacher::flusher_entry > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry > (this=) at osdc/ObjectCacher.h:374 > > From ceph code, this part executed when flushing cache object... And we > don;t understand why. Becasue we have absolutely different race condition > to reproduce it. > > As cloudstack have not good implementation yet of snapshot lifecycle, > sometime, it's happen, that some volumes already marked as EXPUNGED in DB > and then cloudstack try to delete bas Volume, before it's try to unprotect > it. > > Sure, unprotecting fail, normal exception returned back (fail because snap > has childs... ) > > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] > (Thread-1304:null) Executing: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i > 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] > (Thread-1304:null) Execution is successful. > 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at > cephmon.anolim.net:6789 > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Unprotecting snapshot > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] > (agentRequest-Handler-5:null) Failed to delete volume: > com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: > Failed to unprotect snapshot cloudstack-base-snap > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: > 161344838950, via: 4, Ver: v1, Flags: 10, > [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: > com.ceph.rbd.RbdException: Failed to unprotect snapshot > cloudst
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Great! Yes, behaviour exact as i described. So looks like it's root cause ) Thank you, Sam. Ilya! 2015-08-21 21:08 GMT+03:00 Samuel Just sj...@redhat.com: I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote: Odd, did you happen to capture osd logs? No, but the reproducer is trivial to cut paste. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
To be honest, Samsung 850 PRO not 24/7 series... it's something about desktop+ series, but anyway - results from this drives - very very bad in any scenario acceptable by real life... Possible 845 PRO more better, but we don't want to experiment anymore... So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and no so durable for writes, but we think more better to replace 1 ssd per 1 year than to pay double price now. 2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: And should I mention that in another CEPH installation we had samsung 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from the system, so not wear out... Never again we buy Samsung :) On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote: First read please: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... Also if checking with FIO you will get better latencies on intel s3500 (model tested in our case) along with 20X better IOPS results... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ? On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic andrija.pa...@gmail.com wrote: Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew...@efigence.com mailto:mariusz.gronczew...@efigence.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Exact as in our case. Ilya, same for images from our side. Headers opened from hot tier пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал: On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com javascript:; wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? I think I reproduced this on today's master. Setup, cache mode is writeback: $ ./ceph osd pool create foo 12 12 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo' created $ ./ceph osd pool create foo-hot 12 12 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo-hot' created $ ./ceph osd tier add foo foo-hot *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** pool 'foo-hot' is now (or already was) a tier of 'foo' $ ./ceph osd tier cache-mode foo-hot writeback *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** set cache-mode for pool 'foo-hot' to writeback $ ./ceph osd tier set-overlay foo foo-hot *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** overlay for 'foo' is now (or already was) 'foo-hot' Create an image: $ ./rbd create --size 10M --image-format 2 foo/bar $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt $ sudo mkfs.ext4 /mnt/bar $ sudo umount /mnt Create a snapshot, take md5sum: $ ./rbd snap create foo/bar@snap $ ./rbd export foo/bar /tmp/foo-1 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-1 Exporting image: 100% complete...done. $ md5sum /tmp/foo-1 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1 $ md5sum /tmp/snap-1 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1 Set the cache mode to forward and do a flush, hashes don't match - the snap is empty - we bang on the hot tier and don't get redirected to the cold tier, I suspect: $ ./ceph osd tier cache-mode foo-hot forward *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** set cache-mode for pool 'foo-hot' to forward $ ./rados -p foo-hot cache-flush-evict-all rbd_data.100a6b8b4567.0002 rbd_id.bar rbd_directory rbd_header.100a6b8b4567 bar.rbd rbd_data.100a6b8b4567.0001 rbd_data.100a6b8b4567. $ ./rados -p foo-hot cache-flush-evict-all $ ./rbd export foo/bar /tmp/foo-2 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-2 Exporting image: 100% complete...done. $ md5sum /tmp/foo-2 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2 $ md5sum /tmp/snap-2 f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2 $ od /tmp/snap-2 000 00 00 00 00 00 00 00 00 * 5000 Disable the cache tier and we are back to normal: $ ./ceph osd tier remove-overlay foo *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** there is now (or already was) no overlay for 'foo' $ ./rbd export foo/bar /tmp/foo-3 Exporting image: 100% complete...done. $ ./rbd export foo/bar@snap /tmp/snap-3 Exporting image: 100% complete...done. $ md5sum /tmp/foo-3 83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3 $ md5sum /tmp/snap-3 83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3 I first reproduced it with the kernel client, rbd export was just to take it out of the equation. Also, Igor sort of raised a question in his second message: if, after setting the cache mode to forward and doing a flush, I open an image (not a snapshot, so may not be related to the above) for write (e.g. with rbd-fuse), I get an rbd header object in the hot pool, even though it's in forward mode: $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt $ sudo mount /mnt/bar /media $ sudo umount /media $ sudo umount /mnt $ ./rados -p foo-hot ls rbd_header.100a6b8b4567 $ ./rados -p foo ls | grep rbd_header rbd_header.100a6b8b4567 It's been a while since I looked into tiering, is that how it's supposed to work? It looks like it happens because rbd_header op replies don't redirect? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1
Re: [ceph-users] Repair inconsistent pgs..
Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf Mail-list: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html 2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com: Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
Re: [ceph-users] Repair inconsistent pgs..
Image? One? We start deleting images only to fix thsi (export/import)m before - 1-4 times per day (when VM destroyed)... 2015-08-21 1:44 GMT+03:00 Samuel Just sj...@redhat.com: Interesting. How often do you delete an image? I'm wondering if whatever this is happened when you deleted these two images. -Sam On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd images block names started with this... Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 2015-08-21 1:36 GMT+03:00 Samuel Just sj...@redhat.com: Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but other images (that's why the scrub errors went down briefly, those objects -- which were fine -- went away). You might want to export and reimport those two images into new images, but leave the old ones alone until you can clean up the on disk state (image and snapshots) and clear the scrub errors. You probably don't want to read the snapshots for those images either. Everything else is, I think, harmless. The ceph-objectstore-tool feature would probably not be too hard, actually. Each head/snapdir image has two attrs (possibly stored in leveldb -- that's why you want to modify the ceph-objectstore-tool and use its interfaces rather than mucking about with the files directly) '_' and 'snapset' which contain encoded representations of object_info_t and SnapSet (both can be found in src/osd/osd_types.h). SnapSet has a set of clones and related metadata -- you want to read the SnapSet attr off disk and commit a transaction writing out a new version with that clone removed. I'd start by cloning the repo, starting a vstart cluster locally, and reproducing the issue. Next, get familiar with using ceph-objectstore-tool on the osds in that vstart cluster. A good first change would be creating a ceph-objectstore-tool op that lets you dump json for the object_info_t and SnapSet (both types have format() methods which make that easy) on an object to stdout so you can confirm what's actually there. oftc #ceph-devel or the ceph-devel mailing list would be the right place to ask questions. Otherwise, it'll probably get done in the next few weeks. -Sam On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: thank you Sam! I also noticed this linked errors during scrub... Now all lools like reasonable! So we will wait for bug to be closed. do you need any help on it? I mean i can help with coding/testing/etc... 2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com: Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing snap 141. If you look at the objects after that in the log: 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster [ERR] repair 2.490 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster [ERR] repair 2.490 ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 The clone from the second line matches the head object from the previous line, and they have the same clone id. I *think* that the first error is real, and the subsequent ones are just scrub being dumb. Same deal with pg 2.c4. I just opened http://tracker.ceph.com/issues/12738. The original problem is that 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both missing a clone. Not sure how that happened, my money is on a cache/tiering evict racing with a snap trim. If you have any logging or relevant information from when that happened, you should open a bug. The 'snapdir' in the two object names indicates that the head object has actually been deleted (which makes sense if you moved the image to a new image and deleted the old one) and is only being kept around since there are live snapshots. I suggest you leave the snapshots for those images alone for the time being -- removing them might cause the osd to crash trying to clean up the wierd on disk state. Other than the leaked space from those two image snapshots and the annoying spurious scrub errors, I think no actual
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
We switch to forward mode as step to switch cache layer off. Right now we have samsung 850 pro in cache layer (10 ssd, 2 per nodes) and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel S3500 240G which we choose as replacement.. So with such good disks - cache layer - very big bottleneck for us... 2015-08-21 2:02 GMT+03:00 Samuel Just sj...@redhat.com: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com : root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
As i we use journal collocation for journal now (because we want to utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal size on ceph.conf). I don;t prefer manual work)) So create very simple script to update journal size 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Will do, Sam! thank in advance for you help! 2015-08-21 2:28 GMT+03:00 Samuel Just sj...@redhat.com: Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com : Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed
Re: [ceph-users] Repair inconsistent pgs..
Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com osdmap Description: Binary data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
I already kill cache layer, but will try to reproduce on lab 2015-08-21 1:58 GMT+03:00 Samuel Just sj...@redhat.com: Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
I mean in forward mode - it;s permanent problem - snapshots not working. And for writeback mode after we change max_bytes/object values, it;s around 30 by 70... 70% of time it;s works... 30% - not. Looks like for old images - snapshots works fine (images which already exists before we change values). For any new images - no 2015-08-21 2:21 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds. I mean filestore max sync interval = 30 filestore min sync interval = 29 when said flush time 2015-08-21 2:16 GMT+03:00 Samuel Just sj...@redhat.com: Also, what do you mean by change journal side? -Sam On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just sj...@redhat.com wrote: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Yes, will do. What we see. When cache tier in forward mod, if i did rbd snap create - it's use rbd_header not from cold tier, but from hot-tier, butm this 2 headers not synced And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i kill lock, evict header - all start to work.. But it's unacceptable for production... To kill lock during running VM ((( 2015-08-21 1:51 GMT+03:00 Samuel Just sj...@redhat.com: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Attachment blocked, so post as text... root@zzz:~# cat update_osd.sh #!/bin/bash ID=$1 echo Process OSD# ${ID} DEV=`mount | grep ceph-${ID} | cut -d -f 1` echo OSD# ${ID} hosted on ${DEV::-1} TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d -f 6` if [ ${TYPE_RAW} == Solid ] then TYPE=ssd elif [ ${TYPE_RAW} == 7200 ] then TYPE=platter fi echo OSD Type = ${TYPE} HOST=`hostname` echo Current node hostname: ${HOST} echo Set noout option for CEPH cluster ceph osd set noout echo Marked OSD # ${ID} out [19/1857] ceph osd out ${ID} echo Remove OSD # ${ID} from CRUSHMAP ceph osd crush remove osd.${ID} echo Delete auth for OSD# ${ID} ceph auth del osd.${ID} echo Stop OSD# ${ID} stop ceph-osd id=${ID} echo Remove OSD # ${ID} from cluster ceph osd rm ${ID} echo Unmount OSD# ${ID} umount ${DEV} echo ZAP ${DEV::-1} ceph-disk zap ${DEV::-1} echo Create new OSD with ${DEV::-1} ceph-disk-prepare ${DEV::-1} echo Activate new OSD ceph-disk-activate ${DEV} echo Dump current CRUSHMAP ceph osd getcrushmap -o cm.old echo Decompile CRUSHMAP crushtool -d cm.old -o cm echo Place new OSD in proper place sed -i s/device${ID}/osd.${ID}/ cm LINE=`cat -n cm | sed -n /${HOST}-${TYPE} {/,/}/p | tail -n 1 | awk '{print $1}'` sed -i ${LINE}iitem osd.${ID} weight 1.000 cm echo Modify ${HOST} weight into CRUSHMAP sed -i s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight 1.000/ cm echo Compile new CRUSHMAP crushtool -c cm -o cm.new echo Inject new CRUSHMAP ceph osd setcrushmap -i cm.new #echo Clean... #rm -rf cm cm.new echo Unset noout option for CEPH cluster ceph osd unset noout echo OSD recreated... Waiting for rebalancing... 2015-08-21 2:37 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: As i we use journal collocation for journal now (because we want to utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal size on ceph.conf). I don;t prefer manual work)) So create very simple script to update journal size 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com javascript:; wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started... And issue with snapshots arrived 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com javascript:; wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com javascript:;: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com javascript:;: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com javascript:; wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com javascript:; wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode either in the client or on the osd. Why did you have it in that mode? -Sam On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist. 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub
Re: [ceph-users] Repair inconsistent pgs..
thank you Sam! I also noticed this linked errors during scrub... Now all lools like reasonable! So we will wait for bug to be closed. do you need any help on it? I mean i can help with coding/testing/etc... 2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com: Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing snap 141. If you look at the objects after that in the log: 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster [ERR] repair 2.490 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster [ERR] repair 2.490 ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 The clone from the second line matches the head object from the previous line, and they have the same clone id. I *think* that the first error is real, and the subsequent ones are just scrub being dumb. Same deal with pg 2.c4. I just opened http://tracker.ceph.com/issues/12738. The original problem is that 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both missing a clone. Not sure how that happened, my money is on a cache/tiering evict racing with a snap trim. If you have any logging or relevant information from when that happened, you should open a bug. The 'snapdir' in the two object names indicates that the head object has actually been deleted (which makes sense if you moved the image to a new image and deleted the old one) and is only being kept around since there are live snapshots. I suggest you leave the snapshots for those images alone for the time being -- removing them might cause the osd to crash trying to clean up the wierd on disk state. Other than the leaked space from those two image snapshots and the annoying spurious scrub errors, I think no actual corruption is going on though. I created a tracker ticket for a feature that would let ceph-objectstore-tool remove the spurious clone from the head/snapdir metadata. Am I right that you haven't actually seen any osd crashes or user visible corruption (except possibly on snapshots of those two images)? -Sam On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf Mail-list: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html 2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com: Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj
Re: [ceph-users] Repair inconsistent pgs..
Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490
Re: [ceph-users] Repair inconsistent pgs..
Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141
Re: [ceph-users] Repair inconsistent pgs..
Sam, i try to understand which rbd contain this chunks.. but no luck. No rbd images block names started with this... Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 2015-08-21 1:36 GMT+03:00 Samuel Just sj...@redhat.com: Actually, now that I think about it, you probably didn't remove the images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but other images (that's why the scrub errors went down briefly, those objects -- which were fine -- went away). You might want to export and reimport those two images into new images, but leave the old ones alone until you can clean up the on disk state (image and snapshots) and clear the scrub errors. You probably don't want to read the snapshots for those images either. Everything else is, I think, harmless. The ceph-objectstore-tool feature would probably not be too hard, actually. Each head/snapdir image has two attrs (possibly stored in leveldb -- that's why you want to modify the ceph-objectstore-tool and use its interfaces rather than mucking about with the files directly) '_' and 'snapset' which contain encoded representations of object_info_t and SnapSet (both can be found in src/osd/osd_types.h). SnapSet has a set of clones and related metadata -- you want to read the SnapSet attr off disk and commit a transaction writing out a new version with that clone removed. I'd start by cloning the repo, starting a vstart cluster locally, and reproducing the issue. Next, get familiar with using ceph-objectstore-tool on the osds in that vstart cluster. A good first change would be creating a ceph-objectstore-tool op that lets you dump json for the object_info_t and SnapSet (both types have format() methods which make that easy) on an object to stdout so you can confirm what's actually there. oftc #ceph-devel or the ceph-devel mailing list would be the right place to ask questions. Otherwise, it'll probably get done in the next few weeks. -Sam On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: thank you Sam! I also noticed this linked errors during scrub... Now all lools like reasonable! So we will wait for bug to be closed. do you need any help on it? I mean i can help with coding/testing/etc... 2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com: Ah, this is kind of silly. I think you don't have 37 errors, but 2 errors. pg 2.490 object 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing snap 141. If you look at the objects after that in the log: 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster [ERR] repair 2.490 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster [ERR] repair 2.490 ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2 The clone from the second line matches the head object from the previous line, and they have the same clone id. I *think* that the first error is real, and the subsequent ones are just scrub being dumb. Same deal with pg 2.c4. I just opened http://tracker.ceph.com/issues/12738. The original problem is that 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both missing a clone. Not sure how that happened, my money is on a cache/tiering evict racing with a snap trim. If you have any logging or relevant information from when that happened, you should open a bug. The 'snapdir' in the two object names indicates that the head object has actually been deleted (which makes sense if you moved the image to a new image and deleted the old one) and is only being kept around since there are live snapshots. I suggest you leave the snapshots for those images alone for the time being -- removing them might cause the osd to crash trying to clean up the wierd on disk state. Other than the leaked space from those two image snapshots and the annoying spurious scrub errors, I think no actual corruption is going on though. I created a tracker ticket for a feature that would let ceph-objectstore-tool remove the spurious clone from the head/snapdir metadata. Am I right that you haven't actually seen any osd crashes or user visible corruption (except possibly on snapshots of those two images)? -Sam On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding
Re: [ceph-users] Repair inconsistent pgs..
No. This will no help ((( I try to found data, but it's look exist with same time stamp on all osd or missing on all osd ... So, need advice , what I need to do... вторник, 18 августа 2015 г. пользователь Abhishek L написал: Voloshanenko Igor writes: Hi Irek, Please read careful ))) You proposal was the first, i try to do... That's why i asked about help... ( 2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com javascript:;: Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com javascript:;: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! I've had an inconsistent pg once, but it was a different sort of an error (some sort of digest mismatch, where the secondary object copies had later timestamps). This was fixed by moving the object away and restarting, the osd which got fixed when the osd peered, similar to what was mentioned in Sebastian Han's blog[1]. I'm guessing the same method will solve this error as well, but not completely sure, maybe someone else who has seen this particular error could guide you better. [1]: http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/ -- Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: Repair inconsistent pgs..
-- Пересылаемое сообщение - От: *Voloshanenko Igor* igor.voloshane...@gmail.com Дата: вторник, 18 августа 2015 г. Тема: Repair inconsistent pgs.. Кому: Irek Fasikhov malm...@gmail.com Some additional inforamtion (Tnx Irek for questions!) Pool values: root@test:~# ceph osd pool get cold-storage size size: 3 root@test:~# ceph osd pool get cold-storage min_size min_size: 2 Broken pgs dump PG_1 # { state: active+clean+inconsistent, snap_trimq: [], epoch: 17541, up: [ 56, 10, 42 ], acting: [ 56, 10, 42 ], actingbackfill: [ 10, 42, 56 ], info: { pgid: 2.c4, last_update: 17541'29153, last_complete: 17541'29153, log_tail: 16746'26095, last_user_version: 401173, last_backfill: MAX, purged_snaps: [1~1,6~1,8~3,11~2,17~2,1f~2,25~1,28~1,2c~5,32~4,37~1,39~7,41~5,47~16,5e~19,cb~1,ce~2,d4~7,dc~1,de~1,e6~4,102~1,105~6,10d~1,119~1,150~1,15d~2,160~3,16d~1,16f~5,178~1,184~2,194~1,1a2~1,1a5~1,1ac~2,1c7~1,1cb~2,1ce~1], history: { epoch_created: 98, last_epoch_started: 17531, last_epoch_clean: 17541, last_epoch_split: 0, same_up_since: 17139, same_interval_since: 17530, same_primary_since: 17530, last_scrub: 17541'29114, last_scrub_stamp: 2015-08-18 07:37:04.567973, last_deep_scrub: 17541'29114, last_deep_scrub_stamp: 2015-08-18 07:37:04.567973, last_clean_scrub_stamp: 2015-08-05 17:23:45.251731 }, stats: { version: 17541'29153, reported_seq: 21552, reported_epoch: 17541, state: active+clean+inconsistent, last_fresh: 2015-08-18 07:48:37.667036, last_change: 2015-08-18 07:37:04.568541, last_active: 2015-08-18 07:48:37.667036, last_peered: 2015-08-18 07:48:37.667036, last_clean: 2015-08-18 07:48:37.667036, last_became_active: 0.00, last_became_peered: 0.00, last_unstale: 2015-08-18 07:48:37.667036, last_undegraded: 2015-08-18 07:48:37.667036, last_fullsized: 2015-08-18 07:48:37.667036, mapping_epoch: 17140, log_start: 16746'26095, ondisk_log_start: 16746'26095, created: 98, last_epoch_clean: 17541, parent: 0.0, parent_split_bits: 0, last_scrub: 17541'29114, last_scrub_stamp: 2015-08-18 07:37:04.567973, last_deep_scrub: 17541'29114, last_deep_scrub_stamp: 2015-08-18 07:37:04.567973, last_clean_scrub_stamp: 2015-08-05 17:23:45.251731, log_size: 3058, ondisk_log_size: 3058, stats_invalid: 0, stat_sum: { num_bytes: 2236608990, num_objects: 307, num_object_clones: 7, num_object_copies: 921, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_misplaced: 0, num_objects_unfound: 0, num_objects_dirty: 307, num_whiteouts: 0, num_read: 15694, num_read_kb: 401354, num_write: 55720, num_write_kb: 2539827, num_scrub_errors: 1, num_shallow_scrub_errors: 1, num_deep_scrub_errors: 0, num_objects_recovered: 1842, num_bytes_recovered: 13419653940, num_keys_recovered: 36, num_objects_omap: 1, num_objects_hit_set_archive: 0, num_bytes_hit_set_archive: 0 }, up: [ 56, 10, 42 ], acting: [ 56, 10, 42 ], blocked_by: [], up_primary: 56, acting_primary: 56 }, empty: 0, dne: 0, incomplete: 0, last_epoch_started: 17531, hit_set_history: { current_last_update: 0'0, current_last_stamp: 0.00, current_info: { begin: 0.00, end: 0.00, version: 0'0 }, history: [] } }, peer_info: [ { peer: 10, pgid: 2.c4, last_update: 17541'29153, last_complete: 17541'29153, log_tail: 16746'25703, last_user_version: 400914, last_backfill: MAX, purged_snaps: [1~1,6~1,8~3,11~2,17~2,1f~2,25~1,28~1,2c~5,32~4,37~1,39~7,41~5,47~16,5e~19,cb~1,ce~2,d4~7,dc~1,de~1,e6~4,102~1,105~6,10d~1,119~1,150~1,15d~2,160~3,16d~1,16f
Re: [ceph-users] Repair inconsistent pgs..
No. This will no help ((( I try to found data, but it's look exist with same time stamp on all osd or missing on all osd ... So, need advice , what I need to do... вторник, 18 августа 2015 г. пользователь Abhishek L написал: Voloshanenko Igor writes: Hi Irek, Please read careful ))) You proposal was the first, i try to do... That's why i asked about help... ( 2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com javascript:;: Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com javascript:;: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! I've had an inconsistent pg once, but it was a different sort of an error (some sort of digest mismatch, where the secondary object copies had later timestamps). This was fixed by moving the object away and restarting, the osd which got fixed when the osd peered, similar to what was mentioned in Sebastian Han's blog[1]. I'm guessing the same method will solve this error as well, but not completely sure, maybe someone else who has seen this particular error could guide you better. [1]: http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/ -- Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Hi Irek, Please read careful ))) You proposal was the first, i try to do... That's why i asked about help... ( 2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com: Hi, Igor. You need to repair the PG. for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair' | awk {'print$1'}`;do ceph pg repair $i;done С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Repair inconsistent pgs..
Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone 90c59490/rbd_data.eb486436f2beb.7a65/141//2 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone f5759490/rbd_data.1631755377d7e.04da/141//2 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone fee49490/rbd_data.12483d3ba0794b.522f/141//2 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone a9b39490/rbd_data.12483d3ba0794b.37b3/141//2 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone bac19490/rbd_data.1238e82ae8944a.032e/141//2 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone 98519490/rbd_data.123e9c2ae8944a.0807/141//2 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone 28809490/rbd_data.edea7460fe42b.01d9/141//2 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors So, how i can solve expected clone situation by hand? Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Broken snapshots... CEPH 0.94.2
Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH cache layer. Very slow
72 osd, 60 hdd, 12 ssd Primary workload - rbd, kvm пятница, 14 августа 2015 г. пользователь Ben Hines написал: Nice to hear that you have no SSD failures yet in 10months. How many OSDs are you running, and what is your primary ceph workload? (RBD, rgw, etc?) -Ben On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович me...@yuterra.ru javascript:; wrote: Hi! Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph journals and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap. SSD pool are not yet in production, journаlling SSDs works under production load for 10 months. They're in good condition - no faults, no degradation. We specially take 200Gb SSD for journals to reduce costs, and also have a higher than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended 1/3 to 1/6. So, as a conclusion - I'll recommend you to get a bigger budget and buy durable and fast SSDs for Ceph. Megov Igor CIO, Yuterra От: ceph-users ceph-users-boun...@lists.ceph.com javascript:; от имени Voloshanenko Igor igor.voloshane...@gmail.com javascript:; Отправлено: 13 августа 2015 г. 15:54 Кому: Jan Schermer Копия: ceph-users@lists.ceph.com javascript:; Тема: Re: [ceph-users] CEPH cache layer. Very slow So, good, but price for 845 DC PRO 400 GB higher in about 2x times than intel S3500 240G ((( Any other models? ((( 2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz javascript:; : I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and not just PRO or DC EVO!). Those were very cheap but are out of stock at the moment (here). Faster than Intels, cheaper, and slightly different technology (3D V-NAND) which IMO makes them superior without needing many tricks to do its job. Jan On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Tnx, Irek! Will try! but another question to all, which SSD good enough for CEPH now? I'm looking into S3500 240G (I have some S3500 120G which show great results. Around 8x times better than Samsung) Possible you can give advice about other vendors/models with same or below price level as S3500 240G? 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com javascript:;: Hi, Igor. Try to roll the patch here: http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov P.S. I am no longer tracks changes in this direction(kernel), because we use already recommended SSD С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com javascript:;: So, after testing SSD (i wipe 1 SSD, and used it for tests) root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800] ting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 2015 write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 clat percentiles (usec): | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928], | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408], | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016], | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048], | 99.99th=[14912] bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30% So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s I try to change cache mode : echo temporary write through /sys/class/scsi_disk/2:0:0:0/cache_type echo temporary write through /sys/class/scsi_disk/3:0:0:0/cache_type no luck, still same shit results, also i found this article: https://lkml.org/lkml/2013/11/20/264 pointed to old
Re: [ceph-users] CEPH cache layer. Very slow
So, after testing SSD (i wipe 1 SSD, and used it for tests) root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800] ting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 2015 write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 clat percentiles (usec): | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928], | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408], | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016], | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048], | 99.99th=[14912] bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30% So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s I try to change cache mode : echo temporary write through /sys/class/scsi_disk/2:0:0:0/cache_type echo temporary write through /sys/class/scsi_disk/3:0:0:0/cache_type no luck, still same shit results, also i found this article: https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch, which disable CMD_FLUSH https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba Has everybody better ideas, how to improve this? (or disable CMD_FLUSH without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this exception was not included into libsata.c) 2015-08-12 19:17 GMT+03:00 Pieter Koorts pieter.koo...@me.com: Hi Igor I suspect you have very much the same problem as me. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html Basically Samsung drives (like many SATA SSD's) are very much hit and miss so you will need to test them like described here to see if they are any good. http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ To give you an idea my average performance went from 11MB/s (with Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a very small cluster. Pieter On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 disks on each, 10 HDD, 2 SSD) Also we cover this with custom crushmap with 2 root leaf ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -100 5.0 root ssd -102 1.0 host ix-s2-ssd 2 1.0 osd.2 up 1.0 1.0 9 1.0 osd.9 up 1.0 1.0 -103 1.0 host ix-s3-ssd 3 1.0 osd.3 up 1.0 1.0 7 1.0 osd.7 up 1.0 1.0 -104 1.0 host ix-s5-ssd 1 1.0 osd.1 up 1.0 1.0 6 1.0 osd.6 up 1.0 1.0 -105 1.0 host ix-s6-ssd 4 1.0 osd.4 up 1.0 1.0 8 1.0 osd.8 up 1.0 1.0 -106 1.0 host ix-s7-ssd 0 1.0 osd.0 up 1.0 1.0 5 1.0 osd.5 up 1.0 1.0 -1 5.0 root platter -2 1.0 host ix-s2-platter 13 1.0 osd.13 up 1.0 1.0 17 1.0 osd.17 up 1.0 1.0 21 1.0 osd.21 up 1.0 1.0 27 1.0 osd.27 up 1.0 1.0 32 1.0 osd.32 up 1.0 1.0 37 1.0 osd.37 up 1.0 1.0 44 1.0 osd.44 up 1.0 1.0 48 1.0 osd.48 up 1.0 1.0 55 1.0 osd.55 up 1.0 1.0
Re: [ceph-users] CEPH cache layer. Very slow
So, good, but price for 845 DC PRO 400 GB higher in about 2x times than intel S3500 240G ((( Any other models? ((( 2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz: I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and not just PRO or DC EVO!). Those were very cheap but are out of stock at the moment (here). Faster than Intels, cheaper, and slightly different technology (3D V-NAND) which IMO makes them superior without needing many tricks to do its job. Jan On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Tnx, Irek! Will try! but another question to all, which SSD good enough for CEPH now? I'm looking into S3500 240G (I have some S3500 120G which show great results. Around 8x times better than Samsung) Possible you can give advice about other vendors/models with same or below price level as S3500 240G? 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com: Hi, Igor. Try to roll the patch here: http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov P.S. I am no longer tracks changes in this direction(kernel), because we use already recommended SSD С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com : So, after testing SSD (i wipe 1 SSD, and used it for tests) root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800] ting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 2015 write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 clat percentiles (usec): | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928], | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408], | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016], | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048], | 99.99th=[14912] bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30% So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s I try to change cache mode : echo temporary write through /sys/class/scsi_disk/2:0:0:0/cache_type echo temporary write through /sys/class/scsi_disk/3:0:0:0/cache_type no luck, still same shit results, also i found this article: https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch, which disable CMD_FLUSH https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba Has everybody better ideas, how to improve this? (or disable CMD_FLUSH without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this exception was not included into libsata.c) 2015-08-12 19:17 GMT+03:00 Pieter Koorts pieter.koo...@me.com: Hi Igor I suspect you have very much the same problem as me. https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html Basically Samsung drives (like many SATA SSD's) are very much hit and miss so you will need to test them like described here to see if they are any good. http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ To give you an idea my average performance went from 11MB/s (with Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a very small cluster. Pieter On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 disks on each, 10 HDD, 2 SSD) Also we cover this with custom crushmap with 2 root leaf ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -100 5.0 root ssd -102 1.0 host ix-s2-ssd 2 1.0 osd.2 up 1.0 1.0 9 1.0 osd.9 up 1.0 1.0
[ceph-users] CEPH cache layer. Very slow
Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 disks on each, 10 HDD, 2 SSD) Also we cover this with custom crushmap with 2 root leaf ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -100 5.0 root ssd -102 1.0 host ix-s2-ssd 2 1.0 osd.2 up 1.0 1.0 9 1.0 osd.9 up 1.0 1.0 -103 1.0 host ix-s3-ssd 3 1.0 osd.3 up 1.0 1.0 7 1.0 osd.7 up 1.0 1.0 -104 1.0 host ix-s5-ssd 1 1.0 osd.1 up 1.0 1.0 6 1.0 osd.6 up 1.0 1.0 -105 1.0 host ix-s6-ssd 4 1.0 osd.4 up 1.0 1.0 8 1.0 osd.8 up 1.0 1.0 -106 1.0 host ix-s7-ssd 0 1.0 osd.0 up 1.0 1.0 5 1.0 osd.5 up 1.0 1.0 -1 5.0 root platter -2 1.0 host ix-s2-platter 13 1.0 osd.13 up 1.0 1.0 17 1.0 osd.17 up 1.0 1.0 21 1.0 osd.21 up 1.0 1.0 27 1.0 osd.27 up 1.0 1.0 32 1.0 osd.32 up 1.0 1.0 37 1.0 osd.37 up 1.0 1.0 44 1.0 osd.44 up 1.0 1.0 48 1.0 osd.48 up 1.0 1.0 55 1.0 osd.55 up 1.0 1.0 59 1.0 osd.59 up 1.0 1.0 -3 1.0 host ix-s3-platter 14 1.0 osd.14 up 1.0 1.0 18 1.0 osd.18 up 1.0 1.0 23 1.0 osd.23 up 1.0 1.0 28 1.0 osd.28 up 1.0 1.0 33 1.0 osd.33 up 1.0 1.0 39 1.0 osd.39 up 1.0 1.0 43 1.0 osd.43 up 1.0 1.0 47 1.0 osd.47 up 1.0 1.0 54 1.0 osd.54 up 1.0 1.0 58 1.0 osd.58 up 1.0 1.0 -4 1.0 host ix-s5-platter 11 1.0 osd.11 up 1.0 1.0 16 1.0 osd.16 up 1.0 1.0 22 1.0 osd.22 up 1.0 1.0 26 1.0 osd.26 up 1.0 1.0 31 1.0 osd.31 up 1.0 1.0 36 1.0 osd.36 up 1.0 1.0 41 1.0 osd.41 up 1.0 1.0 46 1.0 osd.46 up 1.0 1.0 51 1.0 osd.51 up 1.0 1.0 56 1.0 osd.56 up 1.0 1.0 -5 1.0 host ix-s6-platter 12 1.0 osd.12 up 1.0 1.0 19 1.0 osd.19 up 1.0 1.0 24 1.0 osd.24 up 1.0 1.0 29 1.0 osd.29 up 1.0 1.0 34 1.0 osd.34 up 1.0 1.0 38 1.0 osd.38 up 1.0 1.0 42 1.0 osd.42 up 1.0 1.0 50 1.0 osd.50 up 1.0 1.0 53 1.0 osd.53 up 1.0 1.0 57 1.0 osd.57 up 1.0 1.0 -6 1.0 host ix-s7-platter 10 1.0 osd.10 up 1.0 1.0 15 1.0 osd.15 up 1.0 1.0 20 1.0 osd.20 up 1.0 1.0 25 1.0 osd.25 up 1.0 1.0 30 1.0 osd.30 up 1.0 1.0 35 1.0 osd.35 up 1.0 1.0 40 1.0 osd.40 up 1.0 1.0 45 1.0 osd.45 up 1.0 1.0 49 1.0 osd.49 up 1.0 1.0 52 1.0 osd.52 up 1.0 1.0 Then create 2 pools, 1 on HDD (platters), 1 on SSD/ and put SSD pul in from of HDD pool (cache tier) now we receive very bad performance results from cluster. Even with rados