[ceph-users] Fwd: ceph-objectstore-tool remove-clone-metadata. How to use?

2016-07-20 Thread Voloshanenko Igor
Hi community, 10 months ago, we discovered issue, after removing cache tier
from cluster with cluster HEALTH, and start email thread, as result - new
bug was created on tracker by Samuel Just
http://tracker.ceph.com/issues/12738

Till that time, i'm looking for good moment to upgrade (after fix was
backported to 0.94.7). And yesterday i did upgrade on my production cluster.

>From 28 scrub errors, only 5 remains, so i need to fix them by
ceph-objectstore-tool remove-clone-metadata subcommand.

I try to did it, but without real results... Can you please give me advice,
what i'm doing wrong?

My flow was the next:

1. Identify problem PGs... -  ceph health detail | grep inco | grep -v
HEALTH | cut -d " " -f 2
2. Start repair for them, to collect info about errors into logs - ceph pg
repair 

After this for example, i received next records into logs

2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF]
2.c4 repair starts

2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/14d

2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/138

2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir 1
missing clone(s)

2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR]
2.c4 repair 2 errors, 0 fixed

So, i try to fix it with next command:

stop ceph-osd id=56
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307
remove-clone-metadata 138
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307
remove-clone-metadata 14d
start ceph-osd id=56

Strange fact, that after I did this commands - i don;t receive message like
(according to sources... )

cout << "Removal of clone " << cloneid << " complete" << std::endl;
cout << "Use pg repair after OSD restarted to correct stat information" <<
std::endl;

I received silent (no output after command, and command take about 30-35
min to execute... )

Sure, i start pg repair again after this actions... But result - same,
errors still exists...

So, possible i misunderstand input format for ceph-objectstore-tool...
Please help with this.. :)

Thanks you in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-objectstore-tool remove-clone-metadata. How to use?

2016-07-20 Thread Voloshanenko Igor
Hi community, 10 months ago, we discovered issue, after removing cache tier
from cluster with cluster HEALTH, and start email thread, as result - new
bug was created on tracker by Samuel Just
http://tracker.ceph.com/issues/12738

Till that time, i'm looking for good moment to upgrade (after fix was
backported to 0.94.7). And yesterday i did upgrade on my production cluster.

>From 28 scrub errors, only 5 remains, so i need to fix them by
ceph-objectstore-tool remove-clone-metadata subcommand.

I try to did it, but without real results... Can you please give me advice,
what i'm doing wrong?

My flow was the next:

1. Identify problem PGs... -  ceph health detail | grep inco | grep -v
HEALTH | cut -d " " -f 2
2. Start repair for them, to collect info about errors into logs - ceph pg
repair 

After this for example, i received next records into logs

2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF]
2.c4 repair starts

2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/14d

2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0307/138

2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir 1
missing clone(s)

2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR]
2.c4 repair 2 errors, 0 fixed

So, i try to fix it with next command:

stop ceph-osd id=56
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307
remove-clone-metadata 138
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0307
remove-clone-metadata 14d
start ceph-osd id=56

Strange fact, that after I did this commands - i don;t receive message like
(according to sources... )

cout << "Removal of clone " << cloneid << " complete" << std::endl;
cout << "Use pg repair after OSD restarted to correct stat information" <<
std::endl;

I received silent (no output after command, and command take about 30-35
min to execute... )

Sure, i start pg repair again after this actions... But result - same,
errors still exists...

So, possible i misunderstand input format for ceph-objectstore-tool...
Please help with this.. :)

Thanks you in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Wido, also minor issue with 0,2.0 java-rados

We still catch:

-storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876
2015-11-02 11:41:14,958 WARN  [cloud.agent.Agent]
(agentRequest-Handler-4:null) Caught:
java.lang.NegativeArraySizeException
at com.ceph.rbd.RbdImage.snapList(Unknown Source)
at
com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
at
com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
at
com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57)
at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385)
at com.cloud.agent.Agent.processRequest(Agent.java:503)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
at com.cloud.utils.nio.Task.run(Task.java:84)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Even with updated lib:

root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls
/usr/share/cloudstack-agent/lib | grep rados
rados-0.2.0.jar

2015-11-03 11:01 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>:

> Wido, it's the main issue. No records at all...
>
>
> So, from last time:
>
>
> 2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk
> '{print $2}'
> 2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Execution is successful.
> 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processing command:
> com.cloud.agent.api.GetVmStatsCommand
> 2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null) Agent
> started
> 2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
> Implementation Version is 4.5.1
>
> So, almost alsways it's exception after RbdUnprotect then in approx . 20
> minutes - crash..
> Almost all the time - it's happen after GetVmStatsCommand or Disks
> stats... Possible that evil hiden into UpadteDiskInfo method... but i can;t
> find any bad code there (((
>
> 2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com>:
>
>>
>>
>> On 03-11-15 01:54, Voloshanenko Igor wrote:
>> > Thank you, Jason!
>> >
>> > Any advice, for troubleshooting
>> >
>> > I'm looking in code, and right now don;t see any bad things :(
>> >
>>
>> Can you run the CloudStack Agent in DEBUG mode and then see after which
>> lines in the logs it crashes?
>>
>> Wido
>>
>> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com
>> > <mailto:dilla...@redhat.com>>:
>> >
>> > Most likely not going to be related to 13045 since you aren't
>> > actively exporting an image diff.  The most likely problem is that
>> > the RADOS IO context is being closed prior to closing the RBD image.
>> >
>> > --
>> >
>> > Jason Dillaman
>> >
>> >
>> > - Original Message -
>> >
>> > > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com
>> > <mailto:igor.voloshane...@gmail.com>>
>> > > To: "Ceph Users" <ceph-users@lists.ceph.com
>> > <mailto:ceph-users@lists.ceph.com>>
>> > > Sent: Thursday, October 29, 2015 5:27:17 PM
>> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
>> > exception in
>> > > librbd
>> >
>> > > From all we analyzed - look like - it's this issue
>> > > http://tracker.ceph.com/issues/13045
>> >
>> > > PR: https://github.com/ceph/ceph/pull/6097
>> >
>> > > Can anyone help us to confirm this? :)
>> >
>> > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
>> > igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com> >
>> > > :
>> >
>> > > > Additional trace:
>> > >
>> >
>> > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
>> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> > >
>

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Yes, we recompiled ACS too

Also we delete all snapshots... but we can do it for a while...

New snapshot created each days.. And the main issue - agent crash, not
exception itself...

Each RBD operations which cause exception in 20-30 minutes cause agent
crash...

2015-11-03 11:09 GMT+02:00 Wido den Hollander <w...@42on.com>:

>
>
> On 03-11-15 10:04, Voloshanenko Igor wrote:
> > Wido, also minor issue with 0,2.0 java-rados
> >
>
> Did you also re-compile CloudStack against the new rados-java? I still
> think it's related to when the Agent starts cleaning up and there are
> snapshots which need to be unprotected.
>
> In the meantime you might want to remove any existing RBD snapshots
> using the RBD commands from Ceph, that might solve the problem.
>
> Wido
>
> > We still catch:
> >
> > -storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876
> > 2015-11-02 11:41:14,958 WARN  [cloud.agent.Agent]
> > (agentRequest-Handler-4:null) Caught:
> > java.lang.NegativeArraySizeException
> > at com.ceph.rbd.RbdImage.snapList(Unknown Source)
> > at
> >
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
> > at
> >
> com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
> > at
> >
> com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
> > at
> >
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)
> > at
> >
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57)
> > at
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385)
> > at com.cloud.agent.Agent.processRequest(Agent.java:503)
> > at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
> > at com.cloud.utils.nio.Task.run(Task.java:84)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > Even with updated lib:
> >
> > root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls
> > /usr/share/cloudstack-agent/lib | grep rados
> > rados-0.2.0.jar
> >
> > 2015-11-03 11:01 GMT+02:00 Voloshanenko Igor
> > <igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com>>:
> >
> > Wido, it's the main issue. No records at all...
> >
> >
> > So, from last time:
> >
> >
> > 2015-11-02 11:40:33,204 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep
> > Mem:|awk '{print $2}'
> > 2015-11-02 11:40:33,207 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Execution is successful.
> > 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
> > (agentRequest-Handler-4:null) Processing command:
> > com.cloud.agent.api.GetVmStatsCommand
> > 2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null)
> > Agent started
> > 2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
> > Implementation Version is 4.5.1
> >
> > So, almost alsways it's exception after RbdUnprotect then in approx
> > . 20 minutes - crash..
> > Almost all the time - it's happen after GetVmStatsCommand or Disks
> > stats... Possible that evil hiden into UpadteDiskInfo method... but
> > i can;t find any bad code there (((
> >
> > 2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com
> > <mailto:w...@42on.com>>:
> >
> >
> >
> > On 03-11-15 01:54, Voloshanenko Igor wrote:
> > > Thank you, Jason!
> > >
> > > Any advice, for troubleshooting
> > >
> > > I'm looking in code, and right now don;t see any bad things :(
> > >
> >
> > Can you run the CloudStack Agent in DEBUG mode and then see
> > after which
> > lines in the logs it crashes?
> >
> > Wido
> >
> > > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com
> <mailto:dilla...@redhat.com>
> > > <mailto:dilla...@redhat.com <mailto:dilla...@redhat.com>&g

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Wido, it's the main issue. No records at all...


So, from last time:


2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk
'{print $2}'
2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.
2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-4:null) Processing command:
com.cloud.agent.api.GetVmStatsCommand
2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null) Agent
started
2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
Implementation Version is 4.5.1

So, almost alsways it's exception after RbdUnprotect then in approx . 20
minutes - crash..
Almost all the time - it's happen after GetVmStatsCommand or Disks stats...
Possible that evil hiden into UpadteDiskInfo method... but i can;t find any
bad code there (((

2015-11-03 10:40 GMT+02:00 Wido den Hollander <w...@42on.com>:

>
>
> On 03-11-15 01:54, Voloshanenko Igor wrote:
> > Thank you, Jason!
> >
> > Any advice, for troubleshooting
> >
> > I'm looking in code, and right now don;t see any bad things :(
> >
>
> Can you run the CloudStack Agent in DEBUG mode and then see after which
> lines in the logs it crashes?
>
> Wido
>
> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com
> > <mailto:dilla...@redhat.com>>:
> >
> > Most likely not going to be related to 13045 since you aren't
> > actively exporting an image diff.  The most likely problem is that
> > the RADOS IO context is being closed prior to closing the RBD image.
> >
> > --
> >
> > Jason Dillaman
> >
> >
> > - Original Message -
> >
> > > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com
> > <mailto:igor.voloshane...@gmail.com>>
> > > To: "Ceph Users" <ceph-users@lists.ceph.com
> > <mailto:ceph-users@lists.ceph.com>>
> > > Sent: Thursday, October 29, 2015 5:27:17 PM
> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
> > exception in
> > > librbd
> >
> > > From all we analyzed - look like - it's this issue
> > > http://tracker.ceph.com/issues/13045
> >
> > > PR: https://github.com/ceph/ceph/pull/6097
> >
> > > Can anyone help us to confirm this? :)
> >
> > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
> > igor.voloshane...@gmail.com <mailto:igor.voloshane...@gmail.com> >
> > > :
> >
> > > > Additional trace:
> > >
> >
> > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> > >
> > > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89
> > >
> > > > #2 0x7f30f87b36b5 in
> > __gnu_cxx::__verbose_terminate_handler() () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #3 0x7f30f87b1836 in ?? () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #4 0x7f30f87b1863 in std::terminate() () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #5 0x7f30f87b1aa2 in __cxa_throw () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail
> > > > (assertion=assertion@entry=0x7f2fdddeca05 "sub <
> m_subsys.size()",
> > >
> > > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h",
> > line=line@entry=62,
> > >
> > > > func=func@entry=0x7f2fdddedba0
> > > >
> >
>  <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
> > > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> > > > common/assert.cc:77
> > >
> > > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> > > > (level=, sub=, this= out>)
> > >
> > > > at ./log/SubsystemMap.h:62
> > >
> > > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> > > > (this=, sub=, level= out>)
> > >
> > > > at ./log/SubsystemMap.h:61
> > >
> > > &

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-02 Thread Voloshanenko Igor
Dear all, can anybody help?

2015-10-30 10:37 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>:

> It's pain, but not... :(
> We already used your updated lib in dev env... :(
>
> 2015-10-30 10:06 GMT+02:00 Wido den Hollander <w...@42on.com>:
>
>>
>>
>> On 29-10-15 16:38, Voloshanenko Igor wrote:
>> > Hi Wido and all community.
>> >
>> > We catched very idiotic issue on our Cloudstack installation, which
>> > related to ceph and possible to java-rados lib.
>> >
>>
>> I think you ran into this one:
>> https://issues.apache.org/jira/browse/CLOUDSTACK-8879
>>
>> Cleaning up RBD snapshots for volumes didn't go well and caused the JVM
>> to crash.
>>
>> Wido
>>
>> > So, we have constantly agent crashed (which cause very big problem for
>> > us... ).
>> >
>> > When agent crashed - it's crash JVM. And no event in logs at all.
>> > We enabled crush dump, and after crash we see next picture:
>> >
>> > #grep -A1 "Problematic frame" < /hs_err_pid30260.log
>> >  Problematic frame:
>> >  C  [librbd.so.1.0.0+0x5d681]
>> >
>> > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
>> > (gdb)  bt
>> > ...
>> > #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
>> > (level=, sub=, this=)
>> > at ./log/SubsystemMap.h:62
>> > #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
>> > (this=, sub=, level=)
>> > at ./log/SubsystemMap.h:61
>> > #9  0x7f30b9d879be in ObjectCacher::flusher_entry
>> > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
>> > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
>> > (this=) at osdc/ObjectCacher.h:374
>> >
>> > From ceph code, this part executed when flushing cache object... And we
>> > don;t understand why. Becasue we have absolutely different race
>> > condition to reproduce it.
>> >
>> > As cloudstack have not good implementation yet of snapshot lifecycle,
>> > sometime, it's happen, that some volumes already marked as EXPUNGED in
>> > DB and then cloudstack try to delete bas Volume, before it's try to
>> > unprotect it.
>> >
>> > Sure, unprotecting fail, normal exception returned back (fail because
>> > snap has childs... )
>> >
>> > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
>> > (Thread-1304:null) Executing:
>> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh
>> > -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
>> > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
>> > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
>> > (Thread-1304:null) Execution is successful.
>> > 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
>> > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
>> > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the
>> image
>> > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
>> > cephmon.anolim.net:6789 <http://cephmon.anolim.net:6789>
>> > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> > (agentRequest-Handler-5:null) Unprotecting snapshot
>> > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
>> > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
>> > (agentRequest-Handler-5:null) Failed to delete volume:
>> > com.cloud.utils.exception.CloudRuntimeException:
>> > com.ceph.rbd.RbdException: Failed to unprotect snapshot
>> cloudstack-base-snap
>> > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
>> > (agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
>> > 161344838950, via: 4, Ver: v1, Flags: 10,
>> >
>> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
>> > com.ceph.rbd.RbdException: Failed to unprotect snapshot
>> > cloudstack-base-snap","wait":0}}] }
>> > 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent]
>> > (agentRequest-Handler-2:null) Processing command:
>> > com.cloud.agent.api.GetHostStatsCommand
>> > 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource]
>> > (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-02 Thread Voloshanenko Igor
Thank you, Jason!

Any advice, for troubleshooting

I'm looking in code, and right now don;t see any bad things :(

2015-11-03 1:32 GMT+02:00 Jason Dillaman <dilla...@redhat.com>:

> Most likely not going to be related to 13045 since you aren't actively
> exporting an image diff.  The most likely problem is that the RADOS IO
> context is being closed prior to closing the RBD image.
>
> --
>
> Jason Dillaman
>
>
> - Original Message -
>
> > From: "Voloshanenko Igor" <igor.voloshane...@gmail.com>
> > To: "Ceph Users" <ceph-users@lists.ceph.com>
> > Sent: Thursday, October 29, 2015 5:27:17 PM
> > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with exception in
> > librbd
>
> > From all we analyzed - look like - it's this issue
> > http://tracker.ceph.com/issues/13045
>
> > PR: https://github.com/ceph/ceph/pull/6097
>
> > Can anyone help us to confirm this? :)
>
> > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
> igor.voloshane...@gmail.com >
> > :
>
> > > Additional trace:
> >
>
> > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> >
> > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89
> >
> > > #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() ()
> from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #3 0x7f30f87b1836 in ?? () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #4 0x7f30f87b1863 in std::terminate() () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #5 0x7f30f87b1aa2 in __cxa_throw () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail
> > > (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
> >
> > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry
> =62,
> >
> > > func=func@entry=0x7f2fdddedba0
> > > <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__>
> "bool
> > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> > > common/assert.cc:77
> >
> > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> > > (level=, sub=, this=)
> >
> > > at ./log/SubsystemMap.h:62
> >
> > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> > > (this=, sub=, level=)
> >
> > > at ./log/SubsystemMap.h:61
> >
> > > #9 0x7f2fddd879be in ObjectCacher::flusher_entry
> (this=0x7f2ff80b27a0)
> > > at
> > > osdc/ObjectCacher.cc:1527
> >
> > > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
> > > (this= > > out>) at osdc/ObjectCacher.h:374
> >
> > > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
> > > pthread_create.c:312
> >
> > > #12 0x7f30f995547d in clone () at
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
>
> > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <
> igor.voloshane...@gmail.com
> > > >
> > > :
> >
>
> > > > Hi Wido and all community.
> > >
> >
>
> > > > We catched very idiotic issue on our Cloudstack installation, which
> > > > related
> > > > to ceph and possible to java-rados lib.
> > >
> >
>
> > > > So, we have constantly agent crashed (which cause very big problem
> for
> > > > us...
> > > > ).
> > >
> >
>
> > > > When agent crashed - it's crash JVM. And no event in logs at all.
> > >
> >
> > > > We enabled crush dump, and after crash we see next picture:
> > >
> >
>
> > > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log
> > >
> >
> > > > Problematic frame:
> > >
> >
> > > > C [librbd.so.1.0.0+0x5d681]
> > >
> >
>
> > > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
> > >
> >
> > > > (gdb) bt
> > >
> >
> > > > ...
> > >
> >
> > > > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
> > > > (level=, sub=, this=)
> > >
> >
> > > > at ./log/SubsystemMap.h:62
> > >
> >
> > > > #8 0x7f30b9

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-30 Thread Voloshanenko Igor
It's pain, but not... :(
We already used your updated lib in dev env... :(

2015-10-30 10:06 GMT+02:00 Wido den Hollander <w...@42on.com>:

>
>
> On 29-10-15 16:38, Voloshanenko Igor wrote:
> > Hi Wido and all community.
> >
> > We catched very idiotic issue on our Cloudstack installation, which
> > related to ceph and possible to java-rados lib.
> >
>
> I think you ran into this one:
> https://issues.apache.org/jira/browse/CLOUDSTACK-8879
>
> Cleaning up RBD snapshots for volumes didn't go well and caused the JVM
> to crash.
>
> Wido
>
> > So, we have constantly agent crashed (which cause very big problem for
> > us... ).
> >
> > When agent crashed - it's crash JVM. And no event in logs at all.
> > We enabled crush dump, and after crash we see next picture:
> >
> > #grep -A1 "Problematic frame" < /hs_err_pid30260.log
> >  Problematic frame:
> >  C  [librbd.so.1.0.0+0x5d681]
> >
> > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
> > (gdb)  bt
> > ...
> > #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
> > (level=, sub=, this=)
> > at ./log/SubsystemMap.h:62
> > #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
> > (this=, sub=, level=)
> > at ./log/SubsystemMap.h:61
> > #9  0x7f30b9d879be in ObjectCacher::flusher_entry
> > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
> > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
> > (this=) at osdc/ObjectCacher.h:374
> >
> > From ceph code, this part executed when flushing cache object... And we
> > don;t understand why. Becasue we have absolutely different race
> > condition to reproduce it.
> >
> > As cloudstack have not good implementation yet of snapshot lifecycle,
> > sometime, it's happen, that some volumes already marked as EXPUNGED in
> > DB and then cloudstack try to delete bas Volume, before it's try to
> > unprotect it.
> >
> > Sure, unprotecting fail, normal exception returned back (fail because
> > snap has childs... )
> >
> > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
> > (Thread-1304:null) Executing:
> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh
> > -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
> > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
> > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
> > (Thread-1304:null) Execution is successful.
> > 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
> > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
> > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the
> image
> > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
> > cephmon.anolim.net:6789 <http://cephmon.anolim.net:6789>
> > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > (agentRequest-Handler-5:null) Unprotecting snapshot
> > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
> > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
> > (agentRequest-Handler-5:null) Failed to delete volume:
> > com.cloud.utils.exception.CloudRuntimeException:
> > com.ceph.rbd.RbdException: Failed to unprotect snapshot
> cloudstack-base-snap
> > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
> > (agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
> > 161344838950, via: 4, Ver: v1, Flags: 10,
> >
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
> > com.ceph.rbd.RbdException: Failed to unprotect snapshot
> > cloudstack-base-snap","wait":0}}] }
> > 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent]
> > (agentRequest-Handler-2:null) Processing command:
> > com.cloud.agent.api.GetHostStatsCommand
> > 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n
> > 1| awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo
> $idle
> > 2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Execution is successful.
> > 2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Executing: /bin/bash -c
> > freeMem=$(free|grep cache:|awk '{print $4}');echo $freeMem
> > 2015-10

[ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor
Hi Wido and all community.

We catched very idiotic issue on our Cloudstack installation, which related
to ceph and possible to java-rados lib.

So, we have constantly agent crashed (which cause very big problem for
us... ).

When agent crashed - it's crash JVM. And no event in logs at all.
We enabled crush dump, and after crash we see next picture:

#grep -A1 "Problematic frame" < /hs_err_pid30260.log
 Problematic frame:
 C  [librbd.so.1.0.0+0x5d681]

# gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
(gdb)  bt
...
#7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
(level=, sub=, this=)
at ./log/SubsystemMap.h:62
#8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
(this=, sub=, level=)
at ./log/SubsystemMap.h:61
#9  0x7f30b9d879be in ObjectCacher::flusher_entry (this=0x7f2fb4017910)
at osdc/ObjectCacher.cc:1527
#10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
(this=) at osdc/ObjectCacher.h:374

>From ceph code, this part executed when flushing cache object... And we
don;t understand why. Becasue we have absolutely different race condition
to reproduce it.

As cloudstack have not good implementation yet of snapshot lifecycle,
sometime, it's happen, that some volumes already marked as EXPUNGED in DB
and then cloudstack try to delete bas Volume, before it's try to unprotect
it.

Sure, unprotecting fail, normal exception returned back (fail because snap
has childs... )

2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
(Thread-1304:null) Executing:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
/mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
(Thread-1304:null) Execution is successful.
2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
cephmon.anolim.net:6789
2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Unprotecting snapshot
cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
(agentRequest-Handler-5:null) Failed to delete volume:
com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException:
Failed to unprotect snapshot cloudstack-base-snap
2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
161344838950, via: 4, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
com.ceph.rbd.RbdException: Failed to unprotect snapshot
cloudstack-base-snap","wait":0}}] }
2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-2:null) Processing command:
com.cloud.agent.api.GetHostStatsCommand
2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1|
awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo $idle
2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.
2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c freeMem=$(free|grep
cache:|awk '{print $4}');echo $freeMem
2015-10-29 09:02:26,254 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.

BUT, after 20 minutes - agent crashed... If we remove all childs and create
conditions for cloudstack to delete volume - all OK - no agent crash in 20
minutes...

We can't connect this action - Volume delete with agent crashe... Also we
don't understand why +- 20 minutes need to last, and only then agent
crashed...

>From logs, before crash - only GetVMStats... And then - agent started...

2015-10-29 09:21:55,143 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
Sending ping: Seq 4-1343:  { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.PingRoutingCommand":{"newStates":{},"_hostVmStateReport":{"i-881-1117-VM":{"state":"PowerOn","host":"
cs2.anolim.net"},"i-7-106-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-1683-1984-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-11-504-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-325-616-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-10-52-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-941-1237-VM":{"state":"PowerOn","host":"cs2.anolim.net"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":4,"wait":0}}]
}
2015-10-29 09:21:55,149 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
Received 

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor
>From all we analyzed - look like - it's this issue
http://tracker.ceph.com/issues/13045

PR: https://github.com/ceph/ceph/pull/6097

Can anyone help us to confirm this? :)

2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>:

> Additional trace:
>
> #0  0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f30f98950d8 in __GI_abort () at abort.c:89
> #2  0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x7f30f87b1836 in ?? () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x7f30f87b1863 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x7f30f87b1aa2 in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x7f2fddb50778 in ceph::__ceph_assert_fail
> (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
> file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry
> =62,
> func=func@entry=0x7f2fdddedba0
> <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
> ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> common/assert.cc:77
> #7  0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> (level=, sub=, this=)
> at ./log/SubsystemMap.h:62
> #8  0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> (this=, sub=, level=)
> at ./log/SubsystemMap.h:61
> #9  0x7f2fddd879be in ObjectCacher::flusher_entry
> (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527
> #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
> (this=) at osdc/ObjectCacher.h:374
> #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
> pthread_create.c:312
> #12 0x7f30f995547d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>
> :
>
>> Hi Wido and all community.
>>
>> We catched very idiotic issue on our Cloudstack installation, which
>> related to ceph and possible to java-rados lib.
>>
>> So, we have constantly agent crashed (which cause very big problem for
>> us... ).
>>
>> When agent crashed - it's crash JVM. And no event in logs at all.
>> We enabled crush dump, and after crash we see next picture:
>>
>> #grep -A1 "Problematic frame" < /hs_err_pid30260.log
>>  Problematic frame:
>>  C  [librbd.so.1.0.0+0x5d681]
>>
>> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
>> (gdb)  bt
>> ...
>> #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
>> (level=, sub=, this=)
>> at ./log/SubsystemMap.h:62
>> #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
>> (this=, sub=, level=)
>> at ./log/SubsystemMap.h:61
>> #9  0x7f30b9d879be in ObjectCacher::flusher_entry
>> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
>> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
>> (this=) at osdc/ObjectCacher.h:374
>>
>> From ceph code, this part executed when flushing cache object... And we
>> don;t understand why. Becasue we have absolutely different race condition
>> to reproduce it.
>>
>> As cloudstack have not good implementation yet of snapshot lifecycle,
>> sometime, it's happen, that some volumes already marked as EXPUNGED in DB
>> and then cloudstack try to delete bas Volume, before it's try to unprotect
>> it.
>>
>> Sure, unprotecting fail, normal exception returned back (fail because
>> snap has childs... )
>>
>> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
>> (Thread-1304:null) Executing:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
>> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
>> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
>> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
>> (Thread-1304:null) Execution is successful.
>> 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
>> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
>> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
>> cephmon.anolim.net:6789
>> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Unprotecting snapshot
>> cloudstack/71b1e2e9-1985-

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor
Additional trace:

#0  0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f30f98950d8 in __GI_abort () at abort.c:89
#2  0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x7f30f87b1836 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x7f30f87b1863 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x7f30f87b1aa2 in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x7f2fddb50778 in ceph::__ceph_assert_fail
(assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry
=62,
func=func@entry=0x7f2fdddedba0
<_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
common/assert.cc:77
#7  0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
(level=, sub=, this=)
at ./log/SubsystemMap.h:62
#8  0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
(this=, sub=, level=)
at ./log/SubsystemMap.h:61
#9  0x7f2fddd879be in ObjectCacher::flusher_entry (this=0x7f2ff80b27a0)
at osdc/ObjectCacher.cc:1527
#10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
(this=) at osdc/ObjectCacher.h:374
#11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
pthread_create.c:312
#12 0x7f30f995547d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <igor.voloshane...@gmail.com>:

> Hi Wido and all community.
>
> We catched very idiotic issue on our Cloudstack installation, which
> related to ceph and possible to java-rados lib.
>
> So, we have constantly agent crashed (which cause very big problem for
> us... ).
>
> When agent crashed - it's crash JVM. And no event in logs at all.
> We enabled crush dump, and after crash we see next picture:
>
> #grep -A1 "Problematic frame" < /hs_err_pid30260.log
>  Problematic frame:
>  C  [librbd.so.1.0.0+0x5d681]
>
> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
> (gdb)  bt
> ...
> #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
> (level=, sub=, this=)
> at ./log/SubsystemMap.h:62
> #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
> (this=, sub=, level=)
> at ./log/SubsystemMap.h:61
> #9  0x7f30b9d879be in ObjectCacher::flusher_entry
> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
> (this=) at osdc/ObjectCacher.h:374
>
> From ceph code, this part executed when flushing cache object... And we
> don;t understand why. Becasue we have absolutely different race condition
> to reproduce it.
>
> As cloudstack have not good implementation yet of snapshot lifecycle,
> sometime, it's happen, that some volumes already marked as EXPUNGED in DB
> and then cloudstack try to delete bas Volume, before it's try to unprotect
> it.
>
> Sure, unprotecting fail, normal exception returned back (fail because snap
> has childs... )
>
> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
> (Thread-1304:null) Executing:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
> (Thread-1304:null) Execution is successful.
> 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
> cephmon.anolim.net:6789
> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Unprotecting snapshot
> cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
> 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
> (agentRequest-Handler-5:null) Failed to delete volume:
> com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException:
> Failed to unprotect snapshot cloudstack-base-snap
> 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
> 161344838950, via: 4, Ver: v1, Flags: 10,
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
> com.ceph.rbd.RbdException: Failed to unprotect snapshot
> cloudst

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-26 Thread Voloshanenko Igor
Great!
Yes, behaviour exact as i described. So looks like it's root cause )

Thank you, Sam. Ilya!

2015-08-21 21:08 GMT+03:00 Samuel Just sj...@redhat.com:

 I think I found the bug -- need to whiteout the snapset (or decache
 it) upon evict.

 http://tracker.ceph.com/issues/12748
 -Sam

 On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov idryo...@gmail.com wrote:
  On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote:
  Odd, did you happen to capture osd logs?
 
  No, but the reproducer is trivial to cut  paste.
 
  Thanks,
 
  Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-25 Thread Voloshanenko Igor
To be honest, Samsung 850 PRO not 24/7 series... it's something about
desktop+ series, but anyway - results from this drives - very very bad in
any scenario acceptable by real life...

Possible 845 PRO more better, but we don't want to experiment anymore... So
we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and no
so durable for writes, but we think more better to replace 1 ssd per 1 year
than to pay double price now.

2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 And should I mention that in another CEPH installation we had samsung 850
 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
 the system, so not wear out...

 Never again we buy Samsung :)
 On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote:

 First read please:

 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
 are  constant performance numbers, meaning avoiding drives cache and
 running for longer period of time...
 Also if checking with FIO you will get better latencies on intel s3500
 (model tested in our case) along with 20X better IOPS results...

 We observed original issue by having high speed at begining of i.e. file
 transfer inside VM, which than halts to zero... We moved journals back to
 HDDs and performans was acceptable...no we are upgrading to intel S3500...

 Best
 any details on that ?

 On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
 andrija.pa...@gmail.com wrote:

  Make sure you test what ever you decide. We just learned this the hard
 way
  with samsung 850 pro, which is total crap, more than you could
 imagine...
 
  Andrija
  On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:
 
   I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
   Very cheap, better than Intel 3610 for sure (and I think it beats even
   3700).
  
   Jan
  
On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
   wrote:
   
Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
Hi,
   
most of the times I do get the recommendation from resellers to go
 with
the intel s3700 for the journalling.
   
Check out the Intel s3610. 3 drive writes per day for 5 years.
 Plus, it
is cheaper than S3700.
   
Regards,
   
--ck
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  



 --
 Mariusz Gronczewski, Administrator

 Efigence S. A.
 ul. Wołoska 9a, 02-583 Warszawa
 T: [+48] 22 380 13 13
 F: [+48] 22 380 13 14
 E: mariusz.gronczew...@efigence.com
 mailto:mariusz.gronczew...@efigence.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Voloshanenko Igor
Exact as in our case.

Ilya, same for images from our side. Headers opened from hot tier

пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал:

 On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com
 javascript:; wrote:
  What's supposed to happen is that the client transparently directs all
  requests to the cache pool rather than the cold pool when there is a
  cache pool.  If the kernel is sending requests to the cold pool,
  that's probably where the bug is.  Odd.  It could also be a bug
  specific 'forward' mode either in the client or on the osd.  Why did
  you have it in that mode?

 I think I reproduced this on today's master.

 Setup, cache mode is writeback:

 $ ./ceph osd pool create foo 12 12
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 pool 'foo' created
 $ ./ceph osd pool create foo-hot 12 12
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 pool 'foo-hot' created
 $ ./ceph osd tier add foo foo-hot
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 pool 'foo-hot' is now (or already was) a tier of 'foo'
 $ ./ceph osd tier cache-mode foo-hot writeback
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 set cache-mode for pool 'foo-hot' to writeback
 $ ./ceph osd tier set-overlay foo foo-hot
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 overlay for 'foo' is now (or already was) 'foo-hot'

 Create an image:

 $ ./rbd create --size 10M --image-format 2 foo/bar
 $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
 $ sudo mkfs.ext4 /mnt/bar
 $ sudo umount /mnt

 Create a snapshot, take md5sum:

 $ ./rbd snap create foo/bar@snap
 $ ./rbd export foo/bar /tmp/foo-1
 Exporting image: 100% complete...done.
 $ ./rbd export foo/bar@snap /tmp/snap-1
 Exporting image: 100% complete...done.
 $ md5sum /tmp/foo-1
 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-1
 $ md5sum /tmp/snap-1
 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-1

 Set the cache mode to forward and do a flush, hashes don't match - the
 snap is empty - we bang on the hot tier and don't get redirected to the
 cold tier, I suspect:

 $ ./ceph osd tier cache-mode foo-hot forward
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 set cache-mode for pool 'foo-hot' to forward
 $ ./rados -p foo-hot cache-flush-evict-all
 rbd_data.100a6b8b4567.0002
 rbd_id.bar
 rbd_directory
 rbd_header.100a6b8b4567
 bar.rbd
 rbd_data.100a6b8b4567.0001
 rbd_data.100a6b8b4567.
 $ ./rados -p foo-hot cache-flush-evict-all
 $ ./rbd export foo/bar /tmp/foo-2
 Exporting image: 100% complete...done.
 $ ./rbd export foo/bar@snap /tmp/snap-2
 Exporting image: 100% complete...done.
 $ md5sum /tmp/foo-2
 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-2
 $ md5sum /tmp/snap-2
 f1c9645dbc14efddc7d8a322685f26eb  /tmp/snap-2
 $ od /tmp/snap-2
 000 00 00 00 00 00 00 00 00
 *
 5000

 Disable the cache tier and we are back to normal:

 $ ./ceph osd tier remove-overlay foo
 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 there is now (or already was) no overlay for 'foo'
 $ ./rbd export foo/bar /tmp/foo-3
 Exporting image: 100% complete...done.
 $ ./rbd export foo/bar@snap /tmp/snap-3
 Exporting image: 100% complete...done.
 $ md5sum /tmp/foo-3
 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/foo-3
 $ md5sum /tmp/snap-3
 83f5d244bb65eb19eddce0dc94bf6dda  /tmp/snap-3

 I first reproduced it with the kernel client, rbd export was just to
 take it out of the equation.


 Also, Igor sort of raised a question in his second message: if, after
 setting the cache mode to forward and doing a flush, I open an image
 (not a snapshot, so may not be related to the above) for write (e.g.
 with rbd-fuse), I get an rbd header object in the hot pool, even though
 it's in forward mode:

 $ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
 $ sudo mount /mnt/bar /media
 $ sudo umount /media
 $ sudo umount /mnt
 $ ./rados -p foo-hot ls
 rbd_header.100a6b8b4567
 $ ./rados -p foo ls | grep rbd_header
 rbd_header.100a6b8b4567

 It's been a while since I looked into tiering, is that how it's
 supposed to work?  It looks like it happens because rbd_header op
 replies don't redirect?

 Thanks,

 Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Not yet. I will create.
But according to mail lists and Inktank docs - it's expected behaviour when
cache enable

2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com:

 Is there a bug for this in the tracker?
 -Sam

 On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Issue, that in forward mode, fstrim doesn't work proper, and when we take
  snapshot - data not proper update in cache layer, and client (ceph) see
  damaged snap.. As headers requested from cache layer.
 
  2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com:
 
  What was the issue?
  -Sam
 
  On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Samuel, we turned off cache layer few hours ago...
   I will post ceph.log in few minutes
  
   For snap - we found issue, was connected with cache tier..
  
   2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Ok, you appear to be using a replicated cache tier in front of a
   replicated base tier.  Please scrub both inconsistent pgs and post
 the
   ceph.log from before when you started the scrub until after.  Also,
   what command are you using to take snapshots?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Hi Samuel, we try to fix it in trick way.
   
we check all rbd_data chunks from logs (OSD) which are affected,
 then
query
rbd info to compare which rbd consist bad rbd_data, after that we
mount
this
rbd as rbd0, create empty rbd, and DD all info from bad volume to
 new
one.
   
But after that - scrub errors growing... Was 15 errors.. .Now 35...
We
laos
try to out OSD which was lead, but after rebalancing this 2 pgs
 still
have
35 scrub errors...
   
ceph osd getmap -o outfile - attached
   
   
2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:
   
Is the number of inconsistent objects growing?  Can you attach the
whole ceph.log from the 6 hours before and after the snippet you
linked above?  Are you using cache/tiering?  Can you attach the
osdmap
(ceph osd getmap -o outfile)?
-Sam
   
On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 ceph - 0.94.2
 Its happen during rebalancing

 I thought too, that some OSD miss copy, but looks like all
 miss...
 So any advice in which direction i need to go

 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com:

 From a quick peek it looks like some of the OSDs are missing
 clones
 of
 objects. I'm not sure how that could happen and I'd expect the
 pg
 repair to handle that but if it's not there's probably
 something
 wrong; what version of Ceph are you running? Sam, is this
 something
 you've seen, a new bug, or some kind of config issue?
 -Greg

 On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Hi all, at our production cluster, due high rebalancing (((
 we
  have 2
  pgs in
  inconsistent state...
 
  root@temp:~# ceph health detail | grep inc
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
 
  From OSD logs, after recovery attempt:
 
  root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
 while
  read
  i;
  do
  ceph pg repair ${i} ; done
  dumped all in format plain
  instructing pg 2.490 on osd.56 to repair
  instructing pg 2.c4 on osd.56 to repair
 
  /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
  7f94663b3700
  -1
  log_channel(cluster) log [ERR] : deep-scrub 2.490
  f5759490/rbd_data.1631755377d7e.04da/head//2
  expected
  clone
  90c59490/rbd_data.eb486436f2beb.7a65/141//2
  /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
  7f94663b3700
  -1
  log_channel(cluster) log [ERR] : deep-scrub 2.490
  fee49490/rbd_data.12483d3ba0794b.522f/head//2
  expected
  clone
  f5759490/rbd_data.1631755377d7e.04da/141//2
  /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
  7f94663b3700
  -1
  log_channel(cluster) log [ERR] : deep-scrub 2.490
  a9b39490/rbd_data.12483d3ba0794b.37b3/head//2
  expected
  clone
  fee49490/rbd_data.12483d3ba0794b.522f/141//2
  /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
  7f94663b3700
  -1
  log_channel(cluster) log [ERR] : deep-scrub 2.490
  bac19490/rbd_data.1238e82ae8944a.032e/head//2
  expected
  clone
  a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
  /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
  7f94663b3700
  -1

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Inktank:
https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

Mail-list:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html

2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com:

 Which docs?
 -Sam

 On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Not yet. I will create.
  But according to mail lists and Inktank docs - it's expected behaviour
 when
  cache enable
 
  2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Is there a bug for this in the tracker?
  -Sam
 
  On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Issue, that in forward mode, fstrim doesn't work proper, and when we
   take
   snapshot - data not proper update in cache layer, and client (ceph)
 see
   damaged snap.. As headers requested from cache layer.
  
   2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com:
  
   What was the issue?
   -Sam
  
   On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Samuel, we turned off cache layer few hours ago...
I will post ceph.log in few minutes
   
For snap - we found issue, was connected with cache tier..
   
2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:
   
Ok, you appear to be using a replicated cache tier in front of a
replicated base tier.  Please scrub both inconsistent pgs and post
the
ceph.log from before when you started the scrub until after.
 Also,
what command are you using to take snapshots?
-Sam
   
On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 Hi Samuel, we try to fix it in trick way.

 we check all rbd_data chunks from logs (OSD) which are affected,
 then
 query
 rbd info to compare which rbd consist bad rbd_data, after that
 we
 mount
 this
 rbd as rbd0, create empty rbd, and DD all info from bad volume
 to
 new
 one.

 But after that - scrub errors growing... Was 15 errors.. .Now
 35...
 We
 laos
 try to out OSD which was lead, but after rebalancing this 2 pgs
 still
 have
 35 scrub errors...

 ceph osd getmap -o outfile - attached


 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:

 Is the number of inconsistent objects growing?  Can you attach
 the
 whole ceph.log from the 6 hours before and after the snippet
 you
 linked above?  Are you using cache/tiering?  Can you attach the
 osdmap
 (ceph osd getmap -o outfile)?
 -Sam

 On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  ceph - 0.94.2
  Its happen during rebalancing
 
  I thought too, that some OSD miss copy, but looks like all
  miss...
  So any advice in which direction i need to go
 
  2015-08-18 14:14 GMT+03:00 Gregory Farnum 
 gfar...@redhat.com:
 
  From a quick peek it looks like some of the OSDs are missing
  clones
  of
  objects. I'm not sure how that could happen and I'd expect
 the
  pg
  repair to handle that but if it's not there's probably
  something
  wrong; what version of Ceph are you running? Sam, is this
  something
  you've seen, a new bug, or some kind of config issue?
  -Greg
 
  On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Hi all, at our production cluster, due high rebalancing
 (((
   we
   have 2
   pgs in
   inconsistent state...
  
   root@temp:~# ceph health detail | grep inc
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  
   From OSD logs, after recovery attempt:
  
   root@test:~# ceph pg dump | grep -i incons | cut -f 1 |
   while
   read
   i;
   do
   ceph pg repair ${i} ; done
   dumped all in format plain
   instructing pg 2.490 on osd.56 to repair
   instructing pg 2.c4 on osd.56 to repair
  
   /var/log/ceph/ceph-osd.56.log:51:2015-08-18
 07:26:37.035910
   7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   f5759490/rbd_data.1631755377d7e.04da/head//2
   expected
   clone
   90c59490/rbd_data.eb486436f2beb.7a65/141//2
   /var/log/ceph/ceph-osd.56.log:52:2015-08-18
 07:26:37.035960
   7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   fee49490/rbd_data.12483d3ba0794b.522f/head//2
   expected
   clone
   f5759490/rbd_data.1631755377d7e.04da/141//2
   /var/log/ceph/ceph-osd.56.log:53:2015-08-18
 07:26:37.036133
   7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Image? One?

We start deleting images only to fix thsi (export/import)m before - 1-4
times per day (when VM destroyed)...



2015-08-21 1:44 GMT+03:00 Samuel Just sj...@redhat.com:

 Interesting.  How often do you delete an image?  I'm wondering if
 whatever this is happened when you deleted these two images.
 -Sam

 On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Sam, i try to understand which rbd contain this chunks.. but no luck. No
 rbd
  images block names started with this...
 
  Actually, now that I think about it, you probably didn't remove the
  images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
  and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2
 
 
 
 
  2015-08-21 1:36 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Actually, now that I think about it, you probably didn't remove the
  images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
  and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
  other images (that's why the scrub errors went down briefly, those
  objects -- which were fine -- went away).  You might want to export
  and reimport those two images into new images, but leave the old ones
  alone until you can clean up the on disk state (image and snapshots)
  and clear the scrub errors.  You probably don't want to read the
  snapshots for those images either.  Everything else is, I think,
  harmless.
 
  The ceph-objectstore-tool feature would probably not be too hard,
  actually.  Each head/snapdir image has two attrs (possibly stored in
  leveldb -- that's why you want to modify the ceph-objectstore-tool and
  use its interfaces rather than mucking about with the files directly)
  '_' and 'snapset' which contain encoded representations of
  object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
  SnapSet has a set of clones and related metadata -- you want to read
  the SnapSet attr off disk and commit a transaction writing out a new
  version with that clone removed.  I'd start by cloning the repo,
  starting a vstart cluster locally, and reproducing the issue.  Next,
  get familiar with using ceph-objectstore-tool on the osds in that
  vstart cluster.  A good first change would be creating a
  ceph-objectstore-tool op that lets you dump json for the object_info_t
  and SnapSet (both types have format() methods which make that easy) on
  an object to stdout so you can confirm what's actually there.  oftc
  #ceph-devel or the ceph-devel mailing list would be the right place to
  ask questions.
 
  Otherwise, it'll probably get done in the next few weeks.
  -Sam
 
  On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   thank you Sam!
   I also noticed this linked errors during scrub...
  
   Now all lools like reasonable!
  
   So we will wait for bug to be closed.
  
   do you need any help on it?
  
   I mean i can help with coding/testing/etc...
  
   2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Ah, this is kind of silly.  I think you don't have 37 errors, but 2
   errors.  pg 2.490 object
   3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is
 missing
   snap 141.  If you look at the objects after that in the log:
  
   2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 :
 cluster
   [ERR] repair 2.490
   68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
   clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
   2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 :
 cluster
   [ERR] repair 2.490
   ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
   clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
  
   The clone from the second line matches the head object from the
   previous line, and they have the same clone id.  I *think* that the
   first error is real, and the subsequent ones are just scrub being
   dumb.  Same deal with pg 2.c4.  I just opened
   http://tracker.ceph.com/issues/12738.
  
   The original problem is that
   3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
   22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
   missing a clone.  Not sure how that happened, my money is on a
   cache/tiering evict racing with a snap trim.  If you have any logging
   or relevant information from when that happened, you should open a
   bug.  The 'snapdir' in the two object names indicates that the head
   object has actually been deleted (which makes sense if you moved the
   image to a new image and deleted the old one) and is only being kept
   around since there are live snapshots.  I suggest you leave the
   snapshots for those images alone for the time being -- removing them
   might cause the osd to crash trying to clean up the wierd on disk
   state.  Other than the leaked space from those two image snapshots
 and
   the annoying spurious scrub errors, I think no actual

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Good joke )

2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:

 Certainly, don't reproduce this with a cluster you care about :).
 -Sam

 On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote:
  What's supposed to happen is that the client transparently directs all
  requests to the cache pool rather than the cold pool when there is a
  cache pool.  If the kernel is sending requests to the cold pool,
  that's probably where the bug is.  Odd.  It could also be a bug
  specific 'forward' mode either in the client or on the osd.  Why did
  you have it in that mode?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
  We used 4.x branch, as we have very good Samsung 850 pro in
 production,
  and they don;t support ncq_trim...
 
  And 4,x first branch which include exceptions for this in libsata.c.
 
  sure we can backport this 1 line to 3.x branch, but we prefer no to go
  deeper if packege for new kernel exist.
 
  2015-08-21 1:56 GMT+03:00 Voloshanenko Igor 
 igor.voloshane...@gmail.com:
 
  root@test:~# uname -a
  Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
 UTC
  2015 x86_64 x86_64 x86_64 GNU/Linux
 
  2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Also, can you include the kernel version?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com
 wrote:
   Snapshotting with cache/tiering *is* supposed to work.  Can you
 open a
   bug?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
   andrija.pa...@gmail.com wrote:
   This was related to the caching layer, which doesnt support
   snapshooting per
   docs...for sake of closing the thread.
  
   On 17 August 2015 at 21:15, Voloshanenko Igor
   igor.voloshane...@gmail.com
   wrote:
  
   Hi all, can you please help me with unexplained situation...
  
   All snapshot inside ceph broken...
  
   So, as example, we have VM template, as rbd inside ceph.
   We can map it and mount to check that all ok with it
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
   /dev/rbd0
   root@test:~# parted /dev/rbd0 print
   Model: Unknown (unknown)
   Disk /dev/rbd0: 10.7GB
   Sector size (logical/physical): 512B/512B
   Partition Table: msdos
  
   Number  Start   End SizeType File system  Flags
1  1049kB  525MB   524MB   primary  ext4 boot
2  525MB   10.7GB  10.2GB  primary   lvm
  
   Than i want to create snap, so i do:
   root@test:~# rbd snap create
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  
   And now i want to map it:
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   /dev/rbd1
   root@test:~# parted /dev/rbd1 print
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Error: /dev/rbd1: unrecognised disk label
  
   Even md5 different...
   root@ix-s2:~# md5sum /dev/rbd0
   9a47797a07fee3a3d71316e22891d752  /dev/rbd0
   root@ix-s2:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
  
   Ok, now i protect snap and create clone... but same thing...
   md5 for clone same as for snap,,
  
   root@test:~# rbd unmap /dev/rbd1
   root@test:~# rbd snap protect
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   root@test:~# rbd clone
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   cold-storage/test-image
   root@test:~# rbd map cold-storage/test-image
   /dev/rbd1
   root@test:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
    but it's broken...
   root@test:~# parted /dev/rbd1 print
   Error: /dev/rbd1: unrecognised disk label
  
  
   =
  
   tech details:
  
   root@test:~# ceph -v
   ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
  
   We have 2 inconstistent pgs, but all images not placed on this
 pgs...
  
   root@test:~# ceph health detail
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
   18 scrub errors
  
   
  
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770)
 - up
   ([37,15,14], p37) acting ([37,15,14], p37)
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5@snap
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3
 (2.4a3)
   - up
   ([12,23,17], p12) acting ([12,23,17], p12)
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We switch to forward mode as step to switch cache layer off.

Right now we have samsung 850 pro in cache layer (10 ssd, 2 per nodes)
and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel
S3500 240G which we choose as replacement..

So with such good disks - cache layer - very big bottleneck for us...

2015-08-21 2:02 GMT+03:00 Samuel Just sj...@redhat.com:

 What's supposed to happen is that the client transparently directs all
 requests to the cache pool rather than the cold pool when there is a
 cache pool.  If the kernel is sending requests to the cold pool,
 that's probably where the bug is.  Odd.  It could also be a bug
 specific 'forward' mode either in the client or on the osd.  Why did
 you have it in that mode?
 -Sam

 On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  We used 4.x branch, as we have very good Samsung 850 pro in production,
  and they don;t support ncq_trim...
 
  And 4,x first branch which include exceptions for this in libsata.c.
 
  sure we can backport this 1 line to 3.x branch, but we prefer no to go
  deeper if packege for new kernel exist.
 
  2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com
 :
 
  root@test:~# uname -a
  Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
 UTC
  2015 x86_64 x86_64 x86_64 GNU/Linux
 
  2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Also, can you include the kernel version?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote:
   Snapshotting with cache/tiering *is* supposed to work.  Can you open
 a
   bug?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
   andrija.pa...@gmail.com wrote:
   This was related to the caching layer, which doesnt support
   snapshooting per
   docs...for sake of closing the thread.
  
   On 17 August 2015 at 21:15, Voloshanenko Igor
   igor.voloshane...@gmail.com
   wrote:
  
   Hi all, can you please help me with unexplained situation...
  
   All snapshot inside ceph broken...
  
   So, as example, we have VM template, as rbd inside ceph.
   We can map it and mount to check that all ok with it
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
   /dev/rbd0
   root@test:~# parted /dev/rbd0 print
   Model: Unknown (unknown)
   Disk /dev/rbd0: 10.7GB
   Sector size (logical/physical): 512B/512B
   Partition Table: msdos
  
   Number  Start   End SizeType File system  Flags
1  1049kB  525MB   524MB   primary  ext4 boot
2  525MB   10.7GB  10.2GB  primary   lvm
  
   Than i want to create snap, so i do:
   root@test:~# rbd snap create
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  
   And now i want to map it:
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   /dev/rbd1
   root@test:~# parted /dev/rbd1 print
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Error: /dev/rbd1: unrecognised disk label
  
   Even md5 different...
   root@ix-s2:~# md5sum /dev/rbd0
   9a47797a07fee3a3d71316e22891d752  /dev/rbd0
   root@ix-s2:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
  
   Ok, now i protect snap and create clone... but same thing...
   md5 for clone same as for snap,,
  
   root@test:~# rbd unmap /dev/rbd1
   root@test:~# rbd snap protect
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   root@test:~# rbd clone
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   cold-storage/test-image
   root@test:~# rbd map cold-storage/test-image
   /dev/rbd1
   root@test:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
    but it's broken...
   root@test:~# parted /dev/rbd1 print
   Error: /dev/rbd1: unrecognised disk label
  
  
   =
  
   tech details:
  
   root@test:~# ceph -v
   ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
  
   We have 2 inconstistent pgs, but all images not placed on this
 pgs...
  
   root@test:~# ceph health detail
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
   18 scrub errors
  
   
  
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) -
 up
   ([37,15,14], p37) acting ([37,15,14], p37)
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5@snap
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3
 (2.4a3)
   - up
   ([12,23,17], p12) acting ([12,23,17], p12)
   root@test:~# ceph osd

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right. But issues started...

2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:

 But that was still in writeback mode, right?
 -Sam

 On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  WE haven't set values for max_bytes / max_objects.. and all data
 initially
  writes only to cache layer and not flushed at all to cold layer.
 
  Then we received notification from monitoring that we collect about
 750GB in
  hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
  size... And then evicting/flushing started...
 
  And issue with snapshots arrived
 
  2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Not sure what you mean by:
 
  but it's stop to work in same moment, when cache layer fulfilled with
  data and evict/flush started...
  -Sam
 
  On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   No, when we start draining cache - bad pgs was in place...
   We have big rebalance (disk by disk - to change journal side on both
   hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
   and 2
   pgs inconsistent...
  
   In writeback - yes, looks like snapshot works good. but it's stop to
   work in
   same moment, when cache layer fulfilled with data and evict/flush
   started...
  
  
  
   2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
  
   So you started draining the cache pool before you saw either the
   inconsistent pgs or the anomalous snap behavior?  (That is, writeback
   mode was working correctly?)
   -Sam
  
   On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Good joke )
   
2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
   
Certainly, don't reproduce this with a cluster you care about :).
-Sam
   
On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com
wrote:
 What's supposed to happen is that the client transparently
 directs
 all
 requests to the cache pool rather than the cold pool when there
 is
 a
 cache pool.  If the kernel is sending requests to the cold pool,
 that's probably where the bug is.  Odd.  It could also be a bug
 specific 'forward' mode either in the client or on the osd.  Why
 did
 you have it in that mode?
 -Sam

 On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
 We used 4.x branch, as we have very good Samsung 850 pro in
 production,
 and they don;t support ncq_trim...

 And 4,x first branch which include exceptions for this in
 libsata.c.

 sure we can backport this 1 line to 3.x branch, but we prefer
 no
 to
 go
 deeper if packege for new kernel exist.

 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
 igor.voloshane...@gmail.com:

 root@test:~# uname -a
 Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
 17:37:22
 UTC
 2015 x86_64 x86_64 x86_64 GNU/Linux

 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:

 Also, can you include the kernel version?
 -Sam

 On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
 sj...@redhat.com
 wrote:
  Snapshotting with cache/tiering *is* supposed to work.  Can
  you
  open a
  bug?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
  andrija.pa...@gmail.com wrote:
  This was related to the caching layer, which doesnt
 support
  snapshooting per
  docs...for sake of closing the thread.
 
  On 17 August 2015 at 21:15, Voloshanenko Igor
  igor.voloshane...@gmail.com
  wrote:
 
  Hi all, can you please help me with unexplained
  situation...
 
  All snapshot inside ceph broken...
 
  So, as example, we have VM template, as rbd inside ceph.
  We can map it and mount to check that all ok with it
 
  root@test:~# rbd map
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
  /dev/rbd0
  root@test:~# parted /dev/rbd0 print
  Model: Unknown (unknown)
  Disk /dev/rbd0: 10.7GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos
 
  Number  Start   End SizeType File system
 Flags
   1  1049kB  525MB   524MB   primary  ext4
  boot
   2  525MB   10.7GB  10.2GB  primary   lvm
 
  Than i want to create snap, so i do:
  root@test:~# rbd snap create
 
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 
  And now i want to map it:
 
  root@test:~# rbd map
 
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  /dev/rbd1
  root@test:~# parted /dev/rbd1 print
  Warning: Unable to open /dev/rbd1 read-write (Read-only
  file
  system).
  /dev/rbd1 has been opened read-only.
  Warning: Unable to open /dev/rbd1 read-write (Read-only
  file
  system

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
As i we use journal collocation for journal now (because we want to utilize
cache layer ((( ) i use ceph-disk to create new OSD (changed journal size
on ceph.conf). I don;t prefer manual work))

So create very simple script to update journal size

2015-08-21 2:25 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Exactly

 пятница, 21 августа 2015 г. пользователь Samuel Just написал:

 And you adjusted the journals by removing the osd, recreating it with
 a larger journal, and reinserting it?
 -Sam

 On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Right ( but also was rebalancing cycle 2 day before pgs corrupted)
 
  2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Specifically, the snap behavior (we already know that the pgs went
  inconsistent while the pool was in writeback mode, right?).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote:
   Yeah, I'm trying to confirm that the issues did happen in writeback
   mode.
   -Sam
  
   On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
   Right. But issues started...
  
   2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:
  
   But that was still in writeback mode, right?
   -Sam
  
   On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
WE haven't set values for max_bytes / max_objects.. and all data
initially
writes only to cache layer and not flushed at all to cold layer.
   
Then we received notification from monitoring that we collect
 about
750GB in
hot pool ) So i changed values for max_object_bytes to be 0,9 of
disk
size... And then evicting/flushing started...
   
And issue with snapshots arrived
   
2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:
   
Not sure what you mean by:
   
but it's stop to work in same moment, when cache layer fulfilled
with
data and evict/flush started...
-Sam
   
On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 No, when we start draining cache - bad pgs was in place...
 We have big rebalance (disk by disk - to change journal side
 on
 both
 hot/cold layers).. All was Ok, but after 2 days - arrived
 scrub
 errors
 and 2
 pgs inconsistent...

 In writeback - yes, looks like snapshot works good. but it's
 stop
 to
 work in
 same moment, when cache layer fulfilled with data and
 evict/flush
 started...



 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:

 So you started draining the cache pool before you saw either
 the
 inconsistent pgs or the anomalous snap behavior?  (That is,
 writeback
 mode was working correctly?)
 -Sam

 On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Good joke )
 
  2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Certainly, don't reproduce this with a cluster you care
 about
  :).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
  sj...@redhat.com
  wrote:
   What's supposed to happen is that the client
 transparently
   directs
   all
   requests to the cache pool rather than the cold pool
 when
   there
   is
   a
   cache pool.  If the kernel is sending requests to the
 cold
   pool,
   that's probably where the bug is.  Odd.  It could also
 be a
   bug
   specific 'forward' mode either in the client or on the
 osd.
   Why
   did
   you have it in that mode?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
   We used 4.x branch, as we have very good Samsung 850
 pro
   in
   production,
   and they don;t support ncq_trim...
  
   And 4,x first branch which include exceptions for this
 in
   libsata.c.
  
   sure we can backport this 1 line to 3.x branch, but we
   prefer
   no
   to
   go
   deeper if packege for new kernel exist.
  
   2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
   igor.voloshane...@gmail.com:
  
   root@test:~# uname -a
   Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
   May 17
   17:37:22
   UTC
   2015 x86_64 x86_64 x86_64 GNU/Linux
  
   2015-08-21 1:54 GMT+03:00 Samuel Just 
 sj...@redhat.com:
  
   Also, can you include the kernel version?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
   sj...@redhat.com
   wrote:
Snapshotting with cache/tiering *is* supposed to
 work.
Can
you
open a
bug?
-Sam
   
On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
andrija.pa...@gmail.com wrote:
This was related to the caching layer, which

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Will do, Sam!

thank in advance for you help!

2015-08-21 2:28 GMT+03:00 Samuel Just sj...@redhat.com:

 Ok, create a ticket with a timeline and all of this information, I'll
 try to look into it more tomorrow.
 -Sam

 On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Exactly
 
  пятница, 21 августа 2015 г. пользователь Samuel Just написал:
 
  And you adjusted the journals by removing the osd, recreating it with
  a larger journal, and reinserting it?
  -Sam
 
  On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Right ( but also was rebalancing cycle 2 day before pgs corrupted)
  
   2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Specifically, the snap behavior (we already know that the pgs went
   inconsistent while the pool was in writeback mode, right?).
   -Sam
  
   On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com
 wrote:
Yeah, I'm trying to confirm that the issues did happen in writeback
mode.
-Sam
   
On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
Right. But issues started...
   
2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:
   
But that was still in writeback mode, right?
-Sam
   
On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 WE haven't set values for max_bytes / max_objects.. and all
 data
 initially
 writes only to cache layer and not flushed at all to cold
 layer.

 Then we received notification from monitoring that we collect
 about
 750GB in
 hot pool ) So i changed values for max_object_bytes to be 0,9
 of
 disk
 size... And then evicting/flushing started...

 And issue with snapshots arrived

 2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:

 Not sure what you mean by:

 but it's stop to work in same moment, when cache layer
 fulfilled
 with
 data and evict/flush started...
 -Sam

 On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  No, when we start draining cache - bad pgs was in place...
  We have big rebalance (disk by disk - to change journal side
  on
  both
  hot/cold layers).. All was Ok, but after 2 days - arrived
  scrub
  errors
  and 2
  pgs inconsistent...
 
  In writeback - yes, looks like snapshot works good. but it's
  stop
  to
  work in
  same moment, when cache layer fulfilled with data and
  evict/flush
  started...
 
 
 
  2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
 
  So you started draining the cache pool before you saw
 either
  the
  inconsistent pgs or the anomalous snap behavior?  (That is,
  writeback
  mode was working correctly?)
  -Sam
 
  On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Good joke )
  
   2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com
 :
  
   Certainly, don't reproduce this with a cluster you care
   about
   :).
   -Sam
  
   On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
   sj...@redhat.com
   wrote:
What's supposed to happen is that the client
transparently
directs
all
requests to the cache pool rather than the cold pool
when
there
is
a
cache pool.  If the kernel is sending requests to the
cold
pool,
that's probably where the bug is.  Odd.  It could also
be a
bug
specific 'forward' mode either in the client or on the
osd.
Why
did
you have it in that mode?
-Sam
   
On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
We used 4.x branch, as we have very good Samsung
 850
pro
in
production,
and they don;t support ncq_trim...
   
And 4,x first branch which include exceptions for
 this
in
libsata.c.
   
sure we can backport this 1 line to 3.x branch, but
 we
prefer
no
to
go
deeper if packege for new kernel exist.
   
2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
igor.voloshane...@gmail.com:
   
root@test:~# uname -a
Linux ix-s5 4.0.4-040004-generic #201505171336 SMP
 Sun
May 17
17:37:22
UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
   
2015-08-21 1:54 GMT+03:00 Samuel Just
sj...@redhat.com:
   
Also, can you include the kernel version?
-Sam
   
On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
sj...@redhat.com
wrote:
 Snapshotting with cache/tiering *is* supposed

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Hi Samuel, we try to fix it in trick way.

we check all rbd_data chunks from logs (OSD) which are affected, then query
rbd info to compare which rbd consist bad rbd_data, after that we mount
this rbd as rbd0, create empty rbd, and DD all info from bad volume to new
one.

But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos
try to out OSD which was lead, but after rebalancing this 2 pgs still have
35 scrub errors...

ceph osd getmap -o outfile - attached


2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:

 Is the number of inconsistent objects growing?  Can you attach the
 whole ceph.log from the 6 hours before and after the snippet you
 linked above?  Are you using cache/tiering?  Can you attach the osdmap
 (ceph osd getmap -o outfile)?
 -Sam

 On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  ceph - 0.94.2
  Its happen during rebalancing
 
  I thought too, that some OSD miss copy, but looks like all miss...
  So any advice in which direction i need to go
 
  2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com:
 
  From a quick peek it looks like some of the OSDs are missing clones of
  objects. I'm not sure how that could happen and I'd expect the pg
  repair to handle that but if it's not there's probably something
  wrong; what version of Ceph are you running? Sam, is this something
  you've seen, a new bug, or some kind of config issue?
  -Greg
 
  On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Hi all, at our production cluster, due high rebalancing ((( we have 2
   pgs in
   inconsistent state...
  
   root@temp:~# ceph health detail | grep inc
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  
   From OSD logs, after recovery attempt:
  
   root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read
 i; do
   ceph pg repair ${i} ; done
   dumped all in format plain
   instructing pg 2.490 on osd.56 to repair
   instructing pg 2.c4 on osd.56 to repair
  
   /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   f5759490/rbd_data.1631755377d7e.04da/head//2 expected
 clone
   90c59490/rbd_data.eb486436f2beb.7a65/141//2
   /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
 clone
   f5759490/rbd_data.1631755377d7e.04da/141//2
   /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
 clone
   fee49490/rbd_data.12483d3ba0794b.522f/141//2
   /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
 clone
   a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
   /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
 clone
   bac19490/rbd_data.1238e82ae8944a.032e/141//2
   /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
 clone
   98519490/rbd_data.123e9c2ae8944a.0807/141//2
   /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   28809490/rbd_data.edea7460fe42b.01d9/head//2 expected
 clone
   c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
   /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : deep-scrub 2.490
   e1509490/rbd_data.1423897545e146.09a6/head//2 expected
 clone
   28809490/rbd_data.edea7460fe42b.01d9/141//2
   /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765
 7f94663b3700
   -1
   log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors
  
   So, how i can solve expected clone situation by hand?
   Thank in advance!
  
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 



osdmap
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
WE haven't set values for max_bytes / max_objects.. and all data initially
writes only to cache layer and not flushed at all to cold layer.

Then we received notification from monitoring that we collect about 750GB
in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
size... And then evicting/flushing started...

And issue with snapshots arrived

2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:

 Not sure what you mean by:

 but it's stop to work in same moment, when cache layer fulfilled with
 data and evict/flush started...
 -Sam

 On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  No, when we start draining cache - bad pgs was in place...
  We have big rebalance (disk by disk - to change journal side on both
  hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
 and 2
  pgs inconsistent...
 
  In writeback - yes, looks like snapshot works good. but it's stop to
 work in
  same moment, when cache layer fulfilled with data and evict/flush
 started...
 
 
 
  2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
 
  So you started draining the cache pool before you saw either the
  inconsistent pgs or the anomalous snap behavior?  (That is, writeback
  mode was working correctly?)
  -Sam
 
  On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Good joke )
  
   2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Certainly, don't reproduce this with a cluster you care about :).
   -Sam
  
   On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com
 wrote:
What's supposed to happen is that the client transparently directs
all
requests to the cache pool rather than the cold pool when there is
 a
cache pool.  If the kernel is sending requests to the cold pool,
that's probably where the bug is.  Odd.  It could also be a bug
specific 'forward' mode either in the client or on the osd.  Why
 did
you have it in that mode?
-Sam
   
On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
We used 4.x branch, as we have very good Samsung 850 pro in
production,
and they don;t support ncq_trim...
   
And 4,x first branch which include exceptions for this in
 libsata.c.
   
sure we can backport this 1 line to 3.x branch, but we prefer no
 to
go
deeper if packege for new kernel exist.
   
2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
igor.voloshane...@gmail.com:
   
root@test:~# uname -a
Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
17:37:22
UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
   
2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
   
Also, can you include the kernel version?
-Sam
   
On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com
wrote:
 Snapshotting with cache/tiering *is* supposed to work.  Can
 you
 open a
 bug?
 -Sam

 On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
 andrija.pa...@gmail.com wrote:
 This was related to the caching layer, which doesnt support
 snapshooting per
 docs...for sake of closing the thread.

 On 17 August 2015 at 21:15, Voloshanenko Igor
 igor.voloshane...@gmail.com
 wrote:

 Hi all, can you please help me with unexplained situation...

 All snapshot inside ceph broken...

 So, as example, we have VM template, as rbd inside ceph.
 We can map it and mount to check that all ok with it

 root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
 /dev/rbd0
 root@test:~# parted /dev/rbd0 print
 Model: Unknown (unknown)
 Disk /dev/rbd0: 10.7GB
 Sector size (logical/physical): 512B/512B
 Partition Table: msdos

 Number  Start   End SizeType File system  Flags
  1  1049kB  525MB   524MB   primary  ext4 boot
  2  525MB   10.7GB  10.2GB  primary   lvm

 Than i want to create snap, so i do:
 root@test:~# rbd snap create
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

 And now i want to map it:

 root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 /dev/rbd1
 root@test:~# parted /dev/rbd1 print
 Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
 /dev/rbd1 has been opened read-only.
 Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
 /dev/rbd1 has been opened read-only.
 Error: /dev/rbd1: unrecognised disk label

 Even md5 different...
 root@ix-s2:~# md5sum /dev/rbd0
 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
 root@ix-s2:~# md5sum /dev/rbd1
 e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1


 Ok, now i protect snap and create clone... but same thing...
 md5 for clone same as for snap,,

 root@test

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I already kill cache layer, but will try to reproduce on lab

2015-08-21 1:58 GMT+03:00 Samuel Just sj...@redhat.com:

 Hmm, that might actually be client side.  Can you attempt to reproduce
 with rbd-fuse (different client side implementation from the kernel)?
 -Sam

 On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  root@test:~# uname -a
  Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
 UTC
  2015 x86_64 x86_64 x86_64 GNU/Linux
 
  2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Also, can you include the kernel version?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote:
   Snapshotting with cache/tiering *is* supposed to work.  Can you open a
   bug?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic 
 andrija.pa...@gmail.com
   wrote:
   This was related to the caching layer, which doesnt support
   snapshooting per
   docs...for sake of closing the thread.
  
   On 17 August 2015 at 21:15, Voloshanenko Igor
   igor.voloshane...@gmail.com
   wrote:
  
   Hi all, can you please help me with unexplained situation...
  
   All snapshot inside ceph broken...
  
   So, as example, we have VM template, as rbd inside ceph.
   We can map it and mount to check that all ok with it
  
   root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
   /dev/rbd0
   root@test:~# parted /dev/rbd0 print
   Model: Unknown (unknown)
   Disk /dev/rbd0: 10.7GB
   Sector size (logical/physical): 512B/512B
   Partition Table: msdos
  
   Number  Start   End SizeType File system  Flags
1  1049kB  525MB   524MB   primary  ext4 boot
2  525MB   10.7GB  10.2GB  primary   lvm
  
   Than i want to create snap, so i do:
   root@test:~# rbd snap create
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  
   And now i want to map it:
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   /dev/rbd1
   root@test:~# parted /dev/rbd1 print
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Warning: Unable to open /dev/rbd1 read-write (Read-only file
 system).
   /dev/rbd1 has been opened read-only.
   Error: /dev/rbd1: unrecognised disk label
  
   Even md5 different...
   root@ix-s2:~# md5sum /dev/rbd0
   9a47797a07fee3a3d71316e22891d752  /dev/rbd0
   root@ix-s2:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
  
   Ok, now i protect snap and create clone... but same thing...
   md5 for clone same as for snap,,
  
   root@test:~# rbd unmap /dev/rbd1
   root@test:~# rbd snap protect
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   root@test:~# rbd clone
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   cold-storage/test-image
   root@test:~# rbd map cold-storage/test-image
   /dev/rbd1
   root@test:~# md5sum /dev/rbd1
   e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
  
    but it's broken...
   root@test:~# parted /dev/rbd1 print
   Error: /dev/rbd1: unrecognised disk label
  
  
   =
  
   tech details:
  
   root@test:~# ceph -v
   ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
  
   We have 2 inconstistent pgs, but all images not placed on this
 pgs...
  
   root@test:~# ceph health detail
   HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
   pg 2.490 is active+clean+inconsistent, acting [56,15,29]
   pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
   18 scrub errors
  
   
  
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) -
 up
   ([37,15,14], p37) acting ([37,15,14], p37)
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5@snap
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3
 (2.4a3)
   - up
   ([12,23,17], p12) acting ([12,23,17], p12)
   root@test:~# ceph osd map cold-storage
   0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
   osdmap e16770 pool 'cold-storage' (2) object
   '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9
   (2.2a9)
   - up ([12,44,23], p12) acting ([12,44,23], p12)
  
  
   Also we use cache layer, which in current moment - in forward
 mode...
  
   Can you please help me with this.. As my brain stop to understand
 what
   is
   going on...
  
   Thank in advance!
  
  
  
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  
  
  
   --
  
   Andrija Panić
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
I mean in forward mode - it;s permanent problem - snapshots not working.
And for writeback mode after we change max_bytes/object values, it;s around
30 by 70... 70% of time it;s works... 30% - not. Looks like for old images
- snapshots works fine (images which already exists before we change
values). For any new images - no

2015-08-21 2:21 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Right. But issues started...

 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:

 But that was still in writeback mode, right?
 -Sam

 On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  WE haven't set values for max_bytes / max_objects.. and all data
 initially
  writes only to cache layer and not flushed at all to cold layer.
 
  Then we received notification from monitoring that we collect about
 750GB in
  hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
  size... And then evicting/flushing started...
 
  And issue with snapshots arrived
 
  2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Not sure what you mean by:
 
  but it's stop to work in same moment, when cache layer fulfilled with
  data and evict/flush started...
  -Sam
 
  On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   No, when we start draining cache - bad pgs was in place...
   We have big rebalance (disk by disk - to change journal side on both
   hot/cold layers).. All was Ok, but after 2 days - arrived scrub
 errors
   and 2
   pgs inconsistent...
  
   In writeback - yes, looks like snapshot works good. but it's stop to
   work in
   same moment, when cache layer fulfilled with data and evict/flush
   started...
  
  
  
   2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
  
   So you started draining the cache pool before you saw either the
   inconsistent pgs or the anomalous snap behavior?  (That is,
 writeback
   mode was working correctly?)
   -Sam
  
   On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Good joke )
   
2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
   
Certainly, don't reproduce this with a cluster you care about :).
-Sam
   
On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com
wrote:
 What's supposed to happen is that the client transparently
 directs
 all
 requests to the cache pool rather than the cold pool when
 there is
 a
 cache pool.  If the kernel is sending requests to the cold
 pool,
 that's probably where the bug is.  Odd.  It could also be a bug
 specific 'forward' mode either in the client or on the osd.
 Why
 did
 you have it in that mode?
 -Sam

 On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
 We used 4.x branch, as we have very good Samsung 850 pro in
 production,
 and they don;t support ncq_trim...

 And 4,x first branch which include exceptions for this in
 libsata.c.

 sure we can backport this 1 line to 3.x branch, but we prefer
 no
 to
 go
 deeper if packege for new kernel exist.

 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
 igor.voloshane...@gmail.com:

 root@test:~# uname -a
 Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
 17:37:22
 UTC
 2015 x86_64 x86_64 x86_64 GNU/Linux

 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:

 Also, can you include the kernel version?
 -Sam

 On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just 
 sj...@redhat.com
 wrote:
  Snapshotting with cache/tiering *is* supposed to work.
 Can
  you
  open a
  bug?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
  andrija.pa...@gmail.com wrote:
  This was related to the caching layer, which doesnt
 support
  snapshooting per
  docs...for sake of closing the thread.
 
  On 17 August 2015 at 21:15, Voloshanenko Igor
  igor.voloshane...@gmail.com
  wrote:
 
  Hi all, can you please help me with unexplained
  situation...
 
  All snapshot inside ceph broken...
 
  So, as example, we have VM template, as rbd inside ceph.
  We can map it and mount to check that all ok with it
 
  root@test:~# rbd map
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
  /dev/rbd0
  root@test:~# parted /dev/rbd0 print
  Model: Unknown (unknown)
  Disk /dev/rbd0: 10.7GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos
 
  Number  Start   End SizeType File system
 Flags
   1  1049kB  525MB   524MB   primary  ext4
  boot
   2  525MB   10.7GB  10.2GB  primary
  lvm
 
  Than i want to create snap, so i do:
  root@test:~# rbd snap create
 
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 
  And now i want

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Our initial values for journal sizes was enough, but flush time was 5 secs,
so we increase journal side to fit flush timeframe min|max for 29/30
seconds.

I mean
  filestore max sync interval = 30
  filestore min sync interval = 29
when said flush time

2015-08-21 2:16 GMT+03:00 Samuel Just sj...@redhat.com:

 Also, what do you mean by change journal side?
 -Sam

 On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just sj...@redhat.com wrote:
  Not sure what you mean by:
 
  but it's stop to work in same moment, when cache layer fulfilled with
  data and evict/flush started...
  -Sam
 
  On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
  No, when we start draining cache - bad pgs was in place...
  We have big rebalance (disk by disk - to change journal side on both
  hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
 and 2
  pgs inconsistent...
 
  In writeback - yes, looks like snapshot works good. but it's stop to
 work in
  same moment, when cache layer fulfilled with data and evict/flush
 started...
 
 
 
  2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
 
  So you started draining the cache pool before you saw either the
  inconsistent pgs or the anomalous snap behavior?  (That is, writeback
  mode was working correctly?)
  -Sam
 
  On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Good joke )
  
   2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Certainly, don't reproduce this with a cluster you care about :).
   -Sam
  
   On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com
 wrote:
What's supposed to happen is that the client transparently directs
all
requests to the cache pool rather than the cold pool when there
 is a
cache pool.  If the kernel is sending requests to the cold pool,
that's probably where the bug is.  Odd.  It could also be a bug
specific 'forward' mode either in the client or on the osd.  Why
 did
you have it in that mode?
-Sam
   
On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
We used 4.x branch, as we have very good Samsung 850 pro in
production,
and they don;t support ncq_trim...
   
And 4,x first branch which include exceptions for this in
 libsata.c.
   
sure we can backport this 1 line to 3.x branch, but we prefer no
 to
go
deeper if packege for new kernel exist.
   
2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
igor.voloshane...@gmail.com:
   
root@test:~# uname -a
Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
17:37:22
UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
   
2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
   
Also, can you include the kernel version?
-Sam
   
On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com
 
wrote:
 Snapshotting with cache/tiering *is* supposed to work.  Can
 you
 open a
 bug?
 -Sam

 On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
 andrija.pa...@gmail.com wrote:
 This was related to the caching layer, which doesnt support
 snapshooting per
 docs...for sake of closing the thread.

 On 17 August 2015 at 21:15, Voloshanenko Igor
 igor.voloshane...@gmail.com
 wrote:

 Hi all, can you please help me with unexplained
 situation...

 All snapshot inside ceph broken...

 So, as example, we have VM template, as rbd inside ceph.
 We can map it and mount to check that all ok with it

 root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
 /dev/rbd0
 root@test:~# parted /dev/rbd0 print
 Model: Unknown (unknown)
 Disk /dev/rbd0: 10.7GB
 Sector size (logical/physical): 512B/512B
 Partition Table: msdos

 Number  Start   End SizeType File system  Flags
  1  1049kB  525MB   524MB   primary  ext4 boot
  2  525MB   10.7GB  10.2GB  primary   lvm

 Than i want to create snap, so i do:
 root@test:~# rbd snap create
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

 And now i want to map it:

 root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 /dev/rbd1
 root@test:~# parted /dev/rbd1 print
 Warning: Unable to open /dev/rbd1 read-write (Read-only
 file
 system).
 /dev/rbd1 has been opened read-only.
 Warning: Unable to open /dev/rbd1 read-write (Read-only
 file
 system).
 /dev/rbd1 has been opened read-only.
 Error: /dev/rbd1: unrecognised disk label

 Even md5 different...
 root@ix-s2:~# md5sum /dev/rbd0
 9a47797a07fee3a3d71316e22891d752  /dev/rbd0
 root@ix-s2:~# md5sum /dev/rbd1
 e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1


 Ok, now i protect snap and create clone... but same
 thing...
 md5 for clone same as for snap

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Yes, will do.

What we see. When cache tier in forward mod, if i did
rbd snap create - it's use rbd_header not from cold tier, but from
hot-tier, butm this 2 headers not synced
And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i
kill lock, evict header - all start to work..
But it's unacceptable for production... To kill lock during running VM (((

2015-08-21 1:51 GMT+03:00 Samuel Just sj...@redhat.com:

 Snapshotting with cache/tiering *is* supposed to work.  Can you open a bug?
 -Sam

 On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  This was related to the caching layer, which doesnt support snapshooting
 per
  docs...for sake of closing the thread.
 
  On 17 August 2015 at 21:15, Voloshanenko Igor 
 igor.voloshane...@gmail.com
  wrote:
 
  Hi all, can you please help me with unexplained situation...
 
  All snapshot inside ceph broken...
 
  So, as example, we have VM template, as rbd inside ceph.
  We can map it and mount to check that all ok with it
 
  root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
  /dev/rbd0
  root@test:~# parted /dev/rbd0 print
  Model: Unknown (unknown)
  Disk /dev/rbd0: 10.7GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos
 
  Number  Start   End SizeType File system  Flags
   1  1049kB  525MB   524MB   primary  ext4 boot
   2  525MB   10.7GB  10.2GB  primary   lvm
 
  Than i want to create snap, so i do:
  root@test:~# rbd snap create
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 
  And now i want to map it:
 
  root@test:~# rbd map
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  /dev/rbd1
  root@test:~# parted /dev/rbd1 print
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Error: /dev/rbd1: unrecognised disk label
 
  Even md5 different...
  root@ix-s2:~# md5sum /dev/rbd0
  9a47797a07fee3a3d71316e22891d752  /dev/rbd0
  root@ix-s2:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
 
  Ok, now i protect snap and create clone... but same thing...
  md5 for clone same as for snap,,
 
  root@test:~# rbd unmap /dev/rbd1
  root@test:~# rbd snap protect
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  root@test:~# rbd clone
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  cold-storage/test-image
  root@test:~# rbd map cold-storage/test-image
  /dev/rbd1
  root@test:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
   but it's broken...
  root@test:~# parted /dev/rbd1 print
  Error: /dev/rbd1: unrecognised disk label
 
 
  =
 
  tech details:
 
  root@test:~# ceph -v
  ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 
  We have 2 inconstistent pgs, but all images not placed on this pgs...
 
  root@test:~# ceph health detail
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  18 scrub errors
 
  
 
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
  ([37,15,14], p37) acting ([37,15,14], p37)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@snap
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3)
 - up
  ([12,23,17], p12) acting ([12,23,17], p12)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9
 (2.2a9)
  - up ([12,44,23], p12) acting ([12,44,23], p12)
 
 
  Also we use cache layer, which in current moment - in forward mode...
 
  Can you please help me with this.. As my brain stop to understand what
 is
  going on...
 
  Thank in advance!
 
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Attachment blocked, so post as text...

root@zzz:~# cat update_osd.sh
#!/bin/bash

ID=$1
echo Process OSD# ${ID}

DEV=`mount | grep ceph-${ID}  | cut -d   -f 1`
echo OSD# ${ID} hosted on ${DEV::-1}

TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d   -f 6`
if [ ${TYPE_RAW} == Solid ]
then
TYPE=ssd
elif [ ${TYPE_RAW} == 7200 ]
then
TYPE=platter
fi

echo OSD Type = ${TYPE}

HOST=`hostname`
echo Current node hostname: ${HOST}

echo Set noout option for CEPH cluster
ceph osd set noout

echo Marked OSD # ${ID} out
  [19/1857]
ceph osd out ${ID}

echo Remove OSD # ${ID} from CRUSHMAP
ceph osd crush remove osd.${ID}

echo Delete auth for OSD# ${ID}
ceph auth del osd.${ID}

echo Stop OSD# ${ID}
stop ceph-osd id=${ID}

echo Remove OSD # ${ID} from cluster
ceph osd rm ${ID}

echo Unmount OSD# ${ID}
umount ${DEV}

echo ZAP ${DEV::-1}
ceph-disk zap ${DEV::-1}

echo Create new OSD with ${DEV::-1}
ceph-disk-prepare ${DEV::-1}

echo Activate new OSD
ceph-disk-activate ${DEV}

echo Dump current CRUSHMAP
ceph osd getcrushmap -o cm.old

echo Decompile CRUSHMAP
crushtool -d cm.old -o cm

echo Place new OSD in proper place
sed -i s/device${ID}/osd.${ID}/ cm
LINE=`cat -n cm | sed -n /${HOST}-${TYPE} {/,/}/p | tail -n 1 | awk
'{print $1}'`
sed -i ${LINE}iitem osd.${ID} weight 1.000 cm

echo Modify ${HOST} weight into CRUSHMAP
sed -i s/item ${HOST}-${TYPE} weight 9.000/item ${HOST}-${TYPE} weight
1.000/ cm

echo Compile new CRUSHMAP
crushtool -c cm -o cm.new

echo Inject new CRUSHMAP
ceph osd setcrushmap -i cm.new

#echo Clean...
#rm -rf cm cm.new

echo Unset noout option for CEPH cluster
ceph osd unset noout

echo OSD recreated... Waiting for rebalancing...

2015-08-21 2:37 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 As i we use journal collocation for journal now (because we want to
 utilize cache layer ((( ) i use ceph-disk to create new OSD (changed
 journal size on ceph.conf). I don;t prefer manual work))

 So create very simple script to update journal size

 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Exactly

 пятница, 21 августа 2015 г. пользователь Samuel Just написал:

 And you adjusted the journals by removing the osd, recreating it with
 a larger journal, and reinserting it?
 -Sam

 On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Right ( but also was rebalancing cycle 2 day before pgs corrupted)
 
  2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Specifically, the snap behavior (we already know that the pgs went
  inconsistent while the pool was in writeback mode, right?).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com
 wrote:
   Yeah, I'm trying to confirm that the issues did happen in writeback
   mode.
   -Sam
  
   On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
   Right. But issues started...
  
   2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:
  
   But that was still in writeback mode, right?
   -Sam
  
   On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
WE haven't set values for max_bytes / max_objects.. and all data
initially
writes only to cache layer and not flushed at all to cold layer.
   
Then we received notification from monitoring that we collect
 about
750GB in
hot pool ) So i changed values for max_object_bytes to be 0,9 of
disk
size... And then evicting/flushing started...
   
And issue with snapshots arrived
   
2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:
   
Not sure what you mean by:
   
but it's stop to work in same moment, when cache layer
 fulfilled
with
data and evict/flush started...
-Sam
   
On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 No, when we start draining cache - bad pgs was in place...
 We have big rebalance (disk by disk - to change journal side
 on
 both
 hot/cold layers).. All was Ok, but after 2 days - arrived
 scrub
 errors
 and 2
 pgs inconsistent...

 In writeback - yes, looks like snapshot works good. but it's
 stop
 to
 work in
 same moment, when cache layer fulfilled with data and
 evict/flush
 started...



 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:

 So you started draining the cache pool before you saw
 either the
 inconsistent pgs or the anomalous snap behavior?  (That is,
 writeback
 mode was working correctly?)
 -Sam

 On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Good joke )
 
  2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Certainly, don't reproduce this with a cluster you care
 about
  :).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
  sj...@redhat.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
root@test:~# uname -a
Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:

 Also, can you include the kernel version?
 -Sam

 On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote:
  Snapshotting with cache/tiering *is* supposed to work.  Can you open a
 bug?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  This was related to the caching layer, which doesnt support
 snapshooting per
  docs...for sake of closing the thread.
 
  On 17 August 2015 at 21:15, Voloshanenko Igor 
 igor.voloshane...@gmail.com
  wrote:
 
  Hi all, can you please help me with unexplained situation...
 
  All snapshot inside ceph broken...
 
  So, as example, we have VM template, as rbd inside ceph.
  We can map it and mount to check that all ok with it
 
  root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
  /dev/rbd0
  root@test:~# parted /dev/rbd0 print
  Model: Unknown (unknown)
  Disk /dev/rbd0: 10.7GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos
 
  Number  Start   End SizeType File system  Flags
   1  1049kB  525MB   524MB   primary  ext4 boot
   2  525MB   10.7GB  10.2GB  primary   lvm
 
  Than i want to create snap, so i do:
  root@test:~# rbd snap create
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 
  And now i want to map it:
 
  root@test:~# rbd map
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  /dev/rbd1
  root@test:~# parted /dev/rbd1 print
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Error: /dev/rbd1: unrecognised disk label
 
  Even md5 different...
  root@ix-s2:~# md5sum /dev/rbd0
  9a47797a07fee3a3d71316e22891d752  /dev/rbd0
  root@ix-s2:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
 
  Ok, now i protect snap and create clone... but same thing...
  md5 for clone same as for snap,,
 
  root@test:~# rbd unmap /dev/rbd1
  root@test:~# rbd snap protect
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  root@test:~# rbd clone
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  cold-storage/test-image
  root@test:~# rbd map cold-storage/test-image
  /dev/rbd1
  root@test:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
   but it's broken...
  root@test:~# parted /dev/rbd1 print
  Error: /dev/rbd1: unrecognised disk label
 
 
  =
 
  tech details:
 
  root@test:~# ceph -v
  ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 
  We have 2 inconstistent pgs, but all images not placed on this pgs...
 
  root@test:~# ceph health detail
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  18 scrub errors
 
  
 
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
  ([37,15,14], p37) acting ([37,15,14], p37)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@snap
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3)
 - up
  ([12,23,17], p12) acting ([12,23,17], p12)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9
 (2.2a9)
  - up ([12,44,23], p12) acting ([12,44,23], p12)
 
 
  Also we use cache layer, which in current moment - in forward mode...
 
  Can you please help me with this.. As my brain stop to understand what
 is
  going on...
 
  Thank in advance!
 
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Right ( but also was rebalancing cycle 2 day before pgs corrupted)

2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com:

 Specifically, the snap behavior (we already know that the pgs went
 inconsistent while the pool was in writeback mode, right?).
 -Sam

 On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote:
  Yeah, I'm trying to confirm that the issues did happen in writeback mode.
  -Sam
 
  On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
  Right. But issues started...
 
  2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com:
 
  But that was still in writeback mode, right?
  -Sam
 
  On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   WE haven't set values for max_bytes / max_objects.. and all data
   initially
   writes only to cache layer and not flushed at all to cold layer.
  
   Then we received notification from monitoring that we collect about
   750GB in
   hot pool ) So i changed values for max_object_bytes to be 0,9 of disk
   size... And then evicting/flushing started...
  
   And issue with snapshots arrived
  
   2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Not sure what you mean by:
  
   but it's stop to work in same moment, when cache layer fulfilled
 with
   data and evict/flush started...
   -Sam
  
   On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
No, when we start draining cache - bad pgs was in place...
We have big rebalance (disk by disk - to change journal side on
 both
hot/cold layers).. All was Ok, but after 2 days - arrived scrub
errors
and 2
pgs inconsistent...
   
In writeback - yes, looks like snapshot works good. but it's stop
 to
work in
same moment, when cache layer fulfilled with data and evict/flush
started...
   
   
   
2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:
   
So you started draining the cache pool before you saw either the
inconsistent pgs or the anomalous snap behavior?  (That is,
writeback
mode was working correctly?)
-Sam
   
On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 Good joke )

 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:

 Certainly, don't reproduce this with a cluster you care about
 :).
 -Sam

 On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just 
 sj...@redhat.com
 wrote:
  What's supposed to happen is that the client transparently
  directs
  all
  requests to the cache pool rather than the cold pool when
 there
  is
  a
  cache pool.  If the kernel is sending requests to the cold
  pool,
  that's probably where the bug is.  Odd.  It could also be a
 bug
  specific 'forward' mode either in the client or on the osd.
  Why
  did
  you have it in that mode?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
  We used 4.x branch, as we have very good Samsung 850 pro
 in
  production,
  and they don;t support ncq_trim...
 
  And 4,x first branch which include exceptions for this in
  libsata.c.
 
  sure we can backport this 1 line to 3.x branch, but we
 prefer
  no
  to
  go
  deeper if packege for new kernel exist.
 
  2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
  igor.voloshane...@gmail.com:
 
  root@test:~# uname -a
  Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
 May 17
  17:37:22
  UTC
  2015 x86_64 x86_64 x86_64 GNU/Linux
 
  2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Also, can you include the kernel version?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
  sj...@redhat.com
  wrote:
   Snapshotting with cache/tiering *is* supposed to work.
   Can
   you
   open a
   bug?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
   andrija.pa...@gmail.com wrote:
   This was related to the caching layer, which doesnt
   support
   snapshooting per
   docs...for sake of closing the thread.
  
   On 17 August 2015 at 21:15, Voloshanenko Igor
   igor.voloshane...@gmail.com
   wrote:
  
   Hi all, can you please help me with unexplained
   situation...
  
   All snapshot inside ceph broken...
  
   So, as example, we have VM template, as rbd inside
 ceph.
   We can map it and mount to check that all ok with it
  
   root@test:~# rbd map
   cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
   /dev/rbd0
   root@test:~# parted /dev/rbd0 print
   Model: Unknown (unknown)
   Disk /dev/rbd0: 10.7GB
   Sector size (logical/physical): 512B/512B
   Partition Table: msdos
  
   Number  Start   End SizeType File

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
Exactly

пятница, 21 августа 2015 г. пользователь Samuel Just написал:

 And you adjusted the journals by removing the osd, recreating it with
 a larger journal, and reinserting it?
 -Sam

 On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com javascript:; wrote:
  Right ( but also was rebalancing cycle 2 day before pgs corrupted)
 
  2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com javascript:;:
 
  Specifically, the snap behavior (we already know that the pgs went
  inconsistent while the pool was in writeback mode, right?).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com
 javascript:; wrote:
   Yeah, I'm trying to confirm that the issues did happen in writeback
   mode.
   -Sam
  
   On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com javascript:; wrote:
   Right. But issues started...
  
   2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com
 javascript:;:
  
   But that was still in writeback mode, right?
   -Sam
  
   On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com javascript:; wrote:
WE haven't set values for max_bytes / max_objects.. and all data
initially
writes only to cache layer and not flushed at all to cold layer.
   
Then we received notification from monitoring that we collect
 about
750GB in
hot pool ) So i changed values for max_object_bytes to be 0,9 of
disk
size... And then evicting/flushing started...
   
And issue with snapshots arrived
   
2015-08-21 2:15 GMT+03:00 Samuel Just sj...@redhat.com
 javascript:;:
   
Not sure what you mean by:
   
but it's stop to work in same moment, when cache layer fulfilled
with
data and evict/flush started...
-Sam
   
On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
igor.voloshane...@gmail.com javascript:; wrote:
 No, when we start draining cache - bad pgs was in place...
 We have big rebalance (disk by disk - to change journal side on
 both
 hot/cold layers).. All was Ok, but after 2 days - arrived scrub
 errors
 and 2
 pgs inconsistent...

 In writeback - yes, looks like snapshot works good. but it's
 stop
 to
 work in
 same moment, when cache layer fulfilled with data and
 evict/flush
 started...



 2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com
 javascript:;:

 So you started draining the cache pool before you saw either
 the
 inconsistent pgs or the anomalous snap behavior?  (That is,
 writeback
 mode was working correctly?)
 -Sam

 On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com javascript:; wrote:
  Good joke )
 
  2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com
 javascript:;:
 
  Certainly, don't reproduce this with a cluster you care
 about
  :).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just
  sj...@redhat.com javascript:;
  wrote:
   What's supposed to happen is that the client
 transparently
   directs
   all
   requests to the cache pool rather than the cold pool when
   there
   is
   a
   cache pool.  If the kernel is sending requests to the
 cold
   pool,
   that's probably where the bug is.  Odd.  It could also
 be a
   bug
   specific 'forward' mode either in the client or on the
 osd.
   Why
   did
   you have it in that mode?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com javascript:; wrote:
   We used 4.x branch, as we have very good Samsung 850
 pro
   in
   production,
   and they don;t support ncq_trim...
  
   And 4,x first branch which include exceptions for this
 in
   libsata.c.
  
   sure we can backport this 1 line to 3.x branch, but we
   prefer
   no
   to
   go
   deeper if packege for new kernel exist.
  
   2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
   igor.voloshane...@gmail.com javascript:;:
  
   root@test:~# uname -a
   Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun
   May 17
   17:37:22
   UTC
   2015 x86_64 x86_64 x86_64 GNU/Linux
  
   2015-08-21 1:54 GMT+03:00 Samuel Just 
 sj...@redhat.com javascript:;:
  
   Also, can you include the kernel version?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just
   sj...@redhat.com javascript:;
   wrote:
Snapshotting with cache/tiering *is* supposed to
 work.
Can
you
open a
bug?
-Sam
   
On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
andrija.pa...@gmail.com javascript:; wrote:
This was related to the caching layer, which doesnt
support
snapshooting per
docs...for sake of closing the thread

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
We used 4.x branch, as we have very good Samsung 850 pro in production,
and they don;t support ncq_trim...

And 4,x first branch which include exceptions for this in libsata.c.

sure we can backport this 1 line to 3.x branch, but we prefer no to go
deeper if packege for new kernel exist.

2015-08-21 1:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 root@test:~# uname -a
 Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
 2015 x86_64 x86_64 x86_64 GNU/Linux

 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:

 Also, can you include the kernel version?
 -Sam

 On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote:
  Snapshotting with cache/tiering *is* supposed to work.  Can you open a
 bug?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  This was related to the caching layer, which doesnt support
 snapshooting per
  docs...for sake of closing the thread.
 
  On 17 August 2015 at 21:15, Voloshanenko Igor 
 igor.voloshane...@gmail.com
  wrote:
 
  Hi all, can you please help me with unexplained situation...
 
  All snapshot inside ceph broken...
 
  So, as example, we have VM template, as rbd inside ceph.
  We can map it and mount to check that all ok with it
 
  root@test:~# rbd map
 cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
  /dev/rbd0
  root@test:~# parted /dev/rbd0 print
  Model: Unknown (unknown)
  Disk /dev/rbd0: 10.7GB
  Sector size (logical/physical): 512B/512B
  Partition Table: msdos
 
  Number  Start   End SizeType File system  Flags
   1  1049kB  525MB   524MB   primary  ext4 boot
   2  525MB   10.7GB  10.2GB  primary   lvm
 
  Than i want to create snap, so i do:
  root@test:~# rbd snap create
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
 
  And now i want to map it:
 
  root@test:~# rbd map
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  /dev/rbd1
  root@test:~# parted /dev/rbd1 print
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
  /dev/rbd1 has been opened read-only.
  Error: /dev/rbd1: unrecognised disk label
 
  Even md5 different...
  root@ix-s2:~# md5sum /dev/rbd0
  9a47797a07fee3a3d71316e22891d752  /dev/rbd0
  root@ix-s2:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
 
  Ok, now i protect snap and create clone... but same thing...
  md5 for clone same as for snap,,
 
  root@test:~# rbd unmap /dev/rbd1
  root@test:~# rbd snap protect
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  root@test:~# rbd clone
  cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
  cold-storage/test-image
  root@test:~# rbd map cold-storage/test-image
  /dev/rbd1
  root@test:~# md5sum /dev/rbd1
  e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
 
   but it's broken...
  root@test:~# parted /dev/rbd1 print
  Error: /dev/rbd1: unrecognised disk label
 
 
  =
 
  tech details:
 
  root@test:~# ceph -v
  ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 
  We have 2 inconstistent pgs, but all images not placed on this pgs...
 
  root@test:~# ceph health detail
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
  18 scrub errors
 
  
 
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
  ([37,15,14], p37) acting ([37,15,14], p37)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@snap
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3)
 - up
  ([12,23,17], p12) acting ([12,23,17], p12)
  root@test:~# ceph osd map cold-storage
  0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
  osdmap e16770 pool 'cold-storage' (2) object
  '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9
 (2.2a9)
  - up ([12,44,23], p12) acting ([12,44,23], p12)
 
 
  Also we use cache layer, which in current moment - in forward mode...
 
  Can you please help me with this.. As my brain stop to understand
 what is
  going on...
 
  Thank in advance!
 
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
  --
 
  Andrija Panić
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor
No, when we start draining cache - bad pgs was in place...
We have big rebalance (disk by disk - to change journal side on both
hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and
2 pgs inconsistent...

In writeback - yes, looks like snapshot works good. but it's stop to work
in same moment, when cache layer fulfilled with data and evict/flush
started...



2015-08-21 2:09 GMT+03:00 Samuel Just sj...@redhat.com:

 So you started draining the cache pool before you saw either the
 inconsistent pgs or the anomalous snap behavior?  (That is, writeback
 mode was working correctly?)
 -Sam

 On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Good joke )
 
  2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Certainly, don't reproduce this with a cluster you care about :).
  -Sam
 
  On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote:
   What's supposed to happen is that the client transparently directs all
   requests to the cache pool rather than the cold pool when there is a
   cache pool.  If the kernel is sending requests to the cold pool,
   that's probably where the bug is.  Odd.  It could also be a bug
   specific 'forward' mode either in the client or on the osd.  Why did
   you have it in that mode?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
   We used 4.x branch, as we have very good Samsung 850 pro in
   production,
   and they don;t support ncq_trim...
  
   And 4,x first branch which include exceptions for this in libsata.c.
  
   sure we can backport this 1 line to 3.x branch, but we prefer no to
 go
   deeper if packege for new kernel exist.
  
   2015-08-21 1:56 GMT+03:00 Voloshanenko Igor
   igor.voloshane...@gmail.com:
  
   root@test:~# uname -a
   Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17
 17:37:22
   UTC
   2015 x86_64 x86_64 x86_64 GNU/Linux
  
   2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Also, can you include the kernel version?
   -Sam
  
   On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com
   wrote:
Snapshotting with cache/tiering *is* supposed to work.  Can you
open a
bug?
-Sam
   
On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic
andrija.pa...@gmail.com wrote:
This was related to the caching layer, which doesnt support
snapshooting per
docs...for sake of closing the thread.
   
On 17 August 2015 at 21:15, Voloshanenko Igor
igor.voloshane...@gmail.com
wrote:
   
Hi all, can you please help me with unexplained situation...
   
All snapshot inside ceph broken...
   
So, as example, we have VM template, as rbd inside ceph.
We can map it and mount to check that all ok with it
   
root@test:~# rbd map
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
/dev/rbd0
root@test:~# parted /dev/rbd0 print
Model: Unknown (unknown)
Disk /dev/rbd0: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
   
Number  Start   End SizeType File system  Flags
 1  1049kB  525MB   524MB   primary  ext4 boot
 2  525MB   10.7GB  10.2GB  primary   lvm
   
Than i want to create snap, so i do:
root@test:~# rbd snap create
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
   
And now i want to map it:
   
root@test:~# rbd map
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
/dev/rbd1
root@test:~# parted /dev/rbd1 print
Warning: Unable to open /dev/rbd1 read-write (Read-only file
system).
/dev/rbd1 has been opened read-only.
Warning: Unable to open /dev/rbd1 read-write (Read-only file
system).
/dev/rbd1 has been opened read-only.
Error: /dev/rbd1: unrecognised disk label
   
Even md5 different...
root@ix-s2:~# md5sum /dev/rbd0
9a47797a07fee3a3d71316e22891d752  /dev/rbd0
root@ix-s2:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
   
   
Ok, now i protect snap and create clone... but same thing...
md5 for clone same as for snap,,
   
root@test:~# rbd unmap /dev/rbd1
root@test:~# rbd snap protect
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
root@test:~# rbd clone
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
cold-storage/test-image
root@test:~# rbd map cold-storage/test-image
/dev/rbd1
root@test:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1
   
 but it's broken...
root@test:~# parted /dev/rbd1 print
Error: /dev/rbd1: unrecognised disk label
   
   
=
   
tech details:
   
root@test:~# ceph -v
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
   
We have 2 inconstistent pgs, but all images not placed on this
pgs...
   
root@test:~# ceph health detail
HEALTH_ERR 2 pgs inconsistent; 18 scrub

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
thank you Sam!
I also noticed this linked errors during scrub...

Now all lools like reasonable!

So we will wait for bug to be closed.

do you need any help on it?

I mean i can help with coding/testing/etc...

2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com:

 Ah, this is kind of silly.  I think you don't have 37 errors, but 2
 errors.  pg 2.490 object
 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
 snap 141.  If you look at the objects after that in the log:

 2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
 [ERR] repair 2.490
 68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
 clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
 2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
 [ERR] repair 2.490
 ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
 clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2

 The clone from the second line matches the head object from the
 previous line, and they have the same clone id.  I *think* that the
 first error is real, and the subsequent ones are just scrub being
 dumb.  Same deal with pg 2.c4.  I just opened
 http://tracker.ceph.com/issues/12738.

 The original problem is that
 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
 missing a clone.  Not sure how that happened, my money is on a
 cache/tiering evict racing with a snap trim.  If you have any logging
 or relevant information from when that happened, you should open a
 bug.  The 'snapdir' in the two object names indicates that the head
 object has actually been deleted (which makes sense if you moved the
 image to a new image and deleted the old one) and is only being kept
 around since there are live snapshots.  I suggest you leave the
 snapshots for those images alone for the time being -- removing them
 might cause the osd to crash trying to clean up the wierd on disk
 state.  Other than the leaked space from those two image snapshots and
 the annoying spurious scrub errors, I think no actual corruption is
 going on though.  I created a tracker ticket for a feature that would
 let ceph-objectstore-tool remove the spurious clone from the
 head/snapdir metadata.

 Am I right that you haven't actually seen any osd crashes or user
 visible corruption (except possibly on snapshots of those two images)?
 -Sam

 On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Inktank:
 
 https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
 
  Mail-list:
  https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html
 
  2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Which docs?
  -Sam
 
  On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Not yet. I will create.
   But according to mail lists and Inktank docs - it's expected behaviour
   when
   cache enable
  
   2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Is there a bug for this in the tracker?
   -Sam
  
   On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Issue, that in forward mode, fstrim doesn't work proper, and when
 we
take
snapshot - data not proper update in cache layer, and client (ceph)
see
damaged snap.. As headers requested from cache layer.
   
2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com:
   
What was the issue?
-Sam
   
On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 Samuel, we turned off cache layer few hours ago...
 I will post ceph.log in few minutes

 For snap - we found issue, was connected with cache tier..

 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:

 Ok, you appear to be using a replicated cache tier in front of
 a
 replicated base tier.  Please scrub both inconsistent pgs and
 post
 the
 ceph.log from before when you started the scrub until after.
 Also,
 what command are you using to take snapshots?
 -Sam

 On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Hi Samuel, we try to fix it in trick way.
 
  we check all rbd_data chunks from logs (OSD) which are
  affected,
  then
  query
  rbd info to compare which rbd consist bad rbd_data, after
 that
  we
  mount
  this
  rbd as rbd0, create empty rbd, and DD all info from bad
 volume
  to
  new
  one.
 
  But after that - scrub errors growing... Was 15 errors.. .Now
  35...
  We
  laos
  try to out OSD which was lead, but after rebalancing this 2
 pgs
  still
  have
  35 scrub errors...
 
  ceph osd getmap -o outfile - attached
 
 
  2015-08-18 18:48 GMT+03:00 Samuel Just sj

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Samuel, we turned off cache layer few hours ago...
I will post ceph.log in few minutes

For snap - we found issue, was connected with cache tier..

2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:

 Ok, you appear to be using a replicated cache tier in front of a
 replicated base tier.  Please scrub both inconsistent pgs and post the
 ceph.log from before when you started the scrub until after.  Also,
 what command are you using to take snapshots?
 -Sam

 On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Hi Samuel, we try to fix it in trick way.
 
  we check all rbd_data chunks from logs (OSD) which are affected, then
 query
  rbd info to compare which rbd consist bad rbd_data, after that we mount
 this
  rbd as rbd0, create empty rbd, and DD all info from bad volume to new
 one.
 
  But after that - scrub errors growing... Was 15 errors.. .Now 35... We
 laos
  try to out OSD which was lead, but after rebalancing this 2 pgs still
 have
  35 scrub errors...
 
  ceph osd getmap -o outfile - attached
 
 
  2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Is the number of inconsistent objects growing?  Can you attach the
  whole ceph.log from the 6 hours before and after the snippet you
  linked above?  Are you using cache/tiering?  Can you attach the osdmap
  (ceph osd getmap -o outfile)?
  -Sam
 
  On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   ceph - 0.94.2
   Its happen during rebalancing
  
   I thought too, that some OSD miss copy, but looks like all miss...
   So any advice in which direction i need to go
  
   2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com:
  
   From a quick peek it looks like some of the OSDs are missing clones
 of
   objects. I'm not sure how that could happen and I'd expect the pg
   repair to handle that but if it's not there's probably something
   wrong; what version of Ceph are you running? Sam, is this something
   you've seen, a new bug, or some kind of config issue?
   -Greg
  
   On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
Hi all, at our production cluster, due high rebalancing ((( we
 have 2
pgs in
inconsistent state...
   
root@temp:~# ceph health detail | grep inc
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
   
From OSD logs, after recovery attempt:
   
root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while
 read i;
do
ceph pg repair ${i} ; done
dumped all in format plain
instructing pg 2.490 on osd.56 to repair
instructing pg 2.c4 on osd.56 to repair
   
/var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
f5759490/rbd_data.1631755377d7e.04da/head//2 expected
clone
90c59490/rbd_data.eb486436f2beb.7a65/141//2
/var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
clone
f5759490/rbd_data.1631755377d7e.04da/141//2
/var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
clone
fee49490/rbd_data.12483d3ba0794b.522f/141//2
/var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
clone
a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
/var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
clone
bac19490/rbd_data.1238e82ae8944a.032e/141//2
/var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
clone
98519490/rbd_data.123e9c2ae8944a.0807/141//2
/var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
28809490/rbd_data.edea7460fe42b.01d9/head//2 expected
clone
c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
/var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
7f94663b3700
-1
log_channel(cluster) log [ERR] : deep-scrub 2.490
e1509490/rbd_data.1423897545e146.09a6/head//2 expected
clone
28809490

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Issue, that in forward mode, fstrim doesn't work proper, and when we take
snapshot - data not proper update in cache layer, and client (ceph) see
damaged snap.. As headers requested from cache layer.

2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com:

 What was the issue?
 -Sam

 On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  Samuel, we turned off cache layer few hours ago...
  I will post ceph.log in few minutes
 
  For snap - we found issue, was connected with cache tier..
 
  2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Ok, you appear to be using a replicated cache tier in front of a
  replicated base tier.  Please scrub both inconsistent pgs and post the
  ceph.log from before when you started the scrub until after.  Also,
  what command are you using to take snapshots?
  -Sam
 
  On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Hi Samuel, we try to fix it in trick way.
  
   we check all rbd_data chunks from logs (OSD) which are affected, then
   query
   rbd info to compare which rbd consist bad rbd_data, after that we
 mount
   this
   rbd as rbd0, create empty rbd, and DD all info from bad volume to new
   one.
  
   But after that - scrub errors growing... Was 15 errors.. .Now 35... We
   laos
   try to out OSD which was lead, but after rebalancing this 2 pgs still
   have
   35 scrub errors...
  
   ceph osd getmap -o outfile - attached
  
  
   2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com:
  
   Is the number of inconsistent objects growing?  Can you attach the
   whole ceph.log from the 6 hours before and after the snippet you
   linked above?  Are you using cache/tiering?  Can you attach the
 osdmap
   (ceph osd getmap -o outfile)?
   -Sam
  
   On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
   igor.voloshane...@gmail.com wrote:
ceph - 0.94.2
Its happen during rebalancing
   
I thought too, that some OSD miss copy, but looks like all miss...
So any advice in which direction i need to go
   
2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com:
   
From a quick peek it looks like some of the OSDs are missing
 clones
of
objects. I'm not sure how that could happen and I'd expect the pg
repair to handle that but if it's not there's probably something
wrong; what version of Ceph are you running? Sam, is this
 something
you've seen, a new bug, or some kind of config issue?
-Greg
   
On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor
igor.voloshane...@gmail.com wrote:
 Hi all, at our production cluster, due high rebalancing ((( we
 have 2
 pgs in
 inconsistent state...

 root@temp:~# ceph health detail | grep inc
 HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

 From OSD logs, after recovery attempt:

 root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while
 read
 i;
 do
 ceph pg repair ${i} ; done
 dumped all in format plain
 instructing pg 2.490 on osd.56 to repair
 instructing pg 2.c4 on osd.56 to repair

 /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 f5759490/rbd_data.1631755377d7e.04da/head//2
 expected
 clone
 90c59490/rbd_data.eb486436f2beb.7a65/141//2
 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 fee49490/rbd_data.12483d3ba0794b.522f/head//2
 expected
 clone
 f5759490/rbd_data.1631755377d7e.04da/141//2
 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2
 expected
 clone
 fee49490/rbd_data.12483d3ba0794b.522f/141//2
 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 bac19490/rbd_data.1238e82ae8944a.032e/head//2
 expected
 clone
 a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 98519490/rbd_data.123e9c2ae8944a.0807/head//2
 expected
 clone
 bac19490/rbd_data.1238e82ae8944a.032e/141//2
 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
 7f94663b3700
 -1
 log_channel(cluster) log [ERR] : deep-scrub 2.490
 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2
 expected
 clone
 98519490/rbd_data.123e9c2ae8944a.0807/141

Re: [ceph-users] Repair inconsistent pgs..

2015-08-20 Thread Voloshanenko Igor
Sam, i try to understand which rbd contain this chunks.. but no luck. No
rbd images block names started with this...

Actually, now that I think about it, you probably didn't remove the
 images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2




2015-08-21 1:36 GMT+03:00 Samuel Just sj...@redhat.com:

 Actually, now that I think about it, you probably didn't remove the
 images for 3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2
 and 22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2, but
 other images (that's why the scrub errors went down briefly, those
 objects -- which were fine -- went away).  You might want to export
 and reimport those two images into new images, but leave the old ones
 alone until you can clean up the on disk state (image and snapshots)
 and clear the scrub errors.  You probably don't want to read the
 snapshots for those images either.  Everything else is, I think,
 harmless.

 The ceph-objectstore-tool feature would probably not be too hard,
 actually.  Each head/snapdir image has two attrs (possibly stored in
 leveldb -- that's why you want to modify the ceph-objectstore-tool and
 use its interfaces rather than mucking about with the files directly)
 '_' and 'snapset' which contain encoded representations of
 object_info_t and SnapSet (both can be found in src/osd/osd_types.h).
 SnapSet has a set of clones and related metadata -- you want to read
 the SnapSet attr off disk and commit a transaction writing out a new
 version with that clone removed.  I'd start by cloning the repo,
 starting a vstart cluster locally, and reproducing the issue.  Next,
 get familiar with using ceph-objectstore-tool on the osds in that
 vstart cluster.  A good first change would be creating a
 ceph-objectstore-tool op that lets you dump json for the object_info_t
 and SnapSet (both types have format() methods which make that easy) on
 an object to stdout so you can confirm what's actually there.  oftc
 #ceph-devel or the ceph-devel mailing list would be the right place to
 ask questions.

 Otherwise, it'll probably get done in the next few weeks.
 -Sam

 On Thu, Aug 20, 2015 at 3:10 PM, Voloshanenko Igor
 igor.voloshane...@gmail.com wrote:
  thank you Sam!
  I also noticed this linked errors during scrub...
 
  Now all lools like reasonable!
 
  So we will wait for bug to be closed.
 
  do you need any help on it?
 
  I mean i can help with coding/testing/etc...
 
  2015-08-21 0:52 GMT+03:00 Samuel Just sj...@redhat.com:
 
  Ah, this is kind of silly.  I think you don't have 37 errors, but 2
  errors.  pg 2.490 object
  3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 is missing
  snap 141.  If you look at the objects after that in the log:
 
  2015-08-20 20:15:44.865670 osd.19 10.12.2.6:6838/1861727 298 : cluster
  [ERR] repair 2.490
  68c89490/rbd_data.16796a3d1b58ba.0047/head//2 expected
  clone 2d7b9490/rbd_data.18f92c3d1b58ba.6167/141//2
  2015-08-20 20:15:44.865817 osd.19 10.12.2.6:6838/1861727 299 : cluster
  [ERR] repair 2.490
  ded49490/rbd_data.11a25c7934d3d4.8a8a/head//2 expected
  clone 68c89490/rbd_data.16796a3d1b58ba.0047/141//2
 
  The clone from the second line matches the head object from the
  previous line, and they have the same clone id.  I *think* that the
  first error is real, and the subsequent ones are just scrub being
  dumb.  Same deal with pg 2.c4.  I just opened
  http://tracker.ceph.com/issues/12738.
 
  The original problem is that
  3fac9490/rbd_data.eb5f22eb141f2.04ba/snapdir//2 and
  22ca30c4/rbd_data.e846e25a70bf7.0307/snapdir//2 are both
  missing a clone.  Not sure how that happened, my money is on a
  cache/tiering evict racing with a snap trim.  If you have any logging
  or relevant information from when that happened, you should open a
  bug.  The 'snapdir' in the two object names indicates that the head
  object has actually been deleted (which makes sense if you moved the
  image to a new image and deleted the old one) and is only being kept
  around since there are live snapshots.  I suggest you leave the
  snapshots for those images alone for the time being -- removing them
  might cause the osd to crash trying to clean up the wierd on disk
  state.  Other than the leaked space from those two image snapshots and
  the annoying spurious scrub errors, I think no actual corruption is
  going on though.  I created a tracker ticket for a feature that would
  let ceph-objectstore-tool remove the spurious clone from the
  head/snapdir metadata.
 
  Am I right that you haven't actually seen any osd crashes or user
  visible corruption (except possibly on snapshots of those two images)?
  -Sam
 
  On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
  igor.voloshane...@gmail.com wrote:
   Inktank:
  
  
 https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding

Re: [ceph-users] Repair inconsistent pgs..

2015-08-18 Thread Voloshanenko Igor
No. This will no help (((
I try to found data, but it's look exist with same time stamp on all osd or
missing on all osd ...

So, need advice , what I need to do...

вторник, 18 августа 2015 г. пользователь Abhishek L написал:


 Voloshanenko Igor writes:

  Hi Irek, Please read careful )))
  You proposal was the first, i try to do...  That's why i asked about
  help... (
 
  2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com
 javascript:;:
 
  Hi, Igor.
 
  You need to repair the PG.
 
  for i in `ceph pg dump| grep inconsistent | grep -v
 'inconsistent+repair'
  | awk {'print$1'}`;do ceph pg repair $i;done
 
  С уважением, Фасихов Ирек Нургаязович
  Моб.: +79229045757
 
  2015-08-18 8:27 GMT+03:00 Voloshanenko Igor 
 igor.voloshane...@gmail.com javascript:;:
 
  Hi all, at our production cluster, due high rebalancing ((( we have 2
 pgs
  in inconsistent state...
 
  root@temp:~# ceph health detail | grep inc
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
 
  From OSD logs, after recovery attempt:
 
  root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i;
 do
  ceph pg repair ${i} ; done
  dumped all in format plain
  instructing pg 2.490 on osd.56 to repair
  instructing pg 2.c4 on osd.56 to repair
 
  /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
  90c59490/rbd_data.eb486436f2beb.7a65/141//2
  /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
 clone
  f5759490/rbd_data.1631755377d7e.04da/141//2
  /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
 clone
  fee49490/rbd_data.12483d3ba0794b.522f/141//2
  /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
 clone
  a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
  /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
 clone
  bac19490/rbd_data.1238e82ae8944a.032e/141//2
  /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
 clone
  98519490/rbd_data.123e9c2ae8944a.0807/141//2
  /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
  c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
  /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  e1509490/rbd_data.1423897545e146.09a6/head//2 expected
 clone
  28809490/rbd_data.edea7460fe42b.01d9/141//2
  /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors
 
  So, how i can solve expected clone situation by hand?
  Thank in advance!

 I've had an inconsistent pg once, but it was a different sort of an
 error (some sort of digest mismatch, where the secondary object copies
 had later timestamps). This was fixed by moving the object away and
 restarting, the osd which got fixed when the osd peered, similar to what
 was mentioned in Sebastian Han's blog[1].

 I'm guessing the same method will solve this error as well, but not
 completely sure, maybe someone else who has seen this particular error
 could guide you better.

 [1]:
 http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/

 --
 Abhishek

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Repair inconsistent pgs..

2015-08-18 Thread Voloshanenko Igor
-- Пересылаемое сообщение -
От: *Voloshanenko Igor* igor.voloshane...@gmail.com
Дата: вторник, 18 августа 2015 г.
Тема: Repair inconsistent pgs..
Кому: Irek Fasikhov malm...@gmail.com


Some additional inforamtion (Tnx Irek for questions!)

Pool values:

root@test:~# ceph osd pool get cold-storage size
size: 3
root@test:~# ceph osd pool get cold-storage min_size
min_size: 2


Broken pgs dump

PG_1 #

{
state: active+clean+inconsistent,
snap_trimq: [],
epoch: 17541,
up: [
56,
10,
42
],
acting: [
56,
10,
42
],
actingbackfill: [
10,
42,
56
],
info: {
pgid: 2.c4,
last_update: 17541'29153,
last_complete: 17541'29153,
log_tail: 16746'26095,
last_user_version: 401173,
last_backfill: MAX,
purged_snaps:
[1~1,6~1,8~3,11~2,17~2,1f~2,25~1,28~1,2c~5,32~4,37~1,39~7,41~5,47~16,5e~19,cb~1,ce~2,d4~7,dc~1,de~1,e6~4,102~1,105~6,10d~1,119~1,150~1,15d~2,160~3,16d~1,16f~5,178~1,184~2,194~1,1a2~1,1a5~1,1ac~2,1c7~1,1cb~2,1ce~1],
history: {
epoch_created: 98,
last_epoch_started: 17531,
last_epoch_clean: 17541,
last_epoch_split: 0,
same_up_since: 17139,
same_interval_since: 17530,
same_primary_since: 17530,
last_scrub: 17541'29114,
last_scrub_stamp: 2015-08-18 07:37:04.567973,
last_deep_scrub: 17541'29114,
last_deep_scrub_stamp: 2015-08-18 07:37:04.567973,
last_clean_scrub_stamp: 2015-08-05 17:23:45.251731
},
stats: {
version: 17541'29153,
reported_seq: 21552,
reported_epoch: 17541,
state: active+clean+inconsistent,
last_fresh: 2015-08-18 07:48:37.667036,
last_change: 2015-08-18 07:37:04.568541,
last_active: 2015-08-18 07:48:37.667036,
last_peered: 2015-08-18 07:48:37.667036,
last_clean: 2015-08-18 07:48:37.667036,
last_became_active: 0.00,
last_became_peered: 0.00,
last_unstale: 2015-08-18 07:48:37.667036,
last_undegraded: 2015-08-18 07:48:37.667036,
last_fullsized: 2015-08-18 07:48:37.667036,
mapping_epoch: 17140,
log_start: 16746'26095,
ondisk_log_start: 16746'26095,
created: 98,
last_epoch_clean: 17541,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 17541'29114,
last_scrub_stamp: 2015-08-18 07:37:04.567973,
last_deep_scrub: 17541'29114,
last_deep_scrub_stamp: 2015-08-18 07:37:04.567973,
last_clean_scrub_stamp: 2015-08-05 17:23:45.251731,
log_size: 3058,
ondisk_log_size: 3058,
stats_invalid: 0,
stat_sum: {
num_bytes: 2236608990,
num_objects: 307,
num_object_clones: 7,
num_object_copies: 921,
num_objects_missing_on_primary: 0,
num_objects_degraded: 0,
num_objects_misplaced: 0,
num_objects_unfound: 0,
num_objects_dirty: 307,
num_whiteouts: 0,
num_read: 15694,
num_read_kb: 401354,
num_write: 55720,
num_write_kb: 2539827,
num_scrub_errors: 1,
num_shallow_scrub_errors: 1,
num_deep_scrub_errors: 0,
num_objects_recovered: 1842,
num_bytes_recovered: 13419653940,
num_keys_recovered: 36,
num_objects_omap: 1,
num_objects_hit_set_archive: 0,
num_bytes_hit_set_archive: 0
},
up: [
56,
10,
42
],
acting: [
56,
10,
42
],
blocked_by: [],
up_primary: 56,
acting_primary: 56
},
empty: 0,
dne: 0,
incomplete: 0,
last_epoch_started: 17531,
hit_set_history: {
current_last_update: 0'0,
current_last_stamp: 0.00,
current_info: {
begin: 0.00,
end: 0.00,
version: 0'0
},
history: []
}
},
peer_info: [
{
peer: 10,
pgid: 2.c4,
last_update: 17541'29153,
last_complete: 17541'29153,
log_tail: 16746'25703,
last_user_version: 400914,
last_backfill: MAX,
purged_snaps:
[1~1,6~1,8~3,11~2,17~2,1f~2,25~1,28~1,2c~5,32~4,37~1,39~7,41~5,47~16,5e~19,cb~1,ce~2,d4~7,dc~1,de~1,e6~4,102~1,105~6,10d~1,119~1,150~1,15d~2,160~3,16d~1,16f

Re: [ceph-users] Repair inconsistent pgs..

2015-08-18 Thread Voloshanenko Igor
No. This will no help (((
I try to found data, but it's look exist with same time stamp on all osd or
missing on all osd ...

So, need advice , what I need to do...

вторник, 18 августа 2015 г. пользователь Abhishek L написал:


 Voloshanenko Igor writes:

  Hi Irek, Please read careful )))
  You proposal was the first, i try to do...  That's why i asked about
  help... (
 
  2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com
 javascript:;:
 
  Hi, Igor.
 
  You need to repair the PG.
 
  for i in `ceph pg dump| grep inconsistent | grep -v
 'inconsistent+repair'
  | awk {'print$1'}`;do ceph pg repair $i;done
 
  С уважением, Фасихов Ирек Нургаязович
  Моб.: +79229045757
 
  2015-08-18 8:27 GMT+03:00 Voloshanenko Igor 
 igor.voloshane...@gmail.com javascript:;:
 
  Hi all, at our production cluster, due high rebalancing ((( we have 2
 pgs
  in inconsistent state...
 
  root@temp:~# ceph health detail | grep inc
  HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
  pg 2.490 is active+clean+inconsistent, acting [56,15,29]
  pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
 
  From OSD logs, after recovery attempt:
 
  root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i;
 do
  ceph pg repair ${i} ; done
  dumped all in format plain
  instructing pg 2.490 on osd.56 to repair
  instructing pg 2.c4 on osd.56 to repair
 
  /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
  90c59490/rbd_data.eb486436f2beb.7a65/141//2
  /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected
 clone
  f5759490/rbd_data.1631755377d7e.04da/141//2
  /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected
 clone
  fee49490/rbd_data.12483d3ba0794b.522f/141//2
  /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected
 clone
  a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
  /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected
 clone
  bac19490/rbd_data.1238e82ae8944a.032e/141//2
  /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected
 clone
  98519490/rbd_data.123e9c2ae8944a.0807/141//2
  /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
  c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
  /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
  e1509490/rbd_data.1423897545e146.09a6/head//2 expected
 clone
  28809490/rbd_data.edea7460fe42b.01d9/141//2
  /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765
 7f94663b3700
  -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors
 
  So, how i can solve expected clone situation by hand?
  Thank in advance!

 I've had an inconsistent pg once, but it was a different sort of an
 error (some sort of digest mismatch, where the secondary object copies
 had later timestamps). This was fixed by moving the object away and
 restarting, the osd which got fixed when the osd peered, similar to what
 was mentioned in Sebastian Han's blog[1].

 I'm guessing the same method will solve this error as well, but not
 completely sure, maybe someone else who has seen this particular error
 could guide you better.

 [1]:
 http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/

 --
 Abhishek

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Repair inconsistent pgs..

2015-08-17 Thread Voloshanenko Igor
Hi Irek, Please read careful )))
You proposal was the first, i try to do...  That's why i asked about
help... (

2015-08-18 8:34 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 Hi, Igor.

 You need to repair the PG.

 for i in `ceph pg dump| grep inconsistent | grep -v 'inconsistent+repair'
 | awk {'print$1'}`;do ceph pg repair $i;done

 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757

 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com:

 Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
 in inconsistent state...

 root@temp:~# ceph health detail | grep inc
 HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
 pg 2.490 is active+clean+inconsistent, acting [56,15,29]
 pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

 From OSD logs, after recovery attempt:

 root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do
 ceph pg repair ${i} ; done
 dumped all in format plain
 instructing pg 2.490 on osd.56 to repair
 instructing pg 2.c4 on osd.56 to repair

 /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
 90c59490/rbd_data.eb486436f2beb.7a65/141//2
 /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone
 f5759490/rbd_data.1631755377d7e.04da/141//2
 /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone
 fee49490/rbd_data.12483d3ba0794b.522f/141//2
 /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone
 a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
 /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone
 bac19490/rbd_data.1238e82ae8944a.032e/141//2
 /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone
 98519490/rbd_data.123e9c2ae8944a.0807/141//2
 /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
 c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
 /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700
 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490
 e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone
 28809490/rbd_data.edea7460fe42b.01d9/141//2
 /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700
 -1 log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

 So, how i can solve expected clone situation by hand?
 Thank in advance!



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repair inconsistent pgs..

2015-08-17 Thread Voloshanenko Igor
Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
in inconsistent state...

root@temp:~# ceph health detail | grep inc
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

From OSD logs, after recovery attempt:

root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do
ceph pg repair ${i} ; done
dumped all in format plain
instructing pg 2.490 on osd.56 to repair
instructing pg 2.c4 on osd.56 to repair

/var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
f5759490/rbd_data.1631755377d7e.04da/head//2 expected clone
90c59490/rbd_data.eb486436f2beb.7a65/141//2
/var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
fee49490/rbd_data.12483d3ba0794b.522f/head//2 expected clone
f5759490/rbd_data.1631755377d7e.04da/141//2
/var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
a9b39490/rbd_data.12483d3ba0794b.37b3/head//2 expected clone
fee49490/rbd_data.12483d3ba0794b.522f/141//2
/var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
bac19490/rbd_data.1238e82ae8944a.032e/head//2 expected clone
a9b39490/rbd_data.12483d3ba0794b.37b3/141//2
/var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
98519490/rbd_data.123e9c2ae8944a.0807/head//2 expected clone
bac19490/rbd_data.1238e82ae8944a.032e/141//2
/var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
c3c09490/rbd_data.1238e82ae8944a.0c2b/head//2 expected clone
98519490/rbd_data.123e9c2ae8944a.0807/141//2
/var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
28809490/rbd_data.edea7460fe42b.01d9/head//2 expected clone
c3c09490/rbd_data.1238e82ae8944a.0c2b/141//2
/var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 7f94663b3700 -1
log_channel(cluster) log [ERR] : deep-scrub 2.490
e1509490/rbd_data.1423897545e146.09a6/head//2 expected clone
28809490/rbd_data.edea7460fe42b.01d9/141//2
/var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 7f94663b3700 -1
log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors

So, how i can solve expected clone situation by hand?
Thank in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-17 Thread Voloshanenko Igor
Hi all, can you please help me with unexplained situation...

All snapshot inside ceph broken...

So, as example, we have VM template, as rbd inside ceph.
We can map it and mount to check that all ok with it

root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
/dev/rbd0
root@test:~# parted /dev/rbd0 print
Model: Unknown (unknown)
Disk /dev/rbd0: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End SizeType File system  Flags
 1  1049kB  525MB   524MB   primary  ext4 boot
 2  525MB   10.7GB  10.2GB  primary   lvm

Than i want to create snap, so i do:
root@test:~# rbd snap create
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

And now i want to map it:

root@test:~# rbd map
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
/dev/rbd1
root@test:~# parted /dev/rbd1 print
Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 /dev/rbd1 has been opened read-only.
Warning: Unable to open /dev/rbd1 read-write (Read-only file system).
 /dev/rbd1 has been opened read-only.
Error: /dev/rbd1: unrecognised disk label

Even md5 different...
root@ix-s2:~# md5sum /dev/rbd0
9a47797a07fee3a3d71316e22891d752  /dev/rbd0
root@ix-s2:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1


Ok, now i protect snap and create clone... but same thing...
md5 for clone same as for snap,,

root@test:~# rbd unmap /dev/rbd1
root@test:~# rbd snap protect
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
root@test:~# rbd clone
cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap
cold-storage/test-image
root@test:~# rbd map cold-storage/test-image
/dev/rbd1
root@test:~# md5sum /dev/rbd1
e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

 but it's broken...
root@test:~# parted /dev/rbd1 print
Error: /dev/rbd1: unrecognised disk label


=

tech details:

root@test:~# ceph -v
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

We have 2 inconstistent pgs, but all images not placed on this pgs...

root@test:~# ceph health detail
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,42]
18 scrub errors



root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up
([37,15,14], p37) acting ([37,15,14], p37)
root@test:~# ceph osd map cold-storage
0e23c701-401d-4465-b9b4-c02939d57bb5@snap
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up
([12,23,17], p12) acting ([12,23,17], p12)
root@test:~# ceph osd map cold-storage
0e23c701-401d-4465-b9b4-c02939d57bb5@test-image
osdmap e16770 pool 'cold-storage' (2) object
'0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9)
- up ([12,44,23], p12) acting ([12,44,23], p12)


Also we use cache layer, which in current moment - in forward mode...

Can you please help me with this.. As my brain stop to understand what is
going on...

Thank in advance!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH cache layer. Very slow

2015-08-14 Thread Voloshanenko Igor
72 osd, 60 hdd, 12 ssd
Primary workload - rbd, kvm

пятница, 14 августа 2015 г. пользователь Ben Hines написал:

 Nice to hear that you have no SSD failures yet in 10months.

 How many OSDs are you running, and what is your primary ceph workload?
 (RBD, rgw, etc?)

 -Ben

 On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
 me...@yuterra.ru javascript:; wrote:
  Hi!
 
 
  Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for
 ceph
  journals
  and DC S3700 400Gb in the SSD pool: same hosts, separate root in
 crushmap.
 
  SSD pool are not yet in production, journаlling SSDs works under
 production
  load
  for 10 months. They're in good condition - no faults, no degradation.
 
  We specially take 200Gb SSD for journals to reduce costs, and also have a
  higher
  than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
  1/3 to 1/6.
 
  So, as a conclusion - I'll recommend you to get a bigger budget and buy
  durable
  and fast SSDs for Ceph.
 
  Megov Igor
  CIO, Yuterra
 
  
  От: ceph-users ceph-users-boun...@lists.ceph.com javascript:; от
 имени Voloshanenko
  Igor igor.voloshane...@gmail.com javascript:;
  Отправлено: 13 августа 2015 г. 15:54
  Кому: Jan Schermer
  Копия: ceph-users@lists.ceph.com javascript:;
  Тема: Re: [ceph-users] CEPH cache layer. Very slow
 
  So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
  intel S3500 240G (((
 
  Any other models? (((
 
  2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz javascript:;
 :
 
  I tested and can recommend the Samsung 845 DC PRO (make sure it is DC
 PRO
  and not just PRO or DC EVO!).
  Those were very cheap but are out of stock at the moment (here).
  Faster than Intels, cheaper, and slightly different technology (3D
 V-NAND)
  which IMO makes them superior without needing many tricks to do its job.
 
  Jan
 
  On 13 Aug 2015, at 14:40, Voloshanenko Igor 
 igor.voloshane...@gmail.com javascript:;
  wrote:
 
  Tnx, Irek! Will try!
 
  but another question to all, which SSD good enough for CEPH now?
 
  I'm looking into S3500 240G (I have some S3500 120G which show great
  results. Around 8x times better than Samsung)
 
  Possible you can give advice about other vendors/models with same or
 below
  price level as S3500 240G?
 
  2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com
 javascript:;:
 
  Hi, Igor.
  Try to roll the patch here:
 
 
 http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
 
  P.S. I am no longer tracks changes in this direction(kernel), because
 we
  use already recommended SSD
 
  С уважением, Фасихов Ирек Нургаязович
  Моб.: +79229045757
 
  2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
  igor.voloshane...@gmail.com javascript:;:
 
  So, after testing SSD (i wipe 1 SSD, and used it for tests)
 
  root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
  --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
  --gr[53/1800]
  ting --name=journal-test
  journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
  iodepth=1
  fio-2.1.3
  Starting 1 process
  Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops]
 [eta
  00m:00s]
  journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
  10:46:42 2015
write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
  clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
   lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
  clat percentiles (usec):
   |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
  2928],
   | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
  3408],
   | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
  4016],
   | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
  99.95th=[10048],
   | 99.99th=[14912]
  bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
  stdev=34.31
  lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  =64=0.0%
   submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  =64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  =64=0.0%
   issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0
 
  Run status group 0 (all jobs):
WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
  mint=60001msec, maxt=60001msec
 
  Disk stats (read/write):
sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576,
  util=99.30%
 
  So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
 
  I try to change cache mode :
  echo temporary write through  /sys/class/scsi_disk/2:0:0:0/cache_type
  echo temporary write through  /sys/class/scsi_disk/3:0:0:0/cache_type
 
  no luck, still same shit results, also i found this article:
  https://lkml.org/lkml/2013/11/20/264 pointed to old

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Voloshanenko Igor
So, after testing SSD (i wipe 1 SSD, and used it for tests)

root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write
--bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800]
ting --name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
fio-2.1.3
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42
2015
  write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
clat percentiles (usec):
 |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928],
 | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408],
 | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016],
 | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048],
 | 99.99th=[14912]
bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31
lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
  cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%

So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s

I try to change cache mode :
echo temporary write through  /sys/class/scsi_disk/2:0:0:0/cache_type
echo temporary write through  /sys/class/scsi_disk/3:0:0:0/cache_type

no luck, still same shit results, also i found this article:
https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
which disable CMD_FLUSH
https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba

Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this
exception was not included into libsata.c)

2015-08-12 19:17 GMT+03:00 Pieter Koorts pieter.koo...@me.com:

 Hi Igor

 I suspect you have very much the same problem as me.

 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html

 Basically Samsung drives (like many SATA SSD's) are very much hit and miss
 so you will need to test them like described here to see if they are any
 good.
 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 To give you an idea my average performance went from 11MB/s (with Samsung
 SSD) to 30MB/s (without any SSD) on write performance. This is a very small
 cluster.

 Pieter

 On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor 
 igor.voloshane...@gmail.com wrote:

 Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
 disks on each, 10 HDD, 2 SSD)

 Also we cover this with custom crushmap with 2 root leaf

 ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -100 5.0 root ssd
 -102 1.0 host ix-s2-ssd
2 1.0 osd.2   up  1.0  1.0
9 1.0 osd.9   up  1.0  1.0
 -103 1.0 host ix-s3-ssd
3 1.0 osd.3   up  1.0  1.0
7 1.0 osd.7   up  1.0  1.0
 -104 1.0 host ix-s5-ssd
1 1.0 osd.1   up  1.0  1.0
6 1.0 osd.6   up  1.0  1.0
 -105 1.0 host ix-s6-ssd
4 1.0 osd.4   up  1.0  1.0
8 1.0 osd.8   up  1.0  1.0
 -106 1.0 host ix-s7-ssd
0 1.0 osd.0   up  1.0  1.0
5 1.0 osd.5   up  1.0  1.0
   -1 5.0 root platter
   -2 1.0 host ix-s2-platter
   13 1.0 osd.13  up  1.0  1.0
   17 1.0 osd.17  up  1.0  1.0
   21 1.0 osd.21  up  1.0  1.0
   27 1.0 osd.27  up  1.0  1.0
   32 1.0 osd.32  up  1.0  1.0
   37 1.0 osd.37  up  1.0  1.0
   44 1.0 osd.44  up  1.0  1.0
   48 1.0 osd.48  up  1.0  1.0
   55 1.0 osd.55  up  1.0  1.0

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Voloshanenko Igor
So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
intel S3500 240G (((

Any other models? (((

2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz:

 I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO
 and not just PRO or DC EVO!).
 Those were very cheap but are out of stock at the moment (here).
 Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
 which IMO makes them superior without needing many tricks to do its job.

 Jan

 On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com
 wrote:

 Tnx, Irek! Will try!

 but another question to all, which SSD good enough for CEPH now?

 I'm looking into S3500 240G (I have some S3500 120G which show great
 results. Around 8x times better than Samsung)

 Possible you can give advice about other vendors/models with same or below
 price level as S3500 240G?

 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 Hi, Igor.
 Try to roll the patch here:

 http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov

 P.S. I am no longer tracks changes in this direction(kernel), because we
 use already recommended SSD

 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757

 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor igor.voloshane...@gmail.com
 :

 So, after testing SSD (i wipe 1 SSD, and used it for tests)

 root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
 --gr[53/1800]
 ting --name=journal-test
 journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
 iodepth=1
 fio-2.1.3
 Starting 1 process
 Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
 00m:00s]
 journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
 10:46:42 2015
   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
 clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
 clat percentiles (usec):
  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
 2928],
  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
 3408],
  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
 4016],
  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
 99.95th=[10048],
  | 99.99th=[14912]
 bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
 stdev=34.31
 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
 =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0

 Run status group 0 (all jobs):
   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
 mint=60001msec, maxt=60001msec

 Disk stats (read/write):
   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%

 So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s

 I try to change cache mode :
 echo temporary write through  /sys/class/scsi_disk/2:0:0:0/cache_type
 echo temporary write through  /sys/class/scsi_disk/3:0:0:0/cache_type

 no luck, still same shit results, also i found this article:
 https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
 which disable CMD_FLUSH
 https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba

 Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
 without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
 because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this
 exception was not included into libsata.c)

 2015-08-12 19:17 GMT+03:00 Pieter Koorts pieter.koo...@me.com:

 Hi Igor

 I suspect you have very much the same problem as me.

 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html

 Basically Samsung drives (like many SATA SSD's) are very much hit and
 miss so you will need to test them like described here to see if they are
 any good.
 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 To give you an idea my average performance went from 11MB/s (with
 Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a
 very small cluster.

 Pieter

 On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor 
 igor.voloshane...@gmail.com wrote:

 Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes,
 12 disks on each, 10 HDD, 2 SSD)

 Also we cover this with custom crushmap with 2 root leaf

 ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -100 5.0 root ssd
 -102 1.0 host ix-s2-ssd
2 1.0 osd.2   up  1.0  1.0
9 1.0 osd.9   up  1.0  1.0

[ceph-users] CEPH cache layer. Very slow

2015-08-12 Thread Voloshanenko Igor
Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
disks on each, 10 HDD, 2 SSD)

Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0 host ix-s2-ssd
   2 1.0 osd.2   up  1.0  1.0
   9 1.0 osd.9   up  1.0  1.0
-103 1.0 host ix-s3-ssd
   3 1.0 osd.3   up  1.0  1.0
   7 1.0 osd.7   up  1.0  1.0
-104 1.0 host ix-s5-ssd
   1 1.0 osd.1   up  1.0  1.0
   6 1.0 osd.6   up  1.0  1.0
-105 1.0 host ix-s6-ssd
   4 1.0 osd.4   up  1.0  1.0
   8 1.0 osd.8   up  1.0  1.0
-106 1.0 host ix-s7-ssd
   0 1.0 osd.0   up  1.0  1.0
   5 1.0 osd.5   up  1.0  1.0
  -1 5.0 root platter
  -2 1.0 host ix-s2-platter
  13 1.0 osd.13  up  1.0  1.0
  17 1.0 osd.17  up  1.0  1.0
  21 1.0 osd.21  up  1.0  1.0
  27 1.0 osd.27  up  1.0  1.0
  32 1.0 osd.32  up  1.0  1.0
  37 1.0 osd.37  up  1.0  1.0
  44 1.0 osd.44  up  1.0  1.0
  48 1.0 osd.48  up  1.0  1.0
  55 1.0 osd.55  up  1.0  1.0
  59 1.0 osd.59  up  1.0  1.0
  -3 1.0 host ix-s3-platter
  14 1.0 osd.14  up  1.0  1.0
  18 1.0 osd.18  up  1.0  1.0
  23 1.0 osd.23  up  1.0  1.0
  28 1.0 osd.28  up  1.0  1.0
  33 1.0 osd.33  up  1.0  1.0
  39 1.0 osd.39  up  1.0  1.0
  43 1.0 osd.43  up  1.0  1.0
  47 1.0 osd.47  up  1.0  1.0
  54 1.0 osd.54  up  1.0  1.0
  58 1.0 osd.58  up  1.0  1.0
  -4 1.0 host ix-s5-platter
  11 1.0 osd.11  up  1.0  1.0
  16 1.0 osd.16  up  1.0  1.0
  22 1.0 osd.22  up  1.0  1.0
  26 1.0 osd.26  up  1.0  1.0
  31 1.0 osd.31  up  1.0  1.0
  36 1.0 osd.36  up  1.0  1.0
  41 1.0 osd.41  up  1.0  1.0
  46 1.0 osd.46  up  1.0  1.0
  51 1.0 osd.51  up  1.0  1.0
  56 1.0 osd.56  up  1.0  1.0
  -5 1.0 host ix-s6-platter
  12 1.0 osd.12  up  1.0  1.0
  19 1.0 osd.19  up  1.0  1.0
 24 1.0 osd.24  up  1.0  1.0
  29 1.0 osd.29  up  1.0  1.0
  34 1.0 osd.34  up  1.0  1.0
  38 1.0 osd.38  up  1.0  1.0
  42 1.0 osd.42  up  1.0  1.0
  50 1.0 osd.50  up  1.0  1.0
  53 1.0 osd.53  up  1.0  1.0
  57 1.0 osd.57  up  1.0  1.0
  -6 1.0 host ix-s7-platter
  10 1.0 osd.10  up  1.0  1.0
  15 1.0 osd.15  up  1.0  1.0
  20 1.0 osd.20  up  1.0  1.0
  25 1.0 osd.25  up  1.0  1.0
  30 1.0 osd.30  up  1.0  1.0
  35 1.0 osd.35  up  1.0  1.0
  40 1.0 osd.40  up  1.0  1.0
  45 1.0 osd.45  up  1.0  1.0
  49 1.0 osd.49  up  1.0  1.0
  52 1.0 osd.52  up  1.0  1.0


Then create 2 pools, 1 on HDD (platters), 1 on SSD/
and put SSD pul in from of HDD pool (cache tier)

now we receive very bad performance results from cluster.
Even with rados