Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Wido den Hollander


On 03-11-15 01:54, Voloshanenko Igor wrote:
> Thank you, Jason!
> 
> Any advice, for troubleshooting
> 
> I'm looking in code, and right now don;t see any bad things :(
> 

Can you run the CloudStack Agent in DEBUG mode and then see after which
lines in the logs it crashes?

Wido

> 2015-11-03 1:32 GMT+02:00 Jason Dillaman  >:
> 
> Most likely not going to be related to 13045 since you aren't
> actively exporting an image diff.  The most likely problem is that
> the RADOS IO context is being closed prior to closing the RBD image.
> 
> --
> 
> Jason Dillaman
> 
> 
> - Original Message -
> 
> > From: "Voloshanenko Igor"  >
> > To: "Ceph Users"  >
> > Sent: Thursday, October 29, 2015 5:27:17 PM
> > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
> exception in
> > librbd
> 
> > From all we analyzed - look like - it's this issue
> > http://tracker.ceph.com/issues/13045
> 
> > PR: https://github.com/ceph/ceph/pull/6097
> 
> > Can anyone help us to confirm this? :)
> 
> > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
> igor.voloshane...@gmail.com  >
> > :
> 
> > > Additional trace:
> >
> 
> > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> >
> > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89
> >
> > > #2 0x7f30f87b36b5 in
> __gnu_cxx::__verbose_terminate_handler() () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #3 0x7f30f87b1836 in ?? () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #4 0x7f30f87b1863 in std::terminate() () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #5 0x7f30f87b1aa2 in __cxa_throw () from
> > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >
> > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail
> > > (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
> >
> > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h",
> line=line@entry=62,
> >
> > > func=func@entry=0x7f2fdddedba0
> > >
> <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> 
> "bool
> > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> > > common/assert.cc:77
> >
> > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> > > (level=, sub=, this=)
> >
> > > at ./log/SubsystemMap.h:62
> >
> > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> > > (this=, sub=, level=)
> >
> > > at ./log/SubsystemMap.h:61
> >
> > > #9 0x7f2fddd879be in ObjectCacher::flusher_entry
> (this=0x7f2ff80b27a0)
> > > at
> > > osdc/ObjectCacher.cc:1527
> >
> > > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
> > > (this= > > out>) at osdc/ObjectCacher.h:374
> >
> > > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
> > > pthread_create.c:312
> >
> > > #12 0x7f30f995547d in clone () at
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> 
> > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <
> igor.voloshane...@gmail.com 
> > > >
> > > :
> >
> 
> > > > Hi Wido and all community.
> > >
> >
> 
> > > > We catched very idiotic issue on our Cloudstack installation,
> which
> > > > related
> > > > to ceph and possible to java-rados lib.
> > >
> >
> 
> > > > So, we have constantly agent crashed (which cause very big
> problem for
> > > > us...
> > > > ).
> > >
> >
> 
> > > > When agent crashed - it's crash JVM. And no event in logs at all.
> > >
> >
> > > > We enabled crush dump, and after crash we see next picture:
> > >
> >
> 
> > > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log
> > >
> >
> > > > Problematic frame:
> > >
> >
> > > > C [librbd.so.1.0.0+0x5d681]
> > >
> >
> 
> > > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
> > >
> >
> > > > (gdb) bt
> > >
> >
> > > > ...
> > >
> >
> > > > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
> > > > (level=, sub=, this=)
> > >
> >
> > > > at ./log/SubsystemMap.h:62
> > >
> >
> > > > #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
> > > > (this=, sub=, level=)
> > >
> >
> > > > at ./log/SubsystemMap.h:61
> > >
> >
> > > > #9 0x7f30b9d879be in ObjectCacher::flusher_entry
> > > > 

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Wido, also minor issue with 0,2.0 java-rados

We still catch:

-storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876
2015-11-02 11:41:14,958 WARN  [cloud.agent.Agent]
(agentRequest-Handler-4:null) Caught:
java.lang.NegativeArraySizeException
at com.ceph.rbd.RbdImage.snapList(Unknown Source)
at
com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
at
com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
at
com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57)
at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385)
at com.cloud.agent.Agent.processRequest(Agent.java:503)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
at com.cloud.utils.nio.Task.run(Task.java:84)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Even with updated lib:

root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls
/usr/share/cloudstack-agent/lib | grep rados
rados-0.2.0.jar

2015-11-03 11:01 GMT+02:00 Voloshanenko Igor :

> Wido, it's the main issue. No records at all...
>
>
> So, from last time:
>
>
> 2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk
> '{print $2}'
> 2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Execution is successful.
> 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processing command:
> com.cloud.agent.api.GetVmStatsCommand
> 2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null) Agent
> started
> 2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
> Implementation Version is 4.5.1
>
> So, almost alsways it's exception after RbdUnprotect then in approx . 20
> minutes - crash..
> Almost all the time - it's happen after GetVmStatsCommand or Disks
> stats... Possible that evil hiden into UpadteDiskInfo method... but i can;t
> find any bad code there (((
>
> 2015-11-03 10:40 GMT+02:00 Wido den Hollander :
>
>>
>>
>> On 03-11-15 01:54, Voloshanenko Igor wrote:
>> > Thank you, Jason!
>> >
>> > Any advice, for troubleshooting
>> >
>> > I'm looking in code, and right now don;t see any bad things :(
>> >
>>
>> Can you run the CloudStack Agent in DEBUG mode and then see after which
>> lines in the logs it crashes?
>>
>> Wido
>>
>> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman > > >:
>> >
>> > Most likely not going to be related to 13045 since you aren't
>> > actively exporting an image diff.  The most likely problem is that
>> > the RADOS IO context is being closed prior to closing the RBD image.
>> >
>> > --
>> >
>> > Jason Dillaman
>> >
>> >
>> > - Original Message -
>> >
>> > > From: "Voloshanenko Igor" > > >
>> > > To: "Ceph Users" > > >
>> > > Sent: Thursday, October 29, 2015 5:27:17 PM
>> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
>> > exception in
>> > > librbd
>> >
>> > > From all we analyzed - look like - it's this issue
>> > > http://tracker.ceph.com/issues/13045
>> >
>> > > PR: https://github.com/ceph/ceph/pull/6097
>> >
>> > > Can anyone help us to confirm this? :)
>> >
>> > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
>> > igor.voloshane...@gmail.com  >
>> > > :
>> >
>> > > > Additional trace:
>> > >
>> >
>> > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
>> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> > >
>> > > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89
>> > >
>> > > > #2 0x7f30f87b36b5 in
>> > __gnu_cxx::__verbose_terminate_handler() () from
>> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> > >
>> > > > #3 0x7f30f87b1836 in ?? () from
>> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> > >
>> > > > #4 0x7f30f87b1863 in std::terminate() () from
>> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> > >
>> > > > #5 0x7f30f87b1aa2 in __cxa_throw () from
>> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> > >
>> > > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail
>> > > > 

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-03 Thread Chris Taylor

On 2015-11-03 12:01 am, gjprabu wrote:


Hi Taylor,

Details are below.

CEPH -S
cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
health HEALTH_OK
monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

election epoch 526, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7
osdmap e50127: 3 osds: 3 up, 3 in
pgmap v2923439: 190 pgs, 2 pools, 3401 GB data, 920 kobjects
6711 GB used, 31424 GB / 40160 GB avail
190 active+clean
client io 35153 kB/s rd, 1912 kB/s wr, 672 op/s

Client is automatically unmounted in our cause.

Is it possible to change the PG_num in the production setup.


Yes, but increase by small amounts. I would start at 5 and work up to 32 
at a time to find a comfortable number for your cluster. Too many at a 
time will cause a big performance problem until the cluster recovers.


Increase pg_num first: # ceph osd pool set {pool-name} pg_num {pg_num}

PGs will finish peering.

Then increase pgp_num: ceph osd pool set {pool-name} pgp_num {pg_num}

Objects will replicate until cluster is re-balanced.

Take a look at: 
http://docs.ceph.com/docs/master/rados/operations/placement-groups/




Journal stored on SATA 7.2k RPM 6GPS and 1gb network interface.



For every write to the cluster the data is written to the journal and 
then to the backend filesystem. The process is repeated for each 
replica. This creates a lot of IO to the disk. Without journals on an 
SSD I think you are better off with each disk as it's own OSD instead of 
a large RAID array as a single OSD. Just my opinion. In the case of 
disk/array failure recovery time would be less because of less data to 
recover.


I think a single 1Gb network interface is not enough for a production 
network. A single SATA disk could saturate a 1Gb network link. But it 
will depend on your workload.



We are not configured Public and cluster as a separate network and it 
will be transferable via same LAN. Do we need to do this setup for 
better performance.


It will provide extra bandwidth for the cluster to replicate data 
between OSDs instead of using the public/client network.




Also what is beter i/o operation setting for the crush map.


AFAIK, the crush map controls data placement, not so much for 
performance.




Still we are getting errors in ceph osd logs ,what need to done for 
this error.


Have you tried disabling offloading on the NIC? # ethtool -K eth0 tx off



2015-11-03 13:04:18.809488 7f387019c700 0 BAD CRC IN DATA 3742210963 != 
EXP 924878202
2015-11-03 13:04:18.812911 7f387019c700 0 -- 192.168.112.231:6800/49908 
>> 192.168.112.192:0/1457324982 pipe(0x170d2000 sd=44 :6800 s=0 pgs=0 cs=0 l=0 c=0x1b18bf40).accept peer addr is really 192.168.112.192:0/1457324982 (socket is 192.168.112.192:47128/0)


Regards
Prabu

 On Tue, 03 Nov 2015 12:50:40 +0530 CHRIS TAYLOR 
 wrote 



On 2015-11-02 10:19 pm, gjprabu wrote:


Hi Taylor,

I have checked DNS name and all host resolve to the correct IP. MTU
size is 1500 in switch level configuration done. There is no 
firewall/

selinux is running currently.

Also we would like to know below query's which already in the thread.

Regards
Prabu

 On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR
 wrote 

I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all hosts?

"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."

Looks like the communication between subnets is a problem. Is
xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they
configured correctly on the switch and all NICs?

Is there any iptables/firewall rules that could be blocking traffic
between hosts?

Hope that helps,

Chris

On 2015-11-02 9:18 pm, gjprabu wrote:

Hi,

Anybody please help me on this issue.

Regards
Prabu

 On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU 


wrote 

Hi Team,

We have ceph setup with 2 OSD and replica 2 and it is mounted with
ocfs2 clients and its working. When we added new osd all the clients
rbd mapped device disconnected and got hanged by running rbd ls or 
rbd

map command. We waited for long hours to scale the new osd size but
peering not completed event data sync finished, but client side issue
was persist and thought to try old osd service stop/start, after some
time rbd mapped automatically using existing map script.

After service stop/start in old osd again 3rd OSD rebuild and back
filling started and after some time clients rbd mapped device
disconnected and got hanged by running rbd ls or rbd map command. We
thought to wait till to finished data sync in 3'rd OSD and its
completed, even though client side rbd not mapped. After we restarted
all mon and osd service and client side issue got fixed and mounted
rbd. We suspected some issue in our setup. also attached logs for 
your

reference.



What does 'ceph -s' 

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Wido den Hollander


On 03-11-15 10:04, Voloshanenko Igor wrote:
> Wido, also minor issue with 0,2.0 java-rados
> 

Did you also re-compile CloudStack against the new rados-java? I still
think it's related to when the Agent starts cleaning up and there are
snapshots which need to be unprotected.

In the meantime you might want to remove any existing RBD snapshots
using the RBD commands from Ceph, that might solve the problem.

Wido

> We still catch:
> 
> -storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876
> 2015-11-02 11:41:14,958 WARN  [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Caught:
> java.lang.NegativeArraySizeException
> at com.ceph.rbd.RbdImage.snapList(Unknown Source)
> at
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
> at
> com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
> at
> com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
> at
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)
> at
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57)
> at
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385)
> at com.cloud.agent.Agent.processRequest(Agent.java:503)
> at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
> at com.cloud.utils.nio.Task.run(Task.java:84)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> Even with updated lib:
> 
> root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls
> /usr/share/cloudstack-agent/lib | grep rados
> rados-0.2.0.jar
> 
> 2015-11-03 11:01 GMT+02:00 Voloshanenko Igor
> >:
> 
> Wido, it's the main issue. No records at all...
> 
> 
> So, from last time:
> 
> 
> 2015-11-02 11:40:33,204 DEBUG
> [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep
> Mem:|awk '{print $2}'
> 2015-11-02 11:40:33,207 DEBUG
> [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Execution is successful.
> 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processing command:
> com.cloud.agent.api.GetVmStatsCommand
> 2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null)
> Agent started
> 2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
> Implementation Version is 4.5.1
> 
> So, almost alsways it's exception after RbdUnprotect then in approx
> . 20 minutes - crash..
> Almost all the time - it's happen after GetVmStatsCommand or Disks
> stats... Possible that evil hiden into UpadteDiskInfo method... but
> i can;t find any bad code there (((
> 
> 2015-11-03 10:40 GMT+02:00 Wido den Hollander  >:
> 
> 
> 
> On 03-11-15 01:54, Voloshanenko Igor wrote:
> > Thank you, Jason!
> >
> > Any advice, for troubleshooting
> >
> > I'm looking in code, and right now don;t see any bad things :(
> >
> 
> Can you run the CloudStack Agent in DEBUG mode and then see
> after which
> lines in the logs it crashes?
> 
> Wido
> 
> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman  
> > >>:
> >
> > Most likely not going to be related to 13045 since you aren't
> > actively exporting an image diff.  The most likely problem is 
> that
> > the RADOS IO context is being closed prior to closing the RBD 
> image.
> >
> > --
> >
> > Jason Dillaman
> >
> >
> > - Original Message -
> >
> > > From: "Voloshanenko Igor"  
> >  >>
> > > To: "Ceph Users"  
> >  >>
> > > Sent: Thursday, October 29, 2015 5:27:17 PM
> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
> > exception in
> > > librbd
> >
> > > From all we analyzed - look like - it's this issue
> > > http://tracker.ceph.com/issues/13045
> >
> > > PR: 

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Yes, we recompiled ACS too

Also we delete all snapshots... but we can do it for a while...

New snapshot created each days.. And the main issue - agent crash, not
exception itself...

Each RBD operations which cause exception in 20-30 minutes cause agent
crash...

2015-11-03 11:09 GMT+02:00 Wido den Hollander :

>
>
> On 03-11-15 10:04, Voloshanenko Igor wrote:
> > Wido, also minor issue with 0,2.0 java-rados
> >
>
> Did you also re-compile CloudStack against the new rados-java? I still
> think it's related to when the Agent starts cleaning up and there are
> snapshots which need to be unprotected.
>
> In the meantime you might want to remove any existing RBD snapshots
> using the RBD commands from Ceph, that might solve the problem.
>
> Wido
>
> > We still catch:
> >
> > -storage/ae1b6e5f-f5f4-4abe-aee3-084f2fe71876
> > 2015-11-02 11:41:14,958 WARN  [cloud.agent.Agent]
> > (agentRequest-Handler-4:null) Caught:
> > java.lang.NegativeArraySizeException
> > at com.ceph.rbd.RbdImage.snapList(Unknown Source)
> > at
> >
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
> > at
> >
> com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
> > at
> >
> com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
> > at
> >
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)
> > at
> >
> com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:57)
> > at
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1385)
> > at com.cloud.agent.Agent.processRequest(Agent.java:503)
> > at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
> > at com.cloud.utils.nio.Task.run(Task.java:84)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > Even with updated lib:
> >
> > root@ix1-c7-2:/usr/share/cloudstack-agent/lib# ls
> > /usr/share/cloudstack-agent/lib | grep rados
> > rados-0.2.0.jar
> >
> > 2015-11-03 11:01 GMT+02:00 Voloshanenko Igor
> > >:
> >
> > Wido, it's the main issue. No records at all...
> >
> >
> > So, from last time:
> >
> >
> > 2015-11-02 11:40:33,204 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep
> > Mem:|awk '{print $2}'
> > 2015-11-02 11:40:33,207 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > (agentRequest-Handler-2:null) Execution is successful.
> > 2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
> > (agentRequest-Handler-4:null) Processing command:
> > com.cloud.agent.api.GetVmStatsCommand
> > 2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null)
> > Agent started
> > 2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
> > Implementation Version is 4.5.1
> >
> > So, almost alsways it's exception after RbdUnprotect then in approx
> > . 20 minutes - crash..
> > Almost all the time - it's happen after GetVmStatsCommand or Disks
> > stats... Possible that evil hiden into UpadteDiskInfo method... but
> > i can;t find any bad code there (((
> >
> > 2015-11-03 10:40 GMT+02:00 Wido den Hollander  > >:
> >
> >
> >
> > On 03-11-15 01:54, Voloshanenko Igor wrote:
> > > Thank you, Jason!
> > >
> > > Any advice, for troubleshooting
> > >
> > > I'm looking in code, and right now don;t see any bad things :(
> > >
> >
> > Can you run the CloudStack Agent in DEBUG mode and then see
> > after which
> > lines in the logs it crashes?
> >
> > Wido
> >
> > > 2015-11-03 1:32 GMT+02:00 Jason Dillaman  
> > > >>:
> > >
> > > Most likely not going to be related to 13045 since you
> aren't
> > > actively exporting an image diff.  The most likely problem
> is that
> > > the RADOS IO context is being closed prior to closing the
> RBD image.
> > >
> > > --
> > >
> > > Jason Dillaman
> > >
> > >
> > > - Original Message -
> > >
> > > > From: "Voloshanenko Igor"  
> > >  > >>
> > > > 

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-03 Thread gjprabu
Hi Taylor,



   Details are below.



ceph -s

cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f

 health HEALTH_OK

 monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

election epoch 526, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7

 osdmap e50127: 3 osds: 3 up, 3 in

  pgmap v2923439: 190 pgs, 2 pools, 3401 GB data, 920 kobjects

6711 GB used, 31424 GB / 40160 GB avail

 190 active+clean

 client io 35153 kB/s rd, 1912 kB/s wr, 672 op/s



Client is automatically unmounted in our cause.



Is it possible to change the PG_num in the production setup.



Journal stored on SATA 7.2k RPM  6GPS and 1gb network interface.



We are not configured Public and cluster as a separate network and it will be 
transferable via same LAN. Do we need to do this setup for better performance.



Also what is beter i/o operation setting for the crush map.





Still we are getting errors in ceph osd logs ,what need to done for this error.



2015-11-03 13:04:18.809488 7f387019c700  0 bad crc in data 3742210963 != exp 
924878202

2015-11-03 13:04:18.812911 7f387019c700  0 -- 192.168.112.231:6800/49908 
 192.168.112.192:0/1457324982 pipe(0x170d2000 sd=44 :6800 s=0 pgs=0 
cs=0 l=0 c=0x1b18bf40).accept peer addr is really 192.168.112.192:0/1457324982 
(socket is 192.168.112.192:47128/0)





Regards

Prabu










 On Tue, 03 Nov 2015 12:50:40 +0530 Chris Taylor ctay...@eyonic.com 
wrote 




On 2015-11-02 10:19 pm, gjprabu wrote: 



 Hi Taylor, 

 

 I have checked DNS name and all host resolve to the correct IP. MTU 

 size is 1500 in switch level configuration done. There is no firewall/ 

 selinux is running currently. 

 

 Also we would like to know below query's which already in the thread. 

 

 Regards 

 Prabu 

 

  On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR 

 ctay...@eyonic.com wrote  

 

 I would double check the network configuration on the new node. 

 Including hosts files and DNS names. Do all the host names resolve to 

 the correct IP addresses from all hosts? 

 

 "... 192.168.112.231:6800/49908  192.168.113.42:0/599324131 ..." 

 

 Looks like the communication between subnets is a problem. Is 

 xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they 

 configured correctly on the switch and all NICs? 

 

 Is there any iptables/firewall rules that could be blocking traffic 

 between hosts? 

 

 Hope that helps, 

 

 Chris 

 

 On 2015-11-02 9:18 pm, gjprabu wrote: 

 

 Hi, 

 

 Anybody please help me on this issue. 

 

 Regards 

 Prabu 

 

  On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU 
gjpr...@zohocorp.com 

 wrote  

 

 Hi Team, 

 

 We have ceph setup with 2 OSD and replica 2 and it is mounted with 

 ocfs2 clients and its working. When we added new osd all the clients 

 rbd mapped device disconnected and got hanged by running rbd ls or rbd 

 map command. We waited for long hours to scale the new osd size but 

 peering not completed event data sync finished, but client side issue 

 was persist and thought to try old osd service stop/start, after some 

 time rbd mapped automatically using existing map script. 

 

 After service stop/start in old osd again 3rd OSD rebuild and back 

 filling started and after some time clients rbd mapped device 

 disconnected and got hanged by running rbd ls or rbd map command. We 

 thought to wait till to finished data sync in 3'rd OSD and its 

 completed, even though client side rbd not mapped. After we restarted 

 all mon and osd service and client side issue got fixed and mounted 

 rbd. We suspected some issue in our setup. also attached logs for your 

 reference. 

 



What does 'ceph -s' look like? is the cluster HEALTH_OK? 



 

 Something we are missing in our setup i don't know, highly appreciated 

 if anybody help us to solve this issue. 

 

 Before new osd.2 addition : 

 

 osd.0 - size : 13T and used 2.7 T 

 osd.1 - size : 13T and used 2.7 T 

 

 After new osd addition : 

 osd.0 size : 13T and used 1.8T 

 osd.1 size : 13T and used 2.1T 

 osd.2 size : 15T and used 2.5T 

 

 rbd ls 

 repo / integrepository (pg_num: 126) 

 rbd / integdownloads (pg_num: 64) 

 

 Also we would like to know few clarifications . 

 

 If any new osd will be added whether all client will be unmounted 

 automatically . 

 



Clients do not need to unmount images when OSDs are added. 



 While add new osd can we access ( read / write ) from client machines ? 

 



Clients still have read/write access to RBD images in the cluster while 

adding OSDs and during recovery. 



 How much data will be added in new osd - without change any repilca / 

 pg_num ? 

 



The data will re-balance between OSDs automatically. I found having more 

PGs help distribute the load more evenly. 



 How long to take finish this process ? 



Depends greatly on the hardware and configuration. 

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-11-03 Thread Voloshanenko Igor
Wido, it's the main issue. No records at all...


So, from last time:


2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk
'{print $2}'
2015-11-02 11:40:33,207 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.
2015-11-02 11:40:35,316 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-4:null) Processing command:
com.cloud.agent.api.GetVmStatsCommand
2015-11-02 11:40:35,867 INFO  [cloud.agent.AgentShell] (main:null) Agent
started
2015-11-02 11:40:35,868 INFO  [cloud.agent.AgentShell] (main:null)
Implementation Version is 4.5.1

So, almost alsways it's exception after RbdUnprotect then in approx . 20
minutes - crash..
Almost all the time - it's happen after GetVmStatsCommand or Disks stats...
Possible that evil hiden into UpadteDiskInfo method... but i can;t find any
bad code there (((

2015-11-03 10:40 GMT+02:00 Wido den Hollander :

>
>
> On 03-11-15 01:54, Voloshanenko Igor wrote:
> > Thank you, Jason!
> >
> > Any advice, for troubleshooting
> >
> > I'm looking in code, and right now don;t see any bad things :(
> >
>
> Can you run the CloudStack Agent in DEBUG mode and then see after which
> lines in the logs it crashes?
>
> Wido
>
> > 2015-11-03 1:32 GMT+02:00 Jason Dillaman  > >:
> >
> > Most likely not going to be related to 13045 since you aren't
> > actively exporting an image diff.  The most likely problem is that
> > the RADOS IO context is being closed prior to closing the RBD image.
> >
> > --
> >
> > Jason Dillaman
> >
> >
> > - Original Message -
> >
> > > From: "Voloshanenko Igor"  > >
> > > To: "Ceph Users"  > >
> > > Sent: Thursday, October 29, 2015 5:27:17 PM
> > > Subject: Re: [ceph-users] Cloudstack agent crashed JVM with
> > exception in
> > > librbd
> >
> > > From all we analyzed - look like - it's this issue
> > > http://tracker.ceph.com/issues/13045
> >
> > > PR: https://github.com/ceph/ceph/pull/6097
> >
> > > Can anyone help us to confirm this? :)
> >
> > > 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor <
> > igor.voloshane...@gmail.com  >
> > > :
> >
> > > > Additional trace:
> > >
> >
> > > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> > >
> > > > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89
> > >
> > > > #2 0x7f30f87b36b5 in
> > __gnu_cxx::__verbose_terminate_handler() () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #3 0x7f30f87b1836 in ?? () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #4 0x7f30f87b1863 in std::terminate() () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #5 0x7f30f87b1aa2 in __cxa_throw () from
> > > > /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > >
> > > > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail
> > > > (assertion=assertion@entry=0x7f2fdddeca05 "sub <
> m_subsys.size()",
> > >
> > > > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h",
> > line=line@entry=62,
> > >
> > > > func=func@entry=0x7f2fdddedba0
> > > >
> >
>  <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
> > > > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> > > > common/assert.cc:77
> > >
> > > > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> > > > (level=, sub=, this= out>)
> > >
> > > > at ./log/SubsystemMap.h:62
> > >
> > > > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> > > > (this=, sub=, level= out>)
> > >
> > > > at ./log/SubsystemMap.h:61
> > >
> > > > #9 0x7f2fddd879be in ObjectCacher::flusher_entry
> > (this=0x7f2ff80b27a0)
> > > > at
> > > > osdc/ObjectCacher.cc:1527
> > >
> > > > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
> > > > (this= > > > out>) at osdc/ObjectCacher.h:374
> > >
> > > > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
> > > > pthread_create.c:312
> > >
> > > > #12 0x7f30f995547d in clone () at
> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > >
> >
> > > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor <
> > igor.voloshane...@gmail.com 
> > > > >
> > > > :
> > >
> >
> > > > > Hi Wido and all community.
> > > >
> > >
> >
> > > > > We catched very idiotic issue on our Cloudstack installation,
> > which
> > > > > related
> > 

[ceph-users] rados bench leaves objects in tiered pool

2015-11-03 Thread Дмитрий Глушенок
Hi,

While benchmarking tiered pool using rados bench it was noticed that objects 
are not being removed after test.

Test was performed using "rados -p rbd bench 3600 write". The pool is not used 
by anything else.

Just before end of test:
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
rbd-cache 36 33110M  3.41  114G8366
rbd   37 43472M  4.47  237G   10858

Some time later (few hundreds of writes are flushed, rados automatic cleanup 
finished):
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
rbd-cache 36  22998 0  157G   16342
rbd   37 46050M  4.74  234G   11503

# rados -p rbd-cache ls | wc -l
16242
# rados -p rbd ls | wc -l
11503
#

# rados -p rbd cleanup
error during cleanup: -2
error 2: (2) No such file or directory
#

# rados -p rbd cleanup --run-name "" --prefix prefix ""
 Warning: using slow linear search
 Removed 0 objects
#

# rados -p rbd ls | head -5
benchmark_data_dropbox01.tzk_7641_object10901
benchmark_data_dropbox01.tzk_7641_object9645
benchmark_data_dropbox01.tzk_7641_object10389
benchmark_data_dropbox01.tzk_7641_object10090
benchmark_data_dropbox01.tzk_7641_object11204
#

#  rados -p rbd-cache ls | head -5
benchmark_data_dropbox01.tzk_7641_object10901
benchmark_data_dropbox01.tzk_7641_object9645
benchmark_data_dropbox01.tzk_7641_object10389
benchmark_data_dropbox01.tzk_7641_object5391
benchmark_data_dropbox01.tzk_7641_object10090
#

So, it looks like the objects are still in place (in both pools?). But it is 
not possible to remove them:

# rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901
error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No such 
file or directory
#

# ceph health
HEALTH_OK
#


Can somebody explain the behavior? And is it possible to cleanup the benchmark 
data without recreating the pools?


ceph version 0.94.5

# ceph osd dump | grep rbd
pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 100 pgp_num 100 last_change 755 flags 
hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 
107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, 
seed: 0} 3600s x1 stripe_width 0
pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins 
pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 
read_tier 36 write_tier 36 stripe_width 4128
#

# ceph osd pool get rbd-cache hit_set_type
hit_set_type: bloom
# ceph osd pool get rbd-cache hit_set_period
hit_set_period: 3600
# ceph osd pool get rbd-cache hit_set_count
hit_set_count: 1
# ceph osd pool get rbd-cache target_max_objects
target_max_objects: 0
# ceph osd pool get rbd-cache target_max_bytes
target_max_bytes: 107374182400
# ceph osd pool get rbd-cache cache_target_dirty_ratio
cache_target_dirty_ratio: 0.1
# ceph osd pool get rbd-cache cache_target_full_ratio
cache_target_full_ratio: 0.2
#

Crush map:
root cache_tier {   
id -7   # do not change unnecessarily
# weight 0.450
alg straw   
hash 0  # rjenkins1
item osd.0 weight 0.090
item osd.1 weight 0.090
item osd.2 weight 0.090
item osd.3 weight 0.090
item osd.4 weight 0.090
}
root store_tier {   
id -8   # do not change unnecessarily
# weight 0.450
alg straw   
hash 0  # rjenkins1
item osd.5 weight 0.090
item osd.6 weight 0.090
item osd.7 weight 0.090
item osd.8 weight 0.090
item osd.9 weight 0.090
}
rule cache {
ruleset 1
type replicated
min_size 0
max_size 5
step take cache_tier
step chooseleaf firstn 0 type osd
step emit
}
rule store {
ruleset 2
type erasure
min_size 0
max_size 5
step take store_tier
step chooseleaf firstn 0 type osd
step emit
}

Thanks

--
Dmitry Glushenok
Jet Infosystems

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One object in .rgw.buckets.index causes systemic instability

2015-11-03 Thread Gerd Jakobovitsch

Dear all,

I have a cluster running hammer (0.94.5), with 5 nodes. The main usage 
is for S3-compatible object storage.
I am getting to a very troublesome problem at a ceph cluster. A single 
object in the .rgw.buckets.index is not responding to request and takes 
a very long time while recovering after an osd restart. During this 
time, the OSDs where this object is mapped got heavily loaded, with high 
cpu as well as memory usage. At the same time, the directory 
/var/lib/ceph/osd/ceph-XX/current/omap gets a large number of entries ( 
> 1), that won't decrease.


Very frequently, I get >100 blocked requests for this object, and the 
main OSD that stores it ends up accepting no other requests. Very 
frequently the OSD ends up crashing due to filestore timeout, and 
getting it up again is very troublesome - it usually has to run alone in 
the node for a long time, until the object gets recovered, somehow.


At the OSD logs, there are several entries like these:
 -7051> 2015-11-03 10:46:08.339283 7f776974f700 10 log_client logged 
2015-11-03 10:46:02.942023 osd.63 10.17.0.9:6857/2002 41 : cluster [WRN] 
slow re
quest 120.003081 seconds old, received at 2015-11-03 10:43:56.472825: 
osd_repop(osd.53.236531:7 34.7 
8a7482ff/.dir.default.198764998.1/head//34 v 2369

84'22) currently commit_sent


2015-11-03 10:28:32.405265 7f0035982700  0 log_channel(cluster) log 
[WRN] : 97 slow requests, 1 included below; oldest blocked for > 
2046.502848 secs
2015-11-03 10:28:32.405269 7f0035982700  0 log_channel(cluster) log 
[WRN] : slow request 1920.676998 seconds old, received at 2015-11-03 
09:56:31.7282
24: osd_op(client.210508702.0:14696798 .dir.default.198764998.1 [call 
rgw.bucket_prepare_op] 15.8a7482ff ondisk+write+known_if_redirected 
e236956) cur

rently waiting for blocked object

Is there any way to go deeper into this problem, or to rebuild the .rgw 
index without loosing data? I currently have 30 TB of data in the 
cluster - most of it concentrated in a handful of buckets - that I can't 
loose.


Regards.
--



--

As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo 
sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou 
qualquer forma de utilização do teor deste documento depende de autorização do 
emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação 
tenha sido recebida por engano, favor avisar imediatamente, respondendo esta 
mensagem.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Choosing hp sata or sas SSDs for journals

2015-11-03 Thread Karsten Heymann
Hi,

has anyone experiences with hp-branded ssds for journaling? Given that
everything else is fixed (raid controller, cpu, etc...) and a fixed budget,
would it be better to go with more of the cheaper 6G SATA Write intensive
drives or should I aim for (then fewer) 12G SAS models? Here are the specs:

HP 6G SATA Write Intensive 200 GB (804639-B21):
- Sequential reads / writes (MB/s): 540 / 300
- Random reads /writes (IOPS): 64,500 / 42,000
- DWPD: 10

HP 12G SAS Mainstream Endurance 200 GB (779164-B21):
- Sequential reads / writes (MB/s): 1,000 / 510
- Random reads /writes (IOPS): 70,000 / 51,000
- DWPD: 10

HP 12G SAS Write Intensive 200 GB (802578-B21):
- Sequential reads / writes (MB/s): 1,000 / 660
- Random reads /writes (IOPS): 106,000 / 83,000
- DWPD: 25

(Source: http://www8.hp.com/h20195/v2/GetPDF.aspx%2F4AA4-7186ENW.pdf)

I know that asking does not free me from benchmarking, but maybe someone
has a rough estimate?

Best regards
Karsten
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Openstack deployment

2015-11-03 Thread Iban Cabrillo
Hi all,
During last week I been trying to deploy the pre-existing ceph cluster
with out openstack intance.
The ceph-cinder integration was easy (or at least I think so!!)
There is only one volume to attach block storage to out cloud machines.

The client.cinder has permission on this volume (following the guides)
...
client.cinder
key: AQAonXXXRAAPIAj9iErv001a0k+vyFdUg==
caps: [mon] allow r
*caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=volumes*

   ceph.conf file seems to be OK:

*[global]*
*fsid = 6f5a65a7-316c-4825-afcb-428608941dd1*
*mon_initial_members = cephadm, cephmon02, cephmon03*
*mon_host = 10.10.3.1,10.10.3.2,10.10.3.3*
*auth_cluster_required = cephx*
*auth_service_required = cephx*
*auth_client_required = cephx*
*filestore_xattr_use_omap = true*
*osd_pool_default_size = 2*
*public_network = 10.10.0.0/16 *
*cluster_network = 192.168.254.0/27 *

*[osd]*
*osd_journal_size = 2*

*[client.cinder]*
*keyring = /etc/ceph/ceph.client.cinder.keyring*

*[client]*
*rbd cache = true*
*rbd cache writethrough until flush = true*
*admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok*


The trouble seems that blocks are created using the client.admin instead of
client.cinder

>From cinder machine:

cinder:~ # rados ls --pool volumes
rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
rbd_directory
rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee
rbd_header.23d5e33b4c15c
rbd_header.20407190ce77f

But if I try to look for using cinder client:


  cinder:~ #rados ls --pool volumes --secret client.cinder
  "empty answer"

cinder:~ # ls -la /etc/ceph
total 24
drwxr-xr-x   2 root   root   4096 nov  3 10:17 .
drwxr-xr-x 108 root   root   4096 oct 29 09:52 ..
-rw---   1 root   root 63 nov  3 10:17 ceph.client.admin.keyring
-rw-r--r--   1 cinder cinder   67 oct 28 13:44 ceph.client.cinder.keyring
-rw-r--r--   1 root   root454 oct  1 13:56 ceph.conf
-rw-r--r--   1 root   root 73 sep 27 09:36 ceph.mon.keyring


from a client (I have supposed that this machine only need the cinder
key...)

cloud28:~ # ls -la /etc/ceph/
total 28
drwx--   2 root root  4096 nov  3 11:01 .
drwxr-xr-x 116 root root 12288 oct 30 14:37 ..
-rw-r--r--   1 nova nova67 oct 28 11:43 ceph.client.cinder.keyring
-rw-r--r--   1 root root   588 nov  3 10:59 ceph.conf
-rw-r--r--   1 root root92 oct 26 16:59 rbdmap

cloud28:~ # rbd -p volumes ls
2015-11-03 11:01:58.782795 7fc6c714b840 -1 monclient(hunting): ERROR:
missing keyring, cannot use cephx for authentication
2015-11-03 11:01:58.782800 7fc6c714b840  0 librados: client.admin
initialization error (2) No such file or directory
rbd: couldn't connect to the cluster!

Any help will be welcome.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench leaves objects in tiered pool

2015-11-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Try:

rados -p {cachepool} cache-flush-evict-all

and see if the objects clean up.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum  wrote:
> When you have a caching pool in writeback mode, updates to objects
> (including deletes) are handled by writeback rather than writethrough.
> Since there's no other activity against these pools, there is nothing
> prompting the cache pool to flush updates out to the backing pool, so
> the backing pool hasn't deleted its objects because nothing's told it
> to. You'll find that the cache pool has deleted the data for its
> objects, but it's keeping around a small "whiteout" and the object
> info metadata.
> The "rados ls" you're using has never played nicely with cache tiering
> and probably never will. :( Listings are expensive operations and
> modifying them to do more than the simple info scan would be fairly
> expensive in terms of computation and IO.
>
> I think there are some caching commands you can send to flush updates
> which would cause the objects to be entirely deleted, but I don't have
> them off-hand. You can probably search the mailing list archives or
> the docs for tiering commands. :)
> -Greg
>
> On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок  wrote:
>> Hi,
>>
>> While benchmarking tiered pool using rados bench it was noticed that objects 
>> are not being removed after test.
>>
>> Test was performed using "rados -p rbd bench 3600 write". The pool is not 
>> used by anything else.
>>
>> Just before end of test:
>> POOLS:
>> NAME  ID USED   %USED MAX AVAIL 
>> OBJECTS
>> rbd-cache 36 33110M  3.41  114G
>> 8366
>> rbd   37 43472M  4.47  237G   
>> 10858
>>
>> Some time later (few hundreds of writes are flushed, rados automatic cleanup 
>> finished):
>> POOLS:
>> NAME  ID USED   %USED MAX AVAIL 
>> OBJECTS
>> rbd-cache 36  22998 0  157G   
>> 16342
>> rbd   37 46050M  4.74  234G   
>> 11503
>>
>> # rados -p rbd-cache ls | wc -l
>> 16242
>> # rados -p rbd ls | wc -l
>> 11503
>> #
>>
>> # rados -p rbd cleanup
>> error during cleanup: -2
>> error 2: (2) No such file or directory
>> #
>>
>> # rados -p rbd cleanup --run-name "" --prefix prefix ""
>>  Warning: using slow linear search
>>  Removed 0 objects
>> #
>>
>> # rados -p rbd ls | head -5
>> benchmark_data_dropbox01.tzk_7641_object10901
>> benchmark_data_dropbox01.tzk_7641_object9645
>> benchmark_data_dropbox01.tzk_7641_object10389
>> benchmark_data_dropbox01.tzk_7641_object10090
>> benchmark_data_dropbox01.tzk_7641_object11204
>> #
>>
>> #  rados -p rbd-cache ls | head -5
>> benchmark_data_dropbox01.tzk_7641_object10901
>> benchmark_data_dropbox01.tzk_7641_object9645
>> benchmark_data_dropbox01.tzk_7641_object10389
>> benchmark_data_dropbox01.tzk_7641_object5391
>> benchmark_data_dropbox01.tzk_7641_object10090
>> #
>>
>> So, it looks like the objects are still in place (in both pools?). But it is 
>> not possible to remove them:
>>
>> # rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901
>> error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No 
>> such file or directory
>> #
>>
>> # ceph health
>> HEALTH_OK
>> #
>>
>>
>> Can somebody explain the behavior? And is it possible to cleanup the 
>> benchmark data without recreating the pools?
>>
>>
>> ceph version 0.94.5
>>
>> # ceph osd dump | grep rbd
>> pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash 
>> rjenkins pg_num 100 pgp_num 100 last_change 755 flags 
>> hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 
>> 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, 
>> seed: 0} 3600s x1 stripe_width 0
>> pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins 
>> pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 
>> read_tier 36 write_tier 36 stripe_width 4128
>> #
>>
>> # ceph osd pool get rbd-cache hit_set_type
>> hit_set_type: bloom
>> # ceph osd pool get rbd-cache hit_set_period
>> hit_set_period: 3600
>> # ceph osd pool get rbd-cache hit_set_count
>> hit_set_count: 1
>> # ceph osd pool get rbd-cache target_max_objects
>> target_max_objects: 0
>> # ceph osd pool get rbd-cache target_max_bytes
>> target_max_bytes: 107374182400
>> # ceph osd pool get rbd-cache cache_target_dirty_ratio
>> cache_target_dirty_ratio: 0.1
>> # ceph osd pool get rbd-cache cache_target_full_ratio
>> cache_target_full_ratio: 0.2
>> #
>>
>> Crush map:
>> root cache_tier {
>> id -7   # do not change unnecessarily
>> # weight 0.450
>> alg straw
>> hash 0  # rjenkins1
>> item osd.0 

[ceph-users] some postmortem

2015-11-03 Thread Dzianis Kahanovich
OK, now my ceph cluster is died & re-created. Main problem was too many pgs and 
disabled swap, then one of node have problems with xfs (even stuck on mount) and 
all starts to die, last on trying to edit pgs & delete more then needed. But I 
see some issues.


After ceph-osd crash (out of RAM there) some of PGs (backfilled?) broken (and 
even replicate it?). This PG wait (locked) forever and also produce "slow 
requests" (living forever too). Some example from logs:


2015-11-01 18:06:52.920712 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[0.123( v 
162977'16156 (155381'13156,162977'16156] local-les=189415 n=552 ec=1 les/c 
189415/189476 189377/189377/189377) [0,10,3] r=2 lpr=189377 pi=163821-189376/285 
luod=0'0 crt=0'0 lcod 0'0 active] lock
2015-11-01 18:06:52.922113 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[2.121( v 
162977'1918456 (160044'1915456,162977'1918456] local-les=189415 n=340 ec=1 les/c 
189415/189476 189377/189377/189377) [0,10,3] r=2 lpr=189377 pi=163821-189376/277 
luod=0'0 crt=0'0 lcod 0'0 active] lock
2015-11-01 18:06:52.924800 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[1.124( v 
162378'21481 (160046'18440,162378'21481] local-les=189468 n=69 ec=1 les/c 
189468/189481 189447/189463/189302) [10,12,3] r=2 lpr=189463 pi=178724-189462/54 
luod=0'0 crt=162378'21479 lcod 0'0 active] lock
2015-11-01 18:06:52.925660 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[0.125( v 
162981'15001 (154638'12001,162981'15001] local-les=189477 n=551 ec=1 les/c 
189477/189484 189447/189464/189302) [10,12,3] r=2 lpr=189464 pi=178724-189463/66 
luod=0'0 crt=0'0 lcod 0'0 active] lock


So, IMHO need to improve crash recovery (especially on backfilling) and pg 
verification after restart, at least to avoid "active" for broken pg.


PS 0.94.5
PPS 4.3.0 not stuck on mount, but xfs_repair still required.
PPPS Use swap and avoid forced kill.

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench leaves objects in tiered pool

2015-11-03 Thread Дмитрий Глушенок
Hi,

Thanks Gregory and Robert, now it is a bit clearer.

After cache-flush-evict-all almost all objects were deleted, but 101 remained 
in cache pool. Also 1 pg changed its state to inconsistent with HEALTH_ERR. 
"ceph pg repair" changed objects count to 100, but at least ceph become healthy.

Now it looks like:
POOLS:
NAME  ID USED  %USED MAX AVAIL OBJECTS 
rbd-cache 36 23185 0  157G 100 
rbd   37 0 0  279G   0 
# rados -p rbd-cache ls -all
# rados -p rbd ls -all
# 

Is there any way to find what the objects are?

"ceph pg ls-by-pool rbd-cache" gives me pgs of the objects. Looking into these 
pgs gives me nothing I can understand :)

# ceph pg ls-by-pool rbd-cache | head -4
pg_stat objects mip degrmispunf bytes   log disklog state   
state_stamp v   reportedup   up_primary   acting  
acting_primary  last_scrub  scrub_stamp last_deep_scrub deep_scrub_stamp
36.01   0   0   0   0   83  926 926 
active+clean2015-11-03 22:06:39.193371  798'926   798:640 [4,0,3] 4 
  [4,0,3] 4   798'926 2015-11-03 22:06:39.193321  798'926 
2015-11-03 22:06:39.193321
36.11   0   0   0   0   193 854 854 
active+clean2015-11-03 18:28:51.190819  798'854   798:515 [1,4,3] 1 
  [1,4,3] 1   796'628 2015-11-03 18:28:51.190749  0'0 
2015-11-02 18:28:42.546224
36.21   0   0   0   0   198 869 869 
active+clean2015-11-03 18:28:44.556048  798'869   798:554 [2,0,1] 2 
  [2,0,1] 2   796'650 2015-11-03 18:28:44.555980  0'0 
2015-11-02 18:28:42.546226
#

# find /var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/
/var/lib/ceph/osd/ceph-0/current/36.0_head/__head___24
/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 
11:12:37.962360\u2015-11-03 21:28:58.149662__head__.ceph-internal_24
# find /var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/
/var/lib/ceph/osd/ceph-0/current/36.2_head/__head_0002__24
/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 
19:50:00.788736\u2015-11-03 21:29:02.460568__head_0002_.ceph-internal_24
#

# ls -l 
/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\\uset\\u36.0\\uarchive\\u2015-11-03\
 11\:12\:37.962360\\u2015-11-03\ 
21\:28\:58.149662__head__.ceph-internal_24 
-rw-r--r--. 1 root root 83 Nov  3 21:28 
/var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03 
11:12:37.962360\u2015-11-03 21:28:58.149662__head__.ceph-internal_24
# 
# ls -l 
/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\\uset\\u36.2\\uarchive\\u2015-11-02\
 19\:50\:00.788736\\u2015-11-03\ 
21\:29\:02.460568__head_0002_.ceph-internal_24 
-rw-r--r--. 1 root root 198 Nov  3 21:29 
/var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02 
19:50:00.788736\u2015-11-03 21:29:02.460568__head_0002_.ceph-internal_24
#

--
Dmitry Glushenok
Jet Infosystems


> 3 нояб. 2015 г., в 20:11, Robert LeBlanc  написал(а):
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Try:
> 
> rados -p {cachepool} cache-flush-evict-all
> 
> and see if the objects clean up.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum  wrote:
>> When you have a caching pool in writeback mode, updates to objects
>> (including deletes) are handled by writeback rather than writethrough.
>> Since there's no other activity against these pools, there is nothing
>> prompting the cache pool to flush updates out to the backing pool, so
>> the backing pool hasn't deleted its objects because nothing's told it
>> to. You'll find that the cache pool has deleted the data for its
>> objects, but it's keeping around a small "whiteout" and the object
>> info metadata.
>> The "rados ls" you're using has never played nicely with cache tiering
>> and probably never will. :( Listings are expensive operations and
>> modifying them to do more than the simple info scan would be fairly
>> expensive in terms of computation and IO.
>> 
>> I think there are some caching commands you can send to flush updates
>> which would cause the objects to be entirely deleted, but I don't have
>> them off-hand. You can probably search the mailing list archives or
>> the docs for tiering commands. :)
>> -Greg
>> 
>> On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок  wrote:
>>> Hi,
>>> 
>>> While benchmarking tiered pool using rados bench it was noticed that 
>>> objects are not being removed after test.
>>> 
>>> Test was performed using "rados -p rbd bench 3600 write". The pool is not 
>>> used by 

Re: [ceph-users] rados bench leaves objects in tiered pool

2015-11-03 Thread Gregory Farnum
Ceph maintains some metadata in objects. In this case, hitsets, which keep
track of object accesses for evaluating how hot an object is when flushing
and evicting from the cache.

On Tuesday, November 3, 2015, Дмитрий Глушенок  wrote:

> Hi,
>
> Thanks Gregory and Robert, now it is a bit clearer.
>
> After cache-flush-evict-all almost all objects were deleted, but 101
> remained in cache pool. Also 1 pg changed its state to inconsistent with
> HEALTH_ERR.
> "ceph pg repair" changed objects count to 100, but at least ceph become
> healthy.
>
> Now it looks like:
> POOLS:
> NAME  ID USED  %USED MAX AVAIL
>  OBJECTS
> rbd-cache 36 23185 0  157G
>  100
> rbd   37 0 0  279G
>0
> # rados -p rbd-cache ls -all
> # rados -p rbd ls -all
> #
>
> Is there any way to find what the objects are?
>
> "ceph pg ls-by-pool rbd-cache" gives me pgs of the objects. Looking into
> these pgs gives me nothing I can understand :)
>
> # ceph pg ls-by-pool rbd-cache | head -4
> pg_stat objects mip degrmispunf bytes   log disklog
> state   state_stamp v   reportedup   up_primary
>  acting  acting_primary  last_scrub  scrub_stamp last_deep_scrub
> deep_scrub_stamp
> 36.01   0   0   0   0   83  926 926
>  active+clean2015-11-03 22:06:39.193371  798'926   798:640
> [4,0,3] 4   [4,0,3] 4   798'926 2015-11-03 22:06:39.193321
> 798'926 2015-11-03 22:06:39.193321
> 36.11   0   0   0   0   193 854 854
>  active+clean2015-11-03 18:28:51.190819  798'854   798:515
> [1,4,3] 1   [1,4,3] 1   796'628 2015-11-03 18:28:51.190749
> 0'0 2015-11-02 18:28:42.546224
> 36.21   0   0   0   0   198 869 869
>  active+clean2015-11-03 18:28:44.556048  798'869   798:554
> [2,0,1] 2   [2,0,1] 2   796'650 2015-11-03 18:28:44.555980
> 0'0 2015-11-02 18:28:42.546226
> #
>
> # find /var/lib/ceph/osd/ceph-0/current/36.0_head/
> /var/lib/ceph/osd/ceph-0/current/36.0_head/
> /var/lib/ceph/osd/ceph-0/current/36.0_head/__head___24
> /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03
> 11:12:37.962360\u2015-11-03 21:28:58.149662__head__.ceph-internal_24
> # find /var/lib/ceph/osd/ceph-0/current/36.2_head/
> /var/lib/ceph/osd/ceph-0/current/36.2_head/
> /var/lib/ceph/osd/ceph-0/current/36.2_head/__head_0002__24
> /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02
> 19:50:00.788736\u2015-11-03 21:29:02.460568__head_0002_.ceph-internal_24
> #
>
> # ls -l
> /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\\uset\\u36.0\\uarchive\\u2015-11-03\
> 11\:12\:37.962360\\u2015-11-03\
> 21\:28\:58.149662__head__.ceph-internal_24
> -rw-r--r--. 1 root root 83 Nov  3 21:28
> /var/lib/ceph/osd/ceph-0/current/36.0_head/hit\uset\u36.0\uarchive\u2015-11-03
> 11:12:37.962360\u2015-11-03 21:28:58.149662__head__.ceph-internal_24
> #
> # ls -l
> /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\\uset\\u36.2\\uarchive\\u2015-11-02\
> 19\:50\:00.788736\\u2015-11-03\
> 21\:29\:02.460568__head_0002_.ceph-internal_24
> -rw-r--r--. 1 root root 198 Nov  3 21:29
> /var/lib/ceph/osd/ceph-0/current/36.2_head/hit\uset\u36.2\uarchive\u2015-11-02
> 19:50:00.788736\u2015-11-03 21:29:02.460568__head_0002_.ceph-internal_24
> #
>
> --
> Dmitry Glushenok
> Jet Infosystems
>
>
> > 3 нояб. 2015 г., в 20:11, Robert LeBlanc  > написал(а):
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> >
> > Try:
> >
> > rados -p {cachepool} cache-flush-evict-all
> >
> > and see if the objects clean up.
> > - 
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >
> >
> > On Tue, Nov 3, 2015 at 8:02 AM, Gregory Farnum  wrote:
> >> When you have a caching pool in writeback mode, updates to objects
> >> (including deletes) are handled by writeback rather than writethrough.
> >> Since there's no other activity against these pools, there is nothing
> >> prompting the cache pool to flush updates out to the backing pool, so
> >> the backing pool hasn't deleted its objects because nothing's told it
> >> to. You'll find that the cache pool has deleted the data for its
> >> objects, but it's keeping around a small "whiteout" and the object
> >> info metadata.
> >> The "rados ls" you're using has never played nicely with cache tiering
> >> and probably never will. :( Listings are expensive operations and
> >> modifying them to do more than the simple info scan would be fairly
> >> expensive in terms of computation and IO.
> >>
> >> I think there are some caching commands you can send to flush updates
> >> which would cause the objects to be entirely deleted, but I don't have
> >> them off-hand. 

Re: [ceph-users] two or three replicas?

2015-11-03 Thread Udo Lembke
Hi,
for production (with enough OSDs) is three replicas the right choice.
The chance for data loss if two ODSs fails at one time is to high.

And if this happens most of your data ist lost, because the data is
spead over many OSDs...

And yes - two replicas is faster for writes.


Udo


On 02.11.2015 11:10, Wah Peng wrote:
> Hello,
>
> for production application (for example, openstack's block storage),
> is it better to setup data to be stored with two replicas, or three
> replicas? is two replicas with better performance and lower cost?
>
> Thanks.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can snapshot of image still be used while flattening the image?

2015-11-03 Thread Jackie

Hi experts,

I have rbd images and snapshots as following:
image1 - > snapshot1(snapshot for image1)
-> image2(cloned from snapshot1) -> snapshot2 (snapshot for image2),

During I flatten the image2, can I still use snapshot2 to clone new image?

Regards,
Jackie
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Using LVM on top of a RBD.

2015-11-03 Thread Daniel Hoffman
Hi All.

I have a legacy server farm made up of 7 nodes running KVM and using
LVM(LVs) for the disks of the virtual machines. The nodes at this time are
CentOS 6.

We would love to remove this small farm from our network and use CephRBD
over using a traditional iSCSI block device as we currently do.

Has anyone run into issues with Ceph.RBD and LVM ontop? I would almost
thing this is semi redundant as most Linux distro's run LVM's by default
now so any RBD being used would probably have LVM on it.

The system does daily snapshot, copy out (dd) and snapshot remove on every
LV in the cluster. Any issues/performance issues anyone would think about
here?

CentOS 6 shipping driver, should I look for a manual kernel
upgrade/backport?

Performance issues in general with mode1 drivers?

Any feedback/thoughts would be appreciated.

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Choosing hp sata or sas SSDs for journals

2015-11-03 Thread Christian Balzer

Hello,

On Tue, 3 Nov 2015 12:01:16 +0100 Karsten Heymann wrote:

> Hi,
> 
> has anyone experiences with hp-branded ssds for journaling? Given that
> everything else is fixed (raid controller, cpu, etc...) and a fixed

A raid controller that can hopefully be run well in JBOD mode or something
mimicking it closely...

> budget, would it be better to go with more of the cheaper 6G SATA Write
> intensive drives or should I aim for (then fewer) 12G SAS models? Here
> are the specs:
> 
> HP 6G SATA Write Intensive 200 GB (804639-B21):
> - Sequential reads / writes (MB/s): 540 / 300
> - Random reads /writes (IOPS): 64,500 / 42,000
> - DWPD: 10
> 
> HP 12G SAS Mainstream Endurance 200 GB (779164-B21):
> - Sequential reads / writes (MB/s): 1,000 / 510
> - Random reads /writes (IOPS): 70,000 / 51,000
> - DWPD: 10
> 
> HP 12G SAS Write Intensive 200 GB (802578-B21):
> - Sequential reads / writes (MB/s): 1,000 / 660
> - Random reads /writes (IOPS): 106,000 / 83,000
> - DWPD: 25
> 
> (Source: http://www8.hp.com/h20195/v2/GetPDF.aspx%2F4AA4-7186ENW.pdf)
> 
> I know that asking does not free me from benchmarking, but maybe someone
> has a rough estimate?
>
Unless you can find out who the original manufacturer is and what models
they are you will indeed have to benchmark things, as they may be
completely unsuitable for Ceph journals (see the countless threads here,
especially with regards to some Samsung products).

Firstly the sequential reads and random IOPS are pointless, the speed at
which the SSD can do sequential direct, sync I/O is the only factor that
counts when it comes to Ceph journals. 

Since you have a "fixed" budget, how many HDDs do you plan per node and/or
how many SSDs can you actually fit per node?

A DWPD of 10 is with near certainty going to be sufficient, unless you
plan to put way too many journals per SSD.
  
Looking at the SSDs above, the last one is likely to be far more expensive
than the rest and barely needs the 12Gb/s interface (for writes). So
probably the worst choice. 

SSD #1 will serve 3 HDDs nicely, so that would work out well for something
with 8 bays, 6 HDDs and 2 SDDs and similar configurations. It will also be
the cheapest one and provide smaller failure domains.

SSD #2 can handle 5-6 HDDs, so if your cluster is big enough it might be a
good choice for denser nodes.

Note that when looking at something similar I did choose 4 100GB DC S3700
over 2 200GB DC S3700 as the prices were nearly identical, the smaller
SSDs gave me 800MB/s total instead of 730MB/s and with 8 HDDs per node I
only would loose 2 OSDs in case of SSD failure.

Christian

> Best regards
> Karsten


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Amazon S3 API

2015-11-03 Thread Богдан Тимофеев

I have 4 ceph nodes running on virtual machines in our corporate network. I've 
installed a ceph object gateway on admin node. Can I somehow use it in Amazon 
S3 style from my windows machine in the same network, for example with using 
Amazon S3 Java API?

-- 
Богдан Тимофеев___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench leaves objects in tiered pool

2015-11-03 Thread Gregory Farnum
When you have a caching pool in writeback mode, updates to objects
(including deletes) are handled by writeback rather than writethrough.
Since there's no other activity against these pools, there is nothing
prompting the cache pool to flush updates out to the backing pool, so
the backing pool hasn't deleted its objects because nothing's told it
to. You'll find that the cache pool has deleted the data for its
objects, but it's keeping around a small "whiteout" and the object
info metadata.
The "rados ls" you're using has never played nicely with cache tiering
and probably never will. :( Listings are expensive operations and
modifying them to do more than the simple info scan would be fairly
expensive in terms of computation and IO.

I think there are some caching commands you can send to flush updates
which would cause the objects to be entirely deleted, but I don't have
them off-hand. You can probably search the mailing list archives or
the docs for tiering commands. :)
-Greg

On Tue, Nov 3, 2015 at 12:40 AM, Дмитрий Глушенок  wrote:
> Hi,
>
> While benchmarking tiered pool using rados bench it was noticed that objects 
> are not being removed after test.
>
> Test was performed using "rados -p rbd bench 3600 write". The pool is not 
> used by anything else.
>
> Just before end of test:
> POOLS:
> NAME  ID USED   %USED MAX AVAIL 
> OBJECTS
> rbd-cache 36 33110M  3.41  114G
> 8366
> rbd   37 43472M  4.47  237G   
> 10858
>
> Some time later (few hundreds of writes are flushed, rados automatic cleanup 
> finished):
> POOLS:
> NAME  ID USED   %USED MAX AVAIL 
> OBJECTS
> rbd-cache 36  22998 0  157G   
> 16342
> rbd   37 46050M  4.74  234G   
> 11503
>
> # rados -p rbd-cache ls | wc -l
> 16242
> # rados -p rbd ls | wc -l
> 11503
> #
>
> # rados -p rbd cleanup
> error during cleanup: -2
> error 2: (2) No such file or directory
> #
>
> # rados -p rbd cleanup --run-name "" --prefix prefix ""
>  Warning: using slow linear search
>  Removed 0 objects
> #
>
> # rados -p rbd ls | head -5
> benchmark_data_dropbox01.tzk_7641_object10901
> benchmark_data_dropbox01.tzk_7641_object9645
> benchmark_data_dropbox01.tzk_7641_object10389
> benchmark_data_dropbox01.tzk_7641_object10090
> benchmark_data_dropbox01.tzk_7641_object11204
> #
>
> #  rados -p rbd-cache ls | head -5
> benchmark_data_dropbox01.tzk_7641_object10901
> benchmark_data_dropbox01.tzk_7641_object9645
> benchmark_data_dropbox01.tzk_7641_object10389
> benchmark_data_dropbox01.tzk_7641_object5391
> benchmark_data_dropbox01.tzk_7641_object10090
> #
>
> So, it looks like the objects are still in place (in both pools?). But it is 
> not possible to remove them:
>
> # rados -p rbd rm benchmark_data_dropbox01.tzk_7641_object10901
> error removing rbd>benchmark_data_dropbox01.tzk_7641_object10901: (2) No such 
> file or directory
> #
>
> # ceph health
> HEALTH_OK
> #
>
>
> Can somebody explain the behavior? And is it possible to cleanup the 
> benchmark data without recreating the pools?
>
>
> ceph version 0.94.5
>
> # ceph osd dump | grep rbd
> pool 36 'rbd-cache' replicated size 3 min_size 1 crush_ruleset 1 object_hash 
> rjenkins pg_num 100 pgp_num 100 last_change 755 flags 
> hashpspool,incomplete_clones tier_of 37 cache_mode writeback target_bytes 
> 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, 
> seed: 0} 3600s x1 stripe_width 0
> pool 37 'rbd' erasure size 5 min_size 3 crush_ruleset 2 object_hash rjenkins 
> pg_num 100 pgp_num 100 last_change 745 lfor 745 flags hashpspool tiers 36 
> read_tier 36 write_tier 36 stripe_width 4128
> #
>
> # ceph osd pool get rbd-cache hit_set_type
> hit_set_type: bloom
> # ceph osd pool get rbd-cache hit_set_period
> hit_set_period: 3600
> # ceph osd pool get rbd-cache hit_set_count
> hit_set_count: 1
> # ceph osd pool get rbd-cache target_max_objects
> target_max_objects: 0
> # ceph osd pool get rbd-cache target_max_bytes
> target_max_bytes: 107374182400
> # ceph osd pool get rbd-cache cache_target_dirty_ratio
> cache_target_dirty_ratio: 0.1
> # ceph osd pool get rbd-cache cache_target_full_ratio
> cache_target_full_ratio: 0.2
> #
>
> Crush map:
> root cache_tier {
> id -7   # do not change unnecessarily
> # weight 0.450
> alg straw
> hash 0  # rjenkins1
> item osd.0 weight 0.090
> item osd.1 weight 0.090
> item osd.2 weight 0.090
> item osd.3 weight 0.090
> item osd.4 weight 0.090
> }
> root store_tier {
> id -8   # do not change unnecessarily
> # weight 0.450
> alg straw
> hash 0  # rjenkins1
> item osd.5 weight 0.090
> item osd.6 weight 0.090
> item osd.7 weight 0.090
> item 

[ceph-users] iSCSI over RDB is a good idea ?

2015-11-03 Thread Gaetan SLONGO
Dear Ceph users, 

We are currently working on design of virtualization infrastructure using oVirt 
and we would like to use Ceph.

The problem is, at this time there is no native integration of Ceph in oVirt.
One possibility is to export RBD devices over iSCSI (maybe you have better 
one?).

I've saw this post http://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/ 
but this seems to be deprecated on rhel7... Someone already did this on 
rhel/centos 7 with targetd (or other) ? There is performance issues ?

Thank you for advance !

Best regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com