Re: [ceph-users] .New Ceph cluster - cannot add additional monitor

2015-06-12 Thread Alex Muntada
We've recently found similar problems creating a new cluster over an older
one, even after using "ceph-deploy purge", because some of the data
remained on /var/lib/ceph/*/* (ubuntu trusty) and the nodes were trying to
use old keyrings.

Hope it helps,
Alex
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph compiled on ARM hangs on using any commands.

2015-06-12 Thread Yann Dupont



Le 12/06/2015 09:55, Karanvir Singh a écrit :

Hi,

I am trying to compile/create packages latest ceph version ( 519c3c9) 
from hammer branch on an arm platform.
For google-perftools i am compiling those from 
https://code.google.com/p/gperftools/ .


The packages are generated fine
I have used the same branch/commit and commands to create package for 
x86 and those work fine .
But i cant seem to use ceph commands on arm platform, for eg. if i 
give ceph -s or ceph -w . it just hangs.


Hi,
please see http://tracker.ceph.com/issues/11432

I have a (quick & dirty) workaround - but there is probably a better fix 
- as suggested on the IRC log attached on #11432


Anyway last month I totally lacked time to properly test and integrate this.

I'll try to do this next week - In the meantime could you test the 
little patch in #11432 ?


Cheers,



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd cache + libvirt

2015-06-12 Thread Josh Durgin

On 06/08/2015 09:23 PM, Alexandre DERUMIER wrote:

In the short-term, you can remove the "rbd cache" setting from your ceph.conf


That's not true, you need to remove the ceph.conf file.
Removing rbd_cache is not enough or default rbd_cache=false will apply.


I have done tests, here the result matrix


host ceph.conf : no rbd_cache:  guest cache=writeback  : result : nocache   
(wrong)
host ceph.conf : rbd_cache=false :  guest cache=writeback  : result : nocache   
(wrong)
host ceph.conf : rbd_cache=true  :  guest cache=writeback  : result : cache
host ceph.conf : no rbd_cache:  guest cache=none   : result : nocache
host ceph.conf : rbd_cache=false :  guest cache=none   : result : no cache
host ceph.conf : rbd_cache=true  :  guest cache=none   : result : cache 
(wrong)


QEMU patch 3/4 fixes this:

http://comments.gmane.org/gmane.comp.emulators.qemu.block/2500

Josh


- Mail original -
De: "Jason Dillaman" 
À: "Andrey Korolyov" 
Cc: "Josh Durgin" , "aderumier" , 
"ceph-users" 
Envoyé: Lundi 8 Juin 2015 22:29:10
Objet: Re: [ceph-users] rbd cache + libvirt


On Mon, Jun 8, 2015 at 10:43 PM, Josh Durgin  wrote:

On 06/08/2015 11:19 AM, Alexandre DERUMIER wrote:


Hi,


looking at the latest version of QEMU,



It's seem that it's was already this behaviour since the add of rbd_cache
parsing in rbd.c by josh in 2012


http://git.qemu.org/?p=qemu.git;a=blobdiff;f=block/rbd.c;h=eebc3344620058322bb53ba8376af4a82388d277;hp=1280d66d3ca73e552642d7a60743a0e2ce05f664;hb=b11f38fcdf837c6ba1d4287b1c685eb3ae5351a8;hpb=166acf546f476d3594a1c1746dc265f1984c5c85


I'll do tests on my side tomorrow to be sure.



It seems like we should switch the order so ceph.conf is overridden by
qemu's cache settings. I don't remember a good reason to have it the
other way around.

Josh



Erm, doesn`t this code *already* represent the right priorities?
Cache=none setting should set a BDRV_O_NOCACHE which is effectively
disabling cache in a mentioned snippet.



Yes, the override is applied (correctly) based upon your QEMU cache settings. However, it then reads your configuration 
file and re-applies the "rbd_cache" setting based upon what is in the file (if it exists). So in the case 
where a configuration file has "rbd cache = true", the override of "rbd cache = false" derived from 
your QEMU cache setting would get wiped out. The long term solution would be to, as Josh noted, switch the order (so 
long as there wasn't a use-case for applying values in this order). In the short-term, you can remove the "rbd 
cache" setting from your ceph.conf so that QEMU controls it (i.e. it cannot get overridden when reading the 
configuration file) or use a different ceph.conf for a drive which requires different cache settings from the default 
configuration's settings.

Jason



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Gregory Farnum
Okay, Sam thinks he knows what's going on; here's a ticket:
http://tracker.ceph.com/issues/12000

On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum  wrote:
> On Fri, Jun 12, 2015 at 1:07 AM, Paweł Sadowski  wrote:
>> Hi All,
>>
>> I'm testing erasure coded pools. Is there any protection from bit-rot
>> errors on object read? If I modify one bit in object part (directly on
>> OSD) I'm getting *broken*object:
>
> Sorry, are you saying that you're getting a broken object if you flip
> a bit in an EC pool? That should detect the chunk as invalid and
> reconstruct on read...
> -Greg
>
>>
>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>> bb2d82bbb95be6b9a039d135cc7a5d0d  -
>>
>> # modify one bit directly on OSD
>>
>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>> 02f04f590010b4b0e6af4741c4097b4f  -
>>
>> # restore bit to original value
>>
>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>> bb2d82bbb95be6b9a039d135cc7a5d0d  -
>>
>> If I run deep-scrub on modified bit I'm getting inconsistent PG which is
>> correct in this case. After restoring bit and running deep-scrub again
>> all PGs are clean.
>>
>>
>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
>>
>>
>> --
>> PS
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 1:07 AM, Paweł Sadowski  wrote:
> Hi All,
>
> I'm testing erasure coded pools. Is there any protection from bit-rot
> errors on object read? If I modify one bit in object part (directly on
> OSD) I'm getting *broken*object:

Sorry, are you saying that you're getting a broken object if you flip
a bit in an EC pool? That should detect the chunk as invalid and
reconstruct on read...
-Greg

>
> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> bb2d82bbb95be6b9a039d135cc7a5d0d  -
>
> # modify one bit directly on OSD
>
> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> 02f04f590010b4b0e6af4741c4097b4f  -
>
> # restore bit to original value
>
> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> bb2d82bbb95be6b9a039d135cc7a5d0d  -
>
> If I run deep-scrub on modified bit I'm getting inconsistent PG which is
> correct in this case. After restoring bit and running deep-scrub again
> all PGs are clean.
>
>
> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
>
>
> --
> PS
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 11:59 AM, Lincoln Bryant  wrote:
> Thanks John, Greg.
>
> If I understand this correctly, then, doing this:
> rados -p hotpool cache-flush-evict-all
> should start appropriately deleting objects from the cache pool. I just 
> started one up, and that seems to be working.
>
> Otherwise, the cache's confgured timeouts/limits should get those deletions 
> propagated through to the cold pool naturally.
>
> Is that right?

Yep!
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Lincoln Bryant
Thanks John, Greg.

If I understand this correctly, then, doing this:
rados -p hotpool cache-flush-evict-all
should start appropriately deleting objects from the cache pool. I just started 
one up, and that seems to be working.

Otherwise, the cache's confgured timeouts/limits should get those deletions 
propagated through to the cold pool naturally.

Is that right?

Thanks again,
Lincoln

On Jun 12, 2015, at 1:12 PM, Gregory Farnum wrote:

> On Fri, Jun 12, 2015 at 11:07 AM, John Spray  wrote:
>> 
>> Just had a go at reproducing this, and yeah, the behaviour is weird.  Our
>> automated testing for cephfs doesn't include any cache tiering, so this is a
>> useful exercise!
>> 
>> With a writeback overlay cache tier pool on an EC pool, I write a bunch of
>> files, then do a rados cache-flush-evict-all, then delete the files in
>> cephfs.  The result is that all the objects are still present in a "rados
>> ls" on either base or cache pool, but if I try to rm any of them I get an
>> ENOENT.
>> 
>> Then, finally, when I do another cache-flush-evict-all, now the objects are
>> all finally disappearing from the df stats (base and cache pool stats
>> ticking down together).
>> 
>> So intuitively, I guess the cache tier is caching the delete-ness of the
>> objects, and only later flushing that (i.e. deleting from the base pool).
>> The object is still "in the cache" on that basis, and presumably not getting
>> flushed (i.e. deleting in base pool) until usual timeouts/space limits
>> apply.
> 
> Yep, that's it exactly. This is expected behavior.
> 
>> Maybe we need something to kick delete flushes to happen much
>> earlier (like, ASAP when the cluster isn't too busy doing other
>> promotions/evictions).
> 
> Sounds like a good RADOS feature request/blueprint that somebody in
> the community might be able to handle.
> -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Gregory Farnum
On Fri, Jun 12, 2015 at 11:07 AM, John Spray  wrote:
>
> Just had a go at reproducing this, and yeah, the behaviour is weird.  Our
> automated testing for cephfs doesn't include any cache tiering, so this is a
> useful exercise!
>
> With a writeback overlay cache tier pool on an EC pool, I write a bunch of
> files, then do a rados cache-flush-evict-all, then delete the files in
> cephfs.  The result is that all the objects are still present in a "rados
> ls" on either base or cache pool, but if I try to rm any of them I get an
> ENOENT.
>
> Then, finally, when I do another cache-flush-evict-all, now the objects are
> all finally disappearing from the df stats (base and cache pool stats
> ticking down together).
>
> So intuitively, I guess the cache tier is caching the delete-ness of the
> objects, and only later flushing that (i.e. deleting from the base pool).
> The object is still "in the cache" on that basis, and presumably not getting
> flushed (i.e. deleting in base pool) until usual timeouts/space limits
> apply.

Yep, that's it exactly. This is expected behavior.

> Maybe we need something to kick delete flushes to happen much
> earlier (like, ASAP when the cluster isn't too busy doing other
> promotions/evictions).

Sounds like a good RADOS feature request/blueprint that somebody in
the community might be able to handle.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread John Spray


Just had a go at reproducing this, and yeah, the behaviour is weird.  
Our automated testing for cephfs doesn't include any cache tiering, so 
this is a useful exercise!


With a writeback overlay cache tier pool on an EC pool, I write a bunch 
of files, then do a rados cache-flush-evict-all, then delete the files 
in cephfs.  The result is that all the objects are still present in a 
"rados ls" on either base or cache pool, but if I try to rm any of them 
I get an ENOENT.


Then, finally, when I do another cache-flush-evict-all, now the objects 
are all finally disappearing from the df stats (base and cache pool 
stats ticking down together).


So intuitively, I guess the cache tier is caching the delete-ness of the 
objects, and only later flushing that (i.e. deleting from the base 
pool).  The object is still "in the cache" on that basis, and presumably 
not getting flushed (i.e. deleting in base pool) until usual 
timeouts/space limits apply.  Maybe we need something to kick delete 
flushes to happen much earlier (like, ASAP when the cluster isn't too 
busy doing other promotions/evictions).


Sam probably has some more informed thoughts than mine on the expected 
behaviour here.


John


On 12/06/2015 16:54, Lincoln Bryant wrote:

Greetings experts,

I've got a test set up with CephFS configured to use an erasure coded 
pool + cache tier on 0.94.2.


I have been writing lots of data to fill the cache to observe the 
behavior and performance when it starts evicting objects to the 
erasure-coded pool.


The thing I have noticed is that after deleting the files via 'rm' 
through my CephFS kernel client, the cache is emptied but the objects 
that were evicted to the EC pool stick around.


I've attached an image that demonstrates what I'm seeing.

Is this intended behavior, or have I misconfigured something?

Thanks,
Lincoln Bryant



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.2 Hammer released

2015-06-12 Thread Scottix
I noticed amd64 Ubuntu 12.04 hasn't updated its packages to 0.94.2
can you check this?

http://ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages

Package: ceph
Version: 0.94.1-1precise
Architecture: amd64

On Thu, Jun 11, 2015 at 10:35 AM Sage Weil  wrote:

> This Hammer point release fixes a few critical bugs in RGW that can
> prevent objects starting with underscore from behaving properly and that
> prevent garbage collection of deleted objects when using the Civetweb
> standalone mode.
>
> All v0.94.x Hammer users are strongly encouraged to upgrade, and to make
> note of the repair procedure below if RGW is in use.
>
> Upgrading from previous Hammer release
> --
>
> Bug #11442 introduced a change that made rgw objects that start with
> underscore incompatible with previous versions. The fix to that bug
> reverts to the previous behavior. In order to be able to access objects
> that start with an underscore and were created in prior Hammer releases,
> following the upgrade it is required to run (for each affected bucket)::
>
> $ radosgw-admin bucket check --check-head-obj-locator \
>  --bucket= [--fix]
>
> You can get a list of buckets with
>
> $ radosgw-admin bucket list
>
> Notable changes
> ---
>
> * build: compilation error: No high-precision counter available  (armhf,
>   powerpc..) (#11432, James Page)
> * ceph-dencoder links to libtcmalloc, and shouldn't (#10691, Boris Ranto)
> * ceph-disk: disk zap sgdisk invocation (#11143, Owen Synge)
> * ceph-disk: use a new disk as journal disk,ceph-disk prepare fail
>   (#10983, Loic Dachary)
> * ceph-objectstore-tool should be in the ceph server package (#11376, Ken
>   Dreyer)
> * librados: can get stuck in redirect loop if osdmap epoch ==
>   last_force_op_resend (#11026, Jianpeng Ma)
> * librbd: A retransmit of proxied flatten request can result in -EINVAL
>   (Jason Dillaman)
> * librbd: ImageWatcher should cancel in-flight ops on watch error (#11363,
>   Jason Dillaman)
> * librbd: Objectcacher setting max object counts too low (#7385, Jason
>   Dillaman)
> * librbd: Periodic failure of TestLibRBD.DiffIterateStress (#11369, Jason
>   Dillaman)
> * librbd: Queued AIO reference counters not properly updated (#11478,
>   Jason Dillaman)
> * librbd: deadlock in image refresh (#5488, Jason Dillaman)
> * librbd: notification race condition on snap_create (#11342, Jason
>   Dillaman)
> * mds: Hammer uclient checking (#11510, John Spray)
> * mds: remove caps from revoking list when caps are voluntarily released
>   (#11482, Yan, Zheng)
> * messenger: double clear of pipe in reaper (#11381, Haomai Wang)
> * mon: Total size of OSDs is a maginitude less than it is supposed to be.
>   (#11534, Zhe Zhang)
> * osd: don't check order in finish_proxy_read (#11211, Zhiqiang Wang)
> * osd: handle old semi-deleted pgs after upgrade (#11429, Samuel Just)
> * osd: object creation by write cannot use an offset on an erasure coded
>   pool (#11507, Jianpeng Ma)
> * rgw: Improve rgw HEAD request by avoiding read the body of the first
>   chunk (#11001, Guang Yang)
> * rgw: civetweb is hitting a limit (number of threads 1024) (#10243,
>   Yehuda Sadeh)
> * rgw: civetweb should use unique request id (#10295, Orit Wasserman)
> * rgw: critical fixes for hammer (#11447, #11442, Yehuda Sadeh)
> * rgw: fix swift COPY headers (#10662, #10663, #11087, #10645, Radoslaw
>   Zarzynski)
> * rgw: improve performance for large object  (multiple chunks) GET
>   (#11322, Guang Yang)
> * rgw: init-radosgw: run RGW as root (#11453, Ken Dreyer)
> * rgw: keystone token cache does not work correctly (#11125, Yehuda Sadeh)
> * rgw: make quota/gc thread configurable for starting (#11047, Guang Yang)
> * rgw: make swift responses of RGW return last-modified, content-length,
>   x-trans-id headers.(#10650, Radoslaw Zarzynski)
> * rgw: merge manifests correctly when there's prefix override (#11622,
>   Yehuda Sadeh)
> * rgw: quota not respected in POST object (#11323, Sergey Arkhipov)
> * rgw: restore buffer of multipart upload after EEXIST (#11604, Yehuda
>   Sadeh)
> * rgw: shouldn't need to disable rgw_socket_path if frontend is configured
>   (#11160, Yehuda Sadeh)
> * rgw: swift: Response header of GET request for container does not
>   contain X-Container-Object-Count, X-Container-Bytes-Used and x-trans-id
>   headers (#10666, Dmytro Iurchenko)
> * rgw: swift: Response header of POST request for object does not contain
>   content-length and x-trans-id headers (#10661, Radoslaw Zarzynski)
> * rgw: swift: response for GET/HEAD on container does not contain the
>   X-Timestamp header (#10938, Radoslaw Zarzynski)
> * rgw: swift: response for PUT on /container does not contain the
>   mandatory Content-Length header when FCGI is used (#11036, #10971,
>   Radoslaw Zarzynski)
> * rgw: swift: wrong handling of empty metadata on Swift container (#11088,
>   Radoslaw Zarzynski)
> * tests: TestFlatIndex.cc

[ceph-users] Erasure Coding + CephFS, objects not being deleted after rm

2015-06-12 Thread Lincoln Bryant
Greetings experts,

I've got a test set up with CephFS configured to use an erasure coded pool + 
cache tier on 0.94.2. 

I have been writing lots of data to fill the cache to observe the behavior and 
performance when it starts evicting objects to the erasure-coded pool.

The thing I have noticed is that after deleting the files via 'rm' through my 
CephFS kernel client, the cache is emptied but the objects that were evicted to 
the EC pool stick around.

I've attached an image that demonstrates what I'm seeing.

Is this intended behavior, or have I misconfigured something?

Thanks,
Lincoln Bryant

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD with OCFS2

2015-06-12 Thread Somnath Roy
Sorry, it was a typo , I meant to say 1GB only.
I would say break the problem like the following.

1. Run some fio workload say (1G) on RBD and run ceph command like ‘ceph df’ to 
see how much data it written. I am sure you will be seeing same data. Remember 
by default ceph rados object size is 4MB, so, it should write 1GB/4MB number of 
objects.

2. Also, you can use ‘rados’ utility to directly put/get say 1GB file to the 
cluster and check similar way.

As I said, if your journal in the same device and if you measure the space 
consumed by entire OSD mount point , it will be more because of WA induced by 
Ceph. But, individual file size you transferred should not differ.

<< Also please let us know the reason ( Extra 2-3 mins is taken for hg / git 
repository operation like clone , pull , checkout and update.)
Could you please explain a bit what you are trying to do here ?

Thanks & Regards
Somnath

From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Friday, June 12, 2015 12:34 AM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; Kamala Subramani; Siva Sokkumuthu
Subject: Re: RE: RE: [ceph-users] Ceph OSD with OCFS2

Hi,

  I measured the data only what i transfered from client. Example 500MB 
file transfered after complete if i measured the same file size will be 1GB not 
10GB.

   Our Configuration is :-
=
ceph -w
cluster f428f5d6-7323-4254-9f66-56a21b099c1a
health HEALTH_OK
monmap e1: 3 mons at 
{cephadmin=172.20.19.235:6789/0,cephnode1=172.20.7.168:6789/0,cephnode2=172.20.9.41:6789/0},
 election epoch 114, quorum 0,1,2 cephnode1,cephnode2,cephadmin
osdmap e9: 2 osds: 2 up, 2 in
pgmap v1022: 64 pgs, 1 pools, 7507 MB data, 1952 objects
26139 MB used, 277 GB / 302 GB avail
64 active+clean
===
ceph.conf
[global]
osd pool default size = 2
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 172.20.7.168,172.20.9.41,172.20.19.235
mon_initial_members = zoho-cephnode1, zoho-cephnode2, zoho-cephadmin
fsid = f428f5d6-7323-4254-9f66-56a21b099c1a


What is the replication policy you are using ?

   We are using default OSD with 2 replica not using CRUSH Map, PG num and 
Erasure etc.,

What interface you used to store the data ?

   We are using RBD to store data and its has been mounted with OCFS2 in 
client side.

How are you removing data ? Are you removing a rbd image ?

   We are not removing rbd image, only removing data which is already 
having and removing using rm command from client. We didn't set async way to 
transfer or remove data


Also please let us know the reason ( Extra 2-3 mins is taken for hg / git 
repository operation like clone , pull , checkout and update.)


Regards
Prabu GJ



 On Fri, 12 Jun 2015 00:21:24 +0530 Somnath Roy 
mailto:somnath@sandisk.com>> wrote 

Hi,



Ceph journal works in different way.  It’s a write ahead journal, all the data 
will be persisted first in journal and then will be written to actual place. 
Journal data is encoded. Journal is a fixed size partition/file and written 
sequentially. So, if you are placing journal in HDDs, it will be overwritten, 
for SSD case , it will be GC later. So, if you are measuring amount of data 
written to the device it will be double. But, if you are saying you have 
written a 500MB file to cluster and you are seeing the actual file size is 10G, 
it should not be the case. How are you seeing this size BTW ?



Could you please tell us more about your configuration ?

What is the replication policy you are using ?

What interface you used to store the data ?



Regarding your other query..



<< If i transfer 1GB data, what will be server size(OSD), Is this will write 
compressed format



No, actual data is not compressed. You don’t want to fill up OSD disk and there 
are some limits you can set . Check the following link



http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/



It will stop working if the disk is 95% full by default.



<< Is it possible to take backup from server compressed data and copy the same 
to other machine as Server_Backup  - then start new client using Server_Backup

For backup, check the following link if that works for you.



https://ceph.com/community/blog/tag/backup/



Also, you can use RGW federated config for back up.



<< Data removal is very slow



How are you removing data ? Are you removing a rbd image ?



If you are removing entire pool , that should be fast and do deletes data async 
way I guess.



Thanks & Regards

Somnath



From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Thursday, June 11, 2015 6:38 AM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com

Re: [ceph-users] Best setup for SSD

2015-06-12 Thread Christian Balzer
On Fri, 12 Jun 2015 10:18:18 -0500 Mark Nelson wrote:

> If you are careful about how you balance things, there's probably no 
> reason why SSDs and Spinners in the same server wouldn't work so long as 
> they are not in the same pool.  I imagine that recommendation is 
> probably to keep things simple and have folks avoid designing unbalanced 
> systems.
> 
Precisely.

A mixed system needs a VERY intimate knowledge of Ceph and you work load,
use case. 
SSD based OSDs will use a lot more CPU, saturate a lot more network
bandwidth that HDD based ones.
And putting 2.5" SSDs into 3.5" bays is a waste of (rack) space.

As an example, a 2U server with 12 3.5" bays (OSDs) and 2 2.5" bays (OS
and journals)in the back (hello Supermicro) makes a good, dense spinning
rust based storage node. This will saturate about 10Gb/s
In the same 2U you can have a twin node (2x 12 2.5" bays), with 8-10 SSDs
per node and the fastest CPUs you can afford, as well as the fasted
network (dual 10Gb/s or faster) that your budget allows.

Christian

> Mark
> 
> On 06/12/2015 10:06 AM, Quentin Hartman wrote:
> > I don't know the official reason, but I would imagine the disparity in
> > performance would lead to weird behaviors and very spiky overall
> > performance. I would think that running a mix of SSD and HDD OSDs in
> > the same pool would be frowned upon, not just the same server.
> >
> > On Fri, Jun 12, 2015 at 9:00 AM, Dominik Zalewski
> > mailto:dzalew...@optlink.co.uk>> wrote:
> >
> > Be warned that running SSD and HD based OSDs in the same server
> > is not
> > recommended. If you need the storage capacity, I'd stick to the
> > journals
> > on SSDs plan.
> >
> >
> > Can you please elaborate more why running SSD and HD based OSDs in
> > the same server is not
> > recommended ?
> >
> > Thanks
> >
> > Dominik
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-12 Thread Mark Nelson
If you are careful about how you balance things, there's probably no 
reason why SSDs and Spinners in the same server wouldn't work so long as 
they are not in the same pool.  I imagine that recommendation is 
probably to keep things simple and have folks avoid designing unbalanced 
systems.


Mark

On 06/12/2015 10:06 AM, Quentin Hartman wrote:

I don't know the official reason, but I would imagine the disparity in
performance would lead to weird behaviors and very spiky overall
performance. I would think that running a mix of SSD and HDD OSDs in the
same pool would be frowned upon, not just the same server.

On Fri, Jun 12, 2015 at 9:00 AM, Dominik Zalewski
mailto:dzalew...@optlink.co.uk>> wrote:

Be warned that running SSD and HD based OSDs in the same server
is not
recommended. If you need the storage capacity, I'd stick to the
journals
on SSDs plan.


Can you please elaborate more why running SSD and HD based OSDs in
the same server is not
recommended ?

Thanks

Dominik

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best setup for SSD

2015-06-12 Thread Quentin Hartman
I don't know the official reason, but I would imagine the disparity in
performance would lead to weird behaviors and very spiky overall
performance. I would think that running a mix of SSD and HDD OSDs in the
same pool would be frowned upon, not just the same server.

On Fri, Jun 12, 2015 at 9:00 AM, Dominik Zalewski 
wrote:

> Be warned that running SSD and HD based OSDs in the same server is not
>> recommended. If you need the storage capacity, I'd stick to the journals
>> on SSDs plan.
>
>
> Can you please elaborate more why running SSD and HD based OSDs in the
> same server is not
> recommended ?
>
> Thanks
>
> Dominik
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best setup for SSD

2015-06-12 Thread Dominik Zalewski
>
> Be warned that running SSD and HD based OSDs in the same server is not
> recommended. If you need the storage capacity, I'd stick to the journals
> on SSDs plan.


Can you please elaborate more why running SSD and HD based OSDs in the same
server is not
recommended ?

Thanks

Dominik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Antw: cephx error - renew key

2015-06-12 Thread Steffen Weißgerber
  


>>> tombo  schrieb am Dienstag, 9. Juni 2015 um 21:44:

> 
> Hello guys, 
>

Hi tombo,

that seem's to be related to http://tracker.ceph.com/issues/4282. We had the
same effects but limited by 1 hour. After that the authentication works again.

When increasing the log level when the problem apears you'll see that the
clients key rotation seems to be the problem. It tries to connect with old
key which is no longer valid.

> today we had one storage (19xosd) down for 4 hours
> and now we are observing different problems and when I tried to restart
> one osd, I got error related to cephx 
> 
> 2015-06-09 21:09:49.983522
> 7fded00c7700 0 auth: could not find secret_id=6238
> 2015-06-09
> 21:09:49.983585 7fded00c7700 0 cephx: verify_authorizer could not get
> service secret for service osd secret_id=6238
> 2015-06-09 21:09:49.983595
> 7fded00c7700 0 -- X.X.X.32:6808/728850 >> X.X.X.32:6852/1474277
> pipe(0x7fdf47291200 sd=90 :6808 s=0 p
> gs=0 cs=0 l=0
> c=0x7fdf33340940).accept: got bad authorizer
> 

What does the ceph client  X.X.X.32 use? A kernel based rbd or qemu.
In case of kernel rbd did you change the kernel on  X.X.X.32?

> configuration is 
> 
> auth
> cluster required = cephx
> auth service required = none
> auth client
> required = none
> 
> So as I understand, it is not possible to disable whole
> auth on fly...so it is possible to renew key for osd to see if it helps?
> If yes, how? Remove old with
> 
> ceph auth del osd.{osd-num} and generate
> new ceph auth add osd.{osd-num} osd 'allow *' mon 'allow rwx' -i
> /var/lib/ceph/osd/ceph-{osd-num}/keyring ? And I don't want to loose
> that osd data ( as usually, nobody wants :) )
> 
> Thanks for help.

Regards

Steffen


-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure coded pools and bit-rot protection

2015-06-12 Thread Paweł Sadowski
Hi All,

I'm testing erasure coded pools. Is there any protection from bit-rot
errors on object read? If I modify one bit in object part (directly on
OSD) I'm getting *broken*object:

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d  -

# modify one bit directly on OSD

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
02f04f590010b4b0e6af4741c4097b4f  -

# restore bit to original value

mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
bb2d82bbb95be6b9a039d135cc7a5d0d  -

If I run deep-scrub on modified bit I'm getting inconsistent PG which is
correct in this case. After restoring bit and running deep-scrub again
all PGs are clean.


[ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
   

-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph compiled on ARM hangs on using any commands.

2015-06-12 Thread Karanvir Singh
Hi,
 I am trying to compile/create packages latest ceph version ( 519c3c9) from 
hammer branch on an arm platform.
 For google-perftools i am compiling those from 
https://code.google.com/p/gperftools/ .

 The packages are generated fine
 I have used the same branch/commit and commands to create package for x86 and 
those work fine .
 But i cant seem to use ceph commands on arm platform, for eg. if i give ceph 
-s or ceph -w . it just hangs.

 with strace it just shows 
 ...
 ..
 gettimeofday({1433782898, 950051}, NULL) = 0
 gettimeofday({1433782898, 950106}, NULL) = 0
 ..
 ..
 and with gdb

 (gdb) bt
 #0 0xb6f0b316 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
 #1 0xb57e20ba in ?? () from /usr/lib/librados.so.2
 #2 0xb6fbd20e in call_init (l=, argc=3, argv=0xbed3b7a4, 
env=0xbed3b7b4) at dl-init.c:78
 #3 0xb6fbd2a0 in _dl_init (main_map=main_map@entry=0x389ad0, argc=3, 
argv=0xbed3b7a4, env=0xbed3b7b4) at dl-init.c:126
 #4 0xb6fc0076 in dl_open_worker (a=) at dl-open.c:577
 #5 0xb6fbd140 in _dl_catch_error (objname=objname@entry=0xbed3a874, 
errstring=errstring@entry=0xbed3a878, mallocedp=mallocedp@entry=0xbed3a873, 
operate=0xb6fbfe41 , args=args@entry=0xbed3a87c) at 
dl-error.c:187
 #6 0xb6fbfa8e in _dl_open (file=0xb69f978c "librados.so.2", mode=-2147483646, 
caller_dlopen=0xb6cde431, nsid=, argc=3, argv=0xbed3b7a4, 
env=0xbed3b7b4) at dl-open.c:661
 #7 0xb6e99af8 in dlopen_doit (a=0xbed3aad0) at dlopen.c:66
 #8 0xb6fbd140 in _dl_catch_error (objname=0x2ec15c, errstring=0x2ec160, 
mallocedp=0x2ec158, operate=0xb6e99aa5 , args=0xbed3aad0) at 
dl-error.c:187
 #9 0xb6e99f48 in _dlerror_run (operate=0xb6e99aa5 , 
args=args@entry=0xbed3aad0) at dlerror.c:163
 #10 0xb6e99b82 in __dlopen (file=0xb69f978c "librados.so.2", mode=) at dlopen.c:87
 #11 0xb6cde430 in ?? () from 
/usr/lib/python2.7/lib-dynload/_ctypes.arm-linux-gnueabihf.so
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)

 Any help/pointers are appreciated.

 Thanks a lot,
Karanvir Singh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New to CEPH - VR@Sheeltron

2015-06-12 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> V.Ranganath
> Sent: 12 June 2015 06:06
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] New to CEPH - VR@Sheeltron
> 
> Dear Sir,
> 
> I am New to CEPH. I have the following queries:
> 
> 1. I have been using OpenNAS, OpenFiler, Gluster & Nexenta for storage OS.
> How is CEPH different from Gluster & Nexenta ?

Most of those solutions are based on old style RAID groups (yes, I know ZFS
isn't RAID but its similar). Ceph is scale out object based storage. Each
disk effectively is controlled by an individual piece of software which
together pool together to create a storage cluster.

> 
> 2. I also use LUSTRE for our Storage in a HPC Environment. Can CEPH be
> substituted for Lustre ?

Possibly, but I'm not best placed to answer this

> 
> 3. What is the minimum capacity of storage (in TB), where CEPH can be
> deployed ? What is the typical hardware configuration required to support
> CEPH ? Can we use 'commodity hardware' like TYAN - Servers & JBODs to
> stack up the HDDs ?? Do you need RAID Controllers or is RAID/LUN built by
> the OS ?

I believe the minimum amount would probably be around 10GB, but it really
only tends to make sense moving from traditional RAID to Ceph when you
probably get to around 100TB.

> 
> 4. Do you have any doc. that gives me the comparisons with other Software
> based Storage ?

There isn't anything that I'm aware of, probably as it's so different that a
single comparison is hard. Are there any particular points you are
interested in comparing?

> 
> Thanks & Regards,
> 
> V.Ranganath
> VP - SI&S Division
> Sheeltron Digital Systems Pvt. Ltd.
> Direct: 080-49293307
> Mob:+91 88840 54897
> E-mail: ra...@sheeltron.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD with OCFS2

2015-06-12 Thread gjprabu
Hi,

  I measured the data only what i transfered from client. Example 500MB 
file transfered after complete if i measured the same file size will be 1GB not 
10GB. 

   Our Configuration is :-
=
 ceph -w
 cluster f428f5d6-7323-4254-9f66-56a21b099c1a
 health HEALTH_OK
 monmap e1: 3 mons at 
{cephadmin=172.20.19.235:6789/0,cephnode1=172.20.7.168:6789/0,cephnode2=172.20.9.41:6789/0},
 election epoch 114, quorum 0,1,2 cephnode1,cephnode2,cephadmin
 osdmap e9: 2 osds: 2 up, 2 in
 pgmap v1022: 64 pgs, 1 pools, 7507 MB data, 1952 objects
 26139 MB used, 277 GB / 302 GB avail
 64 active+clean
===
ceph.conf
[global]
osd pool default size = 2
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 172.20.7.168,172.20.9.41,172.20.19.235
mon_initial_members = zoho-cephnode1, zoho-cephnode2, zoho-cephadmin
fsid = f428f5d6-7323-4254-9f66-56a21b099c1a


What is the replication policy you are using ?
 
   We are using default OSD with 2 replica not using CRUSH Map, PG num and 
Erasure etc., 

What interface you used to store the data ?

   We are using RBD to store data and its has been mounted with OCFS2 in 
client side.

How are you removing data ? Are you removing a rbd image ?
 
   We are not removing rbd image, only removing data which is already 
having and removing using rm command from client. We didn't set async way to 
transfer or remove data


Also please let us know the reason ( Extra 2-3 mins is taken for hg / git 
repository operation like clone , pull , checkout and update.)



Regards
Prabu GJ





 On Fri, 12 Jun 2015 00:21:24 +0530 Somnath Roy 
 wrote  

  Hi,
  
 Ceph journal works in different way.  It’s a write ahead journal, all the data 
will be persisted first in journal and then will be written to actual place. 
Journal data is encoded. Journal is a fixed size partition/file and written 
sequentially. So, if you are placing journal in HDDs, it will be overwritten, 
for SSD case , it will be GC later. So, if you are measuring amount of data 
written to the device it will be double. But, if you are saying you have 
written a 500MB file to cluster and you are seeing the actual file size is 10G, 
it should not be the case. How are you seeing this size BTW ?
  
 Could you please tell us more about your configuration ?
 What is the replication policy you are using ?
 What interface you used to store the data ?
  
 Regarding your other query..
  
 << If i transfer 1GB data, what will be server size(OSD), Is this will 
write compressed format
  
 No, actual data is not compressed. You don’t want to fill up OSD disk and 
there are some limits you can set . Check the following link
  
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
  
 It will stop working if the disk is 95% full by default.
  
 << Is it possible to take backup from server compressed data and copy 
the same to other machine as Server_Backup  - then start new client using 
Server_Backup
 For backup, check the following link if that works for you.
  
 https://ceph.com/community/blog/tag/backup/
  
 Also, you can use RGW federated config for back up.
  
 << Data removal is very slow
  
 How are you removing data ? Are you removing a rbd image ?
  
 If you are removing entire pool , that should be fast and do deletes data 
async way I guess.
  
 Thanks & Regards
 Somnath
  
   From: gjprabu [mailto:gjpr...@zohocorp.com] 
 Sent: Thursday, June 11, 2015 6:38 AM
 To: Somnath Roy
 Cc: ceph-users@lists.ceph.com; Kamala Subramani; Siva Sokkumuthu
 Subject: Re: RE: [ceph-users] Ceph OSD with OCFS2
 
 
  
   Hi Team,
 
 Once data transfer completed the journal file should convert all memory 
data's to real places but our cause it showing double of the size after 
complete transfer, Here everyone will confuse what is real file and folder 
size. Also What will happen If i move the monitoring from that osd server to 
separately, is the double size issue may solve ?
 
 We have below query also.
 
 1.  Extra 2-3 mins is taken for hg / git repository operation like clone , 
pull , checkout and update.
 
  2.  If i transfer 1GB data, what will be server size(OSD), Is this will write 
compressed format.
 
  3 . Is it possible to take backup from server compressed data and copy the 
same to other machine as Server_Backup  - then start new client using 
Server_Backup.  
 
 4.  Data removal is very slow.
   Regards
 
 Prabu
   
 
 
  
   
  On Fri, 05 Jun 2015 21:55:28 +0530 Somnath Roy 
 wrote  
 
Yes, Ceph will be writing twice , one for journal and one for actual data. 
Considering