Re: [ceph-users] Dataflow/path Client --- OSD

2015-05-07 Thread Wido den Hollander
On 05/07/2015 10:28 AM, Götz Reinicke - IT Koordinator wrote:
 Hi,
 
 still designing and deciding, we asked ourself: How dose the data
 travels from and to an OSD?
 
 E.G. I have my Fileserver with a rbd mounted and a client workstation
 writes/read to/from a share on that rbd.
 
 Is the data directly going to an OSD (node) or is it e.g. travelling
 trough the monitors as well.
 

No, the monitors are never in the I/O path. Clients talk to the OSDs
directly.

10Gb would be sufficient, but I think for all the nodes. Bandwidth is
usually not the problem, latency is.

 The point is: If we connect our file servers and OSD nodes with 40Gb,
 dose the monitor need 40Gb to? Or would be 10Gb enough.
 
 Oversize is ok :) ...
 
   Thansk and regards . Götz
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Find out the location of OSD Journal

2015-05-07 Thread Martin B Nielsen
Hi,

Inside your mounted osd there is a symlink - journal - pointing to a file
or disk/partition used with it.

Cheers,
Martin

On Thu, May 7, 2015 at 11:06 AM, Patrik Plank pat...@plank.me wrote:

  Hi,


 i cant remember on which drive I install which OSD journal :-||
 Is there any command to show this?


 thanks
 regards



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] wrong diff-export format description

2015-05-07 Thread Ultral
Hi all,

It looks like a bit wrong description

http://ceph.com/docs/master/dev/rbd-diff/

   - u8: ‘s’
   - u64: (ending) image size

I suppose that instead of u64 should be used something like le64, isn't it?
Because of from this description is not clear which bytes order should I
use..
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Networking question

2015-05-07 Thread MEGATEL / Rafał Gawron
Hi

I have theoretical question about network in ceph.
If I have two networks (public and cluster network) and one link in public 
network is broken ( cluster network is fine) what I will see in my cluster ?

How work ceph in this situation ?

Or how works ceph if link to cluster network was broken and public network is 
fine ?
CEPH node will available ?

---
rgawron
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Btrfs defragmentation

2015-05-07 Thread Lionel Bouton
On 05/06/15 19:51, Lionel Bouton wrote:

 During normal operation Btrfs OSD volumes continue to behave in the same
 way XFS ones do on the same system (sometimes faster/sometimes slower).
 What is really slow though it the OSD process startup. I've yet to make
 serious tests (umounting the filesystems to clear caches), but I've
 already seen 3 minutes of delay reading the pgs. Example :

 2015-05-05 16:01:24.854504 7f57c518b780  0 osd.17 22428 load_pgs
 2015-05-05 16:01:24.936111 7f57ae7fc700  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint:
 ioctl SNAP_DESTROY got (2) No such file or directory
 2015-05-05 16:01:24.936137 7f57ae7fc700 -1
 filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap
 'snap_1671188' got (2) No such file or directory
 2015-05-05 16:01:24.991629 7f57ae7fc700  0
 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint:
 ioctl SNAP_DESTROY got (2) No such file or directory
 2015-05-05 16:01:24.991654 7f57ae7fc700 -1
 filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap
 'snap_1671189' got (2) No such file or directory
 2015-05-05 16:04:25.413110 7f57c518b780  0 osd.17 22428 load_pgs opened
 160 pgs

 The filesystem might not have reached its balance between fragmentation
 and defragmentation rate at this time (so this may change) but mirrors
 our initial experience with Btrfs where this was the first symptom of
 bad performance.

We've seen progress on this front. Unfortunately for us we had 2 power
outages and they seem to have damaged the disk controller of the system
we are testing Btrfs on: we just had a system crash.
On the positive side this gives us an update on the OSD boot time.

With a freshly booted system without anything in cache :
- the first Btrfs OSD we installed loaded the pgs in ~1mn30s which is
half of the previous time,
- the second Btrfs OSD where defragmentation was disabled for some time
and was considered more fragmented by our tool took nearly 10 minutes to
load its pgs (and even spent 1mn before starting to load them).
- the third Btrfs OSD which was always defragmented took 4mn30 seconds
to load its pgs (it was considered more fragmented than the first and
less than the second).

My current assumption is that the defragmentation process we use can't
handle large spikes of writes (at least when originally populating the
OSD with data through backfills) but then can repair the damage on
performance they cause at least partially (it's still slower to boot
than the 3 XFS OSDs on the same system where loading pgs took 6-9 seconds).
In the current setup the defragmentation is very slow to process because
I set it up to generate very little load on the filesystems it processes
: there may be room to improve.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dataflow/path Client --- OSD

2015-05-07 Thread Götz Reinicke - IT Koordinator
Hi,

still designing and deciding, we asked ourself: How dose the data
travels from and to an OSD?

E.G. I have my Fileserver with a rbd mounted and a client workstation
writes/read to/from a share on that rbd.

Is the data directly going to an OSD (node) or is it e.g. travelling
trough the monitors as well.

The point is: If we connect our file servers and OSD nodes with 40Gb,
dose the monitor need 40Gb to? Or would be 10Gb enough.

Oversize is ok :) ...

Thansk and regards . Götz

-- 
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82 420
E-Mail goetz.reini...@filmakademie.de

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
Staatssekretär im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway and keystone

2015-05-07 Thread Mark Kirkwood

On 07/05/15 20:21, ghislain.cheval...@orange.com wrote:

HI all,

After adding the nss and the keystone admin url  parameters in ceph.conf and 
creating the openSSL certificates, all is working well.

If I had followed the doc and processed by copy/paste, I wouldn't have 
encountered any problems.

As all is working well without this set of parameters using the swift API and 
keystone, It would be helpful if the page 
http://ceph.com/docs/master/radosgw/keystone/  was more precise according to 
this implementation.

Best regards

-Message d'origine-
De : CHEVALIER Ghislain IMT/OLPS
Envoyé : lundi 13 avril 2015 16:17
À : ceph-users
Objet : RE: [ceph-users] Rados Gateway and keystone

Hi all,

Coming back to that issue.

I successfully used keystone users for the rados gateway and the swift API but 
I still don't understand how it can work with S3 API and i.e. S3 users 
(AccessKey/SecretKey)

I found a swift3 initiative but I think It's only compliant in a pure OpenStack 
swift environment  by setting up a specific plug-in.
https://github.com/stackforge/swift3

A rgw can be, at the same, time under keystone control and  standard 
radosgw-admin if
- for swift, you use the right authentication service (keystone or internal)
- for S3, you use the internal authentication service

So, my questions are still valid.
How can a rgw work for S3 users if there are stored in keystone? Which is the 
accesskey and secretkey?
What is the purpose of rgw s3 auth use keystone parameter ?



The difference is that (in particular with the v2 protocol) swift 
clients talk to keystone to a) authenticate and b) find the swift 
storage endpoint (even if it is actually pointing to rgw).


In contrast s3 clients will talk directly to the rgw, and *it* will talk 
to kesystone to check the client's s3 credentials fir them. That's why 
rgw need to have rgw s3 auth use keystone and similar parameters.


Cheers

Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Find out the location of OSD Journal

2015-05-07 Thread Francois Lafont
Hi,

Patrik Plank wrote:

 i cant remember on which drive I install which OSD journal :-||
 Is there any command to show this?

It's probably not the answer you hope, but why don't use a simple:

ls -l /var/lib/ceph/osd/ceph-$id/journal

?

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Find out the location of OSD Journal

2015-05-07 Thread Patrik Plank
Hi,



i cant remember on which drive I install which OSD journal :-||
Is there any command to show this?


thanks
regards

 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] After calamari installation osd start failed

2015-05-07 Thread Patrik Plank
Hi,



after i have installed calamari, 

ceph shows me following error when i change/reinstall add a osd.0.



Traceback (most recent call last):
  File /usr/bin/calamari-crush-location, line 86, in module
    sys.exit(main())
  File /usr/bin/calamari-crush-location, line 83, in main
    print get_osd_location(args.id)
  File /usr/bin/calamari-crush-location, line 47, in get_osd_location
    last_location = get_last_crush_location(osd_id)
  File /usr/bin/calamari-crush-location, line 27, in get_last_crush_location
    proc = Popen(c, stdout=PIPE, stderr=PIPE)
  File /usr/lib/python2.7/subprocess.py, line 679, in __init__
    errread, errwrite)
  File /usr/lib/python2.7/subprocess.py, line 1259, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Invalid command:  saw 0 of args(string(goodchars [A-Za-z0-9-_.=])) 
[string(goodchars [A-Za-z0-9-_.=])...], expected at least 1
osd crush create-or-move osdname (id|osd.id) float[0.0-] args [args...] 
:  create entry or move existing entry for name weight at/to location args
Error EINVAL: invalid command
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 
--keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 0.46 '



[global]
osd_crush_location_hook = /usr/bin/calamari-crush-location
fsid = 78227661-3a1b-4e56-addc-c2a272933ac2
mon_initial_members = ceph01
mon_host = 10.0.0.20,10.0.0.21,10.0.0.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
filestore_op_threads = 32
public_network = 10.0.0.0/24
cluster_network = 10.0.1.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096
osd_max_write_size = 200
osd_map_cache_size = 1024
osd_map_cache_bl_size = 128
osd_recovery_op_priority = 1
osd_max_recovery_max_active = 1
osd_recovery_max_backfills = 1
osd_op_threads = 32
osd_disk_threads = 8



After i have recreate osd.0 


3    0.27            osd.3    up    1    
6    0.55            osd.6    up    1    
9    0.55            osd.9    up    1    
12    0.27            osd.12    up    1    
15    0.27            osd.15    up    1    
18    0.27            osd.18    up    1    
21    0.06999            osd.21    up    1    
24    0.27            osd.24    up    1    
27    0.27            osd.27    up    1    
-3    3.18        host ceph02
4    0.55            osd.4    up    1    
7    0.55            osd.7    up    1    
10    0.55            osd.10    up    1    
13    0.27            osd.13    up    1    
1    0.11            osd.1    up    1    
16    0.27            osd.16    up    1    
19    0.27            osd.19    up    1    
22    0.06999            osd.22    up    1    
25    0.27            osd.25    up    1    
28    0.27            osd.28    up    1    
-4    2.76        host ceph03
2    0.11            osd.2    up    1    
5    0.55            osd.5    up    1    
8    0.55            osd.8    up    1    
11    0.13            osd.11    up    1    
14    0.27            osd.14    up    1    
17    0.27            osd.17    up    1    
20    0.27            osd.20    up    1    
23    0.06999            osd.23    up    1    
26    0.27            osd.26    up    1    
29    0.27            osd.29    up    1    
0    0    osd.0    down    0    



Does anybody have an idea how can i solve this??



thanks

cheers



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?

2015-05-07 Thread Loic Dachary
Hi,

https://github.com/ceph/ceph/pull/4517 is the fix for 
http://tracker.ceph.com/issues/11388

Cheers

On 07/05/2015 20:28, Andy Allan wrote:
 Hi all,
 
 I've found what I think is a packaging error in Hammer. I've tried
 registering for the tracker.ceph.com site but my confirmation email
 has got lost somewhere!
 
 /usr/bin/ceph is installed by the ceph-common package.
 
 ```
 dpkg -S /usr/bin/ceph
 ceph-common: /usr/bin/ceph
 ```
 
 It relies on ceph_argparse, but that isn't packaged in ceph-common,
 it's packaged in ceph. But the dependency is that ceph relies on
 ceph-common, not the other way around.
 
 ```
 dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py
 ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py
 ```
 
 Moreover, there's a commit that says move argparse to ceph-common
 but it's actually moved it to the `ceph.install` file, not
 `ceph-common.install`
 
 https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec
 
 So I think this is a packaging error, unless I'm misunderstanding something!
 
 Thanks,
 Andy
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-07 Thread Vandeir Eduardo
Hi,

when issuing rbd unmap command when there is no network connection with
mons and osds, the command hangs. Isn't there a option to force unmap even
on this situation?

Att.

Vandeir.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds

2015-05-07 Thread Ilya Dryomov
On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo
vandeir.edua...@gmail.com wrote:
 Hi,

 when issuing rbd unmap command when there is no network connection with mons
 and osds, the command hangs. Isn't there a option to force unmap even on
 this situation?

No, but you can Ctrl-C the unmap command and that should do it.  In the
dmesg you'll see something like

  rbd: unable to tear down watch request

and you may have to wait for the cluster to timeout the watch.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?

2015-05-07 Thread Andy Allan
Hi Loic,

Sorry for the noise! I'd looked when I first ran into it and didn't
find any reports or PRs, I should have checked again today.

Thanks,
Andy

On 7 May 2015 at 19:41, Loic Dachary l...@dachary.org wrote:
 Hi,

 https://github.com/ceph/ceph/pull/4517 is the fix for 
 http://tracker.ceph.com/issues/11388

 Cheers

 On 07/05/2015 20:28, Andy Allan wrote:
 Hi all,

 I've found what I think is a packaging error in Hammer. I've tried
 registering for the tracker.ceph.com site but my confirmation email
 has got lost somewhere!

 /usr/bin/ceph is installed by the ceph-common package.

 ```
 dpkg -S /usr/bin/ceph
 ceph-common: /usr/bin/ceph
 ```

 It relies on ceph_argparse, but that isn't packaged in ceph-common,
 it's packaged in ceph. But the dependency is that ceph relies on
 ceph-common, not the other way around.

 ```
 dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py
 ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py
 ```

 Moreover, there's a commit that says move argparse to ceph-common
 but it's actually moved it to the `ceph.install` file, not
 `ceph-common.install`

 https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec

 So I think this is a packaging error, unless I'm misunderstanding something!

 Thanks,
 Andy
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 --
 Loïc Dachary, Artisan Logiciel Libre


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph_argparse packaging error in Hammer/debian?

2015-05-07 Thread Andy Allan
Hi all,

I've found what I think is a packaging error in Hammer. I've tried
registering for the tracker.ceph.com site but my confirmation email
has got lost somewhere!

/usr/bin/ceph is installed by the ceph-common package.

```
dpkg -S /usr/bin/ceph
ceph-common: /usr/bin/ceph
```

It relies on ceph_argparse, but that isn't packaged in ceph-common,
it's packaged in ceph. But the dependency is that ceph relies on
ceph-common, not the other way around.

```
dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py
ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py
```

Moreover, there's a commit that says move argparse to ceph-common
but it's actually moved it to the `ceph.install` file, not
`ceph-common.install`

https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec

So I think this is a packaging error, unless I'm misunderstanding something!

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unable to start monitor

2015-05-07 Thread Krishna Mohan
Srikanth,

Try if this helps..

sudo initctl list|grep ceph (should display all ceph daemon)

sudo start ceps-mon-all  (To start ceph all ceph-monitor)

Thanks
-Krishna





 On May 7, 2015, at 1:35 PM, Srikanth Madugundi srikanth.madugu...@gmail.com 
 wrote:
 
 Hi,
 
 I am setting up a local instance of ceph cluster with latest source from git 
 hub. The build succeeded and installation was successful, But I could not 
 start the monitor.
 
 The ceph start command returns immediately and does not output anything.
 $ sudo /etc/init.d/ceph start mon.monitor1
 
 $
 
 $ ls -l /var/lib/ceph/mon/ceph-monitor1/
 
 total 8
 
 -rw-r--r-- 1 root root0 May  7 20:27 done
 
 -rw-r--r-- 1 root root   77 May  7 19:12 keyring
 
 drwxr-xr-x 2 root root 4096 May  7 19:12 store.db
 
 -rw-r--r-- 1 root root0 May  7 20:26 sysvinit
 
 -rw-r--r-- 1 root root0 May  7 20:09 upstart
 
 
 
 
 
 The log filed does not seem to have any details either
 
 
 
 $ cat /var/log/ceph/ceph-mon.monitor1.log 
 
 
 2015-05-07 19:12:13.356389 7f3f06bdb880 -1 did not load config file, using 
 default settings.
 
 
 
 $ cat /etc/ceph/ceph.conf 
 
 [global]
 
 mon host = 15.43.33.21
 
 fsid = 92f859df-8b27-466a-8d44-01af2b7ea7e6
 
 mon initial members = monitor1
 
 
 
 # Enable authentication
 
 auth cluster required = cephx
 
 auth service required = cephx
 
 auth client required = cephx
 
 
 
 # POOL / PG / CRUSH
 
 osd pool default size = 3  # Write an object 3 times
 
 osd pool default min size = 1 # Allow writing one copy in a degraded state
 
 
 
 # Ensure you have a realistic number of placement groups. We recommend 
 
 # approximately 200 per OSD. E.g., total number of OSDs multiplied by 200 
 
 # divided by the number of replicas (i.e., osd pool default size). 
 
 # !! BE CAREFULL !!
 
 # You properly should never rely on the default numbers when creating pool!
 
 osd pool default pg num = 32
 
 osd pool default pgp num = 32
 
 
 
 #log file = /home/y/logs/ceph/$cluster-$type.$id.log
 
 
 
 # Logging
 
 debug paxos = 0
 
 debug throttle = 0
 
 
 
 keyring = /etc/ceph/ceph.client.admin.keyring
 
 #run dir = /home/y/var/run/ceph
 
 
 
 [mon]
 
 debug mon = 10
 
 debug ms = 1
 
 # We found that when the disk usage reach to 94%, the disk could not be 
 written
 
 # any file (no free space), so that we lower the full ratio and we should 
 start
 
 # data migration before it becomes full
 
 mon osd full ratio = 0.9
 
 #mon data = /home/y/var/lib/ceph/mon/$cluster-$id
 
 mon osd down out interval = 172800 # 2 * 24 * 60 * 60 seconds
 
 # Ceph monitors need to be told how many reporters must to be seen from 
 different
 
 # OSDs before it can be marked offline, this should be greater than the 
 number of
 
 # OSDs per OSD host
 
 mon osd min down reporters = 12
 
 #keyring = /home/y/conf/ceph/ceph.mon.keyring
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-07 Thread Robert LeBlanc
On Thu, May 7, 2015 at 5:20 AM, Wido den Hollander w...@42on.com wrote:


 Aren't snapshots something that should protect you against removal? IF
 snapshots work properly in CephFS you could create a snapshot every hour.


Unless the file is created and removed between snapshots, then the Recycle
Bin feature would have it and the snapshot wouldn't.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW - Can't download complete object

2015-05-07 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Sean seapasu...@uchicago.edu
 To: ceph-users@lists.ceph.com
 Sent: Thursday, May 7, 2015 3:35:14 PM
 Subject: [ceph-users] RGW - Can't download complete object
 
 I have another thread goign on about truncation of objects and I believe
 this is a separate but equally bad issue in civetweb/radosgw. My cluster
 is completely healthy
 
 I have one (possibly more) objects stored in ceph rados gateway that
 will return a different size every time I Try to download it::
 
 http://pastebin.com/hK1iqXZH --- ceph -s
 http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object

The two interesting things that I see here is:
 - the multipart upload size for each part is on the big side (is it 1GB for 
each part?)
 - it seems that there are a lot of parts that suffered from retries, could be 
a source for the 512k issue

 http://pastebin.com/5TnvgMrX --- python download code
 
 The weird part is every time I download the file it is of a different
 size. I am grabbing the individual objects of the 14g file and will
 update this email once I have them all statted out. Currently I am
 getting, on average, 1.5G to 2Gb files when the total object should be
 14G in size.
 
 lacadmin@kh10-9:~$ python corruptpull.py
 the download failed. The filesize = 2125988202. The actual size is
 14577056082. Attempts = 1
 the download failed. The filesize = 2071462250. The actual size is
 14577056082. Attempts = 2
 the download failed. The filesize = 2016936298. The actual size is
 14577056082. Attempts = 3
 the download failed. The filesize = 1643643242. The actual size is
 14577056082. Attempts = 4
 the download failed. The filesize = 1597505898. The actual size is
 14577056082. Attempts = 5
 the download failed. The filesize = 2075656554. The actual size is
 14577056082. Attempts = 6
 the download failed. The filesize = 650117482. The actual size is
 14577056082. Attempts = 7
 the download failed. The filesize = 1987576170. The actual size is
 14577056082. Attempts = 8
 the download failed. The filesize = 2109210986. The actual size is
 14577056082. Attempts = 9
 the download failed. The filesize = 2142765418. The actual size is
 14577056082. Attempts = 10
 the download failed. The filesize = 2134376810. The actual size is
 14577056082. Attempts = 11
 the download failed. The filesize = 2146959722. The actual size is
 14577056082. Attempts = 12
 the download failed. The filesize = 2142765418. The actual size is
 14577056082. Attempts = 13
 the download failed. The filesize = 1467482474. The actual size is
 14577056082. Attempts = 14
 the download failed. The filesize = 2046296426. The actual size is
 14577056082. Attempts = 15
 the download failed. The filesize = 2021130602. The actual size is
 14577056082. Attempts = 16
 the download failed. The filesize = 177366. The actual size is
 14577056082. Attempts = 17
 the download failed. The filesize = 2146959722. The actual size is
 14577056082. Attempts = 18
 the download failed. The filesize = 2016936298. The actual size is
 14577056082. Attempts = 19
 the download failed. The filesize = 1983381866. The actual size is
 14577056082. Attempts = 20
 the download failed. The filesize = 2134376810. The actual size is
 14577056082. Attempts = 21
 
 Notice it is always different. Once the rados -p .rgw.buckets ls | grep
 finishes I will return the listing of objects as well but this is quite
 odd and I think this is a separate issue.
 
 Has anyone seen this before? Why wouldn't radosgw return an error and
 why am I getting different file sizes?

Usually that means that there was some error in the middle of the download, 
maybe client to radosgw communication issue. What does the radosgw show when 
this happens?

 
 I would post the log from radosgw but I don't see any err|wrn|fatal
 mentions in the log and the client completes without issue every time.
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Find out the location of OSD Journal

2015-05-07 Thread Robert LeBlanc
You may also be able to use `ceph-disk list`.

On Thu, May 7, 2015 at 3:56 AM, Francois Lafont flafdiv...@free.fr wrote:

 Hi,

 Patrik Plank wrote:

  i cant remember on which drive I install which OSD journal :-||
  Is there any command to show this?

 It's probably not the answer you hope, but why don't use a simple:

 ls -l /var/lib/ceph/osd/ceph-$id/journal

 ?

 --
 François Lafont
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?

2015-05-07 Thread Ken Dreyer
On 05/07/2015 12:53 PM, Andy Allan wrote:
 Hi Loic,
 
 Sorry for the noise! I'd looked when I first ran into it and didn't
 find any reports or PRs, I should have checked again today.
 
 Thanks,
 Andy

That's totally fine. If you want, you can review that PR and give a
thumbs up or down comment there :) More eyes on the Debian-related
changes are always a good thing.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway and keystone

2015-05-07 Thread ghislain.chevalier
HI all,

After adding the nss and the keystone admin url  parameters in ceph.conf and 
creating the openSSL certificates, all is working well.

If I had followed the doc and processed by copy/paste, I wouldn't have 
encountered any problems.

As all is working well without this set of parameters using the swift API and 
keystone, It would be helpful if the page 
http://ceph.com/docs/master/radosgw/keystone/  was more precise according to 
this implementation.

Best regards

-Message d'origine-
De : CHEVALIER Ghislain IMT/OLPS 
Envoyé : lundi 13 avril 2015 16:17
À : ceph-users
Objet : RE: [ceph-users] Rados Gateway and keystone

Hi all,

Coming back to that issue.

I successfully used keystone users for the rados gateway and the swift API but 
I still don't understand how it can work with S3 API and i.e. S3 users 
(AccessKey/SecretKey)

I found a swift3 initiative but I think It's only compliant in a pure OpenStack 
swift environment  by setting up a specific plug-in. 
https://github.com/stackforge/swift3

A rgw can be, at the same, time under keystone control and  standard 
radosgw-admin if
- for swift, you use the right authentication service (keystone or internal)
- for S3, you use the internal authentication service

So, my questions are still valid.
How can a rgw work for S3 users if there are stored in keystone? Which is the 
accesskey and secretkey?
What is the purpose of rgw s3 auth use keystone parameter ?

Best regards

--
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com Envoyé : lundi 23 mars 2015 14:03 À : ceph-users 
Objet : [ceph-users] Rados Gateway and keystone

Hi All,

I just would to be sure about keystone configuration for Rados Gateway.

I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and 
http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone
but I didn't catch if after having configured the rados gateway (ceph.conf) in 
order to use keystone, it becomes mandatory to create all the users in it. 

In other words, can a rgw be, at the same, time under keystone control and  
standard radosgw-admin ?
How does it work for S3 users ?
What is the purpose of rgw s3 auth use keystone parameter ?

Best regards

- - - - - - - - - - - - - - - - -
Ghislain Chevalier
+33299124432
+33788624370
ghislain.cheval...@orange.com
_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites 
ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez 
le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les 
messages electroniques etant susceptibles d'alteration, Orange decline toute 
responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law; they should not be distributed, used 
or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-07 Thread John Spray

On 06/05/2015 16:58, Scottix wrote:

As a point to
* someone accidentally removed a thing, and now they need a thing back

I thought MooseFS has an interesting feature that I thought would be 
good for CephFS and maybe others.


Basically a timed Trashbin
Deleted files are retained for a configurable period of time (a file 
system level trash bin)


It's an idea to cover this use case.


Until recently we had a bug where deleted files weren't purged until the 
next MDS restart, so maybe we should just back out the fix for that :-D


Seriously though, I didn't know about that MooseFS feature, it's 
interesting that they decided to implement that.  It would be fairly 
straightforward to do that in CephFS (we already put deleted files into 
a 'stray' directory before purging them asynchronously), but I think 
there might be some debate about whether it's really the role of the 
underlying filesystem to do that kind of thing.


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Networking question

2015-05-07 Thread Alexandre DERUMIER
Hi,

If I have two networks (public and cluster network) and one link in public 
network is broken ( cluster network is fine) what I will see in my cluster ? 


See

http://ceph.com/docs/master/rados/configuration/network-config-ref/


only osd between them use private network.

so if public network not work (on osd), mon will not see osd, so osd will be 
out.


Or how works ceph if link to cluster network was broken and public network is 
fine ? 

I'm not sure here, osd will be enable to replicate, so maybe it going out 
itself ?



- Mail original -
De: MEGATEL / Rafał Gawron rafal.gaw...@megatel.com.pl
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Jeudi 7 Mai 2015 10:11:20
Objet: [ceph-users] Networking question



Hi 



I have theoretical question about network in ceph. 

If I have two networks (public and cluster network) and one link in public 
network is broken ( cluster network is fine) what I will see in my cluster ? 



How work ceph in this situation ? 



Or how works ceph if link to cluster network was broken and public network is 
fine ? 

CEPH node will available ? 



--- 

rgawron 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Btrfs defragmentation

2015-05-07 Thread Burkhard Linke

Hi,

On 05/07/2015 12:04 PM, Lionel Bouton wrote:

On 05/06/15 19:51, Lionel Bouton wrote:

*snipsnap*

We've seen progress on this front. Unfortunately for us we had 2 power
outages and they seem to have damaged the disk controller of the system
we are testing Btrfs on: we just had a system crash.
On the positive side this gives us an update on the OSD boot time.

With a freshly booted system without anything in cache :
- the first Btrfs OSD we installed loaded the pgs in ~1mn30s which is
half of the previous time,
- the second Btrfs OSD where defragmentation was disabled for some time
and was considered more fragmented by our tool took nearly 10 minutes to
load its pgs (and even spent 1mn before starting to load them).
- the third Btrfs OSD which was always defragmented took 4mn30 seconds
to load its pgs (it was considered more fragmented than the first and
less than the second).

My current assumption is that the defragmentation process we use can't
handle large spikes of writes (at least when originally populating the
OSD with data through backfills) but then can repair the damage on
performance they cause at least partially (it's still slower to boot
than the 3 XFS OSDs on the same system where loading pgs took 6-9 seconds).
In the current setup the defragmentation is very slow to process because
I set it up to generate very little load on the filesystems it processes
: there may be room to improve.


Part of the OSD boot up process is also the handling of existing 
snapshots and journal replay. I've also had several btrfs based OSDs 
that took up to 20-30 minutes to start, especially after a crash. During 
journal replay the OSD daemon creates a number of new snapshot for its 
operations (newly created snap_XYZ directories that vanish after a short 
time). This snapshotting probably also adds overhead to the OSD startup 
time.
I have disabled snapshots in my setup now, since the stock ubuntu trusty 
kernel had some stability problems with btrfs.


I also had to establish cron jobs for rebalancing the btrfs partitions. 
It compacts the extents and may reduce the total amount of space taken. 
Unfortunately this procedure is not a default in most distribution (it 
definitely should be!). The problems associated with unbalanced extents 
should have been solved in kernel 3.18, but I didn't had the time to 
check it yet.


As a side note: I had several OSD with dangling snapshots (more than the 
two usually handled by the OSD). They are probably due to crashed OSD 
daemons. You have to remove the manually, otherwise they start to 
consume disk space.


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Networking question

2015-05-07 Thread Simon Hallam
This page explains what happens quite well:

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#flapping-osds

We recommend using both a public (front-end) network and a cluster (back-end) 
network so that you can better meet the capacity requirements of object 
replication. Another advantage is that you can run a cluster network such that 
it isn’t connected to the internet, thereby preventing some denial of service 
attacks. When OSDs peer and check heartbeats, they use the cluster (back-end) 
network when it’s available. See Monitor/OSD Interaction for details.

However, if the cluster (back-end) network fails or develops significant 
latency while the public (front-end) network operates optimally, OSDs currently 
do not handle this situation well. What happens is that OSDs mark each other 
down on the monitor, while marking themselves up. We call this scenario 
‘flapping`.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Alexandre DERUMIER
Sent: 07 May 2015 12:43
To: MEGATEL / Rafał Gawron
Cc: ceph-users
Subject: Re: [ceph-users] Networking question

Hi,

If I have two networks (public and cluster network) and one link in public 
network is broken ( cluster network is fine) what I will see in my cluster ? 


See

http://ceph.com/docs/master/rados/configuration/network-config-ref/


only osd between them use private network.

so if public network not work (on osd), mon will not see osd, so osd will be 
out.


Or how works ceph if link to cluster network was broken and public network is 
fine ? 

I'm not sure here, osd will be enable to replicate, so maybe it going out 
itself ?



- Mail original -
De: MEGATEL / Rafał Gawron rafal.gaw...@megatel.com.pl
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Jeudi 7 Mai 2015 10:11:20
Objet: [ceph-users] Networking question



Hi 



I have theoretical question about network in ceph. 

If I have two networks (public and cluster network) and one link in public 
network is broken ( cluster network is fine) what I will see in my cluster ? 



How work ceph in this situation ? 



Or how works ceph if link to cluster network was broken and public network is 
fine ? 

CEPH node will available ? 



--- 

rgawron 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Please visit our new website at www.pml.ac.uk and follow us on Twitter  
@PlymouthMarine

Winner of the Environment  Conservation category, the Charity Awards 2014.

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered 
in England  Wales, company number 4178503. Registered Charity No. 1091222. 
Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in 
error, please notify the sender and remove it from your system. You are 
reminded that e-mail communications are not secure and may contain viruses; PML 
accepts no liability for any loss or damage which may be caused by viruses.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD in ceph.conf

2015-05-07 Thread Robert LeBlanc
I have not used ceph-deploy, but it should use ceph-disk for the OSD
preparation.  Ceph-disk creates GPT partitions with specific partition
UUIDS for data and journals. When udev or init starts the OSD, or mounts it
to a temp location reads the whoami file and the journal, then remounts it
in the correct location. There is no need for fstab entries or the like.
This allows you to easily move OSD disks between servers (if you take the
journals with it). It's magic!  But I think I just gave away the secret.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 7, 2015 5:16 AM, Georgios Dimitrakakis gior...@acmac.uoc.gr
wrote:

 Indeed it is not necessary to have any OSD entries in the Ceph.conf file
 but what happens in the event of a disk failure resulting in changing the
 mount device?

 For what I can see is that OSDs are mounted from entries in /etc/mtab (I
 am on CentOS 6.6)
 like this:

 /dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0
 /dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0
 /dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0
 /dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0
 /dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0
 /dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0
 /dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0
 /dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0
 /dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0
 /dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0


 So in the event of a disk failure (e.g. disk SDH fails) then in the order
 the next one will take its place meaning that
 SDI will be seen as SDH upon next reboot thus it will be mounted as CEPH-6
 instead of CEPH-7 and so on...resulting in a problematic configuration (I
 guess that lots of data will be start moving around, PGs will be misplaced
 etc.)


 Correct me if I am wrong but the proper way to mount them would be by
 using the UUID of the partition.

 Is it OK if I change the entries in /etc/mtab using the UUID=xx
 instead of /dev/sdX1??

 Does CEPH try to mount them using a different config file and perhaps
 exports the entries at boot in /etc/mtab (in the latter case no
 modification in /etc/mtab will be taken into account)??

 I have deployed the Ceph cluster using only the ceph-deploy command. Is
 there a parameter that I 've missed that must be used during deployment in
 order to specify the mount points using the UUIDs instead of the device
 names?


 Regards,


 George




 On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote:

 We dont have OSD entries in our Ceph config. They are not needed if
 you dont have specific configs for different OSDs.

 Robert LeBlanc

 Sent from a mobile device please excuse any typos.
 On May 6, 2015 7:18 PM, Florent MONTHEL  wrote:

  Hi teqm,

 Is it necessary to indicate in ceph.conf all OSD that we have in the
 cluster ?
 we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD seem
 to have change ID so crush map not mapped with the reality
 Thanks

 FLORENT MONTHEL
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [1]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



 Links:
 --
 [1] mailto:ceph-users@lists.ceph.com
 [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 [3] mailto:florent.mont...@flox-arts.net


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD in ceph.conf

2015-05-07 Thread Georgios Dimitrakakis
Indeed it is not necessary to have any OSD entries in the Ceph.conf 
file
but what happens in the event of a disk failure resulting in changing 
the mount device?


For what I can see is that OSDs are mounted from entries in /etc/mtab 
(I am on CentOS 6.6)

like this:

/dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0
/dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0
/dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0
/dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0
/dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0
/dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0
/dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0
/dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0
/dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0
/dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0


So in the event of a disk failure (e.g. disk SDH fails) then in the 
order the next one will take its place meaning that
SDI will be seen as SDH upon next reboot thus it will be mounted as 
CEPH-6 instead of CEPH-7 and so on...resulting in a problematic 
configuration (I guess that lots of data will be start moving around, 
PGs will be misplaced etc.)



Correct me if I am wrong but the proper way to mount them would be by 
using the UUID of the partition.


Is it OK if I change the entries in /etc/mtab using the UUID=xx 
instead of /dev/sdX1??


Does CEPH try to mount them using a different config file and perhaps 
exports the entries at boot in /etc/mtab (in the latter case no 
modification in /etc/mtab will be taken into account)??


I have deployed the Ceph cluster using only the ceph-deploy command. 
Is there a parameter that I 've missed that must be used during 
deployment in order to specify the mount points using the UUIDs instead 
of the device names?



Regards,


George




On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote:

We dont have OSD entries in our Ceph config. They are not needed if
you dont have specific configs for different OSDs.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 6, 2015 7:18 PM, Florent MONTHEL  wrote:


Hi teqm,

Is it necessary to indicate in ceph.conf all OSD that we have in the
cluster ?
we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD seem
to have change ID so crush map not mapped with the reality
Thanks

FLORENT MONTHEL
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:florent.mont...@flox-arts.net


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS unexplained writes

2015-05-07 Thread Gregory Farnum
Sam? This looks to be the HashIndex::SUBDIR_ATTR, but I don't know
exactly what it's for nor why it would be getting constantly created
and removed on a pure read workload...

On Thu, May 7, 2015 at 2:55 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 It does sound contradictory: why would read operations in cephfs result
 in writes to disk? But they do. I upgraded to Hammer last week and I am
 still seeing this.

 The setup is as follows:

 EC-pool on hdd's for data
 replicated pool on ssd's for data-cache
 replicated pool on ssd's for meta-data

 Now whenever I start doing heavy reads on cephfs, I see intense bursts
 of write operations on the hdd's. The reads I'm doing are things like
 reading a large file (streaming a video), or running a big rsync job
 with --dry-run (so it just checks meta-data). No clue why that would
 have any effect on the hdd's, but it does.

 Now, to further figure out what's going on, I tried using lsof, atop,
 iotop, but those tools don't provide the necessary information. In lsof
 I just see a whole bunch of files opened at any time, but it doesn't
 change much during these tests.
 In atop and iotop I can clearly see that the hdd's are doing a lot of
 writes when I'm reading in cephfs, but those tools can't tell me what
 those writes are.

 So I tried strace, which can trace file operations and attach to running
 processes.
 # strace -f -e trace=file -p 5076
 This gave me an idea of what was going on. 5076 is the process id of the
 osd for one of the hdd's. I saw mostly stat's and open's, but those are
 all reads, not writes. Of course btrfs can cause writes when doing reads
 (atime), but I have the osd mounted with noatime.
 The only write operations that I saw a lot of are these:

 [pid  5350]
 getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17
 [pid  5350]
 setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0
 [pid  5350]
 removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
 user.cephos.phash.contents@1) = -1 ENODATA (No data available)

 So it appears that the osd's aren't writing actual data to disk, but
 metadata in the form of xattr's. Can anyone explain what this setting
 and removing of xattr's could be for?

 Kind regards,

 Erik.


 On 03/16/2015 10:44 PM, Gregory Farnum wrote:
 The information you're giving sounds a little contradictory, but my
 guess is that you're seeing the impacts of object promotion and
 flushing. You can sample the operations the OSDs are doing at any
 given time by running ops_in_progress (or similar, I forget exact
 phrasing) command on the OSD admin socket. I'm not sure if rados df
 is going to report cache movement activity or not.

 That though would mostly be written to the SSDs, not the hard drives —
 although the hard drives could still get metadata updates written when
 objects are flushed. What data exactly are you seeing that's leading
 you to believe writes are happening against these drives? What is the
 exact CephFS and cache pool configuration?
 -Greg

 On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 Hi,

 I forgot to mention: while I am seeing these writes in iotop and
 /proc/diskstats for the hdd's, I am -not- seeing any writes in rados
 df for the pool residing on these disks. There is only one pool active
 on the hdd's and according to rados df it is getting zero writes when
 I'm just reading big files from cephfs.

 So apparently the osd's are doing some non-trivial amount of writing on
 their own behalf. What could it be?

 Thanks,

 Erik.


 On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,

 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.

 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.

 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.

 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.

 Thanks,

 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

[ceph-users] osd does not start when object store is set to newstore

2015-05-07 Thread Srikanth Madugundi
Hi,

I built and installed ceph source from (wip-newstore) branch and could not
start osd with newstore as osd objectstore.

$ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c
/etc/ceph/ceph.conf --cluster ceph -f
2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
$

 ceph.config ( I have the following settings in ceph.conf)

[global]
osd objectstore = newstore
newstore backend = rocksdb

enable experimental unrecoverable data corrupting features = newstore

The logs does not show much details.

$ tail -f /var/log/ceph/ceph-osd.0.log
2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process
ceph-osd, pid 23514
2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store

Am I missing something?

Regards
Srikanth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [cephfs][ceph-fuse] cache size or memory leak?

2015-05-07 Thread Dexter Xiong
I tried echo 3  /proc/sys/vm/drop_caches and dentry_pinned_count dropped.

Thanks for your help.

On Thu, Apr 30, 2015 at 11:34 PM Yan, Zheng uker...@gmail.com wrote:

 On Thu, Apr 30, 2015 at 4:37 PM, Dexter Xiong dxtxi...@gmail.com wrote:
  Hi,
  I got these message when I remount:
  2015-04-30 15:47:58.199837 7f9ad30a27c0 -1 asok(0x3c83480)
  AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed
 to
  bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok':
 (17)
  File exists
  fuse: bad mount point `ceph-fuse': No such file or directory
  ceph-fuse[2576]: fuse failed to initialize
  2015-04-30 15:47:58.199980 7f9ad30a27c0 -1 init, newargv = 0x3ca9b00
  newargc=14
  2015-04-30 15:47:58.200020 7f9ad30a27c0 -1 fuse_parse_cmdline failed.
  ceph-fuse[2574]: mount failed: (22) Invalid argument.
 
  It seems that FUSE doesn't support remount? This link is google
 result.
 

 please try echo 3  /proc/sys/vm/drop_caches. check if the pinned
 dentries count drops after executing the command.

 Regards
 Yan, Zheng

  I am using ceph-dokan too. And I got the similar memory problem. I
 don't
  know if it is the same problem. I switched to use kernel module and
 Samba to
  replace previous solution temporarily.
  I'm trying to read and track the ceph  ceph-dokan source code to
 find
  more useful information.
 
 
  I don't know if my previous email arrived the list(Maybe the
 attachment
  is too large). Here is its content:
 
  I wrote a test case with Python:
  '''
  import os
  for i in range(200):
  dir_name = '/srv/ceph_fs/test/d%s'%i
  os.mkdir(dir_name)
  for j in range(3):
  with open('%s/%s'%(dir_name, j), 'w') as f:
  f.write('0')
  '''
 
  The output of status command after test on a fresh mount:
  {
  metadata: {
  ceph_sha1: e4bfad3a3c51054df7e537a724c8d0bf9be972ff,
  ceph_version: ceph version 0.94.1
  (e4bfad3a3c51054df7e537a724c8d0bf9be972ff),
  entity_id: admin,
  hostname: local-share-server,
  mount_point: \/srv\/ceph_fs
  },
  dentry_count: 204,
  dentry_pinned_count: 201,
  inode_count: 802,
  mds_epoch: 25,
  osd_epoch: 177,
  osd_epoch_barrier: 176
  }
  It seems that all pinned dentrys are directories from dump cache command
  output.
 
  Attachment is a package of debug log and dump cache content.
 
 
  On Thu, Apr 30, 2015 at 2:55 PM Yan, Zheng uker...@gmail.com wrote:
 
  On Wed, Apr 29, 2015 at 4:33 PM, Dexter Xiong dxtxi...@gmail.com
 wrote:
   The output of status command of fuse daemon:
   dentry_count: 128966,
   dentry_pinned_count: 128965,
   inode_count: 409696,
   I saw the pinned dentry is nearly the same as dentry.
   So I enabled debug log(debug client = 20/20) and  read  Client.cc
 source
   code in general. I found that an entry will not be trimed if it is
   pinned.
   But how can I unpin dentrys?
 
  Maybe these dentries are pinned by fuse kernel module (ceph-fuse does
  not try trimming kernel cache when its cache size 
  client_cache_size). Could you please run mount -o remount mount
  point, then run the status command again. check if number of pinned
  dentries drops.
 
  Regards
  Yan, Zheng
 
 
  
   On Wed, Apr 29, 2015 at 12:19 PM Dexter Xiong dxtxi...@gmail.com
   wrote:
  
   I tried set client cache size = 100, but it doesn't solve the
 problem.
   I tested ceph-fuse with kernel version 3.13.0-24 3.13.0-49 and
   3.16.0-34.
  
  
  
   On Tue, Apr 28, 2015 at 7:39 PM John Spray john.sp...@redhat.com
   wrote:
  
  
  
   On 28/04/2015 06:55, Dexter Xiong wrote:
Hi,
I've deployed a small hammer cluster 0.94.1. And I mount it
 via
ceph-fuse on Ubuntu 14.04. After several hours I found that the
ceph-fuse process crashed. The end is the crash log from
/var/log/ceph/ceph-client.admin.log. The memory cost of ceph-fuse
process was huge(more than 4GB) when it crashed.
Then I did some test and found these actions will increase
memory
cost of ceph-fuse rapidly and the memory cost never seem to
decrease:
   
  * rsync command to sync small files(rsync -a /mnt/some_small
/srv/ceph)
  * chown command/ chmod command(chmod 775 /srv/ceph -R)
   
But chown/chmod command on accessed files will not increase the
memory
cost.
It seems that ceph-fuse caches the file nodes but never releases
them.
I don't know if there is an option to control the cache size. I
set mds cache size = 2147483647 option to improve the performance
 of
mds, and I tried to set mds cache size = 1000 at client side but
 it
doesn't effect the result.
  
   The setting for client-side cache limit is client cache size,
   default
   is 16384
  
   What kernel version are you using on the client?  There have been
 some
   issues with cache trimming vs. fuse in recent kernels, but we
 thought
   we
   had workarounds in place...
  
   

Re: [ceph-users] Kicking 'Remapped' PGs

2015-05-07 Thread Gregory Farnum
This is pretty weird to me. Normally those PGs should be reported as
active, or stale, or something else in addition to remapped. Sam
suggests that they're probably stuck activating for some reason (which
is a state in new enough code, but not all versions), but I can't tell
or imagine why from these settings. You might have hit a bug I'm not
familiar with that will be jostled by just restarting the OSDs in
question... :/
-Greg


On Tue, May 5, 2015 at 7:46 AM, Paul Evans p...@daystrom.com wrote:
  Gregory Farnum g...@gregs42.com wrote:

 Oh. That's strange; they are all mapped to two OSDs but are placed on
 two different ones. I'm...not sure why that would happen. Are these
 PGs active? What's the full output of ceph -s?


 Those 4 PG’s went inactive at some point, and we had the luxury of  time to
 understand how we arrived at this state before we truly have to fix it (but
 that time is soon).
 So...We kicked a couple of OSD’s out yesterday to let the cluster re-shuffle
 things (osd.19 and osd.34…both of which were non-primary copies of the
 ‘acting’ PG map) and now the cluster status is even more interesting, IMHO:

 ceph@nc48-n1:/ceph-deploy/nautilus$ ceph -s
 cluster 68bc69c1-1382-4c30-9bf8-480e32cc5b92
  health HEALTH_WARN 2 pgs stuck inactive; 2 pgs stuck unclean;
 nodeep-scrub flag(s) set; crush map has legacy tunables
  monmap e1: 3 mons at
 {nc48-n1=10.253.50.211:6789/0,nc48-n2=10.253.50.212:6789/0,nc48-n3=10.253.50.213:6789/0},
 election epoch 564, quorum 0,1,2 nc48-n1,nc48-n2,nc48-n3
  osdmap e80862: 94 osds: 94 up, 92 in
 flags nodeep-scrub
   pgmap v1954234: 6144 pgs, 2 pools, 35251 GB data, 4419 kobjects
 91727 GB used, 245 TB / 334 TB avail
 6140 active+clean
2 remapped
2 active+clean+scrubbing
 ceph@nc48-n1:/ceph-deploy/nautilus$ ceph pg dump_stuck
 ok
 pg_statobjectsmipdegrunfbyteslogdisklogstate
 state_stampvreportedupup_primaryactingacting_primary
 last_scrubscrub_stamplast_deep_scrubdeep_scrub_stamp
 11.e2f280000233984418130013001remapped
 2015-04-23 13:18:59.29958968310'5108280862:121916[77,4]77
 [77,34]7768310'510822015-04-23 11:40:11.5654870'0
 2014-10-20 13:41:46.122624
 11.323282000235718664730013001remapped
 2015-04-23 13:18:58.97039670105'4896180862:126346[0,37]0
 [0,19]070105'489612015-04-23 11:47:02.9801458145'44375
 2015-03-30 16:09:36.975875


 --
 Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd does not start when object store is set to newstore

2015-05-07 Thread Somnath Roy
I think you need to add the following..

enable experimental unrecoverable data corrupting features = newstore rocksdb

Thanks  Regards
Somnath


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Srikanth Madugundi
Sent: Thursday, May 07, 2015 10:56 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] osd does not start when object store is set to newstore

Hi,

I built and installed ceph source from (wip-newstore) branch and could not 
start osd with newstore as osd objectstore.

$ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c 
/etc/ceph/ceph.conf --cluster ceph -f
2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store
$

 ceph.config ( I have the following settings in ceph.conf)

[global]
osd objectstore = newstore
newstore backend = rocksdb

enable experimental unrecoverable data corrupting features = newstore

The logs does not show much details.

$ tail -f /var/log/ceph/ceph-osd.0.log
2015-05-08 00:01:54.331136 7fb00e07c880  0 ceph version  (), process ceph-osd, 
pid 23514
2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store

Am I missing something?

Regards
Srikanth



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About Ceph Cache Tier parameters

2015-05-07 Thread Nick Fisk
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 GODIN Vincent (SILCA)
 Sent: 07 May 2015 11:13
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] About Ceph Cache Tier parameters
 
 Hi,
 
 In Cache Tier parameters, there is nothing to tell the cache to flush
dirty
 objects on cold storage when the cache is under-utilized (as far as you
're
 under the cache_target_dirty_ratio, it's look like dirty objects can be
 keeped in the cache for years).

Yes this is correct, I have played around with a cron job to flush the dirty
blocks when I know the cluster will be idle, this improves write performance
for the next bunch of bursty writes. I think the idea behind the current
cache thinking is more geared to something like running VM's where typically
the same hot blocks will be written to over and over again.

My workload involves a significant number of blocks which are written once
and then never again and so flushing the cache before each job run seems to
improve performance.

 
 That is to say that the flush operations will always start during writes
and
 when we have reached the cache_target_dirty_ratio value : this will slow
 down the current writes IO.
 
 Are some futur requests planned to improve this behavior ?

Not that I'm currently aware of, but I did post here a couple of weeks ago
suggesting that maybe having high and low watermarks for the cache flushing
might improve performance. At the low watermark, cache would be flushed with
a low/idle priority (much like scrub options) and at the high watermark the
current flushing behaviour would start. I didn't get any response, so I
think this idea may have hit a bit of a dead end. I did start having a
looking through the Ceph source to see if it was something I could try doing
myself, but I haven't found enough time to get my head round it.

 
 Thanks for your response
 
 Vince
 
 
 
 Ce message et toutes les pi?ces jointes (ci-apr?s le Message) sont
 strictement confidentiels et sont ?tablis ? l'attention exclusive de ses
 destinataires.
 Si vous recevez ce message par erreur, merci de le d?truire et d'en
avertir
 imm?diatement l'exp?diteur par e-mail.
 Toute utilisation de ce message non conforme ? sa destination, toute
 modification, ?dition, ou diffusion totale ou partielle non autoris?e est
 interdite. SILCA d?cline toute responsabilit?; au titre de ce Message s'il
a ?t?
 alt?r?, d?form?, falsifi? ou encore ?dit? ou diffus? sans autorisation.
 This mail message and attachments (the Message) are confidential and
 solely intended for the addressees.
 If you receive this message in error, please delete it and immediately
notify
 the sender by e-mail.
 Any use other than its intended purpose, review, retransmission,
 dissemination, either whole or partial is prohibited except if formal
approval
 is granted. SILCA shall not be liable for the Message if altered, changed,
 falsified , retransmeted or disseminated.






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Change pool id

2015-05-07 Thread Tuomas Juntunen
Hi

 

Just wanted to mention this again, if it went unnoticed.

 

Problem is that I need to get the same ID for a pool as it was before, or a
way to tell ceph where to find the original image for the VM's. I have them
available.

 

T

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Tuomas Juntunen
Sent: 5. toukokuuta 2015 16:24
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Change pool id

 

Hi

 

Previously I had to delete one pool because of a mishap I did. Now I need to
create the pool again and give it the same id. How would one do that? 

 

I assume my root problem is that, since I had to delete the images pool, the
base images vm's use are missing. I have the images available in images
pool. Would changing the id of the pool fix this (images is now id 18,
should be id 4)?

 

Below is the result of 'rbd ls vms -l' which shows that the file obviously
is missing.

 

2015-05-05 16:19:12.634163 7fe7a9b22840 -1 librbd: error looking up name for
pool id 4: (2) No such file or directory

2015-05-05 16:19:12.634194 7fe7a9b22840 -1 librbd: error opening parent
snapshot: (2) No such file or directory

 

Br,

T

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unable to start monitor

2015-05-07 Thread Srikanth Madugundi
Hi,

I am setting up a local instance of ceph cluster with latest source from
git hub. The build succeeded and installation was successful, But I could
not start the monitor.

The ceph start command returns immediately and does not output anything.

$ sudo /etc/init.d/ceph start mon.monitor1

$

$ ls -l /var/lib/ceph/mon/ceph-monitor1/

total 8

-rw-r--r-- 1 root root0 May  7 20:27 done

-rw-r--r-- 1 root root   77 May  7 19:12 keyring

drwxr-xr-x 2 root root 4096 May  7 19:12 store.db

-rw-r--r-- 1 root root0 May  7 20:26 sysvinit

-rw-r--r-- 1 root root0 May  7 20:09 upstart



The log filed does not seem to have any details either


$ cat /var/log/ceph/ceph-mon.monitor1.log

2015-05-07 19:12:13.356389 7f3f06bdb880 -1 did not load config file, using
default settings.


$ cat /etc/ceph/ceph.conf

[global]

mon host = 15.43.33.21

fsid = 92f859df-8b27-466a-8d44-01af2b7ea7e6

mon initial members = monitor1


# Enable authentication

auth cluster required = cephx

auth service required = cephx

auth client required = cephx


# POOL / PG / CRUSH

osd pool default size = 3  # Write an object 3 times

osd pool default min size = 1 # Allow writing one copy in a degraded state


# Ensure you have a realistic number of placement groups. We recommend

# approximately 200 per OSD. E.g., total number of OSDs multiplied by 200

# divided by the number of replicas (i.e., osd pool default size).

# !! BE CAREFULL !!

# You properly should never rely on the default numbers when creating pool!

osd pool default pg num = 32

osd pool default pgp num = 32


#log file = /home/y/logs/ceph/$cluster-$type.$id.log


# Logging

debug paxos = 0

debug throttle = 0


keyring = /etc/ceph/ceph.client.admin.keyring

#run dir = /home/y/var/run/ceph


[mon]

debug mon = 10

debug ms = 1

# We found that when the disk usage reach to 94%, the disk could not be
written

# any file (no free space), so that we lower the full ratio and we should
start

# data migration before it becomes full

mon osd full ratio = 0.9

#mon data = /home/y/var/lib/ceph/mon/$cluster-$id

mon osd down out interval = 172800 # 2 * 24 * 60 * 60 seconds

# Ceph monitors need to be told how many reporters must to be seen from
different

# OSDs before it can be marked offline, this should be greater than the
number of

# OSDs per OSD host

mon osd min down reporters = 12

#keyring = /home/y/conf/ceph/ceph.mon.keyring
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW - Can't download complete object

2015-05-07 Thread Sean
I have another thread goign on about truncation of objects and I believe 
this is a separate but equally bad issue in civetweb/radosgw. My cluster 
is completely healthy


I have one (possibly more) objects stored in ceph rados gateway that 
will return a different size every time I Try to download it::


http://pastebin.com/hK1iqXZH --- ceph -s
http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object
http://pastebin.com/5TnvgMrX --- python download code

The weird part is every time I download the file it is of a different 
size. I am grabbing the individual objects of the 14g file and will 
update this email once I have them all statted out. Currently I am 
getting, on average, 1.5G to 2Gb files when the total object should be 
14G in size.


lacadmin@kh10-9:~$ python corruptpull.py
the download failed. The filesize = 2125988202. The actual size is 
14577056082. Attempts = 1
the download failed. The filesize = 2071462250. The actual size is 
14577056082. Attempts = 2
the download failed. The filesize = 2016936298. The actual size is 
14577056082. Attempts = 3
the download failed. The filesize = 1643643242. The actual size is 
14577056082. Attempts = 4
the download failed. The filesize = 1597505898. The actual size is 
14577056082. Attempts = 5
the download failed. The filesize = 2075656554. The actual size is 
14577056082. Attempts = 6
the download failed. The filesize = 650117482. The actual size is 
14577056082. Attempts = 7
the download failed. The filesize = 1987576170. The actual size is 
14577056082. Attempts = 8
the download failed. The filesize = 2109210986. The actual size is 
14577056082. Attempts = 9
the download failed. The filesize = 2142765418. The actual size is 
14577056082. Attempts = 10
the download failed. The filesize = 2134376810. The actual size is 
14577056082. Attempts = 11
the download failed. The filesize = 2146959722. The actual size is 
14577056082. Attempts = 12
the download failed. The filesize = 2142765418. The actual size is 
14577056082. Attempts = 13
the download failed. The filesize = 1467482474. The actual size is 
14577056082. Attempts = 14
the download failed. The filesize = 2046296426. The actual size is 
14577056082. Attempts = 15
the download failed. The filesize = 2021130602. The actual size is 
14577056082. Attempts = 16
the download failed. The filesize = 177366. The actual size is 
14577056082. Attempts = 17
the download failed. The filesize = 2146959722. The actual size is 
14577056082. Attempts = 18
the download failed. The filesize = 2016936298. The actual size is 
14577056082. Attempts = 19
the download failed. The filesize = 1983381866. The actual size is 
14577056082. Attempts = 20
the download failed. The filesize = 2134376810. The actual size is 
14577056082. Attempts = 21


Notice it is always different. Once the rados -p .rgw.buckets ls | grep 
finishes I will return the listing of objects as well but this is quite 
odd and I think this is a separate issue.


Has anyone seen this before? Why wouldn't radosgw return an error and 
why am I getting different file sizes?


I would post the log from radosgw but I don't see any err|wrn|fatal 
mentions in the log and the client completes without issue every time.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS unexplained writes

2015-05-07 Thread Erik Logtenberg
It does sound contradictory: why would read operations in cephfs result
in writes to disk? But they do. I upgraded to Hammer last week and I am
still seeing this.

The setup is as follows:

EC-pool on hdd's for data
replicated pool on ssd's for data-cache
replicated pool on ssd's for meta-data

Now whenever I start doing heavy reads on cephfs, I see intense bursts
of write operations on the hdd's. The reads I'm doing are things like
reading a large file (streaming a video), or running a big rsync job
with --dry-run (so it just checks meta-data). No clue why that would
have any effect on the hdd's, but it does.

Now, to further figure out what's going on, I tried using lsof, atop,
iotop, but those tools don't provide the necessary information. In lsof
I just see a whole bunch of files opened at any time, but it doesn't
change much during these tests.
In atop and iotop I can clearly see that the hdd's are doing a lot of
writes when I'm reading in cephfs, but those tools can't tell me what
those writes are.

So I tried strace, which can trace file operations and attach to running
processes.
# strace -f -e trace=file -p 5076
This gave me an idea of what was going on. 5076 is the process id of the
osd for one of the hdd's. I saw mostly stat's and open's, but those are
all reads, not writes. Of course btrfs can cause writes when doing reads
(atime), but I have the osd mounted with noatime.
The only write operations that I saw a lot of are these:

[pid  5350]
getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17
[pid  5350]
setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0
[pid  5350]
removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3,
user.cephos.phash.contents@1) = -1 ENODATA (No data available)

So it appears that the osd's aren't writing actual data to disk, but
metadata in the form of xattr's. Can anyone explain what this setting
and removing of xattr's could be for?

Kind regards,

Erik.


On 03/16/2015 10:44 PM, Gregory Farnum wrote:
 The information you're giving sounds a little contradictory, but my
 guess is that you're seeing the impacts of object promotion and
 flushing. You can sample the operations the OSDs are doing at any
 given time by running ops_in_progress (or similar, I forget exact
 phrasing) command on the OSD admin socket. I'm not sure if rados df
 is going to report cache movement activity or not.
 
 That though would mostly be written to the SSDs, not the hard drives —
 although the hard drives could still get metadata updates written when
 objects are flushed. What data exactly are you seeing that's leading
 you to believe writes are happening against these drives? What is the
 exact CephFS and cache pool configuration?
 -Greg
 
 On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote:
 Hi,

 I forgot to mention: while I am seeing these writes in iotop and
 /proc/diskstats for the hdd's, I am -not- seeing any writes in rados
 df for the pool residing on these disks. There is only one pool active
 on the hdd's and according to rados df it is getting zero writes when
 I'm just reading big files from cephfs.

 So apparently the osd's are doing some non-trivial amount of writing on
 their own behalf. What could it be?

 Thanks,

 Erik.


 On 03/16/2015 10:26 PM, Erik Logtenberg wrote:
 Hi,

 I am getting relatively bad performance from cephfs. I use a replicated
 cache pool on ssd in front of an erasure coded pool on rotating media.

 When reading big files (streaming video), I see a lot of disk i/o,
 especially writes. I have no clue what could cause these writes. The
 writes are going to the hdd's and they stop when I stop reading.

 I mounted everything with noatime and nodiratime so it shouldn't be
 that. On a related note, the Cephfs metadata is stored on ssd too, so
 metadata-related changes shouldn't hit the hdd's anyway I think.

 Any thoughts? How can I get more information about what ceph is doing?
 Using iotop I only see that the osd processes are busy but it doesn't
 give many hints as to what they are doing.

 Thanks,

 Erik.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] export-diff exported only 4kb instead of 200-600gb

2015-05-07 Thread Ultral
Hi all,


Something strange occurred.
I have ceph 0.87 version and 2048gb image format 1. I decided to made
incremental backups between clusters

i've made initial copy,

time bbcp -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd
export-diff --cluster cluster1 --pool RBD-01 --image
CEPH_006__01__NA__0003__ESX__ALL_EXT --snap move2db24-20150428 -
1.1.1.1:rbd import-diff - --cluster cluster2 --pool
TST-INT-SD-RBD-1DC --image temp

and decide to move incremental(it should be about 200-600gb of changes)

time bbcp -c -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd
--cluster cluster1 --pool RBD-01 --image
CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap
move2db24-20150428 --snap 2015-05-05 - 1.1.1.1:rbd import-diff -
--cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp

it itakes about 30min(it was too fast because i have limitation 7M bettween
clusters) and i decided to check how much data was transfered

time rbd export-diff --cluster cluster1 --pool RBD-01 --image
CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap
move2db24-20150428 --snap 2015-05-05 -|wc -c
4753
Exporting image: 100% complete...done.


i've double checked it.. it was really 4753 bytes,  i've decided to check
export-diff file

000: 7262 6420 6469  2076 310a 6612   rbd diff v1.f...
010: 006d 6f76 6532 6462 3234 2d32 3031 3530  .move2db24-20150
020: 3432 3874 0a00  3230 3135 2d30 352d  428t2015-05-
030: 3035 7300   0200 0077 0080 5501  05sw..U.
040:   0002    02ef cdab  
050: 0080 3500   0d00     ..5.
060: 2d58 3aff 5002  bc56 2255 08fc 14a9  -X:.PVU
070: e6c0 e839 351a 942c 01de 4603 0e00   ...95..,..F.
080: 3a00         :...
090:          
0a0:          

..
0001270:          
0001280:          
0001290: 65   e

it look like correct format..

i've made clone(like flex clone) from snapshot 2015-05-05  found that it
doesn't have changes  from snap move2db24-20150428

 Do you have any ideas, what should i check? why is it hapend?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] wrong diff-export format description

2015-05-07 Thread Jason Dillaman
You are correct -- it is little endian like the other values. I'll open a 
ticket to correct the document. 

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: Ultral ultral...@gmail.com 
To: ceph-us...@ceph.com 
Sent: Thursday, May 7, 2015 5:23:12 AM 
Subject: [ceph-users] wrong diff-export format description 

Hi all, 

It looks like a bit wrong description 

http://ceph.com/docs/master/dev/rbd-diff/ 


* u8: ‘s’ 
* u64: (ending) image size 
I suppose that instead of u64 should be used something like le64, isn't it? 
Because of from this description is not clear which bytes order should I use.. 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to backup hundreds or thousands of TB

2015-05-07 Thread Wido den Hollander
On 05/07/2015 12:10 PM, John Spray wrote:
 On 06/05/2015 16:58, Scottix wrote:
 As a point to
 * someone accidentally removed a thing, and now they need a thing back

 I thought MooseFS has an interesting feature that I thought would be
 good for CephFS and maybe others.

 Basically a timed Trashbin
 Deleted files are retained for a configurable period of time (a file
 system level trash bin)

 It's an idea to cover this use case.
 
 Until recently we had a bug where deleted files weren't purged until the
 next MDS restart, so maybe we should just back out the fix for that :-D
 
 Seriously though, I didn't know about that MooseFS feature, it's
 interesting that they decided to implement that.  It would be fairly
 straightforward to do that in CephFS (we already put deleted files into
 a 'stray' directory before purging them asynchronously), but I think
 there might be some debate about whether it's really the role of the
 underlying filesystem to do that kind of thing.
 

Aren't snapshots something that should protect you against removal? IF
snapshots work properly in CephFS you could create a snapshot every hour.

With the recursive statistics [0] of CephFS you could easily backup
all your data to a different Ceph system or anything not Ceph.

I've done this with a ~700TB CephFS cluster and that is still working
properly.

Wido

[0]:
http://blog.widodh.nl/2015/04/playing-with-cephfs-recursive-statistics/

 John
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] About Ceph Cache Tier parameters

2015-05-07 Thread GODIN Vincent (SILCA)
Hi,


In Cache Tier parameters, there is nothing to tell the cache to flush dirty 
objects on cold storage when the cache is under-utilized (as far as you 're 
under the cache_target_dirty_ratio, it's look like dirty objects can be 
keeped in the cache for years).



That is to say that the flush operations will always start during writes and 
when we have reached the cache_target_dirty_ratio value : this will slow down 
the current writes IO.



Are some futur requests planned to improve this behavior ?



Thanks for your response



Vince


___

Ce message et toutes les pièces jointes (ci-après le Message) sont 
strictement confidentiels et sont établis à l'attention exclusive de ses 
destinataires.

Si vous recevez ce message par erreur, merci de le détruire et d'en avertir 
immédiatement l'expéditeur par e-mail. 

Toute utilisation de ce message non conforme à sa destination, toute 
modification, édition, ou diffusion totale ou partielle non autorisée est 
interdite. SILCA  décline toute responsabilité au titre de ce Message s'il a 
été altéré, déformé, falsifié ou encore édité ou diffusé sans autorisation.



This  mail message and attachments (the Message) are confidential and solely 
intended for the addressees.

If you receive this message in error, please delete it and immediately notify 
the sender by e-mail.

Any  use  other than its intended purpose, review, retransmission, 
dissemination, either whole or partial is prohibited except  if  formal 
approval  is granted.

SILCA shall not be liable for the Message if altered, changed, falsified , 
retransmeted or disseminated.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Btrfs defragmentation

2015-05-07 Thread Lionel Bouton
Hi,

On 05/07/15 12:30, Burkhard Linke wrote:
 [...]
 Part of the OSD boot up process is also the handling of existing
 snapshots and journal replay. I've also had several btrfs based OSDs
 that took up to 20-30 minutes to start, especially after a crash.
 During journal replay the OSD daemon creates a number of new snapshot
 for its operations (newly created snap_XYZ directories that vanish
 after a short time). This snapshotting probably also adds overhead to
 the OSD startup time.
 I have disabled snapshots in my setup now, since the stock ubuntu
 trusty kernel had some stability problems with btrfs.

 I also had to establish cron jobs for rebalancing the btrfs
 partitions. It compacts the extents and may reduce the total amount of
 space taken.

I'm not sure what you mean by compacting extents. I'm sure balance
doesn't defragment or compress files. It moves extents and before 3.14
according to the Btrfs wiki it was used to reclaim allocated but unused
space.
This shouldn't affect performance and with modern kernels may not be
needed to reclaim unused space anymore.

 Unfortunately this procedure is not a default in most distribution (it
 definitely should be!). The problems associated with unbalanced
 extents should have been solved in kernel 3.18, but I didn't had the
 time to check it yet.

I don't have any btrfs filesystem running on 3.17 or earlier version
anymore (with a notable exception, see below) so I can't comment. I have
old btrfs filesystems that were created on 3.14 and are now on 3.18.x or
3.19.x (by the way avoid 3.18.9 to 3.19.4 if you can have any sort of
power failure, there's a possibility of a mount deadlock which requires
btrfs-zero-log to solve...). btrfs fi usage doesn't show anything
suspicious on these old fs.
I have a Jolla Phone which comes with a btrfs filesystem and uses an old
heavily patched 3.4 kernel. It didn't have any problem yet but I don't
stuff it with data (I've seen discussions about triggering a balance
before a SailfishOS upgrade).
I assume that you shouldn't have any problem with filesystems that
aren't heavily used which should be the case with Ceph OSD (for example
our current alert level is at 75% space usage).


 As a side note: I had several OSD with dangling snapshots (more than
 the two usually handled by the OSD). They are probably due to crashed
 OSD daemons. You have to remove the manually, otherwise they start to
 consume disk space.

Thanks a lot, I didn't think it could happen. I'll configure an alert
for this case.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com