from:"Jean\-Charles Lopez"

Re: [ceph-users] RBD Snapshot

2019-04-07 Thread Jean-Charles Lopez

Hi

Have you try the rbd du command?

Info simply tells you about the size provisioned. 

Regards

JC

While moving. Excuse unintended typos.

> On Apr 7, 2019, at 13:21, Spencer Babcock  wrote:
> 
> Hello,
>  
> I currently have an issue regarding RBD snapshot.
>  
> The issue:
> I can fill a VM’s disk with dd, perform an rbd snapshot, delete the dd file, 
> and repeat until the pool is full. All while RBD info shows the virtual size 
> of the disk specified at creation. The commands I am aware of do not 
> accurately reflect the volume’s total storage consumption.
>  
> How can I get a better picture of a volume’s storage footprint? “#rados df” , 
>  “#rbd diff pool/image@snap | awk '{ SUM += $2 } END { print SUM/1024/1024 " 
> MB" }'”, “#rbd snap ls”, and “rbd export/qemu-img info” aren’t functional 
> enough to show the storage consumed by differentials for an image.
>  
> Thanks!
> 
> 
> 
> Spencer Babcock   
> ,  -  
> 
> Phone:Enseva on Facebook
> Email: spencer.babc...@enseva.com Enseva on Twitter
> Website: www.enseva.com   Enseva on LinkedIn
> 
> NOTICE: This message is covered by the Electronic Communications Privacy Act, 
> Title 18, United States Code, Sections 2510-2521. This e-mail and any 
> attached files are the exclusive property of Enseva LLC, are deemed 
> privileged and confidential, and are intended solely for the use of the 
> individual(s) or entity to whom this e-mail is addressed. If you are not one 
> of the named recipient(s) or believe that you have received this message in 
> error, please delete this e-mail and any attachments and notify the sender 
> immediately. Any other use, re-creation, dissemination, forwarding or copying 
> of this e-mail is strictly prohibited and may be unlawful.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Doubts about parameter "osd sleep recovery"

2019-02-18 Thread Jean-Charles Lopez

Hi Fabio,

have a look here: 
https://github.com/ceph/ceph/blob/luminous/src/common/options.cc#L2355 


It’s designed to relieve the pressure generated by the recovery and backfill on 
both the drives and the network as it slows down these activities by 
introducing a sleep after these respectives ops.

Regards
JC

> On Feb 18, 2019, at 09:28, Fabio Abreu  wrote:
> 
> Hi Everybody !
> 
> I finding configure my cluster to receives news disks and pgs and after 
> configure the main standard configuration too, I look the parameter "osd 
> sleep recovery" to implement in production environment but I find just sample 
> doc about this config. 
> 
> Someone have experience with this parameter ? 
> 
> Only discussion in the internet about this :
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025574.html 
> 
> 
> My main configuration to receive new osds in Jewel 10.2.7 cluster : 
> 
> Before include new nodes : 
> $ ceph tell osd.* injectargs '--osd-max-backfills 2'
> $ ceph tell osd.* injectargs '--osd-recovery-threads 1'
> $ ceph tell osd.* injectargs '--osd-recovery-op-priority 2'
> $ ceph tell osd.* injectargs '--osd-client-op-priority 63'
> $ ceph tell osd.* injectargs '--osd-recovery-max-active 2'
> 
> After include new nodes 
> $ ceph tell osd.* injectargs '--osd-max-backfills 1'
> $ ceph tell osd.* injectargs '--osd-recovery-threads 1'
> $ ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
> $ ceph tell osd.* injectargs '--osd-client-op-priority 63'
> $ ceph tell osd.* injectargs '--osd-recovery-max-active 1'
> 
> 
> Regards, 
> 
> Fabio Abreu Reis
> http://fajlinux.com.br 
> Tel : +55 21 98244-0161
> Skype : fabioabreureis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread Jean-Charles Lopez

Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

> On Jan 29, 2019, at 10:08, PHARABOT Vincent  wrote:
> 
> Hello,
>  
> I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
> state.
> I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
> clean+active state !
>  
> I might miss something obvious but I really don’t know what…. Someone could 
> help me ?
> I tried to seek answers among the list mail threads, but no luck, other 
> situation seems different
>  
> Thank you a lot for your help
>  
> Vincent
>  
> # ceph -v
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
>  
> # ceph -s
> cluster:
> id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
> health: HEALTH_WARN
> Reduced data availability: 200 pgs inactive
>  
> services:
> mon: 1 daemons, quorum ip-10-8-66-123.eu 
> -west-2.compute.internal
> mgr: ip-10-8-66-123.eu 
> -west-2.compute.internal(active)
> osd: 3 osds: 3 up, 3 in
>  
> data:
> pools: 2 pools, 200 pgs
> objects: 0 objects, 0 B
> usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
> pgs: 100.000% pgs unknown
> 200 unknown
>  
> # ceph osd tree -f json-pretty
>  
> {
> "nodes": [
> {
> "id": -1,
> "name": "default",
> "type": "root",
> "type_id": 10,
> "children": [
> -3,
> -5,
> -7
> ]
> },
> {
> "id": -7,
> "name": "ip-10-8-10-108",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 2
> ]
> },
> {
> "id": 2,
> "device_class": "hdd",
> "name": "osd.2",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
> "depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> },
> {
> "id": -5,
> "name": "ip-10-8-22-148",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 1
> ]
> },
> {
> "id": 1,
> "device_class": "hdd",
> "name": "osd.1",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
> "depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> },
> {
> "id": -3,
> "name": "ip-10-8-5-246",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 0
> ]
> },
> {
> "id": 0,
> "device_class": "hdd",
> "name": "osd.0",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
>"depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> }
> ],
> "stray": []
> }
>  
> # cat /etc/ceph/ceph.conf
> [global]
> fsid = ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
> mon initial members = ip-10-8-66-123
> mon host = 10.8.66.123
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> pid file = /var/run/$cluster/$type.pid
>  
>  
> #Choose reasonable numbers for number of replicas and placement groups.
> osd pool default size = 3 # Write an object 3 times
> osd pool default min size = 2 # Allow writing 2 copy in a degraded state
> osd pool default pg num = 100
> osd pool default pgp num = 100
>  
> #Choose a reasonable crush leaf type
> #0 for a 1-node cluster.
> #1 for a multi node cluster in a single rack
> #2 for a multi node, multi chassis cluster with multiple hosts in a chassis
> #3 for a multi node cluster with hosts across racks, etc.
> osd crush chooseleaf type = 2
>  
> [mon]
> debug mon = 20
>  
> # ceph health detail
> HEALTH_WARN Reduced data availability: 200 pgs inactive
> PG_AVAILABILITY Reduced data availability: 200 pgs inactive
> pg 1.46 is stuck inactive for 10848.068201, current state unknown, last 
> acting []
> pg 1.47 is stuck inactive for 10848.068

Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-13 Thread Jean-Charles Lopez

Hi Vlad,

No need for a specific CRUSH map configuration. I’d suggest you use the 
primary-affinity setting on the OSD so that only the OSDs that are close to 
your read point are are selected as primary.

See https://ceph.com/geen-categorie/ceph-primary-affinity/ for information

Just set the primary affinity of all the OSDs in building 2 to 0.

Only the OSDs in building 1 should then be used as primary OSDs.

BR
JC

> On Nov 13, 2018, at 12:19, Vlad Kopylov  wrote:
> 
> Or is it possible to mount one OSD directly for read file access?
> 
> v
> 
> On Sun, Nov 11, 2018 at 1:47 PM Vlad Kopylov  > wrote:
> Maybe it is possible if done via gateway-nfs export?
> Settings for gateway allow read osd selection?
> 
> v
> 
> On Sun, Nov 11, 2018 at 1:01 AM Martin Verges  > wrote:
> Hello Vlad,
> 
> If you want to read from the same data, then it ist not possible (as far I 
> know).
> 
> --
> Martin Verges
> Managing director
> 
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges 
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> 
> Web: https://croit.io 
> YouTube: https://goo.gl/PGE1Bx 
> Am Sa., 10. Nov. 2018, 03:47 hat Vlad Kopylov  > geschrieben:
> Maybe i missed something but FS is explicitly selecting pools to put files 
> and metadata, like I did below.
> So if I create new pools - data in them will be different. If I apply the 
> rule dc1_primary to cfs_data pool, and client from dc3 connects to fs t01 - 
> it will start using dc1 hosts
> 
> 
> ceph osd pool create cfs_data 100
> ceph osd pool create cfs_meta 100
> ceph fs new t01 cfs_data cfs_meta
> sudo mount -t ceph ceph1:6789:/ /mnt/t01 -o 
> name=admin,secretfile=/home/mciadmin/admin.secret
> 
> rule dc1_primary {
> id 1
> type replicated
> min_size 1
> max_size 10
> step take dc1
> step chooseleaf firstn 1 type host
> step emit
> step take dc2
> step chooseleaf firstn -2 type host
> step emit
> step take dc3
> step chooseleaf firstn -2 type host
> step emit
> }
> 
> On Fri, Nov 9, 2018 at 9:32 PM Vlad Kopylov  > wrote:
> Just to confirm - it will still populate  3 copies in each datacenter?
> Thought this map was to select where to write to, guess it does write 
> replication on the back end.
> 
> I thought pools are completely separate and clients would not see each others 
> data?
> 
> Thank you Martin!
> 
> 
> 
> 
> On Fri, Nov 9, 2018 at 2:10 PM Martin Verges  > wrote:
> Hello Vlad,
> 
> you can generate something like this:
> 
> rule dc1_primary_dc2_secondary {
> id 1
> type replicated
> min_size 1
> max_size 10
> step take dc1
> step chooseleaf firstn 1 type host
> step emit
> step take dc2
> step chooseleaf firstn 1 type host
> step emit
> step take dc3
> step chooseleaf firstn -2 type host
> step emit
> }
> 
> rule dc2_primary_dc1_secondary {
> id 2
> type replicated
> min_size 1
> max_size 10
> step take dc1
> step chooseleaf firstn 1 type host
> step emit
> step take dc2
> step chooseleaf firstn 1 type host
> step emit
> step take dc3
> step chooseleaf firstn -2 type host
> step emit
> }
> 
> After you added such crush rules, you can configure the pools:
> 
> ~ $ ceph osd pool set  crush_ruleset 1
> ~ $ ceph osd pool set  crush_ruleset 2
> 
> Now you place your workload from dc1 to the dc1 pool, and workload
> from dc2 to the dc2 pool. You could also use HDD with SSD journal (if
> your workload issn't that write intensive) and save some money in dc3
> as your client would always read from a SSD and write to Hybrid.
> 
> Btw. all this could be done with a few simple clicks through our web
> frontend. Even if you want to export it via CephFS / NFS / .. it is
> possible to set it on a per folder level. Feel free to take a look at
> https://www.youtube.com/watch?v=V33f7ipw9d4 
>  to see how easy it could
> be.
> 
> --
> Martin Verges
> Managing director
> 
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges 
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> 
> Web: https://croit.io 
> YouTube: https://goo.gl/PGE1Bx 
> 
> 
> 2018-11-09 17:35 GMT+01:00 Vlad Kopylov  >:
> >

Re: [ceph-users] network architecture questions

2018-09-18 Thread Jean-Charles Lopez

They don’t go though the MONs for IOs but they need access to the MONs over the 
public network for authentication and to receive the cluster map. 

JC

While moving. Excuse unintended typos.

> On Sep 18, 2018, at 17:51, Jean-Charles Lopez  wrote:
> 
> Hi
> 
> You deploy 3 MONs on a production cluster for HA. 
> 
> CephFS clients talk to MONs MDSs and OSDs over the public network. 
> 
> CephFS is not NFS and you’ll need ganesha to enable NFS access into your Ceph 
> File system. See http://docs.ceph.com/docs/master/cephfs/nfs/ Ganesha will 
> access your ceph cluster over the public network like any regular ceph 
> client. 
> 
> JC
> 
> While moving. Excuse unintended typos.
> 
>> On Sep 18, 2018, at 16:56, solarflow99  wrote:
>> 
>> thanks for the replies, I don't know that cephFS clients go through the 
>> MONs, they reach the OSDs directly.  When I mentioned NFS, I meant NFS 
>> clients (ie. not cephFS clients) This should have been pretty straight 
>> forward.
>> Anyone doing HA on the MONs?  How do you mount the cephFS shares, surely 
>> you'd have a vip?
>> 
>> 
>> 
>>> On Tue, Sep 18, 2018 at 12:37 PM Jean-Charles Lopez  
>>> wrote:
>>> > On Sep 17, 2018, at 16:13, solarflow99  wrote:
>>> > 
>>> > Hi, I read through the various documentation and had a few questions:
>>> > 
>>> > - From what I understand cephFS clients reach the OSDs directly, does the 
>>> > cluster network need to be opened up as a public network? 
>>> Client traffic only goes over the public network. Only OSD to OSD traffic 
>>> (replication, rebalancing, recovery go over the cluster network)
>>> > 
>>> > - Is it still necessary to have a public and cluster network when the 
>>> > using cephFS since the clients all reach the OSD's directly?  
>>> Separating the network is a plus for troubleshooting and sizing for 
>>> bandwidth
>>> > 
>>> > - Simplest way to do HA on the mons for providing NFS, etc?  
>>> Don’t really understand the question (NFS vs CephFS).
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] network architecture questions

2018-09-18 Thread Jean-Charles Lopez

> On Sep 17, 2018, at 16:13, solarflow99  wrote:
> 
> Hi, I read through the various documentation and had a few questions:
> 
> - From what I understand cephFS clients reach the OSDs directly, does the 
> cluster network need to be opened up as a public network? 
Client traffic only goes over the public network. Only OSD to OSD traffic 
(replication, rebalancing, recovery go over the cluster network)
> 
> - Is it still necessary to have a public and cluster network when the using 
> cephFS since the clients all reach the OSD's directly?  
Separating the network is a plus for troubleshooting and sizing for bandwidth
> 
> - Simplest way to do HA on the mons for providing NFS, etc?  
Don’t really understand the question (NFS vs CephFS).
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests are blocked

2018-05-08 Thread Jean-Charles Lopez

Hi Grigory,

are these lines the only lines in your log file for OSD 15?

Just for sanity, what are the log levels you have set, if any, in your config 
file away from the default? If you set all log levels to 0 like some people do 
you may want to simply go back to the default by commenting out the debug_ 
lines in your config file. If you want to see something more detailed you can 
indeed increase the log level to 5 or 10.

What you can also do is to use the admin socket on the machine to see what 
operations are actually blocked: ceph daemon osd.15 dump_ops_in_flight and ceph 
daemon osd.15 dump_historic_ops.

These two commands and their output will show you what exact operations are 
blocked and will also point you to the other OSDs this OSD is working with to 
serve the IO. May be the culprit is actually one of the OSDs handling the 
subops or it could be a network problem.

Regards
JC

> On May 8, 2018, at 03:11, Grigory Murashov  wrote:
> 
> Hello Jean-Charles!
> 
> I have finally catch the problem, It was at 13-02.
> 
> [cephuser@storage-ru1-osd3 ~]$ ceph health detail
> HEALTH_WARN 18 slow requests are blocked > 32 sec
> REQUEST_SLOW 18 slow requests are blocked > 32 sec
> 3 ops are blocked > 65.536 sec
> 15 ops are blocked > 32.768 sec
> osd.15 has blocked requests > 65.536 sec
> [cephuser@storage-ru1-osd3 ~]$
> 
> 
> But surprise - there is no information in ceph-osd.15.log that time
> 
> 
> 2018-05-08 12:54:26.105919 7f003f5f9700  4 rocksdb: (Original Log Time 
> 2018/05/08-12:54:26.105843) EVENT_LOG_v1 {"time_micros": 1525773266105834, 
> "job": 2793, "event": "trivial_move", "dest
> ination_level": 3, "files": 1, "total_files_size": 68316970}
> 2018-05-08 12:54:26.105926 7f003f5f9700  4 rocksdb: (Original Log Time 
> 2018/05/08-12:54:26.105854) 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
> E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1537]
>  [default] Moved #1 files to level-3 68316970 bytes OK
> : base level 1 max bytes base 268435456 files[0 4 45 403 722 0 0] max score 
> 0.98
> 
> 2018-05-08 13:07:29.711425 7f004f619700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684] 
> reusing log 8051 from recycle list
> 
> 2018-05-08 13:07:29.711497 7f004f619700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:725] 
> [default] New memtable created with log file: #8089. Immutable memtables: 0.
> 
> 2018-05-08 13:07:29.726107 7f003fdfa700  4 rocksdb: (Original Log Time 
> 2018/05/08-13:07:29.711524) 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
> E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
>  Calling FlushMemTableToOutputFile with column family
> [default], flush slots available 1, compaction slots allowed 1, compaction 
> slots scheduled 1
> 2018-05-08 13:07:29.726124 7f003fdfa700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/r
> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/flush_job.cc:264] 
> [default] [JOB 2794] Flushing memtable with next log file: 8089
> 
> Should I have some deeply logging?
> 
> 
> Grigory Murashov
> Voximplant
> 
> 07.05.2018 18:59, Jean-Charles Lopez пишет:
>> Hi,
>> 
>> ceph health detail
>> 
>> This will tell you which OSDs are experiencing the problem so you can then 
>> go and inspect the logs and use the admin socket to find out which requests 
>> are at the source.
>> 
>> Regards
>> JC
>> 
>>> On May 7, 2018, at 03:52, Grigory Murashov  wrote:
>>> 
>>> Hello!
>>> 
>>> I'm not much experiensed in ceph troubleshouting that why I ask for help.
>>> 
>>> I have multiple warnings coming from zabbix as a result of ceph -s
>>> 
>>> REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec
>>> 
>>> I don't see any hardware problems that time.
>>> 
>>> I'm able to find the same strings in ceph.log and ceph-mon.log like
>>> 
>&g

Re: [ceph-users] slow requests are blocked

2018-05-07 Thread Jean-Charles Lopez

Hi,

ceph health detail

This will tell you which OSDs are experiencing the problem so you can then go 
and inspect the logs and use the admin socket to find out which requests are at 
the source.

Regards
JC

> On May 7, 2018, at 03:52, Grigory Murashov  wrote:
> 
> Hello!
> 
> I'm not much experiensed in ceph troubleshouting that why I ask for help.
> 
> I have multiple warnings coming from zabbix as a result of ceph -s
> 
> REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec
> 
> I don't see any hardware problems that time.
> 
> I'm able to find the same strings in ceph.log and ceph-mon.log like
> 
> 2018-05-07 12:37:57.375546 7f3037dae700  0 log_channel(cluster) log [WRN] : 
> Health check failed: 12 slow requests are blocked > 32 sec (REQUEST_SLOW)
> 
> Now It's important to find out the root of the problem.
> 
> How to find out:
> 
> 1. which OSDs are affected
> 
> 2. which particular requests were slowed and blocked?
> 
> I assume I need more detailed logging - how to do that?
> 
> Appreciate your help.
> 
> -- 
> Grigory Murashov
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] session lost, hunting for new mon / session established : every 30s until unmount/remount

2018-03-28 Thread Jean-Charles Lopez

Hi,

if I read you crrectly you have 3 MONs on each data center. This means that 
when the link goes down you will loose quorum making the cluster unavailable.

If my perception is correct, you’d have to start a 7th MON somewhere else 
accessible from both sites for your cluster to maintain quorum during this 
event.

Regards
JC

> On Mar 28, 2018, at 15:40, Nicolas Huillard  wrote:
> 
> Hi all,
> 
> I didn't find much information regarding this kernel client loop in the
> ML. Here are my observation, around which I'll try to investigate.
> 
> My setup:
> * 2 datacenters connected using an IPsec tunnel configured for routing
> (2 subnets)
> * connection to the WAN using PPPoE and the pppd kernel module
> * the PPP connection lasts exactly 7 days, after which the provider
> kills it, and my PPP client restarts it (the WAN/inter-cluster
> communication is thus disconnected during ~30s)
> * 3 MON+n×OSD+MGR+MDS on each datacenter
> * 2 client servers using cephfs/kernel module; one of them on each
> datacenter runs the pppd client and the IPSec endpoint (Pacemaker
> manages this front-end aspect of the cluster)
> * a single cephfs mount which is not managed by Pacemaker
> 
> Observations:
> * when the ppp0 connection stops, the pppd restores the default route
> from "using the PPP tunnel" to "using a virtual IP which happens to be
> on the same host" (but could move to the other peer)
> 
> Mar 28 19:07:09 neon pppd[5543]: restoring old default route to eth0 
> [172.21.0.254]
> 
> * IPsec et al. react cleanly (remove the tunnel, recreate it when PPP
> is up again)
> 
> Mar 28 19:07:43 neon pppd[5543]: Connected to 02::42 via interface 
> eth1.835
> Mar 28 19:07:43 neon pppd[5543]: CHAP authentication succeeded
> Mar 28 19:07:43 neon pppd[5543]: peer from calling number 02::42 
> authorized
> Mar 28 19:07:43 neon pppd[5543]: replacing old default route to eth0 
> [172.21.0.254]
> 
> * 20s after the PPP link is up and IPsec is restored, libceph starts to
> complain (neon is the client/gateway on 172.21.0.0/16 which lost its
> PPP, sodium is the remote side of the same IPsec tunnel) :
> 
> Mar 28 19:08:03 neon kernel: [1232455.656828] libceph: mon1 172.21.0.18:6789 
> socket closed (con state OPEN)
> Mar 28 19:08:12 neon kernel: [1232463.846633] ceph: mds0 caps stale
> Mar 28 19:08:16 neon kernel: [1232468.128577] ceph: mds0 caps went stale, 
> renewing
> Mar 28 19:08:16 neon kernel: [1232468.128581] ceph: mds0 caps stale
> Mar 28 19:08:30 neon kernel: [1232482.601183] libceph: mon3 172.22.0.16:6789 
> session established
> Mar 28 19:09:01 neon kernel: [1232513.256059] libceph: mon3 172.22.0.16:6789 
> session lost, hunting for new mon
> Mar 28 19:09:01 neon kernel: [1232513.321176] libceph: mon5 172.22.0.20:6789 
> session established
> Mar 28 19:09:32 neon kernel: [1232543.977003] libceph: mon5 172.22.0.20:6789 
> session lost, hunting for new mon
> Mar 28 19:09:32 neon kernel: [1232543.979567] libceph: mon2 172.21.0.20:6789 
> session established
> Mar 28 19:09:39 neon kernel: [1232551.435001] ceph: mds0 caps renewed
> Mar 28 19:10:02 neon kernel: [1232574.697885] libceph: mon2 172.21.0.20:6789 
> session lost, hunting for new mon
> Mar 28 19:10:02 neon kernel: [1232574.763614] libceph: mon4 172.22.0.18:6789 
> session established
> Mar 28 19:10:33 neon kernel: [1232605.418776] libceph: mon4 172.22.0.18:6789 
> session lost, hunting for new mon
> Mar 28 19:10:33 neon kernel: [1232605.420896] libceph: mon0 172.21.0.16:6789 
> session established
> Mar 28 19:11:04 neon kernel: [1232636.139720] libceph: mon0 172.21.0.16:6789 
> session lost, hunting for new mon
> Mar 28 19:11:04 neon kernel: [1232636.205717] libceph: mon3 172.22.0.16:6789 
> session established
> 
> Mar 28 19:07:40 sodium kernel: [1211268.708716] libceph: mon0 
> 172.21.0.16:6789 session lost, hunting for new mon
> Mar 28 19:07:44 sodium kernel: [1211272.208735] libceph: mon5 
> 172.22.0.20:6789 socket closed (con state OPEN)
> Mar 28 19:07:53 sodium kernel: [1211281.683700] libceph: mon2 
> 172.21.0.20:6789 socket closed (con state OPEN)
> Mar 28 19:08:18 sodium kernel: [1211306.856489] libceph: mon5 
> 172.22.0.20:6789 session established
> Mar 28 19:08:49 sodium kernel: [1211337.575101] libceph: mon5 
> 172.22.0.20:6789 session lost, hunting for new mon
> Mar 28 19:08:49 sodium kernel: [1211337.640884] libceph: mon0 
> 172.21.0.16:6789 session established
> Mar 28 19:09:20 sodium kernel: [1211368.296187] libceph: mon0 
> 172.21.0.16:6789 session lost, hunting for new mon
> Mar 28 19:09:20 sodium kernel: [1211368.299194] libceph: mon4 
> 172.22.0.18:6789 session established
> Mar 28 19:09:50 sodium kernel: [1211399.017229] libceph: mon4 
> 172.22.0.18:6789 session lost, hunting for new mon
> Mar 28 19:09:50 sodium kernel: [1211399.019655] libceph: mon5 
> 172.22.0.20:6789 session established
> 
> * the active MDS happens to be on sodium's side (172.22.*), whereas the
> primary MON happens to be on neon's side (172.21.*), which explains t

Re: [ceph-users] Cannot delete a pool

2018-03-01 Thread Jean-Charles Lopez

Hi,

connect to the ceph-node1 machine and run : ceph daemon mon.ceph-node1 config 
set mon_allow_pool_delete true

You are just using the wrong parameter as an ID

JC

> On Mar 1, 2018, at 07:41, Max Cuttins  wrote:
> 
> I get:
> 
> #ceph daemon mon.0 config set mon_allow_pool_delete true
> admin_socket: exception getting command descriptions: [Errno 13] Permission 
> denied
> 
> 
> Il 01/03/2018 14:00, Eugen Block ha scritto:
>> It's not necessary to restart a mon if you just want to delete a pool, even 
>> if the "not observed" message appears. And I would not recommend to 
>> permanently enable the "easy" way of deleting a pool. If you are not able to 
>> delete the pool after "ceph tell mon ..." try this:
>> 
>> ceph daemon mon. config set mon_allow_pool_delete true
>> 
>> and then retry deleting the pool. This works for me without restarting any 
>> services or changing config files.
>> 
>> Regards
>> 
>> 
>> Zitat von Ronny Aasen :
>> 
>>> On 01. mars 2018 13:04, Max Cuttins wrote:
 I was testing IO and I created a bench pool.
 
 But if I tried to delete I get:
 
Error EPERM: pool deletion is disabled; you must first set the
mon_allow_pool_delete config option to true before you can destroy a
pool
 
 So I run:
 
ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
mon.ceph-node1: injectargs:mon_allow_pool_delete = 'true' (not
observed, change may require restart)
mon.ceph-node2: injectargs:mon_allow_pool_delete = 'true' (not
observed, change may require restart)
mon.ceph-node3: injectargs:mon_allow_pool_delete = 'true' (not
observed, change may require restart)
 
 I restarted all the nodes.
 But the flag has not been observed.
 
 Is this the right way to remove a pool?
>>> 
>>> i think you need to set the option in the ceph.conf of the monitors.
>>> and then restart the mon's one by one.
>>> 
>>> afaik that is by design.
>>> https://blog.widodh.nl/2015/04/protecting-your-ceph-pools-against-removal-or-property-changes/
>>>  
>>> 
>>> kind regards
>>> Ronny Aasen
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Jean-Charles Lopez

Hi,

looks like you haven’t run the ceph-deploy command with the same user name and 
may be not the same current working directory. This could explain your problem.

Make sure the other daemons have a mgr cap authorisation. You can find on this 
ML details about MGR caps being incorrect for OSDs and MONs after a Jewel to 
Luminous upgrade. The output of a ceph auth list command should help you find 
out if it’s the case.

Are your ceph daemons still running? What does a ceph daemon mon.$(hostname -s) 
quorum_status gives you from a MON server.

JC

> On Feb 28, 2018, at 10:05, Georgios Dimitrakakis  wrote:
> 
> 
> Indeed John,
> 
> you are right! I have updated "ceph-deploy" (which was installed via "pip" 
> that's why wasn't updated with the rest ceph packages) but now it complaints 
> that keys are missing
> 
> $ ceph-deploy mgr create controller
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/user/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr create 
> controller
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  mgr   : [('controller', 
> 'controller')]
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  subcommand: create
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   : 
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  func  :  0x1cce500>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts 
> controller:controller
> [ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found; run 
> 'gatherkeys'
> 
> 
> and I cannot get the keys...
> 
> 
> 
> $ ceph-deploy gatherkeys controller
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/user/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy gatherkeys 
> controller
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   : 
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  mon   : ['controller']
> [ceph_deploy.cli][INFO  ]  func  :  gatherkeys at 0x198b2a8>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory /tmp/tmpPQ895t
> [controller][DEBUG ] connection detected need for sudo
> [controller][DEBUG ] connected to host: controller
> [controller][DEBUG ] detect platform information from remote host
> [controller][DEBUG ] detect machine type
> [controller][DEBUG ] get remote short hostname
> [controller][DEBUG ] fetch remote file
> [ceph_deploy.gatherkeys][WARNIN] No mon key found in host: controller
> [ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:controller
> [ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpPQ895t
> [ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon
> 
> 
> 
> 
>> On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
>>  wrote:
>>> All,
>>> 
>>> I have updated my test ceph cluster from Jewer (10.2.10) to Luminous
>>> (12.2.4) using CentOS packages.
>>> 
>>> I have updated all packages, restarted all services with the proper order
>>> but I get a warning that the Manager Daemon doesn't exist.
>>> 
>>> Here is the output:
>>> 
>>> # ceph -s
>>>  cluster:
>>>id: d357a551-5b7a-4501-8d8f-009c63b2c972
>>>health: HEALTH_WARN
>>>no active mgr
>>> 
>>>  services:
>>>mon: 1 daemons, quorum controller
>>>mgr: no daemons active
>>>osd: 2 osds: 2 up, 2 in
>>> 
>>>  data:
>>>pools:   0 pools, 0 pgs
>>>objects: 0 objects, 0 bytes
>>>usage:   0 kB used, 0 kB / 0 kB avail
>>>pgs:
>>> 
>>> 
>>> While at the same time the system service is up and running
>>> 
>>> # systemctl status ceph-mgr.target
>>> ● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service
>>> instances at once
>>>   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; vendor
>>> preset: enabled)
>>>   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago
>>> 
>>> 
>>> I understand that I have to add a new MGR but when I try to do it via
>>> "ceph-deploy" it fails with

Re: [ceph-users] radosgw not listening after installation

2018-02-05 Thread Jean-Charles Lopez

Hi

see inline

JC

> On Feb 5, 2018, at 18:14, Piers Haken  wrote:
> 
> Thanks, JC,
>  
> You’re right I didn’t deploy any OSDs at that point. I didn’t think that 
> would be a problem since the last `ceph-deploy` command completed without 
> error and its log ended with:
>  
> The Ceph Object Gateway (RGW) is now running on host storage-test01 and 
> default port 7480
>  
> Maybe that’s a bug?
What version is this (ceph -v and ceph-deploy version)
>  
>  
> Anyway, I purged the cluster, rebuilt it with some OSDs, but I still don’t 
> see radosgw listening on port 7480.
>  
> Here’s my ceph.conf (it’s just the default, I haven’t touched it):
>  
> [global]
> fsid = 849f7b15-1e31-450b-b17c-6599fb6ff94d
> mon_initial_members = storage-test01
> mon_host = 10.0.4.127
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
May be some minimum pieces are missing from your config file assuming the 
outpout above is complete. It’s a long time I haven’t deployed with ceph-deploy
[client.rgw.storage-test01]
rgw_frontends = "civetweb port=7480" just to be sure
>  
> here’s my netstat:
>  
> # netstat -alp | grep rados
>  
> tcp0  0 10.0.4.127:5750410.0.4.127:6816 
> ESTABLISHED 19833/radosgw
> tcp0  0 10.0.4.127:4346210.0.4.127:6832 
> ESTABLISHED 19833/radosgw
> tcp0  0 10.0.4.127:4984810.0.4.127:6789 
> ESTABLISHED 19833/radosgw
Here you can see that the RGW is connected to the MON (6789) not listening on 
6789. The second column is the remote address the first one is the local address
> unix  2  [ ACC ] STREAM LISTENING 262758   19833/radosgw  
>   /var/run/ceph/ceph-client.rgw.storage-test01.asok
> unix  3  [ ] STREAM CONNECTED 264433   19833/radosgw
>  
> ps:
>  
> 20243 ?Ssl0:00 /usr/bin/radosgw -f --cluster ceph --name 
> client.rgw.storage-test01 --setuser ceph --setgroup ceph
>  
>  
> Sometimes there’s an intermittent  “Initialization timeout, failed to 
> initialize” in the rgw log, but it doesn’t occur when I restart the services.
I suspect that the RGW doesn’t fully initialize because it can’t create the 
necessary pools. Only way you can trace that is by bumping up the trace with 
debug_rgw = 20 in the configuration file and from there you should be able to 
see where it sits and what it did.
>  
> Let me know if I can send anything else, I’d really like to get this up and 
> running!
> 
> Thanks
> Piers.
>  
>  
> From: Jean-Charles Lopez [mailto:jelo...@redhat.com] 
> Sent: Monday, February 05, 2018 5:23 PM
> To: Piers Haken 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] radosgw not listening after installation
>  
> Hi,
>  
> first of all just in case, it looks like your script does not deploy any OSDs 
> as you go straight from MON to RGW.
>  
> then, RGW does listen by default on 7480 and what you see on 6789 is the MON 
> listening.
>  
> Investigation:
> - Make sure your ceph-radosgw process is running first.
> - If not running, have a look at the log to see why it may have failed.
> - Paste some more information in this mailing list so we can help you find 
> the problem (e.g. output of ceph-deploy, log of your RGW, ...)
>  
> My bet is that given that you haven’t deployed any OSDs the RGW can’t create 
> the pools it needs to stroe data. May be not but just guessing from what you 
> showed us.
>  
> Regards
> JC
>  
> On Feb 5, 2018, at 16:51, Piers Haken  <mailto:pie...@hotmail.com>> wrote:
>  
> i'm trying to setup radosgw on a brand new clutser, but I'm running into an 
> issue where it's not listening on the default port (7480)
>  
> here's my install script:
>  
>ceph-deploy new $NODE
>ceph-deploy install --release luminous $NODE
>ceph-deploy install --release luminous --rgw $NODE
>ceph-deploy mon create-initial
>ceph-deploy admin $NODE
>ceph-deploy rgw create $NODE
>  
> this is on debian 9.3 (stretch) on a clean machine.
>  
> the /usr/bin/radosgw process is running, and it's listening on port 6789 
> (this is not an HTTP server, but some insternal binary protocol), but the 
> docs say it should be listening for HTTP requestst on port 7480.
>  
> what am i missing here?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph.com%2Flistinfo.cgi%2Fceph-users-ceph.com&data=02%7C01%7C%7Cc82c135cd4b04a359eef08d56d001f64%7C84df9e7fe9f640afb435%7C1%7C0%7C636534769653549303&sdata=WfV19ft%2BydpuLRkWaNiuQv%2BiFO9An9FrH%2B47cIYe40Y%3D&reserved=0>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw not listening after installation

2018-02-05 Thread Jean-Charles Lopez

Hi,

first of all just in case, it looks like your script does not deploy any OSDs 
as you go straight from MON to RGW.

then, RGW does listen by default on 7480 and what you see on 6789 is the MON 
listening.

Investigation:
- Make sure your ceph-radosgw process is running first.
- If not running, have a look at the log to see why it may have failed.
- Paste some more information in this mailing list so we can help you find the 
problem (e.g. output of ceph-deploy, log of your RGW, ...)

My bet is that given that you haven’t deployed any OSDs the RGW can’t create 
the pools it needs to stroe data. May be not but just guessing from what you 
showed us.

Regards
JC

> On Feb 5, 2018, at 16:51, Piers Haken  wrote:
> 
> i'm trying to setup radosgw on a brand new clutser, but I'm running into an 
> issue where it's not listening on the default port (7480)
> 
> here's my install script:
> 
>ceph-deploy new $NODE
>ceph-deploy install --release luminous $NODE
>ceph-deploy install --release luminous --rgw $NODE
>ceph-deploy mon create-initial
>ceph-deploy admin $NODE
>ceph-deploy rgw create $NODE
> 
> this is on debian 9.3 (stretch) on a clean machine.
> 
> the /usr/bin/radosgw process is running, and it's listening on port 6789 
> (this is not an HTTP server, but some insternal binary protocol), but the 
> docs say it should be listening for HTTP requestst on port 7480.
> 
> what am i missing here?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Full Ratio

2018-01-24 Thread Jean-Charles Lopez

Hi,

if you are using an older Ceph version note that the mon_osd_near_full_ration 
and mon_osd_full_ration must be set in the config file on the MON hosts first 
and then the MONs restarted one after the other one.

If using a recent  version there is a command ceph osd set-full-ratio and ceph 
osd set-nearfull-ratio

Regards
JC

> On Jan 24, 2018, at 11:07, Karun Josy  wrote:
> 
> Hi,
> 
> I am trying to increase the full ratio of OSDs in a cluster.
> While adding a new node one of the new disk got backfilled to more than 95% 
> and cluster freezed. So I am trying to avoid it from happening again.
> 
> 
> Tried pg set command but it is not working : 
> $ ceph pg set_nearfull_ratio 0.88
> Error ENOTSUP: this command is obsolete
> 
> I had increased the full ratio in osds using injectargs initially but it 
> didnt work as when the disk reached 95% it showed osd full status
> 
> $ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
> osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require 
> restart)
> 
> 
> 
> How can I set full ratio to more than 95% ? 
> 
> Karun 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hadoop on Ceph error

2018-01-18 Thread Jean-Charles Lopez

Hi,

What’s your Hadoop xml config file like?

Have you checked the permissions of the ceph.conf and keyring file?

In case all good may be consider setting debug option in ceph.conf.options with 
Hadoop xml config file

JC


> On Jan 18, 2018, at 16:55, Bishoy Mikhael  wrote:
> 
> Hi All,
> 
> I've a tiny Ceph 12.2.2 cluster setup with three nodes, 17 OSDs, 3 
> MON,MDS,MGR (spanned across the three nodes).
> Hadoop 2.7.3 is configured on only one of the three nodes as follows:
> - Hadoop binaries was extracted to /opt/hadoop/bin/
> - Hadoop config files where at /opt/hadoop/etc/hadoop/
> - Hadoop-cephfs.jar was downloaded from http://download.ceph.com/tarballs/ 
>  to /opt/hadoop/lib/ which the last 
> update to it was on 12-Mar-2013
> - The following symbolic links have been done:
> # ln -s /usr/lib64/libcephfs_jni.so.1.0.0 /usr/lib64/libcephfs_jni.so
> # cp /usr/lib64/libcephfs_jni.so.1.0.0 /opt/hadoop/lib/native/
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0 
> /opt/hadoop/lib/native/libcephfs_jni.so.1
> # ln -s /opt/hadoop/lib/native/libcephfs_jni.so.1.0.0 
> /opt/hadoop/lib/native/libcephfs_jni.so
> # ln -s /usr/share/java/libcephfs.jar /opt/hadoop/lib/
> The following modification to Hadoop-config.sh has been done:
> /opt/hadoop/libexec/hadoop-config.sh
> # CLASSPATH initially contains $HADOOP_CONF_DIR
> CLASSPATH="${HADOOP_CONF_DIR}:/opt/hadoop/lib/libcephfs.jar:/opt/hadoop/lib/hadoop-cephfs.jar"
> 
> So writes and reads to/from Ceph using HDFS CLI works fine, but when I use 
> hadoop Java library I get the following error:
> 
> ERROR HdfsTraveller:58 - com.ceph.fs.CephNotMountedException: not mounted
> 
> fileSystem.globStatus(path)
> FileSystem.globStatus in hdfs api
> ceph returns null pointer
> 
> Any idea what's going on? is it a configuration problem? is it a Ceph 
> problem? Did anybody see that error before?
> 
> 
> Regards,
> Bishoy
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph command hangs

2018-01-16 Thread Jean-Charles Lopez

Hi Nathan,

I would have place the mon_host parameter and assigned it the IP address for 
your monitor host in the global section so that the client (ceph -s command).

Have you also checked your firewall setup on your MON box?

To help diagnose you can also use ceph -s --debug-ms=1 so you can follow the 
network exchange between your client machine and the MON.

Regards
JC


> On Jan 16, 2018, at 13:24, Nathan Dehnel  wrote:
> 
> I'm doing a manual setup following 
> http://docs.ceph.com/docs/master/install/manual-deployment/ 
> 
> 
> The ceph command hangs until I kill it. I have 1 monitor service started.
> ==
> gentooserver ~ # ceph -s
> ^CError EINTR: problem getting command descriptions from mon.
> =
> gentooserver ~ # emerge -pqv ceph
> [ebuild   R   ] sys-cluster/ceph-12.2.1  USE="mgr radosgw ssl systemd 
> tcmalloc -babeltrace -cephfs -fuse -jemalloc -ldap -lttng -nss -static-libs 
> {-test} -xfs -zfs" CPU_FLAGS_X86="sse sse2 sse3 sse4_1 sse4_2 ssse3" 
> PYTHON_TARGETS="python2_7 python3_5 -python3_4 -python3_6" 
> ==
> gentooserver ~ # cat /etc/ceph/ceph.conf
> [global]
> cluster = ceph
> fsid = a736559a-92d1-483e-9289-d2c7feed510f
> ms bind ipv6 = true
> #public network = 2001:1c:d64b:91c5:3a84:dfce:8546:9982/0
> auth cluster required = none
> auth service required = none
> auth client required = none
> 
> [mon]
> mon initial members = mon0
> mon host = gentooserver
> mon addr = [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789
> 
> 
> [mon.mon0]
> host = gentooserver
> mon addr = [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789
> 
> [osd]
> osd journal size = 1
> osd crush chooseleaf type = 0
> ==
> gentooserver ~ # monmaptool --print /tmp/monmap
> monmaptool: monmap file /tmp/monmap
> epoch 0
> fsid a736559a-92d1-483e-9289-d2c7feed510f
> last_changed 2018-01-14 16:50:59.838277
> created 2018-01-14 16:50:59.838277
> 0: [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 mon.mon0
> ==
> gentooserver ~ # systemctl status ceph-mon@mon0 | cat
> ● ceph-mon@mon0.service - Ceph cluster monitor daemon
>Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor 
> preset: disabled)
>Active: active (running) since Tue 2018-01-16 14:50:18 CST; 17min ago
>  Main PID: 75938 (ceph-mon)
>CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@mon0.service
>└─75938 /usr/bin/ceph-mon -f --cluster ceph --id mon0 --setuser 
> ceph --setgroup ceph
> 
> Jan 16 14:50:18 gentooserver systemd[1]: Started Ceph cluster monitor daemon.
> Jan 16 14:50:18 gentooserver ceph-mon[75938]: 2018-01-16 14:50:18.977494 
> 7ff07d4cef80 -1 distro_detect - can't detect distro_version
> ===
> gentooserver ~ # cat /var/log/ceph/ceph.log
> 2018-01-16 14:50:18.977541 mon.mon0 mon.0 
> [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 1 : cluster [INF] mon.mon0@0 
> won leader election with quorum 0
> 2018-01-16 14:50:18.977656 mon.mon0 mon.0 
> [2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0 2 : cluster [INF] monmap e1: 1 
> mons at {mon0=[2001:1c:d64b:91c5:3a84:dfce:8546:9982]:6789/0}
> ---
> gentooserver ~ # cat /var/log/ceph/ceph-mon.mon0.log
> 2018-01-16 14:50:18.760533 7ff07d4cef80  0 set uid:gid to 110:239 (ceph:ceph)
> 2018-01-16 14:50:18.760549 7ff07d4cef80  0 ceph version 12.2.1 
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process 
> (unknown), pid 75938
> 2018-01-16 14:50:18.760591 7ff07d4cef80  0 pidfile_write: ignore empty 
> --pid-file
> 2018-01-16 14:50:18.765642 7ff07d4cef80  0 load: jerasure load: lrc load: isa 
> 2018-01-16 14:50:18.765702 7ff07d4cef80  0  set rocksdb option compression = 
> kNoCompression
> 2018-01-16 14:50:18.765709 7ff07d4cef80  0  set rocksdb option 
> write_buffer_size = 33554432
> 2018-01-16 14:50:18.765722 7ff07d4cef80  0  set rocksdb option compression = 
> kNoCompression
> 2018-01-16 14:50:18.765726 7ff07d4cef80  0  set rocksdb option 
> write_buffer_size = 33554432
> 2018-01-16 14:50:18.765798 7ff07d4cef80  4 rocksdb: RocksDB version: 5.4.0
> 
> 2018-01-16 14:50:18.765804 7ff07d4cef80  4 rocksdb: Git sha 
> rocksdb_build_git_sha:@0@
> 2018-01-16 14:50:18.765806 7ff07d4cef80  4 rocksdb: Compile date Dec 14 2017
> 2018-01-16 14:50:18.765808 7ff07d4cef80  4 rocksdb: DB SUMMARY
> 
> 2018-01-16 14:50:18.765837 7ff07d4cef80  4 rocksdb: CURRENT file:  CURRENT
> 
> 2018-01-16 14:50:18.765840 7ff07d4cef80  4 rocksdb: IDENTITY file:  IDENTITY
> 
> 2018-01-16 14:50:18.765843 7ff07d4cef80  4 rocksdb: MANIFEST file:  
> MANIFEST-11 size: 210 Bytes
> 
> 2018-01-16 14:50:18.765845 7ff07d4cef80  4 rocksdb: SST files in 
> /var/lib/c

Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade

2017-12-20 Thread Jean-Charles Lopez

Hi Kevin

looks like the pb comes from the mgr user itself then. 

Can you get me the output of 
- ceph auth list 
- cat /etc/ceph/ceph.conf on your mgr node

Regards

JC

While moving. Excuse unintended typos.

> On Dec 20, 2017, at 18:40, kevin parrikar  wrote:
> 
> Thanks JC,
> I tried 
> ceph auth caps client.admin osd 'allow *' mds 'allow *' mon 'allow *' mgr 
> 'allow *'
> 
> but still status is same,also  mgr.log is being flooded with below errors.
> 
> 2017-12-21 02:39:10.622834 7fb40a22b700  0 Cannot get stat of OSD 140
> 2017-12-21 02:39:10.622835 7fb40a22b700  0 Cannot get stat of OSD 141
> Not sure whats wrong in my setup
> 
> Regards,
> Kevin
> 
> 
>> On Thu, Dec 21, 2017 at 2:37 AM, Jean-Charles Lopez  
>> wrote:
>> Hi,
>> 
>> make sure client.admin user has an MGR cap using ceph auth list. At some 
>> point there was a glitch with the update process that was not adding the MGR 
>> cap to the client.admin user.
>> 
>> JC
>> 
>> 
>>> On Dec 20, 2017, at 10:02, kevin parrikar  wrote:
>>> 
>>> hi All,
>>> I have upgraded the cluster from Hammer to Jewel and to Luminous .
>>> 
>>> i am able to upload/download glance images but ceph -s shows 0kb used and 
>>> Available and probably because of that cinder create is failing.
>>> 
>>> 
>>> ceph -s
>>>   cluster:
>>> id: 06c5c906-fc43-499f-8a6f-6c8e21807acf
>>> health: HEALTH_WARN
>>> Reduced data availability: 6176 pgs inactive
>>> Degraded data redundancy: 6176 pgs unclean
>>> 
>>>   services:
>>> mon: 3 daemons, quorum controller3,controller2,controller1
>>> mgr: controller3(active)
>>> osd: 71 osds: 71 up, 71 in
>>> rgw: 1 daemon active
>>> 
>>>   data:
>>> pools:   4 pools, 6176 pgs
>>> objects: 0 objects, 0 bytes
>>> usage:   0 kB used, 0 kB / 0 kB avail
>>> pgs: 100.000% pgs unknown
>>>  6176 unknown
>>> 
>>> 
>>> i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create 
>>> ,it was successfull but for some reason ceph -s is not showing correct 
>>> values.
>>> Can some one help me here please
>>> 
>>> Regards,
>>> Kevin
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade

2017-12-20 Thread Jean-Charles Lopez

Hi,

make sure client.admin user has an MGR cap using ceph auth list. At some point 
there was a glitch with the update process that was not adding the MGR cap to 
the client.admin user.

JC


> On Dec 20, 2017, at 10:02, kevin parrikar  wrote:
> 
> hi All,
> I have upgraded the cluster from Hammer to Jewel and to Luminous .
> 
> i am able to upload/download glance images but ceph -s shows 0kb used and 
> Available and probably because of that cinder create is failing.
> 
> 
> ceph -s
>   cluster:
> id: 06c5c906-fc43-499f-8a6f-6c8e21807acf
> health: HEALTH_WARN
> Reduced data availability: 6176 pgs inactive
> Degraded data redundancy: 6176 pgs unclean
> 
>   services:
> mon: 3 daemons, quorum controller3,controller2,controller1
> mgr: controller3(active)
> osd: 71 osds: 71 up, 71 in
> rgw: 1 daemon active
> 
>   data:
> pools:   4 pools, 6176 pgs
> objects: 0 objects, 0 bytes
> usage:   0 kB used, 0 kB / 0 kB avail
> pgs: 100.000% pgs unknown
>  6176 unknown
> 
> 
> i deployed ceph-mgr using ceph-deploy gather-keys && ceph-deploy mgr create 
> ,it was successfull but for some reason ceph -s is not showing correct values.
> Can some one help me here please
> 
> Regards,
> Kevin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Jean-Charles Lopez

Update your ceph.conf file

JC

> On Dec 19, 2017, at 09:03, Karun Josy  wrote:
> 
> Hi ,
> 
> That makes sense.
> 
> How can I adjust the osd nearfull ratio ?  I tried this, however it didnt 
> change. 
> 
> $ ceph tell mon.* injectargs "--mon_osd_nearfull_ratio .86"
> mon.mon-a1: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed, 
> change may require restart)
> mon.mon-a2: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed, 
> change may require restart)
> mon.mon-a3: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed, 
> change may require restart)
> 
> 
> Karun Josy
> 
> On Tue, Dec 19, 2017 at 10:05 PM, Jean-Charles Lopez  <mailto:jelo...@redhat.com>> wrote:
> OK so it’s telling you that the near full OSD holds PGs for these three pools.
> 
> JC
> 
>> On Dec 19, 2017, at 08:05, Karun Josy > <mailto:karunjo...@gmail.com>> wrote:
>> 
>> No, I haven't.
>> 
>> Interestingly, the POOL_NEARFULL flag is shown only when there is 
>> OSD_NEARFULL  flag.
>> I have recently upgraded to Luminous 12.2.2, haven't seen this flag in 12.2.1
>> 
>> 
>> 
>> Karun Josy
>> 
>> On Tue, Dec 19, 2017 at 9:27 PM, Jean-Charles Lopez > <mailto:jelo...@redhat.com>> wrote:
>> Hi
>> 
>> did you set quotas on these pools?
>> 
>> See this page for explanation of most error messages: 
>> http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full
>>  
>> <http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full>
>> 
>> JC
>> 
>>> On Dec 19, 2017, at 01:48, Karun Josy >> <mailto:karunjo...@gmail.com>> wrote:
>>> 
>>> Hello,
>>> 
>>> In one of our clusters, health is showing these warnings :
>>> -
>>> OSD_NEARFULL 1 nearfull osd(s)
>>> osd.22 is near full
>>> POOL_NEARFULL 3 pool(s) nearfull
>>> pool 'templates' is nearfull
>>> pool 'cvm' is nearfull
>>> pool 'ecpool' is nearfull
>>> 
>>> 
>>> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
>>> But what does pool(s) nearfull mean ?
>>> And how can I correct it ?
>>> 
>>> ]$ ceph df
>>> GLOBAL:
>>> SIZE   AVAIL  RAW USED %RAW USED
>>> 31742G 11147G   20594G 64.88
>>> POOLS:
>>> NAMEID USED   %USED MAX AVAIL OBJECTS
>>> templates  5196G 23.28  645G   50202
>>> cvm   66528 0 1076G 770
>>> ecpool   7  10260G 83.56 2018G 3004031
>>> 
>>> 
>>> 
>>> Karun 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Jean-Charles Lopez

OK so it’s telling you that the near full OSD holds PGs for these three pools.

JC

> On Dec 19, 2017, at 08:05, Karun Josy  wrote:
> 
> No, I haven't.
> 
> Interestingly, the POOL_NEARFULL flag is shown only when there is 
> OSD_NEARFULL  flag.
> I have recently upgraded to Luminous 12.2.2, haven't seen this flag in 12.2.1
> 
> 
> 
> Karun Josy
> 
> On Tue, Dec 19, 2017 at 9:27 PM, Jean-Charles Lopez  <mailto:jelo...@redhat.com>> wrote:
> Hi
> 
> did you set quotas on these pools?
> 
> See this page for explanation of most error messages: 
> http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full
>  
> <http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full>
> 
> JC
> 
>> On Dec 19, 2017, at 01:48, Karun Josy > <mailto:karunjo...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> In one of our clusters, health is showing these warnings :
>> -
>> OSD_NEARFULL 1 nearfull osd(s)
>> osd.22 is near full
>> POOL_NEARFULL 3 pool(s) nearfull
>> pool 'templates' is nearfull
>> pool 'cvm' is nearfull
>> pool 'ecpool' is nearfull
>> 
>> 
>> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
>> But what does pool(s) nearfull mean ?
>> And how can I correct it ?
>> 
>> ]$ ceph df
>> GLOBAL:
>> SIZE   AVAIL  RAW USED %RAW USED
>> 31742G 11147G   20594G 64.88
>> POOLS:
>> NAMEID USED   %USED MAX AVAIL OBJECTS
>> templates  5196G 23.28  645G   50202
>> cvm   66528 0 1076G 770
>> ecpool   7  10260G 83.56 2018G 3004031
>> 
>> 
>> 
>> Karun 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Jean-Charles Lopez

Hi

did you set quotas on these pools?

See this page for explanation of most error messages: 
http://docs.ceph.com/docs/master/rados/operations/health-checks/#pool-near-full 


JC

> On Dec 19, 2017, at 01:48, Karun Josy  wrote:
> 
> Hello,
> 
> In one of our clusters, health is showing these warnings :
> -
> OSD_NEARFULL 1 nearfull osd(s)
> osd.22 is near full
> POOL_NEARFULL 3 pool(s) nearfull
> pool 'templates' is nearfull
> pool 'cvm' is nearfull
> pool 'ecpool' is nearfull
> 
> 
> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
> But what does pool(s) nearfull mean ?
> And how can I correct it ?
> 
> ]$ ceph df
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 31742G 11147G   20594G 64.88
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> templates  5196G 23.28  645G   50202
> cvm   66528 0 1076G 770
> ecpool   7  10260G 83.56 2018G 3004031
> 
> 
> 
> Karun 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: Couldn't init storage provider (RADOS)

2017-12-19 Thread Jean-Charles Lopez

Hi,

try having a look at :
- network connectivity issues
- firewall configuration issues
- missing or inaccessible keyring file for client.rgw.ceph-rgw1
- missing or inaccessible ceph.conf file

Regards
JC Lopez
Senior Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelo...@redhat.com 
+1 408-680-6959

> On Dec 18, 2017, at 13:34, Youzhong Yang  wrote:
> 
> Hello,
> 
> I tried to install Ceph 12.2.2 (Luminous) on Ubuntu 16.04.3 LTS (kernel 
> 4.4.0-104-generic), but I am having trouble starting radosgw service:
> 
> # systemctl status ceph-rado...@rgw.ceph-rgw1
> â ceph-rado...@rgw.ceph-rgw1.service - Ceph rados gateway
>Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor 
> preset: enabled)
>Active: inactive (dead) (Result: exit-code) since Mon 2017-12-18 16:10:18 
> EST; 15min ago
>   Process: 4571 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name 
> client.%i --setuser ceph --setgroup ceph (code=exited, status=5)
>  Main PID: 4571 (code=exited, status=5)
> 
> Dec 18 16:10:17 ceph-rgw1 systemd[1]: ceph-rado...@rgw.ceph-rgw1.service: 
> Unit entered failed state.
> Dec 18 16:10:17 ceph-rgw1 systemd[1]: ceph-rado...@rgw.ceph-rgw1.service: 
> Failed with result 'exit-code'.
> Dec 18 16:10:18 ceph-rgw1 systemd[1]: ceph-rado...@rgw.ceph-rgw1.service: 
> Service hold-off time over, scheduling restart.
> Dec 18 16:10:18 ceph-rgw1 systemd[1]: Stopped Ceph rados gateway.
> Dec 18 16:10:18 ceph-rgw1 systemd[1]: ceph-rado...@rgw.ceph-rgw1.service: 
> Start request repeated too quickly.
> Dec 18 16:10:18 ceph-rgw1 systemd[1]: Failed to start Ceph rados gateway.
> 
> If I ran the following command directly, it failed immediately:
> 
> # /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-rgw1 --setuser 
> ceph --setgroup ceph
> 2017-12-18 16:26:56.413135 7ff11b00fe80 -1 Couldn't init storage provider 
> (RADOS)
> 
> There's no issue when I installed Kraken (version 11.2.1). Did I miss 
> anything? 
> 
> Your help would be very much appreciated.
> 
> Thanks,
> 
> --Youzhong
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Jean-Charles Lopez

Hi Mathhew,

anything special happening on the NIC side that could cause a problem? Packet 
drops? Incorrect jumbo frame settings causing fragmentation?

Have you checked the cstate settings on the box?

Have you disabled energy saving settings differently from the other boxes?

Any unexpected wait time on some devices on the box?

Have you compared your kernel parameters on this box compared to the other 
boxes?

Just in case
JC

> On Nov 29, 2017, at 09:24, Matthew Vernon  wrote:
> 
> Hi,
> 
> We have a 3,060 OSD ceph cluster (running Jewel
> 10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by
> which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that
> host), and having ops blocking on it for some time. It will then behave
> for a bit, and then go back to doing this.
> 
> It's always the same OSD, and we've tried replacing the underlying disk.
> 
> The logs have lots of entries of the form
> 
> 2017-11-29 17:18:51.097230 7fcc06919700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fcc29fec700' had timed out after 15
> 
> I've had a brief poke through the collectd metrics for this osd (and
> comparing them with other OSDs on the same host) but other than showing
> spikes in latency for that OSD (iostat et al show no issues with the
> underlying disk) there's nothing obviously explanatory.
> 
> I tried ceph tell osd.2054 injectargs --osd-op-thread-timeout 90 (which
> is what googling for the above message suggests), but that just said
> "unchangeable", and didn't seem to make any difference.
> 
> Any ideas? Other metrics to consider? ...
> 
> Thanks,
> 
> Matthew
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to set osd_max_backfills in Luminous

2017-11-21 Thread Jean-Charles Lopez

Hi,

to check a current value use the following command on the machine where the OSD 
you want to check is running

ceph daemon osd.{id} config show | grep {parameter}
  Or
ceph daemon osd.{id} config get {parameter}

What you are seeing is actually a known glitch where you are being told it has 
no effect when in fact it does. See capture below
[root@luminous ceph-deploy]# ceph daemon osd.0 config get osd_max_backfills
{
"osd_max_backfills": "1"
}
[root@luminous ceph-deploy]# ceph tell osd.* injectargs '--osd_max_backfills 2'
osd.0: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not 
observed, change may require restart)
osd.1: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not 
observed, change may require restart)
osd.2: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not 
observed, change may require restart)
[root@luminous ceph-deploy]# ceph daemon osd.0 config get osd_max_backfills
{
"osd_max_backfills": "2"
}

Regards
JC

> On Nov 21, 2017, at 15:17, Karun Josy  wrote:
> 
> Hello,
> 
> We added couple of OSDs to the cluster and the recovery is taking much time.
> 
> So I tried to increase the osd_max_backfills value dynamically. But its 
> saying the change may need restart. 
> 
> $ ceph tell osd.* injectargs '--osd-max-backfills 5'
> osd.0: osd_max_backfills = '5' osd_objectstore = 'bluestore' (not observed, 
> change may require restart) rocksdb_separate_wal_dir = 'false' (not observed, 
> change may require restart)
> 
> 
> =
> 
> The value seems to be not changed too.
> 
> [cephuser@ceph-las-admin-a1 home]$  ceph -n osd.0 --show-config | grep 
> osd_max_backfills
> osd_max_backfills = 1
> 
> Do I have to really restart all the OSD daemons ?
> 
> 
> 
> Karun 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Configuring ceph usage statistics

2017-11-20 Thread Jean-Charles Lopez

Hi Richard,

you need to grant admin ops capabilities to a specific user to be able to query 
the usage stats.

radosgw-admin caps add --caps "usage=*;buckets=*;metadata=*;users=*;zone=*" 
--uid=johndoe

* can be replace with “read”, “read, write” depending on what you want the user 
to be able to do.

[root@ex-sem-1 ~]# radosgw-admin caps add --caps 
"usage=*;buckets=*;metadata=*;users=*;zone=*" --uid=johndoe
{
"user_id": "johndoe",
"display_name": "John Doe",
"email": "j...@redhat.com",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "johndoe",
"access_key": “x",
"secret_key": “y"
}
],
"swift_keys": [],
"caps": [
{
"type": "buckets",
"perm": "*"
},
{
"type": "metadata",
"perm": "*"
},
{
"type": "usage",
"perm": "*"
},
{
"type": "users",
"perm": "*"
},
{
"type": "zone",
"perm": "*"
}
],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

[root@ex-sem-1 ~]# radosgw-admin usage show --uid=johndoe
{
"entries": [
{
"user": "johndoe",
"buckets": [
{
"bucket": "bucket1",
"time": "2017-11-20 22:00:00.00Z",
"epoch": 1511215200,
"owner": "johndoe",
"categories": [
{
"category": "put_obj",
"bytes_sent": 0,
"bytes_received": 3939,
"ops": 3,
"successful_ops": 3
}
]
}
]
}
],
"summary": [
{
"user": "johndoe",
"categories": [
{
"category": "put_obj",
"bytes_sent": 0,
"bytes_received": 3939,
"ops": 3,
"successful_ops": 3
}
],
"total": {
"bytes_sent": 0,
"bytes_received": 3939,
"ops": 3,
"successful_ops": 3
}
}
]
}

Play with the caps to allow what you feel is necessary.

Note that you also have this to check byte usage

[root@ex-sem-1 ~]# radosgw-admin user stats --uid=johndoe
{
"stats": {
"total_entries": 6,
"total_bytes": 37307,
"total_bytes_rounded": 53248
},
"last_stats_sync": "2017-07-22 22:50:37.572798Z",
"last_stats_update": "2017-11-20 22:56:56.311295Z"
}

Best regards
JC

> On Nov 20, 2017, at 13:30, Richard Cox  wrote:
> 
> Attempting to set up a proof of concept ceph cluster (3 osd’s 1 mon node), 
> and everything is working as far as radowsgw and s3 connectivity, however I 
> can’t seem to get any usage statistics.
>  
> Looking at the documentation this is enabled by default, but just in case it 
> isn’t, I have 
>  
> [client.radosgw.gateway]
>  
> rgw enable usage log = true
> rgw usage log tick interval = 30
> rgw usage log flush threshold = 1024
> rgw usage max shards = 32
> rgw usage max user shards = 1
>  
> I read and write to the cluster using a set up demo account; however when I 
> try to view the usage stats:
>  
> radosgw-admin –uid=demo usage show
> {
> "entries": [],
> "summary": []
> }
>  
> I’m sure there’s something blindingly obvious that I’m missing, but I’m at my 
> wits end to what it could be.
>  
> Thanks for any assistance!
>  
> Richard.
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-19 Thread Jean-Charles Lopez

Hi Russell,

as you have 4 servers, assuming you are not doing EC pools, just stop all the 
OSDs on the second questionable server, mark the OSDs on that server as out, 
let the cluster rebalance and when all PGs are active+clean just replay the 
test.

All IOs should then go only to the other 3 servers.

JC

> On Oct 19, 2017, at 13:49, Russell Glaue  wrote:
> 
> No, I have not ruled out the disk controller and backplane making the disks 
> slower.
> Is there a way I could test that theory, other than swapping out hardware?
> -RG
> 
> On Thu, Oct 19, 2017 at 3:44 PM, David Turner  > wrote:
> Have you ruled out the disk controller and backplane in the server running 
> slower?
> 
> On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue  > wrote:
> I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as 
> suggested.
> 
> Out of the 4 servers:
> 3 of them performed with 17% to 30% disk %busy, and 11% CPU wait. Momentarily 
> spiking up to 50% on one server, and 80% on another
> The 2nd newest server was almost averaging 90% disk %busy and 150% CPU wait. 
> And more than momentarily spiking to 101% disk busy and 250% CPU wait.
> For this 2nd newest server, this was the statistics for about 8 of 9 disks, 
> with the 9th disk not far behind the others.
> 
> I cannot believe all 9 disks are bad
> They are the same disks as the newest 1st server, Crucial_CT960M500SSD1, and 
> same exact server hardware too.
> They were purchased at the same time in the same purchase order and arrived 
> at the same time.
> So I cannot believe I just happened to put 9 bad disks in one server, and 9 
> good ones in the other.
> 
> I know I have Ceph configured exactly the same on all servers
> And I am sure I have the hardware settings configured exactly the same on the 
> 1st and 2nd servers.
> So if I were someone else, I would say it maybe is bad hardware on the 2nd 
> server.
> But the 2nd server is running very well without any hint of a problem.
> 
> Any other ideas or suggestions?
> 
> -RG
> 
> 
> On Wed, Oct 18, 2017 at 3:40 PM, Maged Mokhtar  > wrote:
> just run the same 32 threaded rados test as you did before and this time run 
> atop while the test is running looking for %busy of cpu/disks. It should give 
> an idea if there is a bottleneck in them. 
> 
> On 2017-10-18 21:35, Russell Glaue wrote:
> 
>> I cannot run the write test reviewed at the 
>> ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device blog. The tests 
>> write directly to the raw disk device.
>> Reading an infile (created with urandom) on one SSD, writing the outfile to 
>> another osd, yields about 17MB/s.
>> But Isn't this write speed limited by the speed in which in the dd infile 
>> can be read?
>> And I assume the best test should be run with no other load.
>>  
>> How does one run the rados bench "as stress"?
>>  
>> -RG
>>  
>> 
>> On Wed, Oct 18, 2017 at 1:33 PM, Maged Mokhtar > > wrote:
>> measuring resource load as outlined earlier will show if the drives are 
>> performing well or not. Also how many osds do you have  ?
>> 
>> On 2017-10-18 19:26, Russell Glaue wrote:
>> 
>> The SSD drives are Crucial M500
>> A Ceph user did some benchmarks and found it had good performance
>> https://forum.proxmox.com/threads/ceph-bad-performance-in-qemu-guests.21551/ 
>> 
>>  
>> However, a user comment from 3 years ago on the blog post you linked to says 
>> to avoid the Crucial M500
>>  
>> Yet, this performance posting tells that the Crucial M500 is good.
>> https://inside.servers.com/ssd-performance-2017-c4307a92dea 
>> 
>> 
>> On Wed, Oct 18, 2017 at 11:53 AM, Maged Mokhtar > > wrote:
>> Check out the following link: some SSDs perform bad in Ceph due to sync 
>> writes to journal
>> 
>> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>  
>> 
>> Anther thing that can help is to re-run the rados 32 threads as stress and 
>> view resource usage using atop (or collectl/sar) to check for %busy cpu and 
>> %busy disks to give you an idea of what is holding down your cluster..for 
>> example: if cpu/disk % are all low then check your network/switches.  If 
>> disk %busy is high (90%) for all disks then your disks are the bottleneck: 
>> which either means you have SSDs that are not suitable for Ceph or you have 
>> too few disks (which i doubt is the case). If only 1 disk %busy is high, 
>> there may be something wrong with this disk should be removed.
>> 
>> Maged
>> 
>> On 2017-10-18 18:13, Russell Glaue wrote:
>> 
>> In my previous post, in one of my points I was wondering if the request size 
>> wo

Re: [ceph-users] Not able to start OSD

2017-10-19 Thread Jean-Charles Lopez

Hi,

have you checked the output of "ceph-disk list” on the nodes where the OSDs are 
not coming back on? 

This should give you a hint on what’s going one.

Also use dmesg to search for any error message

And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced 
by the OSD itself when it starts.

Regards
JC

> On Oct 19, 2017, at 12:11, Josy  wrote:
> 
> Hi,
> 
> I am not able to start some of the OSDs in the cluster.
> 
> This is a test cluster and had 8 OSDs. One node was taken out for 
> maintenance. I set the noout flag and after the server came back up I unset 
> the noout flag.
> 
> Suddenly couple of OSDs went down.
> 
> And now I can start the OSDs manually from each node, but the status is still 
> "down"
> 
> $  ceph osd stat
> 8 osds: 2 up, 5 in
> 
> 
> $ ceph osd tree
> ID  CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
>  -1   7.97388 root default
>  -3   1.86469 host a1-osd
>   1   ssd 1.86469 osd.1   down0 1.0
>  -5   0.87320 host a2-osd
>   2   ssd 0.87320 osd.2   down0 1.0
>  -7   0.87320 host a3-osd
>   4   ssd 0.87320 osd.4   down  1.0 1.0
>  -9   0.87320 host a4-osd
>   8   ssd 0.87320 osd.8 up  1.0 1.0
> -11   0.87320 host a5-osd
>  12   ssd 0.87320 osd.12  down  1.0 1.0
> -13   0.87320 host a6-osd
>  17   ssd 0.87320 osd.17up  1.0 1.0
> -15   0.87320 host a7-osd
>  21   ssd 0.87320 osd.21  down  1.0 1.0
> -17   0.87000 host a8-osd
>  28   ssd 0.87000 osd.28  down0 1.0
> 
> Also can see this error in each OSD node.
> 
> # systemctl status ceph-osd@1
> ● ceph-osd@1.service - Ceph object storage daemon osd.1
>Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor 
> preset: disabled)
>Active: failed (Result: start-limit) since Thu 2017-10-19 11:35:18 PDT; 
> 19min ago
>   Process: 4163 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i 
> --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
>   Process: 4158 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 4163 (code=killed, signal=ABRT)
> 
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service entered 
> failed state.
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service holdoff time 
> over, scheduling restart.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: start request repeated too 
> quickly for ceph-osd@1.service
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Failed to start Ceph object 
> storage daemon osd.1.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Unit ceph-osd@1.service entered 
> failed state.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: ceph-osd@1.service failed.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez

Hi Josy,

this is correct.

Just make sure that your current user as well as the user for your VMs (if you 
are using a VM environment) are allowed to write to this directory.

Also make sure that /var/run/ceph exists.

Once you have fixed the permissions problem and made sure that the path where 
you want to create the socket file exists it will be OK.

Note that the socket file can be created anywhere. So you could actually 
position the parameter as admin_socket = “./my-client.asok” just for the 
purpose of a test

Regards
JC Lopez
Senior Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelo...@redhat.com <mailto:jelo...@redhat.com>
+1 408-680-6959

> On Oct 17, 2017, at 16:07, Josy  wrote:
> 
> I think it is permission error, because when running ceph -s it shows this 
> error at the top
> 
> -
> 
> $ ceph -s
> 2017-10-17 15:53:26.132180 7f7698834700 -1 asok(0x7f76940017a0) 
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
> bind the UNIX domain socket to 
> '/var/run/ceph/ceph-client.admin.29983.140147265902928.asok': (13) Permission 
> denied
>   cluster:
> id: de296604-d85c-46ab-a3af-add3367f0e6d
> health: HEALTH_OK
> 
> 
> Selinux is disabled in the server. Also I changed ownership of /var/run/ceph 
> to the ceph user.
> Still no luck. 'ls /var/run/ceph/ lists' no files in the client server
> 
> 
> 
> On 18-10-2017 04:07, Jason Dillaman wrote:
>> On Tue, Oct 17, 2017 at 6:30 PM, Josy  wrote:
>>> Hi,
>>> 
>>> I am running the command  from the admin server.
>>> 
>>> Because there are no asok file in the client server
>>> ls /var/run/ceph/ lists no files in the client server.
>> Most likely a permissions or SElinux/AppArmor issue where the librbd
>> client application cannot write to the directory.
>> 
>>>>> As Jason points it out you also need to make sure that your restart the
>>>>> client connection for the changes in the ceph.conf file to take effect.
>>> You mean restart the client server ?
>>> 
>>> (I am sorry, this is something new for me. I have just started learning
>>> ceph.)
>> Assuming this is for QEMU, QEMU is the librbd client so you would have
>> to stop/start the VM to pick up any configuration changes (or perform
>> a live migration to another server).
>> 
>>> On 18-10-2017 03:32, Jean-Charles Lopez wrote:
>>> 
>>> Hi Josy,
>>> 
>>> just a doubt but it looks like your ASOK file is the one from a Ceph
>>> Manager. So my suspicion is that you may be running the command from the
>>> wrong machine.
>>> 
>>> To run this command, you need to ssh into the machine where the client
>>> connection is being initiated.
>>> 
>>> But may be I am wrong regarding your exact connection point.
>>> 
>>> As Jason points it out you also need to make sure that your restart the
>>> client connection for the changes in the ceph.conf file to take effect.
>>> 
>>> Regards
>>> JC Lopez
>>> Senior Technical Instructor, Global Storage Consulting Practice
>>> Red Hat, Inc.
>>> jelo...@redhat.com
>>> +1 408-680-6959
>>> 
>>> On Oct 17, 2017, at 14:29, Josy  wrote:
>>> 
>>> Thanks for the reply.
>>> 
>>> I added rbd_non_blocking_aio = false in ceph.conf and pushed the admin file
>>> to all nodes.
>>> 
>>> -
>>> [client]
>>> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
>>> log file = /var/log/ceph/client.log
>>> debug rbd = 20
>>> debug librbd = 20
>>> rbd_non_blocking_aio = false
>>> --
>>> 
>>> 
>>> However the config show command still shows it as true.
>>> ---
>>> [cephuser@ceph-las-admin-a1 ceph-cluster]$ sudo ceph --admin-daemon
>>> /var/run/ceph/ceph-mgr.ceph-las-admin-a1.asok config show | grep
>>> "rbd_non_blocking_aio"
>>> "rbd_non_blocking_aio": "true",
>>> ---
>>> 
>>> Did I miss something ?
>>> 
>>> 
>>> On 18-10-2017 01:22, Jean-Charles Lopez wrote:
>>> 
>>> Hi
>>> 
>>> syntax uses the admin socket file : ceph --admin-daemon
>>> /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok config get rbd_cache
>>> 
>>> Should be /var/run/ceph/ceph.client.admin.$pid.$cctid.asok if your
>>> connection is using client.admin to connect to the cluster and your c

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez

Hi Josy,

just a doubt but it looks like your ASOK file is the one from a Ceph Manager. 
So my suspicion is that you may be running the command from the wrong machine.

To run this command, you need to ssh into the machine where the client 
connection is being initiated.

But may be I am wrong regarding your exact connection point.

As Jason points it out you also need to make sure that your restart the client 
connection for the changes in the ceph.conf file to take effect.

Regards
JC Lopez
Senior Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelo...@redhat.com <mailto:jelo...@redhat.com>
+1 408-680-6959

> On Oct 17, 2017, at 14:29, Josy  wrote:
> 
> Thanks for the reply.
> 
> I added rbd_non_blocking_aio = false in ceph.conf and pushed the admin file 
> to all nodes.
> 
> -
> [client]
> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
> log file = /var/log/ceph/client.log
> debug rbd = 20
> debug librbd = 20
> rbd_non_blocking_aio = false
> --
> 
> 
> However the config show command still shows it as true. 
> ---
> [cephuser@ceph-las-admin-a1 ceph-cluster]$ sudo ceph --admin-daemon 
> /var/run/ceph/ceph-mgr.ceph-las-admin-a1.asok config show | grep 
> "rbd_non_blocking_aio"
> "rbd_non_blocking_aio": "true",
> ---
> 
> Did I miss something ? 
> 
> On 18-10-2017 01:22, Jean-Charles Lopez wrote:
>> Hi 
>> 
>> syntax uses the admin socket file : ceph --admin-daemon 
>> /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok config get rbd_cache
>> 
>> Should be /var/run/ceph/ceph.client.admin.$pid.$cctid.asok if your 
>> connection is using client.admin to connect to the cluster and your cluster 
>> name is set to the default of ceph. But obviously can’t know from here the 
>> PID and the CCTID you will have to identify.
>> 
>> You can actually do a ls /var/run/ceph to find the correct admin socket file
>> 
>> Regards
>> JC Lopez
>> Senior Technical Instructor, Global Storage Consulting Practice
>> Red Hat, Inc.
>> jelo...@redhat.com <mailto:jelo...@redhat.com>
>> +1 408-680-6959
>> 
>>> On Oct 17, 2017, at 12:50, Josy >> <mailto:j...@colossuscloudtech.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> I am following this article :
>>> 
>>> http://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/ 
>>> <http://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/>
>>> I have enabled this flag in ceph.conf
>>> 
>>> [client]
>>> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
>>> log file = /var/log/ceph/
>>> 
>>> But the command to show the conf is not working : 
>>> [cephuser@ceph-las-admin-a1 ceph-cluster]$ sudo ceph --admin-daemon  
>>> /etc/ceph/ceph.client.admin.keyring config show
>>> admin_socket: exception getting command descriptions: [Errno 111] 
>>> Connection refused
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez

Hi 

syntax uses the admin socket file : ceph --admin-daemon 
/var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok config get rbd_cache

Should be /var/run/ceph/ceph.client.admin.$pid.$cctid.asok if your connection 
is using client.admin to connect to the cluster and your cluster name is set to 
the default of ceph. But obviously can’t know from here the PID and the CCTID 
you will have to identify.

You can actually do a ls /var/run/ceph to find the correct admin socket file

Regards
JC Lopez
Senior Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelo...@redhat.com 
+1 408-680-6959

> On Oct 17, 2017, at 12:50, Josy  wrote:
> 
> Hi,
> 
> 
> I am following this article :
> 
> http://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/ 
> 
> I have enabled this flag in ceph.conf
> 
> [client]
> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
> log file = /var/log/ceph/
> 
> But the command to show the conf is not working : 
> [cephuser@ceph-las-admin-a1 ceph-cluster]$ sudo ceph --admin-daemon  
> /etc/ceph/ceph.client.admin.keyring config show
> admin_socket: exception getting command descriptions: [Errno 111] Connection 
> refused
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Possible to change the location of run_dir?

2017-09-20 Thread Jean-Charles Lopez

Hi

use the run_dir parameter in your /etc/ceph/ceph.conf file. It defaults to 
/var/run/ceph

Or

use admin_socket = /full/path/to/admin/socket for each daemon or in the global 
section using environment variable such as $type, $id, $cluster and so on

Regards
JC

> On Sep 20, 2017, at 11:27, Bryan Banister  wrote:
> 
> We are running telegraf and would like to have the telegraf user read the 
> admin sockets from ceph, which is required for the ceph telegraf plugin to 
> apply the ceph  related tags to the data.  The ceph admin sockets are by 
> default stored in /var/run/ceph, but this is recreated at boot time, so we 
> can’t set permissions on these sockets which will persist.
>  
> We would like to change the run_dir for ceph to be a persistent directory.  
> Is there a way to do this?
>  
> Would be nice if there was a [global] config option or something we could put 
> in the /etc/sysconfig/ceph file.
>  
> Thanks,
> -Bryan
> 
> 
> Note: This email is for the confidential use of the named addressee(s) only 
> and may contain proprietary, confidential or privileged information. If you 
> are not the intended recipient, you are hereby notified that any review, 
> dissemination or copying of this email is strictly prohibited, and to please 
> notify the sender immediately and destroy this email and any attachments. 
> Email transmission cannot be guaranteed to be secure or error-free. The 
> Company, therefore, does not make any guarantees as to the completeness or 
> accuracy of this email or any attachments. This email is for informational 
> purposes only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform any type 
> of transaction of a financial product.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph fails to recover

2017-09-20 Thread Jean-Charles Lopez

Hi,

you can play with the following 2 parameters:
osd_recovery_max_active
osd_max_backfills

The higher the number the higher the number of PGs being processed at the same 
time.

Regards
Jean-Charles LOPEZ
jeanchlo...@mac.com



JC Lopez
Senior Technical Instructor, Global Storage Consulting Practice
Red Hat, Inc.
jelo...@redhat.com <mailto:jelo...@redhat.com>
+1 408-680-6959

> On Sep 20, 2017, at 08:26, Jonas Jaszkowic  
> wrote:
> 
> Thank you, that is very helpful. I didn’t know about the osd_max_backfills 
> option. Recovery is now working faster. 
> 
> What is the best way to make recovery as fast as possible assuming that I do 
> not care about read/write speed? (Besides
> setting osd_max_backfills as high as possible). Are there any important 
> options that I have to know?
> 
> What is the best practice to deal with the issue recovery speed vs. 
> read/write speed during a recovery situation? Do you
> have any suggestions/references/hints how to deal with such situations?
> 
> 
>> Am 20.09.2017 um 16:45 schrieb David Turner > <mailto:drakonst...@gmail.com>>:
>> 
>> To help things look a little better, I would also stop the daemon for osd.6 
>> and mark it down `ceph osd down 6`.  Note that if the OSD is still running 
>> it will likely mark itself back up and in on its own.  I don't think that 
>> the OSD still running and being up in the cluster is causing the issue, but 
>> it might.  After that, I would increase how many PGs can recover at the same 
>> time by increasing osd_max_backfills `ceph tell osd.* injectargs 
>> '--osd_max_backfills=5'`.  Note that for production you'll want to set this 
>> number to something that doesn't negatively impact your client IO, but high 
>> enough to help recover your cluster faster.  You can figure out that number 
>> by increasing it 1 at a time and watching the OSD performance with `iostat 
>> -x 1` or something to see how heavily used the OSDs are during your normal 
>> usage and again during recover while testing the settings.  For testing, you 
>> can set it as high as you'd like (probably no need to go above 20 as that 
>> will likely saturate your disks' performance) to get the PGs out of the wait 
>> status and into active recovery and backfilling.
>> 
>> On Wed, Sep 20, 2017 at 10:03 AM Jonas Jaszkowic 
>> mailto:jonasjaszkowic.w...@gmail.com>> wrote:
>> Output of ceph status:
>> 
>> cluster 18e87fd8-17c1-4045-a1a2-07aac106f200
>>  health HEALTH_WARN
>> 1 pgs backfill_wait
>> 56 pgs degraded
>> 1 pgs recovering
>> 55 pgs recovery_wait
>> 56 pgs stuck degraded
>> 57 pgs stuck unclean
>> recovery 50570/1369003 objects degraded (3.694%)
>> recovery 854/1369003 objects misplaced (0.062%)
>>  monmap e2: 1 mons at {ip-172-31-16-102=172.31.16.102:6789/0 
>> <http://172.31.16.102:6789/0>}
>> election epoch 4, quorum 0 ip-172-31-16-102
>> mgr active: ip-172-31-16-102
>>  osdmap e247: 32 osds: 32 up, 31 in; 1 remapped pgs
>> flags sortbitwise,require_jewel_osds,require_kraken_osds
>>   pgmap v10860: 256 pgs, 1 pools, 1975 GB data, 111 kobjects
>> 2923 GB used, 6836 GB / 9760 GB avail
>> 50570/1369003 objects degraded (3.694%)
>> 854/1369003 objects misplaced (0.062%)
>>  199 active+clean
>>   55 active+recovery_wait+degraded
>>1 active+remapped+backfill_wait
>>1 active+recovering+degraded
>>   client io 513 MB/s rd, 131 op/s rd, 0 op/s wr
>> 
>> Output of ceph osd tree:
>> 
>> ID  WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>  -1 9.83984 root default
>>  -2 0.30750 host ip-172-31-24-96
>>   0 0.30750 osd.0  up  1.0  1.0
>>  -3 0.30750 host ip-172-31-30-32
>>   1 0.30750 osd.1  up  1.0  1.0
>>  -4 0.30750 host ip-172-31-28-36
>>   2 0.30750 osd.2  up  1.0  1.0
>>  -5 0.30750 host ip-172-31-18-100
>>   3 0.30750 osd.3  up  1.0  1.0
>>  -6 0.30750 host ip-172-31-25-240
>>   4 0.30750 osd.4  up  1.0  1.0
>>  -7 0.30750 host ip-172-31-24-110
>>   5 0.30750 osd.5  up  1.0  1.0
>>  -8 0.30750 host ip-172-31-20-245
>>

Re: [ceph-users] Rgw install manual install luminous

2017-09-12 Thread Jean-Charles Lopez

Hi,

see comment in line

Regards
JC

> On Sep 12, 2017, at 13:31, Marc Roos  wrote:
> 
> 
> 
> I have been trying to setup the rados gateway (without deploy), but I am 
> missing some commands to enable the service I guess? How do I populate 
> the /var/lib/ceph/radosgw/ceph-gw1. I didn’t see any command like the 
> ceph-mon.
> 
> service ceph-radosgw@gw1 start
> Gives:
> 2017-09-12 22:26:06.390523 7fb9d7f27e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.390537 7fb9d7f27e00  0 deferred set uid:gid to 
> 167:167 (ceph:ceph)
> 2017-09-12 22:26:06.390592 7fb9d7f27e00  0 ceph version 12.2.0 
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28481
> 2017-09-12 22:26:06.412882 7fb9d7f27e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.415335 7fb9d7f27e00 -1 auth: error parsing file 
> /var/lib/ceph/radosgw/ceph-gw1/keyring
> 2017-09-12 22:26:06.415342 7fb9d7f27e00 -1 auth: failed to load 
> /var/lib/ceph/radosgw/ceph-gw1/keyring: (5) Input/output error
> 2017-09-12 22:26:06.415355 7fb9d7f27e00  0 librados: client.gw1 
> initialization error (5) Input/output error
> 2017-09-12 22:26:06.415981 7fb9d7f27e00 -1 Couldn't init storage 
> provider (RADOS)
> 2017-09-12 22:26:06.669892 7f1740d89e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.669919 7f1740d89e00  0 deferred set uid:gid to 
> 167:167 (ceph:ceph)
> 2017-09-12 22:26:06.669977 7f1740d89e00  0 ceph version 12.2.0 
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28497
> 2017-09-12 22:26:06.693019 7f1740d89e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.695963 7f1740d89e00 -1 auth: error parsing file 
> /var/lib/ceph/radosgw/ceph-gw1/keyring
> 2017-09-12 22:26:06.695971 7f1740d89e00 -1 auth: failed to load 
> /var/lib/ceph/radosgw/ceph-gw1/keyring: (5) Input/output error
Looks like you don’t have the keyring for the RGW user. The error message tells 
you about the location and the filename to use.
> 2017-09-12 22:26:06.695989 7f1740d89e00  0 librados: client.gw1 
> initialization error (5) Input/output error
> 2017-09-12 22:26:06.696850 7f1740d89e00 -1 Couldn't init storage 
> provider (RADOS
> 
> radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gw1 -f --log-to-stderr 
> --debug-rgw=1 --debug-ms=1
> Gives:
> 2017-09-12 22:20:55.845184 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.845457 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.845508 7f9004b54e00  0 ceph version 12.2.0 
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28122
> 2017-09-12 22:20:55.867423 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.869509 7f9004b54e00  1  Processor -- start
> 2017-09-12 22:20:55.869573 7f9004b54e00  1 -- - start start
> 2017-09-12 22:20:55.870324 7f9004b54e00  1 -- - --> 
> 192.168.10.111:6789/0 -- auth(proto 0 36 bytes epoch 0) v1 -- 
> 0x7f9006e6ec80 con 0
> 2017-09-12 22:20:55.870350 7f9004b54e00  1 -- - --> 
> 192.168.10.112:6789/0 -- auth(proto 0 36 bytes epoch 0) v1 -- 
> 0x7f9006e6ef00 con 0
> 2017-09-12 22:20:55.870824 7f8ff1fc4700  1 -- 
> 192.168.10.114:0/4093088986 learned_addr learned my addr 
> 192.168.10.114:0/4093088986
> 2017-09-12 22:20:55.871413 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 <== mon.0 192.168.10.111:6789/0 1  
> mon_map magic: 0 v1  361+0+0 (1785674138 0 0) 0x7f9006e8afc0 con 
> 0x7f90070d8800
> 2017-09-12 22:20:55.871567 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 <== mon.0 192.168.10.111:6789/0 2  
> auth_reply(proto 2 0 (0) Success) v1  33+0+0 (4108244008 0 0) 
> 0x7f9006e6ec80 con 0x7f90070d8800
> 2017-09-12 22:20:55.871662 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 --> 192.168.10.111:6789/0 -- auth(proto 2 2 
> bytes epoch 0) v1 -- 0x7f9006e6f900 con 0
> 2017-09-12 22:20:55.871688 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 <== mon.1 192.168.10.112:6789/0 1  
> mon_map magic: 0 v1  361+0+0 (1785674138 0 0) 0x7f9006e8b200 con 
> 0x7f90070d7000
> 2017-09-12 22:20:55.871734 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 <== mon.1 192.168.10.112:6789/0 2  
> auth_reply(proto 2 0 (0) Success) v1  33+0+0 (3872865519 0 0) 
> 0x7f9006e6ef00 con 0x7f90070d7000
> 2017-09-12 22:20:55.871759 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 --> 192.168.10.112:6789/0 -- auth(proto 2 2 
> bytes epoch 0) v1 -- 0x7f9006e6ec80 con 0
> 2017-09-12 22:20:55.872083 7f8ff07c1700  1 -- 
> 192.168.10.114:0/4093088986 <== mon.0 192.168.10.111:6789/0 3  
> auth_reply(proto 2 -22 (22) Invalid argument) v1 ===

Re: [ceph-users] too few PGs per OSD (16 < min 30) but I set pool_default_pg_num: 300 in Ansible

2017-06-14 Thread Jean-Charles LOPEZ

Hi,

see comments below.

JC
> On Jun 14, 2017, at 07:23, Stéphane Klein  wrote:
> 
> Hi,
> 
> I have this parameter in my Ansible configuration:
> 
> pool_default_pg_num: 300 # (100 * 6) / 2 = 300
> 
> But I have this error:
> 
> # ceph status
> cluster 800221d2-4b8c-11e7-9bb9-cffc42889917
>  health HEALTH_ERR
> 73 pgs are stuck inactive for more than 300 seconds
> 22 pgs degraded
> 9 pgs peering
> 64 pgs stale
> 22 pgs stuck degraded
> 9 pgs stuck inactive
> 64 pgs stuck stale
> 31 pgs stuck unclean
> 22 pgs stuck undersized
> 22 pgs undersized
> too few PGs per OSD (16 < min 30)
>  monmap e1: 2 mons at 
> {ceph-storage-rbx-1=172.29.20.30:6789/0,ceph-storage-rbx-2=172.29.20.31:6789/0
>  }
> election epoch 4, quorum 0,1 ceph-storage-rbx-1,ceph-storage-rbx-2
>  osdmap e41: 12 osds: 6 up, 6 in; 8 remapped pgs
> flags sortbitwise,require_jewel_osds
>   pgmap v79: 64 pgs, 1 pools, 0 bytes data, 0 objects
As this line shows you only have 64 pgs in your cluster so far hence the 
warning. This parameter must be positioned before you deploy your cluster or 
before you create your first pool.
> 30919 MB used, 22194 GB / 5 GB avail
>   33 stale+active+clean
>   22 stale+active+undersized+degraded
>9 stale+peering
> 
> I have 2 hosts with 3 partitions, then 3 x 2 OSD ?
> 
> Why 16 < min 30 ? I set 300 pg_num
> 
> Best regards,
> Stéphane
> -- 
> Stéphane Klein  >
> blog: http://stephane-klein.info 
> cv : http://cv.stephane-klein.info 
> Twitter: http://twitter.com/klein_stephane 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-08-29 Thread Jean-Charles Lopez

How Mehmet

OK so it does come from a rados put. 

As you were able to check the VM device objet size is 4 MB. 

So we'll see after you have removed the object with rados -p rbd rm. 

I'll wait for an update. 

JC

While moving. Excuse unintended typos.

> On Aug 29, 2016, at 14:34, Mehmet  wrote:
> 
> Hey JC,
> 
> after setting up the ceph-cluster i tried to migrate an image from one of our 
> production vm into ceph via
> 
> # rados -p rbd put ...
> 
> but i have got always "file too large". I guess this file
> 
> # -rw-r--r-- 1 ceph ceph 100G Jul 31 01:04 vm-101-disk-2__head_383C3223__0
> 
> is the result of this :) - did not thought that there will be something stay 
> in ceph after the mentioned error above.
> Seems i was wrong...
> 
> This could match the time where the issue happened first time...:
> 
> 1. i tried to put via "rados -p rbd put..." this did not worked (tried to put 
> a ~400G file...)
> 2. after ~ 1 week i see the blocked requests after first running "deep-scrub" 
> (default where ceph starts deep-scrubbing)
> 
> I guess the deleting of this file should solve the issue.
> Did you see my mail where i wrote the test results of this?
> 
> # osd_scrub_chunk_max = 5
> # osd_deep_scrub_stride = 1048576
> 
> Only corner note.
> 
>> This seems more to me like a pure radios object of 100GB that was
>> uploaded to the cluster. From the name it could be a VM disk image
>> that was uploaded as an object. If it was an RBD object, it’s size
>> would be in the boundaries of an RBD objects (order 12=4K order
>> 25=32MB).
> 
>> Verify that when you do a "rados -p rbd ls | grep vm-101-disk-2”
>> command, you can see an object named vm-101-disk-2.
> 
> root@:~# rados -p rbd ls | grep vm-101-disk-2
> rbd_id.vm-101-disk-2
> vm-101-disk-2
> 
>> Verify if you have an RBD named this way “rbd -p rbd ls | grep vm-101-disk-2"
> 
> root@:~# rbd -p rbd ls | grep vm-101-disk-2
> vm-101-disk-2
> 
>> As I’m not familiar with proxmox so I’d suggest the following:
>> If yes to 1, for security, copy this file somewhere else and then to a
>> rados -p rbd rm vm-101-disk-2.
> 
> root@:~# rbd -p rbd info vm-101-disk-2
> rbd image 'vm-101-disk-2':
>size 400 GB in 102400 objects
>order 22 (4096 kB objects)
>block_name_prefix: rbd_data.5e7d1238e1f29
>format: 2
>features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>flags:
> 
> The VM with the id "101" is up and running. This is using "vm-101-disk-2" as 
> disk - i have moved the disk sucessfully in another way :) (same name :/) 
> after "rados put" did not worked. And as we can see here the objects for this 
> image also exists within ceph
> 
> root@:~# rados -p rbd ls | grep "rbd_data.5e7d1238e1f29" | wc -l
> 53011
> 
> I assumed here to get 102400 objects but as ceph is doing thin provisining 
> this should be ok.
> 
>> If no to 1, for security, copy this file somewhere else and then to a
>> rm -rf vm-101-disk-2__head_383C3223__0
> 
> I should be able to delete the mentioned "100G file".
> 
>> Make sure all your PG copies show the same content and wait for the
>> next scrub to see what is happening.
> 
> Will make a backup of this file and in addition from the vm within proxmox 
> tomorrow on all involved osds and then start a deep-scrub and of course keep 
> you informed.
> 
>> If anything goes wrong you will be able to upload an object with the
>> exact same content from the file you copied.
>> Is proxmox using such huge objects for something to your knowledge (VM
>> boot image or something else)? Can you search the proxmox mailing list
>> and open tickets to verify.
> 
> As i already wrote in this eMail i guess that i am the cause for this :*( 
> with the wrong usage of "rados put".
> Proxmox is using librbd to talk with ceph so it should not be able to create 
> such a large one file.
> 
>> And is this the cause of the long deep scrub? I do think so but I’m
>> not in front of the cluster.
> 
> Let it see :) - i hope that my next eMail will close this issue.
> 
> Thank you very much for your help!
> 
> Best regards,
> - Mehmet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem with radosgw

2016-02-16 Thread Jean-Charles LOPEZ

Hi,

first checks you can do:
- Check the RADOSGW process is running
- Check the output of ceph auth list for typos in permissions for the RADOSGW 
user
- Check you have the keyring file for the user you created on the RADOSGW node
- Check the output of ceph df to verify the RADOSGW was able to create its pools
- Check the execute permission on the FCGI script file
- Check the content of your ceph.conf file on the RADOSGW node and check for 
typos.

Feel free to post the result of those checks (ceph.conf file, ls -l, ceph df 
output, ps -ef | grep radosgw output) remove any key

JC

> On Feb 16, 2016, at 08:08, Alexandr Porunov  
> wrote:
> 
> I have problem with radosgw. I have pass this tutorial but without success: 
> http://docs.ceph.com/docs/hammer/radosgw/config/ 
> 
> 
> When I try:
> curl http://porunov.com 
> 
> I always get the same page:
> ...
> 500 Internal Server Error
> ...
> 
> /var/log/httpd/error.log shows:
> ...
> [Tue Feb 16 17:32:37.413558 2016] [:error] [pid 6377] (13)Permission denied: 
> [client 192.168.56.80:41121 ] FastCGI: failed to 
> connect to server "/var/www/html/s3gw.fcgi": connect() failed
> [Tue Feb 16 17:32:37.413596 2016] [:error] [pid 6377] [client 
> 192.168.56.80:41121 ] FastCGI: incomplete 
> headers (0 bytes) recived from server "/var/www/html/s3gw.fcgi"
> 
> /var/log/httpd/access.log shows:
> ...
> 192.168.56.80 - - [16/Feb/2016:17:32:37 + 0200] "GET / HTTP/1.1" 500 530 "-" 
> "curl/7.29.0"
> 
> I have 6 nodes:
> node1 (ip: 192.168.56.101) - mon, osd
> node2 (ip: 192.168.56.102) - mon, osd
> node3 (ip: 192.168.56.103) - mon, osd
> admin-node (ip: 192.168.56.100)
> ns1 (ip: 192.168.56.50) - dns server (bind 9)
> ceph-rgw (ip: 192.168.56.80) - Ceph Gateway Node
> 
> Dns server have this zone file:
> $TTL 86400
> @IN SOA porunov.com . admin.porunov.com 
> . (
> 2016021000
> 43200
> 3600
> 360
> 2592000 )
> ;
> @IN NS ns1.porunov.com .
> @IN A 192.168.56.80
> *  IN CNAME @
> 
> /var/www/html/s3gw.fcgi contains:
> #!/bin/sh
> exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
> 
> /etc/httpd/conf.d/rgw.conf contains:
> FastCgiExternalServer /var/www/html/s3gw.fcgi -socket 
> /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> 
>   ServerName porunov.com 
>   ServerAlias *.porunov.com 
>   ServerAdmin ad...@porunov.com 
>   DocumentRoot /var/www/html
>   RewriteEngine On
>   RewriteRule ^/(.*) /s3gw.fcgi?%{QUERY_STRING} 
> [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
>   
> 
>   Options +ExecCGI
>   AllowOverride All
>   SetHandler fastcgi-script
>   Order allow,deny
>   Allow from all
>   AuthBasicAuthoritative Off
> 
>   
>   AllowEncodedSlashes On
>   ErrorLog /var/log/httpd/error.log
>   CustomLog /var/log/httpd/access.log combined
>   ServerSignature Off
> 
> 
> I use CentOS 7 on all nodes. Also I can not start radosgw with this command:
> systemctl start ceph-radosgw
> because it shows:
> Failed to start ceph-radosgw.service: Unit ceph-radosgw.service failed to 
> load: No such file or directory.
> 
> But this command seems to work:
> systemctl start ceph-radosgw@radosgw.gateway.service
> 
> httpd and ceph-radosgw@radosgw.gateway service is: active (running)
> 
> Please help me to figure out how to repair it.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] deep scrubbing causes osd down

2015-04-12 Thread Jean-Charles Lopez

Hi andrei

There is one parameter, osd_max_scrub I think, that controls the number of 
scrubs per OSD. But the default is 1 if I'm correct. 

Can you check on one of your OSDs with the admin socket?

Then it remains the option of scheduling the deep scrubs via a cron job after 
setting nodeep-scrub to prevent automatic deep scrubbing. 

Dan Van Der Ster had a post on this ML regarding this.
JC

While moving. Excuse unintended typos.

> On Apr 12, 2015, at 05:21, Andrei Mikhailovsky  wrote:
> 
> 
> JC, 
> 
> the restart of the osd servers seems to have stabilised the cluster. It has 
> been a few hours since the restart and I haven't not seen a single osd 
> disconnect.
> 
> Is there a way to limit the total number of scrub and/or deep-scrub processes 
> running at the same time? For instance, I do not want to have more than 1 or 
> 2 scrub/deep-scrubs running at the same time on my cluster. How do I 
> implement this?
> 
> Thanks
> 
> Andrei
> 
> From: "Andrei Mikhailovsky" 
> To: "LOPEZ Jean-Charles" 
> Cc: ceph-users@lists.ceph.com
> Sent: Sunday, 12 April, 2015 9:02:05 AM
> Subject: Re: [ceph-users] deep scrubbing causes osd down
> 
> JC,
> 
> I've implemented the following changes to the ceph.conf and restarted mons 
> and osds.
> 
> osd_scrub_chunk_min = 1
> osd_scrub_chunk_max =5
> 
> 
> Things have become considerably worse after the changes. Shortly after doing 
> that, majority of osd processes started taking up over 100% cpu and the 
> cluster has considerably slowed down. All my vms are reporting high IO wait 
> (between 30-80%), even vms which are pretty idle and don't do much.
> 
> i have tried restarting all osds, but shortly after the restart the cpu usage 
> goes up. The osds are showing the following logs:
> 
> 2015-04-12 08:39:28.853860 7f96f81dd700  0 log_channel(default) log [WRN] : 
> slow request 60.277590 seconds old, received at 2015-04-12 08:38:28.576168: 
> osd_op(client.69637439.0:290325926 rbd_data.265f967a5f7514.4a00 
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1249280~4096] 
> 5.cb2620e0 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74834) 
> currently waiting for missing object
> 2015-04-12 08:39:28.853863 7f96f81dd700  0 log_channel(default) log [WRN] : 
> slow request 60.246943 seconds old, received at 2015-04-12 08:38:28.606815: 
> osd_op(client.69637439.0:290325927 rbd_data.265f967a5f7514.4a00 
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1310720~4096] 
> 5.cb2620e0 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74834) 
> currently waiting for missing object
> 2015-04-12 08:39:36.855180 7f96f81dd700  0 log_channel(default) log [WRN] : 7 
> slow requests, 1 included below; oldest blocked for > 68.278951 secs
> 2015-04-12 08:39:36.855191 7f96f81dd700  0 log_channel(default) log [WRN] : 
> slow request 30.268450 seconds old, received at 2015-04-12 08:39:06.586669: 
> osd_op(client.64965167.0:1607510 rbd_data.1f264b2ae8944a.0228 
> [set-alloc-hint object_size 4194304 write_size 4194304,write 3584000~69632] 
> 5.30418007 ack+ondisk+write+known_if_redirected e74834) currently waiting for 
> subops from 9
> 2015-04-12 08:40:43.570004 7f96dd693700  0  cls/rgw/cls_rgw.cc:1458: 
> gc_iterate_entries end_key=1_01428824443.569998000
> 
> [In total i've got around 40,000 slow request entries accumulated overnight 
> ((( ]
> 
> On top of that, I have reports of osds going down and back up as frequently 
> as every 10-20 minutes. This effects all osds and not a particular set of 
> osds.
> 
> I will restart the osd servers to see if it makes a difference, otherwise, I 
> will need to revert back to the default settings as the cluster as it 
> currently is is not functional.
> 
> Andrei
> 
> From: "LOPEZ Jean-Charles" 
> To: "Andrei Mikhailovsky" 
> Cc: "LOPEZ Jean-Charles" , ceph-users@lists.ceph.com
> Sent: Saturday, 11 April, 2015 7:54:18 PM
> Subject: Re: [ceph-users] deep scrubbing causes osd down
> 
> Hi Andrei,
> 
> 1) what ceph version are you running?
> 2) what distro and version are you running?
> 3) have you checked the disk elevator for the OSD devices to be set to cfq?
> 4) Have have you considered exploring the following  parameters to further 
> tune
> - osd_scrub_chunk_min lower the default value of 5. e.g. = 1
> - osd_scrub_chunk_max lower the default value of 25. e.g. = 5
> - osd_deep_scrub_stride If you have lowered parameters above, you can play 
> with this one to fit best your physical disk behaviour.
> - osd_scrub_sleep introduce a half second sleep between 2 scrubs; e.g. = 0.5 
> to start with a half second delay
> 
> 
> Cheers
> JC
> 
> 
> On 10 Apr 2015, at 12:01, Andrei Mikhailovsky  wrote:
> 
> Hi guys,
> 
> I was wondering if anyone noticed that the deep scrubbing process causes some 
> osd to go down?
> 
> I have been keeping an eye on a few remaining stability issues in my test 
> cluster. One of the unsolved issues is the occasional reporting of osd(s) 
> going down and com

Re: [ceph-users] Directly connect client to OSD using HTTP

2015-03-28 Thread Jean-Charles Lopez

Hi

There is no limit on the number of gateways you can deploy. So just put a load 
balancer in front of many of them to avoid SPOF and bottleneck. 

Rgds

JC

While moving. Excuse unintended typos.

> On Mar 28, 2015, at 08:22, c...@jack.fr.eu.org wrote:
> 
> Hi,
> 
> I am designing an infrastructure using Ceph.
> The client will fetch data though HTTP.
> 
> I saw the radosgw, that is made for that, it has, however, some weakness
> for me : as far as I understood, when a client want to fetch a file, it
> connects to the radosgw, which will connect to the right OSD and pipe
> data to the client.
> 
> Is there any way to remove such bottleneck (proxyfing all data ?)
> 
> A solution would be to create a radosgw on each OSD-servers, I still
> need a way to redirect customers to the right way (to the radosgw that
> lives on the correct OSD).
> I dig the docs, and even with librados, I could not find this information.
> 
> To conclude with a global overview, my needs:
> - few data, many servers, many bandwidth
> - each servers are limited by network, not by disk IO
> - client's URI are forged (they first download some sort of index, then
> the files): a programmable solution can be integrated there
> 
> Thanks for reading
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs: from a file name determine the objects name

2015-02-01 Thread Jean-Charles Lopez

Hi Verma

All lauout questions are detailed here for CephFS.

http://docs.ceph.com/docs/master/cephfs/file-layouts/

Hope this is what you are looking for

Cheers

JC

While moving. Excuse unintended typos.

> On Feb 1, 2015, at 08:08, Mudit Verma  wrote:
> 
> Hi All, 
> 
> CEPHFS - Given a file name, how can one determine the exact location and the 
> name of the objects on OSDs. 
> 
> So far I could understand that the objects data is stored in .../current dir 
> in OSDs, but what naming convention do they use? 
> 
> Many thanks in advance 
> 
> Thanks
> Mudit
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Question about primary OSD of a pool

2015-02-01 Thread Jean-Charles Lopez

Hi

You can verify the exact mapping using the following command: ceph osd map 
{poolname} {objectname}

Check page http://docs.ceph.com/docs/master/man/8/ceph for the ceph command.

Cheers

JC

While moving. Excuse unintended typos.

> On Feb 1, 2015, at 08:04, Loic Dachary  wrote:
> 
> 
> 
>> On 01/02/2015 14:47, Dennis Chen wrote:
>> Hello,
>> 
>> If I write 2 different objects, eg, "john" and "paul" respectively to
>> a same pool like "testpool" in the cluster, is the primary OSD
>> calculated by CRUSH for the 2 objects the same?
> 
> Hi,
> 
> CRUSH is likely to place john on an OSD and paul on another OSD.
> 
> Cheers
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having an issue with: 7 pgs stuck inactive; 7 pgs stuck unclean; 71 requests are blocked > 32

2015-01-23 Thread Jean-Charles Lopez

Hi Glen

Run a ceph pg {id} query on one of your stuck PGs to find out what the PG
is waiting for to be completed.

Rgds
JC


On Friday, January 23, 2015, Glen Aidukas 
wrote:

>  Hello fellow ceph users,
>
>
>
> I ran into a major issue were two KVM hosts will not start due to issues
> with my Ceph cluster.
>
>
>
> Here are some details:
>
>
>
> Running ceph version 0.87.  There are 10 hosts with 6 drives each for 60
> OSDs.
>
>
>
> # ceph -s
>
> cluster 1431e336-faa2-4b13-b50d-c1d375b4e64b
>
>  health HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs
> stuck unclean; 71 requests are blocked > 32 sec; pool rbd-b has too few pgs
>
>  monmap e1: 3 mons at {xx},
> election epoch 92, quorum 0,1,2 ceph-b01,ceph-b02,ceph-b03
>
>  mdsmap e49: 1/1/1 up {0=pmceph-b06=up:active}, 1 up:standby
>
>  osdmap e10023: 60 osds: 60 up, 60 in
>
>   pgmap v19851672: 45056 pgs, 22 pools, 13318 GB data, 3922 kobjects
>
> 39863 GB used, 178 TB / 217 TB avail
>
>45049 active+clean
>
>7 incomplete
>
>   client io 954 kB/s rd, 386 kB/s wr, 78 op/s
>
>
>
> # ceph health detail
>
> HEALTH_WARN 7 pgs incomplete; 7 pgs stuck inactive; 7 pgs stuck unclean;
> 69 requests are blocked > 32 sec; 5 osds have slow requests; pool rbd-b has
> too few pgs
>
> pg 3.38b is stuck inactive since forever, current state incomplete, last
> acting [48,35,2]
>
> pg 1.541 is stuck inactive since forever, current state incomplete, last
> acting [48,20,2]
>
> pg 3.57d is stuck inactive for 15676.967208, current state incomplete,
> last acting [55,48,2]
>
> pg 3.5c9 is stuck inactive since forever, current state incomplete, last
> acting [48,2,15]
>
> pg 3.540 is stuck inactive for 15676.959093, current state incomplete,
> last acting [57,48,2]
>
> pg 3.5a5 is stuck inactive since forever, current state incomplete, last
> acting [2,48,57]
>
> pg 3.305 is stuck inactive for 15676.855987, current state incomplete,
> last acting [39,2,48]
>
> pg 3.38b is stuck unclean since forever, current state incomplete, last
> acting [48,35,2]
>
> pg 1.541 is stuck unclean since forever, current state incomplete, last
> acting [48,20,2]
>
> pg 3.57d is stuck unclean for 15676.971318, current state incomplete, last
> acting [55,48,2]
>
> pg 3.5c9 is stuck unclean since forever, current state incomplete, last
> acting [48,2,15]
>
> pg 3.540 is stuck unclean for 15676.963204, current state incomplete, last
> acting [57,48,2]
>
> pg 3.5a5 is stuck unclean since forever, current state incomplete, last
> acting [2,48,57]
>
> pg 3.305 is stuck unclean for 15676.860098, current state incomplete, last
> acting [39,2,48]
>
> pg 3.5c9 is incomplete, acting [48,2,15] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.5a5 is incomplete, acting [2,48,57] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.57d is incomplete, acting [55,48,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.540 is incomplete, acting [57,48,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 1.541 is incomplete, acting [48,20,2] (reducing pool metadata min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.38b is incomplete, acting [48,35,2] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> pg 3.305 is incomplete, acting [39,2,48] (reducing pool rbd-b min_size
> from 2 may help; search ceph.com/docs for 'incomplete')
>
> 20 ops are blocked > 2097.15 sec
>
> 49 ops are blocked > 1048.58 sec
>
> 13 ops are blocked > 2097.15 sec on osd.2
>
> 7 ops are blocked > 2097.15 sec on osd.39
>
> 3 ops are blocked > 1048.58 sec on osd.39
>
> 41 ops are blocked > 1048.58 sec on osd.48
>
> 4 ops are blocked > 1048.58 sec on osd.55
>
> 1 ops are blocked > 1048.58 sec on osd.57
>
> 5 osds have slow requests
>
> pool rbd-b objects per pg (1084) is more than 12.1798 times cluster
> average (89)
>
>
>
> I ran the following but did not help:
>
>
>
> # ceph health detail | grep ^pg | cut -c4-9 | while read i; do ceph pg
> repair ${i} ; done
>
> instructing pg 3.38b on osd.48 to repair
>
> instructing pg 1.541 on osd.48 to repair
>
> instructing pg 3.57d on osd.55 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.540 on osd.57 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.305 on osd.39 to repair
>
> instructing pg 3.38b on osd.48 to repair
>
> instructing pg 1.541 on osd.48 to repair
>
> instructing pg 3.57d on osd.55 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.540 on osd.57 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.305 on osd.39 to repair
>
> instructing pg 3.5c9 on osd.48 to repair
>
> instructing pg 3.5a5 on osd.2 to repair
>
> instructing pg 3.5

Re: [ceph-users] export from Amazon S3 -> Ceph

2014-11-27 Thread Jean-Charles LOPEZ

Hi Geoff,

The obsync tool that ships with Ceph or Ceph-extras that does a per bucket copy 
of the data from S3 to the RGW using S3. 

Have a look at http://ceph.com/docs/argonaut/man/1/obsync/ 


Cheers
JC



> On Nov 27, 2014, at 01:40, Geoff Galitz  wrote:
> 
> 
> Any advice on the best way to export from Amazon S3 to a local ceph storage 
> system?The data in question we'd still access via the S3 interface access 
> to ceph.
> 
> -G
> 
> 
> 
> 
> -- 
> Geoff Galitz, ggal...@shutterstock.com 
> Shutterstock GmbH
> Infastructure Engineering
> Schönhauser Allee 36, 10435 Berlin, Germany
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Question about ceph-deploy

2014-11-27 Thread Jean-Charles LOPEZ

Hi Louis,

the page you mentioned originally is intended as a quick starter guide for 
deploying the latest Ceph LTS release and that’s its sole purpose. For specific 
and advanced ceph-deploy features and usage, there is a dedicated ceph-deploy 
site right here: http://ceph.com/ceph-deploy/docs 
<http://ceph.com/ceph-deploy/docs>

So I guess no need for a documentation update ;-)

Cheers
JC

> On Nov 26, 2014, at 23:26, mail list  wrote:
> 
> Thanks JC， It works, and i think ceph should modify the manual.
> 
> On Nov 27, 2014, at 13:59, Jean-Charles LOPEZ  <mailto:jc.lo...@inktank.com>> wrote:
> 
>> Hi Louis,
>> 
>> ceph-deploy install —release=giant admin-node
>> 
>> Cheers
>> JC
>> 
>> 
>> 
>>> On Nov 26, 2014, at 20:38, mail list >> <mailto:louis.hust...@gmail.com>> wrote:
>>> 
>>> ceph-deploy install admin-node
>> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Question about ceph-deploy

2014-11-26 Thread Jean-Charles LOPEZ

Hi Louis,

ceph-deploy install —release=giant admin-node

Cheers
JC



> On Nov 26, 2014, at 20:38, mail list  wrote:
> 
> ceph-deploy install admin-node

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] evaluating Ceph

2014-11-25 Thread Jean-Charles LOPEZ

Hi,

Use ceph mds newfs {metaid} {dataid} instead

JC



> On Nov 25, 2014, at 12:27, Jeripotula, Shashiraj 
>  wrote:
> 
> Hi All,
>  
> I am evaluating Ceph for one of our product requirements.
>  
> I have gone through the website, http://ceph.com/docs/master/start/ 
> 
>  
> I am using Ubuntu 14.04 LTS and am done with most of the steps. 
>  
> Finally, I am struck on Creating File System. From the website, 
> The ceph fs new command was introduced in Ceph 0.84. Prior to this release, 
> no manual steps are required to create a filesystem, and pools named data and 
> metadata exist by default.
> 
> Even though, I did update from ceph-deploy and directly from ceph, 
> 
> ceph -v
> 
> ceph version 0.80.7(6c0127fcb58008793d3c8b62d925bc91963672a3)
> 
> I am not able to upgrade/update to .84. Everytime, it says, you are already 
> at the latest version, so I am not able to deploy File System. 
> 
> Any help, is highly appreciated.
> 
> Thanks
> 
> Raj
> 
> ---
> 
> Shashiraj Jeripotula(Raj)
> DMTS
> Systems Engineering, Internet Software and Technology Group | Verizon 
> Corporate Technology
>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Jean-Charles LOPEZ

Hi,

Use ceph mds newfs {metaid} {dataid} instead

JC



> On Nov 20, 2014, at 20:43, Huynh Dac Nguyen  wrote:
> 
> Hi All,
> 
> When i create a new cephfs, this error shows
> 
> Is it a bug?
> 
> [root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata cephfs_data
> no valid command found; 10 closest matches:
> fsid
> Error EINVAL: invalid command
> 
> 
> Regards,
> Ndhuynh
> 
> This e-mail message including any attachments is for the sole use of the 
> intended(s) and may contain privileged or confidential information. Any 
> unauthorized review, use, disclosure or distribution is prohibited. If you 
> are not intended recipient, please immediately contact the sender by reply 
> e-mail and delete the original message and destroy all copies thereof.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests/blocked

2014-11-20 Thread Jean-Charles LOPEZ

Hi Jeff,

it would probably wise to first check what these slow requests are:
1) ceph health detail -> This will tell you which OSDs are experiencing the 
slow requests
2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the above 
OSDs will tell you what theses ops are waiting for.

My fair guess is that either you have a network problem or some other drives in 
your cluster are about to die or are experiencing write errors causing retries 
and slowing the request processing.

Just to be sure, if your drives are SMART capable, use smartctl to look ate the 
stats for the drives you will have potentially identified in the steps above.

Regards
JC

> On Nov 20, 2014, at 06:00, Jeff  wrote:
> 
> Hi,
> 
>   We have a five node cluster that has been running for a long
> time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
> things continued to work well.  
> 
>   Last week a drive failed on one of the nodes.  We replaced the
> drive and things were working well again.
> 
>   After about six days we started getting lots of "slow
> requests...blocked for..." messages (100's/hour) and performance has been
> terrible.  Since then we've made sure to have all of the latest OS patches
> and rebooted all five nodes.  We are still seeing a lot of slow
> requests/blocked messages.  Any idea(s) on what's wrong/where to look?
> 
> Thanks!
>   Jeff
> -- 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] bucket cleanup speed

2014-11-15 Thread Jean-Charles LOPEZ

Hi,

this is an old thing I remember of and it may not be exactly related but just 
in case, let’s try it out.

Just verify the value of the following parameter rgw_gc_max_objs in the RGW 
configuration via the admin socket (cep daemon {row.id} config get 
rgw_gc_max_objs
.
If the value is 32, update the ceph.conf file for your RGW and set it to 37 or 
any higher prime number such as 997 and restart your RGW. Make sure you do the 
restart before your next big batch of object removal. It will take some time to 
clear the overloaded buckets, may be, but it should at least distribute the new 
ones evenly onto all gc buckets after this and make the removal of the newly 
deleted objects way quicker.

Keep us posted to tell us if it has improved anything.

JC

> On Nov 14, 2014, at 01:20, Daniel Hoffman  wrote:
> 
> Hi All.
> 
> Running a Ceph Cluster (firefly) ceph version 0.80.5
> 
> We use ceph mainly for backups via the radosGW at the moment.
> 
> There had to be an account deleted/bucket removed which had a very large 
> number of objects and was about 60TB in space.
> 
> We have been monitoring it for days now, and the data is purging but very 
> very slowly. We are actually putting new backups in much faster than the old 
> data is being removed.
> 
> As we are doing a lot of work with backups and aging out data we need to find 
> a way to improve the cleanup process.
> 
> Is there a way to improve the purge/clean performance?
> Is the clean/purge performance impacted by disk thread ioprio class setting?
> 
> Any advice or tunables to improve the removal of data is appreciated.
> 
> Thanks
> 
> Daniel
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cache Tier Statistics

2014-11-08 Thread Jean-Charles Lopez

Hi Nick

If my brain doesn't fail me you can try
ceph daemon osd.{id} perf dump
ceph report (not 100% sure if cache stats are in

Rgds
JC


On Saturday, November 8, 2014, Nick Fisk  wrote:

> Hi,
>
>
>
> Does anyone know if there any statistics available specific to the cache
> tier functionality, I’m thinking along the lines of cache hit ratios? Or
> should I be pulling out the Read statistics for backing+cache pools and
> assuming that if a read happens from the backing pool it was a miss and
> then calculating it from that?
>
>
>
> Thanks,
>
> Nick
>
>

-- 
Sent while moving
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread Jean-Charles LOPEZ

Hi,

with ceps-deploy do the following
1) Install ceph-deploy
2) mkdir ~/ceph-deploy
3) cd ~/ceph-deploy
4) ceph-deploy --overwrite-conf config pull {monitorhostname}
5) If version is Giant
a) ceph osd pool create cephfsdata 
b) ceph odd pool create cephfsmeta xxx
c) ceph mds newfs {cephfsmeta_poolid} {cephfsdata_poolid}
5) ceph-deploy mds create {mdshostname}

Make sure you have password-less ssh access into the later host.

I think this should do the trick

JC



> On Nov 6, 2014, at 20:07, JIten Shah  wrote:
> 
> Thanks Loic. 
> 
> What is the recommended puppet module for installing cephFS ?
> 
> I can send more details about puppet-ceph but basically I haven't changed 
> anything in there except for assigning values to the required params in the 
> yaml file. 
> 
> --Jiten 
> 
> 
> 
>> On Nov 6, 2014, at 7:24 PM, Loic Dachary  wrote:
>> 
>> Hi,
>> 
>> At the moment puppet-ceph does not support CephFS. The error you're seeing 
>> does not ring a bell, would you have more context to help diagnose it ?
>> 
>> Cheers
>> 
>>> On 06/11/2014 23:44, JIten Shah wrote:
>>> Hi Guys,
>>> 
>>> I am sure many of you guys have installed cephfs using puppet. I am trying 
>>> to install “firefly” using the puppet module from  
>>> https://github.com/ceph/puppet-ceph.git  
>>> 
>>> and running into the “ceph_config” file issue where it’s unable to find the 
>>> config file and I am not sure why.
>>> 
>>> Here’s the error I get while running puppet on one of the mon nodes:
>>> 
>>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: 
>>> Could not evaluate: No ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: 
>>> Could not evaluate: No ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could 
>>> not evaluate: No ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
>>> evaluate: No ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
>>> ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
>>> evaluate: No ability to determine if ceph_config exists
>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could 
>>> not evaluate: No ability to determine if ceph_config exists
>>> 
>>> —Jiten
>>> 
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best practice about using multiple disks on one single OSD

2014-09-25 Thread Jean-Charles LOPEZ

Hi James,

the best practice is to set up 1 OSD daemon per physical disk drive.

In your case, each OSD node would hence be 4 OSD daemons using one physical 
drive per daemon, and deploying a minimum of 3 servers so each object copy 
resides on a separate physical server.

JC



On Sep 25, 2014, at 20:42, James Pan  wrote:

> Hi,
> 
> I have several servers and each server has 4 disks.
> Now I am going to setup Ceph on these servers and use all the 4 disks but it 
> seems one OSD instance can be configured with one backend storage. 
> 
> So there seems two options to me:
> 
> 1. Make the 4 disks into a raid0 then setup OSD to use this raid0 but 
> obviously this is not good because one disk failure will ruin the entire 
> storage.
> 2. Build FS on each disk and start 4 OSD instances on the server.
> 
> Both options are not good. So I am wondering what's the best practice of 
> setting up multiple didks on one OSD for Ceph.
> 
> 
> Thanks!
> Best Regards,
> 
> 
> 
> James Jiaming Pan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cache pool stats

2014-09-15 Thread Jean-Charles Lopez

Hi

ceph daemon osd.x perf dump will show you the stats Andrei

JC

On Monday, September 15, 2014, Andrei Mikhailovsky 
wrote:

> Hi
>
> Does anyone know how to check the basic cache pool stats for the
> information like how well the cache layer is working for a recent or
> historic time frame? Things like cache hit ratio would be very helpful as
> well as.
>
> Thanks
>
> Andrei
>


-- 
Sent while moving
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cache tier unable to auto flush data to storage tier

2014-09-14 Thread Jean-Charles LOPEZ

Hi Karan,

may be the statistical information about the temperature of the objects in the 
pool that has to be maintained by the flushing/eviciting agent?

Otherwise, hard to say from my seat.

JC



On Sep 14, 2014, at 08:05, Karan Singh  wrote:

> Thanks JC  , it worked , now cache tiering agent is migrating data between 
> tiers.
> 
> 
> But Now , i am seeing a new ISSUE :  Cache-pool has got some EXTRA objects , 
> that is not visible with # rados -p cache-pool ls but under #ceph df i can 
> see the count of those objects.
> 
> [root@ceph-node1 ~]# ceph df | egrep -i "objects|pool"
> POOLS:
> NAME   ID USED  %USED OBJECTS
> EC-pool15 1000M 1.21  2
> cache-pool 16 252   0 3
> [root@ceph-node1 ~]#
> [root@ceph-node1 ~]# rados -p cache-pool ls
> [root@ceph-node1 ~]# rados -p cache-pool  cache-flush-evict-all
> [root@ceph-node1 ~]# rados -p cache-pool ls
> [root@ceph-node1 ~]# ceph df | egrep -i "objects|pool"
> POOLS:
> NAME   ID USED  %USED OBJECTS
> EC-pool15 1000M 1.21  2
> cache-pool 16 252   0 3
> [root@ceph-node1 ~]#
> 
> 
> # Also when i create ONE object manually , #ceph df says that 2 objects has 
> been added. From where this extra object coming
> 
> [root@ceph-node1 ~]# ceph df | egrep -i "objects|pool"
> POOLS:
> NAME   ID USED  %USED OBJECTS
> EC-pool15 1000M 1.21  2
> cache-pool 16 252   0 3
> [root@ceph-node1 ~]#
> [root@ceph-node1 ~]#
> [root@ceph-node1 ~]# rados -p cache-pool put test /etc/hosts( I have 
> added one object in this step )
> [root@ceph-node1 ~]# rados -p cache-pool ls   
> ( when i list i can see only 1 object that i have recently created)
> test
> [root@ceph-node1 ~]# ceph df | egrep -i "objects|pool"
> POOLS:
> NAME   ID USED  %USED OBJECTS
> EC-pool15 1000M 1.21  2
> cache-pool 16 651   0 5   
> (Why it is showing 5 objects  , while earlier its 
> showing 3 Objects , why it has increased by 2  on adding only 1 object )
> [root@ceph-node1 ~]#
> 
> 
> - Karan -
> 
> On 14 Sep 2014, at 03:42, Jean-Charles LOPEZ  wrote:
> 
>> Hi Karan,
>> 
>> May be setting the dirty byte ratio (flush) and the full ratio (eviction). 
>> Just try to see if it makes any difference
>> - cache_target_dirty_ratio .1
>> - cache_target_full_ratio .2
>> 
>> Tune the percentage as desired relatively to target_max_bytes and 
>> target_max_objects. The first threshold reached will trigger flush or 
>> eviction (num objects or num bytes)
>> 
>> JC
>> 
>> 
>> 
>> On Sep 13, 2014, at 15:23, Karan Singh  wrote:
>> 
>>> Hello Cephers
>>> 
>>> I have created a Cache pool and looks like cache tiering agent is not able 
>>> to flush/evict data as per defined policy. However when i manually evict / 
>>> flush data , it migrates data from cache-tier to storage-tier
>>> 
>>> Kindly advice if there is something wrong with policy or anything else i am 
>>> missing.
>>> 
>>> Ceph Version: 0.80.5
>>> OS : Cent OS 6.4
>>> 
>>> Cache pool created using the following commands :
>>> 
>>> ceph osd tier add data cache-pool 
>>> ceph osd tier cache-mode cache-pool writeback
>>> ceph osd tier set-overlay data cache-pool
>>> ceph osd pool set cache-pool hit_set_type bloom
>>> ceph osd pool set cache-pool hit_set_count 1
>>> ceph osd pool set cache-pool hit_set_period 300
>>> ceph osd pool set cache-pool target_max_bytes 1
>>> ceph osd pool set cache-pool target_max_objects 100
>>> ceph osd pool set cache-pool cache_min_flush_age 60
>>> ceph osd pool set cache-pool cache_min_evict_age 60
>>> 
>>> 
>>> [root@ceph-node1 ~]# date
>>> Sun Sep 14 00:49:59 EEST 2014
>>> [root@ceph-node1 ~]# rados -p data  put file1 /etc/hosts
>>> [root@ceph-node1 ~]# rados -p data ls
>>> [root@ceph-node1 ~]# rados -p cache-pool ls
>>> file1
>>> [root@ceph-node1 ~]#
>>> 
>>> 
>>> [root@ceph-node1 ~]# date
>>> Sun Sep 14 00:59:33 EEST 2014
>>> [root@ceph-node1 ~]# rados -p data ls
>>> [root@ceph-node1 ~]# 
>>> [root@ceph-node1 ~

Re: [ceph-users] Cache tier unable to auto flush data to storage tier

2014-09-13 Thread Jean-Charles LOPEZ

Hi Karan,

May be setting the dirty byte ratio (flush) and the full ratio (eviction). Just 
try to see if it makes any difference
- cache_target_dirty_ratio .1
- cache_target_full_ratio .2

Tune the percentage as desired relatively to target_max_bytes and 
target_max_objects. The first threshold reached will trigger flush or eviction 
(num objects or num bytes)

JC



On Sep 13, 2014, at 15:23, Karan Singh  wrote:

> Hello Cephers
> 
> I have created a Cache pool and looks like cache tiering agent is not able to 
> flush/evict data as per defined policy. However when i manually evict / flush 
> data , it migrates data from cache-tier to storage-tier
> 
> Kindly advice if there is something wrong with policy or anything else i am 
> missing.
> 
> Ceph Version: 0.80.5
> OS : Cent OS 6.4
> 
> Cache pool created using the following commands :
> 
> ceph osd tier add data cache-pool 
> ceph osd tier cache-mode cache-pool writeback
> ceph osd tier set-overlay data cache-pool
> ceph osd pool set cache-pool hit_set_type bloom
> ceph osd pool set cache-pool hit_set_count 1
> ceph osd pool set cache-pool hit_set_period 300
> ceph osd pool set cache-pool target_max_bytes 1
> ceph osd pool set cache-pool target_max_objects 100
> ceph osd pool set cache-pool cache_min_flush_age 60
> ceph osd pool set cache-pool cache_min_evict_age 60
> 
> 
> [root@ceph-node1 ~]# date
> Sun Sep 14 00:49:59 EEST 2014
> [root@ceph-node1 ~]# rados -p data  put file1 /etc/hosts
> [root@ceph-node1 ~]# rados -p data ls
> [root@ceph-node1 ~]# rados -p cache-pool ls
> file1
> [root@ceph-node1 ~]#
> 
> 
> [root@ceph-node1 ~]# date
> Sun Sep 14 00:59:33 EEST 2014
> [root@ceph-node1 ~]# rados -p data ls
> [root@ceph-node1 ~]# 
> [root@ceph-node1 ~]# rados -p cache-pool ls
> file1
> [root@ceph-node1 ~]#
> 
> 
> [root@ceph-node1 ~]# date
> Sun Sep 14 01:08:02 EEST 2014
> [root@ceph-node1 ~]# rados -p data ls
> [root@ceph-node1 ~]# rados -p cache-pool ls
> file1
> [root@ceph-node1 ~]#
> 
> 
> 
> [root@ceph-node1 ~]# rados -p cache-pool  cache-flush-evict-all
> file1
> [root@ceph-node1 ~]#
> [root@ceph-node1 ~]# rados -p data ls
> file1
> [root@ceph-node1 ~]# rados -p cache-pool ls
> [root@ceph-node1 ~]#
> 
> 
> Regards
> Karan Singh
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS mounting error

2014-09-12 Thread Jean-Charles LOPEZ

Hi Erick,

the address to use in the mount syntax is the address of your MON node, not the 
one of the MDS node.

Or may be you have deployed both a MON and an MDS on ceph01?


JC



On Sep 12, 2014, at 18:41, Erick Ocrospoma  wrote:

> 
> 
> On 12 September 2014 20:32, JIten Shah  wrote:
> What does your mount command look like ?
> 
> 
> mount -t ceph ceph01:/mnt /mnt -o name=admin,secretfile=/root/ceph/admin.key
> 
> Where ceph01 is my mds server.
> 
>  
> Sent from my iPhone 5S
> 
> 
> 
> On Sep 12, 2014, at 4:56 PM, Erick Ocrospoma  wrote:
> 
>> Hi,
>> 
>> I'm n00b in the ceph world, so here I go. I was following this tutorials 
>> [1][2] (in case you need to know if I missed something), while trying to 
>> mount a block from an isolated machine using cehpfs I got this error 
>> (actually following what's there in [2]).
>> 
>> mount error 5 = Input/output error
>> 
>> I've searched on the internet  but with no success. No idea what's going on, 
>> and logs don't seem to have any clue of this error. My setup consists on 1 
>> mds server and 3 OSD servers, I perform all test with the root user, I've 
>> seen on other tutorials (and on this aswell) using an specificar user, don't 
>> if that could have impacted on the whole setup.
>> 
>> 
>> [1] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph
>> [2] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph&f=2
>> 
>> 
>> -- 
>> 
>> 
>> 
>> ~ Happy install !
>> 
>> 
>> 
>> 
>> 
>> Erick.
>> 
>> ---
>> 
>> IRC :   zerick
>> About :  http://about.me/zerick
>> Linux User ID :  549567
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> 
> 
> 
> ~ Happy install !
> 
> 
> 
> 
> 
> Erick.
> 
> ---
> 
> IRC :   zerick
> About :  http://about.me/zerick
> Linux User ID :  549567
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw ERROR: can't get key: ret=-2

2014-06-27 Thread Jean-Charles LOPEZ

Hi Benjamin,

code extract

sync_all_users() erroring is the sync of user stats

/* 
* thread, full sync all users stats periodically 
* 
* only sync non idle users or ones that never got synced before, this is needed 
so that 
* users that didn't have quota turned on before (or existed before the user 
objclass 
* tracked stats) need to get their backend stats up to date. 
*/

Nothing to really worry if it is a brand new fresh install as there is nothing 
to synchronize in terms of stat.

JC



On Jun 27, 2014, at 12:35, Benjamin Lynch  wrote:

> Hello Ceph users,
> 
> Has anyone seen a radosgw error like this:
> 
> 2014-06-27 14:02:39.254210 7f06b11587c0  0 ceph version 0.80.1
> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 15471
> 2014-06-27 14:02:39.341198 7f06955ea700  0 ERROR: can't get key: ret=-2
> 2014-06-27 14:02:39.341212 7f06955ea700  0 ERROR: sync_all_users()
> returned ret=-2
> 
> This is a new install of radosgw.  It created the default pools after
> I started it up, so I would assume the client keyring it set
> correctly. However, it's having trouble getting a key of sort (from
> the error message).  Any idea which key it's looking for?
> 
> Thanks.
> 
> - Ben
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HEALTH_WARN pool has too few pgs

2014-06-11 Thread Jean-Charles LOPEZ

Hi Eric,

increase the number of PGs in your pool with 
Step 1: ceph osd pool set  pg_num  
Step 2: ceph osd pool set  pgp_num  

You can check the number of PGs in your pool with ceph osd dump | grep ^pool

See documentation: http://ceph.com/docs/master/rados/operations/pools/

JC



On Jun 11, 2014, at 12:59, Eric Eastman  wrote:

> Hi,
> 
> I am seeing the following warning on one of my test clusters:
> 
> # ceph health detail
> HEALTH_WARN pool Ray has too few pgs
> pool Ray objects per pg (24) is more than 12 times cluster average (2)
> 
> This is a reported issue and is set to "Won't Fix" at:
> http://tracker.ceph.com/issues/8103
> 
> My test cluster has a mix of test data, and the pool showing the warning is 
> used for RBD Images.
> 
> 
> # ceph df detail
> GLOBAL:
>   SIZE  AVAIL RAW USED %RAW USED OBJECTS
>   1009G 513G  496G 49.14 33396
> POOLS:
>NAME   ID CATEGORY USED   %USED OBJECTS
>  DIRTY READ   WRITE
>data   0  -0  0 0  
> 0 0  0
>metadata   1  -0  0 0  
> 0 0  0
>rbd2  -0  0 0  
> 0 0  0
>iscsi  3  -847M   0.08  241
> 211   11839k 10655k
>cinder 4  -305M   0.03  53 
> 2 51579  31584
>glance 5  -65653M 6.35  8222   
>  7 512k   10405
>.users.swift   7  -0  0 0  
> 0 0  4
>.rgw.root  8  -1045   0 4  
> 4 23 5
>.rgw.control   9  -0  0 8  
> 8 0  0
>.rgw   10 -2520 2  
> 2 3  11
>.rgw.gc11 -0  0 32 
> 324958   3328
>.users.uid 12 -5750 3  
> 3 70 23
>.users 13 -9  0 1  
> 1 0  9
>.users.email   14 -0  0 0  
> 0 0  0
>.rgw.buckets   15 -0  0 0  
> 0 0  0
>.rgw.buckets.index 16 -0  0 1  
> 1 1  1
>Ray17 -99290M 9.61  24829  
>  24829 0  0
> 
> 
> It would be nice if we could turn off this message.
> 
> Eric
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is it still unsafe to map a RBD device on an OSD server?

2014-06-10 Thread Jean-Charles LOPEZ

Hi Sébastien,

still the case. Depending on what you do, the OSD process will get to a hang 
and will suicide.

Regards
JC

On Jun 10, 2014, at 09:46, Sebastien Han  wrote:

> Hi all,
> 
> A couple of years ago, I heard that it wasn’t safe to map a krbd block on an 
> OSD host.
> It was more or less like mounting a NFS mount on the NFS server, we can 
> potentially end up with some deadlocks.
> 
> At least, I tried again recently and didn’t encounter any problem.
> 
> What do you think?
> 
> Cheers.
>  
> Sébastien Han 
> Cloud Engineer 
> 
> "Always give 100%. Unless you're giving blood."
> 
> Phone: +33 (0)1 49 70 99 72 
> Mail: sebastien@enovance.com 
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mounting CephFS RO?

2014-05-29 Thread Jean-Charles LOPEZ

Hi Shawn,

just use the standard option=ro (-o ro)

Cheers
JC



On May 29, 2014, at 20:10, Shawn Edwards  wrote:

> Is it possible to have a CephFS mounted as RW on one machine and RO on 
> another machine?  We have a use case where we would have one box writing 
> files to CephFS and at least one other which would need the information in 
> CephFS.  It seems silly to us to use NFS or something like that to get the 
> files out of CephFS via the first box, as that will add unnecessary network 
> load.
> 
> Yes, I know I can try this, but I wanted to know if anyone else has tried it 
> or if it is known to not work because of some architectural issue.
> 
> Thanks!
> 
> -- 
>  Shawn Edwards
>  Beware programmers with screwdrivers.  They tend to spill them on their 
> keyboards.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS block size

2014-05-25 Thread Jean-Charles LOPEZ

Hi Sherry

go to this page in order to look at the description for show_layout and 
set_layout which will do what you are seeking for: 
https://ceph.com/docs/master/man/8/cephfs/

Cheers
JC



On May 24, 2014, at 21:49, Sherry Shahbazi  wrote:

> Hi all, 
> 
> I appreciate if anyone could let me know:
> 
> 1) How to check CephFS block size (It seems to be 4 MB which is really high)?
> 2) How to change CephFS default block size? I had a look at the following 
> link but it seems that no option is provided for specifying block size!
> https://ceph.com/docs/master/man/8/mount.ceph/#options
> 
> Thanks, 
> Sherry
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance stats

2014-05-15 Thread Jean-Charles LOPEZ

Hi,

fio is probably the tool you are looking that supports both RBD images and 
other disk devices for testing so that you can bench your Ceph cluster as week 
as compare it with other devices.

Cheers
JC



On May 14, 2014, at 23:37, yalla.gnan.ku...@accenture.com wrote:

> Hi All,
>  
> Is there a way by which we can measure the performance of Ceph block devices 
> ? (Example :  I/O stats, data to identify bottlenecks etc).
> Also what are the available ways in which we can compare Ceph storage 
> performance with other storage solutions  ?
>  
>  
> Thanks
> Kumar
>  
> 
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. 
> __
> 
> www.accenture.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Default pool ruleset problem

2014-05-06 Thread Jean-Charles Lopez

Hi

Before removing the rules, modify all pools to use the same crus rule with
the following command.

ceph osd pool set pool-name crush_ruleset n

Pool-name is the name of the pool
N is the rule number you wish to keep

And this will make sure your cluster remains healthy.

My 2 cts

JC
Sent from a mobile terminal

On Tuesday, May 6, 2014, Cao, Buddy  wrote:

>  Hi,
>
>
>
> After I setup ceph cluster thru mkcephfs, there are 3 default pools named
> metadata/data/rbd. And I notice that the 3 default pools assign to ruleset
> 0,1,2 respectively.  What if I refresh the crushmap with only 1 ruleset?
>  It means there are at least two default pools would have no corresponding
> rulesets. Will stuck unclean pgs or other odd status come?
>
>
>
>
>
> Wei Cao (Buddy)
>
>
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph not replicating

2014-04-19 Thread Jean-Charles Lopez

Hi again

Looked at your ceph -s.

You have only 2 OSDs, one on each node. The default replica count is 2, the
default crush map says each replica on a different host, or may be you set
it to 2 different OSDs. Anyway, when one of your OSD goes down, Ceph can no
longer find another OSDs to host the second replica it must create.

Looking at your crushmap we would know better.

Recommendation: for testing efficiently and most options available,
functionnally speaking, deploy a cluster with 3 nodes, 3 OSDs each is my
best practice.

Or make 1 node with 3 OSDs modifying your crushmap to "choose type osd" in
your rulesets.

JC

On Saturday, April 19, 2014, Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:

> Hi,
>
> I'm building a cluster where two nodes replicate objects inside. I found
> that shutting down just one of the nodes (the second one), makes everything
> "incomplete".
>
> I cannot find why, since crushmap looks good to me.
>
> after shutting down one node
>
> cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771
>  health HEALTH_WARN 192 pgs incomplete; 96 pgs stuck inactive; 96 pgs
> stuck unclean; 1/2 in osds are down
>  monmap e9: 1 mons at {blue-compute=172.16.0.119:6789/0}, election
> epoch 1, quorum 0 blue-compute
>  osdmap e73: 2 osds: 1 up, 2 in
>   pgmap v172: 192 pgs, 3 pools, 275 bytes data, 1 objects
> 7552 kB used, 919 GB / 921 GB avail
>  192 incomplete
>
>
> Both nodes has WD Caviar Black 500MB disk with btrfs filesystem on it.
> Full disk used.
>
> I cannot understand why does not replicate to both nodes.
>
> Someone can help?
>
> Best regards,
>

-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph not replicating

2014-04-19 Thread Jean-Charles Lopez

Hi

Do tou have chooseleaf type host or type node in your crush map?

How many OSDs do you run on each hosts?

Thx
JC

On Saturday, April 19, 2014, Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:

> Hi,
>
> I'm building a cluster where two nodes replicate objects inside. I found
> that shutting down just one of the nodes (the second one), makes everything
> "incomplete".
>
> I cannot find why, since crushmap looks good to me.
>
> after shutting down one node
>
> cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771
>  health HEALTH_WARN 192 pgs incomplete; 96 pgs stuck inactive; 96 pgs
> stuck unclean; 1/2 in osds are down
>  monmap e9: 1 mons at {blue-compute=172.16.0.119:6789/0}, election
> epoch 1, quorum 0 blue-compute
>  osdmap e73: 2 osds: 1 up, 2 in
>   pgmap v172: 192 pgs, 3 pools, 275 bytes data, 1 objects
> 7552 kB used, 919 GB / 921 GB avail
>  192 incomplete
>
>
> Both nodes has WD Caviar Black 500MB disk with btrfs filesystem on it.
> Full disk used.
>
> I cannot understand why does not replicate to both nodes.
>
> Someone can help?
>
> Best regards,
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Jean-Charles LOPEZ

Hi Cedric,

use the radios command to remove the empty pool name if you need to.

rados rmpool ‘’ ‘’ —yes-i-really-really-mean-it

You won’t be alb to remove it with the ceph command

JC



On Apr 18, 2014, at 03:51, Cedric Lemarchand  wrote:

> Hi,
> 
> I am facing a strange behaviour where a pool is stucked, I have no idea how 
> this pool appear in the cluster in the way I have not played with pool 
> creation, *yet*.
> 
> # root@node1:~# ceph -s
> cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3
>  health HEALTH_WARN 333 pgs degraded; 333 pgs stuck unclean; pool 
> .rgw.buckets has too few pgs
>  monmap e1: 1 mons at {node1=127.0.0.1:6789/0}, election epoch 1, quorum 
> 0 node1
>  osdmap e154: 3 osds: 3 up, 3 in
>   pgmap v16812: 3855 pgs, 14 pools, 41193 MB data, 24792 objects
> 57236 MB used, 644 GB / 738 GB avail
> 3522 active+clean
>  333 active+degraded
> 
> # root@node1:/etc/ceph# ceph osd dump
> epoch 154
> fsid 1b147882-722c-43d8-8dfb-38b78d9fbec3
> created 2014-04-16 20:46:46.516403
> modified 2014-04-18 12:14:29.052231
> flags 
> 
> pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 1 min_size 1 crush_ruleset 1 object_hash rjenkins 
> pg_num 64 pgp_num 64 last_change 1 owner 0
> pool 2 'rbd' rep size 1 min_size 1 crush_ruleset 2 object_hash rjenkins 
> pg_num 64 pgp_num 64 last_change 1 owner 0
> pool 3 '.rgw.root' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 333 pgp_num 333 last_change 16 owner 0
> pool 4 '.rgw.control' rep size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 333 pgp_num 333 last_change 18 owner 0
> pool 5 '.rgw' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 333 pgp_num 333 last_change 20 owner 0
> pool 6 '.rgw.gc' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 333 pgp_num 333 last_change 21 owner 0
> pool 7 '.users.uid' rep size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 333 pgp_num 333 last_change 22 owner 0
> pool 8 '.users' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 333 pgp_num 333 last_change 26 owner 0
> pool 9 '.users.swift' rep size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 333 pgp_num 333 last_change 28 owner 0
> pool 10 '.users.email' rep size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 333 pgp_num 333 last_change 56 owner 0
> pool 11 '.rgw.buckets.index' rep size 1 min_size 1 crush_ruleset 0 
> object_hash rjenkins pg_num 333 pgp_num 333 last_change 58 owner 
> 18446744073709551615
> pool 12 '.rgw.buckets' rep size 1 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 333 pgp_num 333 last_change 60 owner 18446744073709551615
> pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
> 333 pgp_num 333 last_change 146 owner 18446744073709551615
> 
> max_osd 5
> osd.0 up   in  weight 1 up_from 151 up_thru 151 down_at 148 
> last_clean_interval [144,147) 192.168.1.18:6800/26681 192.168.1.18:6801/26681 
> 192.168.1.18:6802/26681 192.168.1.18:6803/26681 exists,up 
> f6f63e8a-42af-4dda-b523-ffb835165420
> osd.1 up   in  weight 1 up_from 149 up_thru 149 down_at 148 
> last_clean_interval [139,147) 192.168.1.18:6805/26685 192.168.1.18:6806/26685 
> 192.168.1.18:6807/26685 192.168.1.18:6808/26685 exists,up 
> fa4689ac-e0ca-4ec3-ab2a-6afa57cc7498
> osd.2 up   in  weight 1 up_from 153 up_thru 153 down_at 148 
> last_clean_interval [141,147) 192.168.1.18:6810/26691 192.168.1.18:6811/26691 
> 192.168.1.18:6812/26691 192.168.1.18:6813/26691 exists,up 
> 6b2f7e3f-619c-4922-bdf9-bb0f2eee7413
> 
> # root@node1:/etc/ceph# ceph pg dump_stuck unclean |sort 
> 13.00000000active+degraded2014-04-18 
> 12:14:28.4385230'0154:13[0][0]0'02014-04-18 
> 11:12:05.3228550'02014-04-18 11:12:05.322855
> 13.1000000000active+degraded2014-04-18 
> 12:14:26.1106330'0154:13[0][0]0'02014-04-18 
> 11:12:06.3181590'02014-04-18 11:12:06.318159
> 13.100000000active+degraded2014-04-18 
> 12:14:37.0810870'0154:12[2][2]0'02014-04-18 
> 11:12:05.6423170'02014-04-18 11:12:05.642317
> 13.10000000active+degraded2014-04-18 
> 12:14:20.8748290'0154:13[1][1]0'02014-04-18 
> 11:12:05.5808740'02014-04-18 11:12:05.580874
> 13.1010000000active+degraded2014-04-18 
> 12:14:16.7231000'0154:14[1][1]0'02014-04-18 
> 11:12:06.5409750'02014-04-18 11:12:06.540975
> 13.1020000000active+degraded2014-04-18 
> 12:14:35.7954910'0154:12[2][2]0'02014-04-18 
> 11:12:06.543

Re: [ceph-users] radosgw-agent won't run

2014-04-04 Thread Jean-Charles Lopez

What is the status of your PGs on the slave zone side.

A down or stale PG could definitely cause this.

May be a quick ceph -s and ceph health detail could help locate the PG with
a problem that could may be then help you get the correct ceph pg {pgid}
query command to find out which OSD is causing it

JC

On Friday, April 4, 2014, Craig Lewis  wrote:

>  I've been seeing this warning on ceph -w for a while:
> 2014-04-04 11:26:29.438992 osd.3 [WRN] 84 slow requests, 1 included below;
> oldest blocked for > 90124.336765 secs
> 2014-04-04 11:26:29.438996 osd.3 [WRN] slow request 1920.199044 seconds
> old, received at 2014-04-04 10:54:29.239906: osd_op(client.45483332.0:79
> .dir.us-west-1.51941060.1 [call rgw.bucket_list] 11.7c96a483 e13266) v4
> currently waiting for missing object
>
> It appears to be causing problems for radosgw-agent (this warning is in
> the slave zone).
>
> Requests blocked for more than a day are a bit of a problem.  Is there
> anything I can do about this?
>
>
> --
>
>  *Craig Lewis*
>  Senior Systems Engineer
> Office +1.714.602.1309
> Email 
> cle...@centraldesktop.com
>
>  *Central Desktop. Work together in ways you never thought possible.*
>  Connect with us   Website   |  
> Twitter |
> Facebook   |  
> LinkedIn |
> Blog 
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cleaning up; data usage, snap-shots, auth users

2014-04-02 Thread Jean-Charles Lopez

Hi

>From what is pasted, your remove failed so make sure you purge snapshots
then the rbd image.

For the user removal, as explained in www.ceph.com/docs or ceph auth help
just issue ceph auth del {user}

JC

On Wednesday, April 2, 2014, Jonathan Gowar  wrote:

> Hi,
>
>I have a small 8TB testing cluster.  During testing I've used 94G.
> But, I have since removed pools and images from Ceph, I shouldn't be
> using any space, but still the 94G usage remains.  How can I reclaim old
> used space?
>
> Also, this:-
>
> ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba1b5d15e
> 2014-04-03 01:02:23.304323 7f92e2ced760 -1 librbd::ImageCtx: error
> finding header: (2) No such file or directory
> Removing image: 2014-04-03 01:02:23.312212 7f92e2ced760 -1 librbd: error
> removing img from new-style directory: (2) No such file or directory
> 0% complete...failed.
> rbd: delete error: (2) No such file or directory
> ceph@ceph-admin:~$ rbd rm 6fa36869-4afe-485a-90a3-93fba1b5d15e -p
> cloudstack
> 2014-04-03 01:02:34.424626 7fd556d00760 -1 librbd: image has snapshots -
> not removing
> Removing image: 0% complete...failed.
> rbd: image has snapshots - these must be deleted with 'rbd snap purge'
> before the image can be removed.
> ceph@ceph-admin:~$ rbd snap purge 6fa36869-4afe-485a-90a3-93fba1b5d15e
> -p cloudstack
> Removing all snapshots2014-04-03 01:02:46.863370 7f2949461760 -1 librbd:
> removing snapshot from header failed: (16) Device or resource busy
> : 0% complete...failed.
> rbd: removing snaps failed: (16) Device or resource busy
>
> Lastly, how can I remove a user from the auth list?
>
> Regards,
> Jon
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph: Error librbd to create a clone

2014-03-31 Thread Jean-Charles Lopez

Hi Patrick

When you call the clone method, add an extra argument specifying the
features to be used for the clone:
1 for layering only
3 for layering and striping

Adapt the value to your requirements

Then it should work.

JC

On Monday, March 31, 2014, COPINE Patrick  wrote:

> Hi,
>
> Now, I have an exception. Before starting the Python program, I made the
> following actions.
>
>
>
>
>
> The exception is :
>
>> root@ceph-clt:~/src# python v4.py 
>>
>> Traceback (most recent call last):
>>
>>   File "v4.py", line 22, in 
>>
>> rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3')
>>
>>   File "/usr/lib/python2.7/dist-packages/rbd.py", line 239, in clone
>>
>> raise make_ex(ret, 'error creating clone')
>>
>> rbd.Error: error creating clone: error code 8
>>
>
>
> 2014-03-28 21:13 GMT+01:00 COPINE Patrick 
> 
> >:
>
>> Yes, I forgot that. I was influenced by the error message. I will protect
>> the snapshot before performing the clone. Thank you.
>>  Le 28 mars 2014 17:30, "COPINE Patrick" 
>> >
>> a écrit :
>>
>>  Hi,
>>>
>>> I would like create a clone with a Python code, but I have a syntax
>>> error. In first, I have created an image (type 2), and after a snapshot.
>>> The code is here.
>>>
>>> Best regards,
>>>
>>> root@ceph-clt:~/src# cat v4.py

 # -*- coding:utf-8 -*-

 import rbd

 import rados


 FCONF='/etc/ceph/ceph.conf'

 POOL='rbd'


 try:

   # connexion au cluster

   cluster = rados.Rados(conffile=FCONF)

   cluster.connect()


   # utilisation du pool 'rbd'

   p_ioctx = cluster.open_ioctx(POOL)

   c_ioctx = cluster.open_ioctx(POOL)


   try:

 # instanciation d'un objet de type 'rbd' pour cloner l'image

 rbd_inst = rbd.RBD()


 # création du clone "c1-foo3

 rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3')


   finally:

 p_ioctx.close()

 c_ioctx.close()

 finally:

   cluster.shutdown()

>>>
>>>  root@ceph-clt:~/src# python v4.py

 Traceback (most recent call last):

   File "v4.py", line 22, in 

 rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3', 0,
 0)

   File "/usr/lib/python2.7/dist-packages/rbd.py", line 239, in clone

 raise make_ex(ret, 'error creating clone')

 rbd.InvalidArgument: error creating clone

>>>
>>>
>>> Patrick COPINE *(*tél: *+33680117101 <%2B33680117101>, **+33652566319
>>> <%2B33652566319>)*
>>>
>>
>
>
> --
> Cordialement,
>
> Patrick COPINE *(*tél: *+33680117101, **+33652566319)*
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph: Error librbd to create a clone

2014-03-28 Thread Jean-Charles LOPEZ

Hi Patrick,

when you clone, you clone from a snapshot. This snapshot must be protected and 
above all, you RBD image must be format 2.

Are all of these elements true?

JC



On Mar 28, 2014, at 09:30, COPINE Patrick  wrote:

> Hi,
> 
> I would like create a clone with a Python code, but I have a syntax error. In 
> first, I have created an image (type 2), and after a snapshot. The code is 
> here. 
> 
> Best regards,
> 
> root@ceph-clt:~/src# cat v4.py
> 
> # -*- coding:utf-8 -*-
> 
> import rbd
> 
> import rados
> 
> 
> 
> FCONF='/etc/ceph/ceph.conf'
> 
> POOL='rbd'
> 
> 
> 
> try:
> 
>   # connexion au cluster
> 
>   cluster = rados.Rados(conffile=FCONF)
> 
>   cluster.connect()
> 
> 
> 
>   # utilisation du pool 'rbd'
> 
>   p_ioctx = cluster.open_ioctx(POOL)
> 
>   c_ioctx = cluster.open_ioctx(POOL)
> 
> 
> 
>   try:
> 
> # instanciation d'un objet de type 'rbd' pour cloner l'image
> 
> rbd_inst = rbd.RBD()
> 
> 
> 
> # création du clone "c1-foo3
> 
> rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3') 
> 
> 
> 
>   finally:
> 
> p_ioctx.close()
> 
> c_ioctx.close()
> 
> finally:
> 
>   cluster.shutdown()
> 
> 
> root@ceph-clt:~/src# python v4.py
> 
> Traceback (most recent call last):
> 
>   File "v4.py", line 22, in 
> 
> rbd_inst.clone(p_ioctx, 'foo3', 's1-foo3', c_ioctx, 'c1-foo3', 0, 0) 
> 
>   File "/usr/lib/python2.7/dist-packages/rbd.py", line 239, in clone
> 
> raise make_ex(ret, 'error creating clone')
> 
> rbd.InvalidArgument: error creating clone
> 
> 
> 
> 
> Patrick COPINE (tél: +33680117101, +33652566319)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Remove volume

2014-03-13 Thread Jean-Charles LOPEZ

Hi,

use rbd children to see dependencies 

rbd children @ for example

http://ceph.com/docs/rbd

JC



On Mar 13, 2014, at 24:09, yalla.gnan.ku...@accenture.com wrote:

> Hi Jean,
>  
> Thanks a lot.
>  
> I have  the following images in ‘ images’ pool:
> 
>  
> root@compute:/home/oss# rbd -p images ls
> 0c605116-0634-4aed-9b3f-12d9483cd38a
> 9f1a5bdc-3450-4934-99b3-d1b834ad9592
> b16aac0e-621f-4f36-a027-39c86d28011f
>  
> When I try deleting them I get this error:
> --
> root@compute:/home/oss# rbd -p images rm 0c605116-0634-4aed-9b3f-12d9483cd38a
> 2014-03-13 04:28:19.455049 7f751275e780 -1 librbd: image has snapshots - not 
> removing
> Removing image: 0% complete...failed.
> rbd: image has snapshots - these must be deleted with 'rbd snap purge' before 
> the image can be removed.
>  
> When I try deleting the snaps , I face the following error:
> 
> root@compute:/home/oss# rbd -p images snap purge 
> 0c605116-0634-4aed-9b3f-12d9483cd38a
> Removing all snapshots: 0% complete...failed.
> rbd: removing snaps failed: (16) Device or resource busy
> 2014-03-13 04:31:39.512199 7f1ac4134780 -1 librbd: removing snapshot from 
> header failed: (16) Device or resource busy
>  
>  
> How to delete the images ?
>  
>  
> Thanks
> Kumar
>  
>  
> From: Jean-Charles Lopez [mailto:jc.lo...@inktank.com] 
> Sent: Thursday, March 13, 2014 12:37 PM
> To: Gnan Kumar, Yalla
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Remove volume
>  
> Hi
>  
> rbd -p  rm 
>  
> e.g. rbd -p volumes rm volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c
>  
> http://ceph.com/docs
>  
> JC
> 
> On Wednesday, March 12, 2014,  wrote:
> Hi All,
>  
> I have ceph installed on Ubuntu nodes. At present I have the following 
> volumes in Volumes pool:
> -
> root@compute:/home/oss# rbd -p volumes ls
> volume-53f0dc29-956f-48f1-8db1-b0f9c1b0e9f1
> volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c
> volume-a73d1bd0-2937-41c4-bbca-2545454eefac
> volume-bd45af55-489f-4d09-bc14-33229c1e3096
> volume-cb11564f-7550-4e23-8197-4f8af09e506c
> volume-f3f67d69-8ac3-41a9-8001-4a2b512af933
>  
>  
> What is the command to delete the above volumes ?
>  
> Thanks
> Kumar
>  
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. 
> __
> 
> www.accenture.com
> 
> 
> -- 
> Sent while moving
> Pardon my French and any spelling &| grammar glitches
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Remove volume

2014-03-13 Thread Jean-Charles Lopez

Hi

Probably because you have snapshots mapped somewhere on a node or because
you have cloned some images from snapshot which are protected to deploy VMs.

This will prevent the deletion of the snapshots in both cases

JC

On Thursday, March 13, 2014,  wrote:

>  Hi Jean,
>
>
>
> Thanks a lot.
>
>
>
> I have  the following images in ' images' pool:
>
> 
>
>
>
> root@compute:/home/oss# rbd -p images ls
>
> 0c605116-0634-4aed-9b3f-12d9483cd38a
>
> 9f1a5bdc-3450-4934-99b3-d1b834ad9592
>
> b16aac0e-621f-4f36-a027-39c86d28011f
>
>
>
> When I try deleting them I get this error:
>
> --
>
> root@compute:/home/oss# rbd -p images rm
> 0c605116-0634-4aed-9b3f-12d9483cd38a
>
> 2014-03-13 04:28:19.455049 7f751275e780 -1 librbd: image has snapshots -
> not removing
>
> Removing image: 0% complete...failed.
>
> rbd: image has snapshots - these must be deleted with 'rbd snap purge'
> before the image can be removed.
>
>
>
> When I try deleting the snaps , I face the following error:
>
> 
>
> root@compute:/home/oss# rbd -p images snap purge
> 0c605116-0634-4aed-9b3f-12d9483cd38a
>
> Removing all snapshots: 0% complete...failed.
>
> rbd: removing snaps failed: (16) Device or resource busy
>
> 2014-03-13 04:31:39.512199 7f1ac4134780 -1 librbd: removing snapshot from
> header failed: (16) Device or resource busy
>
>
>
>
>
> How to delete the images ?
>
>
>
>
>
> Thanks
>
> Kumar
>
>
>
>
>
> *From:* Jean-Charles Lopez 
> [mailto:jc.lo...@inktank.com]
>
> *Sent:* Thursday, March 13, 2014 12:37 PM
> *To:* Gnan Kumar, Yalla
> *Cc:* 
> ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Remove volume
>
>
>
> Hi
>
>
>
> rbd -p  rm 
>
>
>
> e.g. rbd -p volumes rm volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c
>
>
>
> http://ceph.com/docs
>
>
>
> JC
>
> On Wednesday, March 12, 2014, 
> >
> wrote:
>
> Hi All,
>
>
>
> I have ceph installed on Ubuntu nodes. At present I have the following
> volumes in Volumes pool:
>
> -
>
> root@compute:/home/oss# rbd -p volumes ls
>
> volume-53f0dc29-956f-48f1-8db1-b0f9c1b0e9f1
>
> volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c
>
> volume-a73d1bd0-2937-41c4-bbca-2545454eefac
>
> volume-bd45af55-489f-4d09-bc14-33229c1e3096
>
> volume-cb11564f-7550-4e23-8197-4f8af09e506c
>
> volume-f3f67d69-8ac3-41a9-8001-4a2b512af933
>
>
>
>
>
> What is the command to delete the above volumes ?
>
>
>
> Thanks
>
> Kumar
>
>
>  --
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>
>
>
> --
> Sent while moving
> Pardon my French and any spelling &| grammar glitches
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Remove volume

2014-03-13 Thread Jean-Charles Lopez

Hi

rbd -p  rm 

e.g. rbd -p volumes rm volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c

http://ceph.com/docs

JC

On Wednesday, March 12, 2014,  wrote:

>  Hi All,
>
>
>
> I have ceph installed on Ubuntu nodes. At present I have the following
> volumes in Volumes pool:
>
> -
>
> root@compute:/home/oss# rbd -p volumes ls
>
> volume-53f0dc29-956f-48f1-8db1-b0f9c1b0e9f1
>
> volume-55abf0d4-01a7-41d0-9e7e-407ad0db213c
>
> volume-a73d1bd0-2937-41c4-bbca-2545454eefac
>
> volume-bd45af55-489f-4d09-bc14-33229c1e3096
>
> volume-cb11564f-7550-4e23-8197-4f8af09e506c
>
> volume-f3f67d69-8ac3-41a9-8001-4a2b512af933
>
>
>
>
>
> What is the command to delete the above volumes ?
>
>
>
> Thanks
>
> Kumar
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd create ... STRIPINGV2 and format 2 or later required

2014-03-12 Thread Jean-Charles LOPEZ

Hi Dieter,

you have a problem with your command.

You set order = 16 so your RBD objects is going to be 65536 bytes

Then you tell RBD that you stripe-unit is going to be 65536 which is the size 
of your full object.

Either decrease the size of —stripe-unit to 8192 for example
Or increase order so that it is bigger than your stripe unit and contains a 
multiple of stripe-units (e.g. 21)

And it will work without any problem
JC


On Mar 11, 2014, at 07:22, Kasper Dieter  wrote:

> So, should I open a bug report ?
> 
> STRIPINGV2 feature was added in Ceph v0.53, and I'm running v0.61 and using 
> '--image-format 2' during 'rbd create'
> 
> Regards,
> -Dieter
> 
> 
> On Tue, Mar 11, 2014 at 03:13:28PM +0100, Srinivasa Rao Ragolu wrote:
>>   of course. rbd userland utilities provide you create  images on RADOS as   
>>   
>>   block storage. 
>>   
>> 
>>   On Tue, Mar 11, 2014 at 7:37 PM, Kasper Dieter 
>>   
>>   <[1]dieter.kas...@ts.fujitsu.com> wrote:   
>>   
>> 
>> I know, that format2 in rbd.ko is supported with kernel version 3.10 and 
>>   
>> above.   
>>   
>> 
>> But, if I want to create an rbd-image
>>   
>> only the Ceph Userland services should be involved, shouldn't it ?   
>>   
>> 
>> -Dieter  
>>   
>> 
>> BTW the kernel version on the nodes hosting the OSDs processes is
>>   
>> 2.6.32-358.el6.x86_64
>>   
>> but I can also boot with a 3.10.32 kernel.   
>>   
>> 
>> On Tue, Mar 11, 2014 at 02:57:05PM +0100, Srinivasa Rao Ragolu wrote:
>>   
>>>   Please check the kernel version . Only kernel version 3.10 and
>> above are
>>   
>>>   supported to create format type 2 images. 
>>> 
>>>   On Tue, Mar 11, 2014 at 7:16 PM, Kasper Dieter
>>>   <[1][2]dieter.kas...@ts.fujitsu.com> wrote:   
>>> 
>>> When using "rbd create ... --image-format 2" in some cases this 
>> CMD is   
>>   
>>> rejected by 
>>> EINVAL with the message "librbd: STRIPINGV2 and format 2 or later   
>>> required for non-default striping"  
>>> But, in v0.61.9 "STRIPINGV2 and format 2" should be supported   
>>> 
>>> [root@rx37-3 ~]# rbd create --pool SSD-r2 --size 20480 --order 16   
>>> --image-format 2 --stripe-unit 65536 --stripe-count 4 t2
>>> rbd: create error: (22) Invalid argument
>>> 2014-03-11 14:39:03.885185 7f15bc170760 -1 librbd: STRIPINGV2 and   
>> format   
>>   
>>> 2 or later required for non-default striping
>>> 
>>> [root@rx37-3 ~]# ceph -v
>>> ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90)  
>>> 
>>> Any hints ? 
>>> 
>>> Regards,
>>> -Dieter 
>>> ___ 
>>> ceph-users mailing list 
>>> [2][3]ceph-users@lists.ceph.com 
>>> [3][4]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> References   
>>> 
>>>   Visible links 
>>>   1. mailto:[5]dieter.kas...@ts.fujitsu.com 
>>>   2. mailto:[6]ceph-users@lists.ceph.com
>>>   3. [7]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
>> 
>> References
>> 
>>   Visible links
>>   1. mailto:dieter.kas...@ts.fujitsu.com
>>   2. mailto:dieter.kas...@ts.fujitsu.com
>>   3. mailto:ceph-users@lists.ceph.com
>>   4. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>   5. mailto:dieter.kas...@ts.fujitsu.com
>>   6. mailto:ceph-users@lists.ceph.com
>>   7. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] how to configure ceph object gateway

2014-03-12 Thread Jean-Charles LOPEZ

Hi,

what commands are “not found”?

This page for configuring the RGW works fine as far as I know as I used it no 
later than a week ago.

Can you please give us more details? What is your layout (radosgw installed on 
a ceph node, mon node, standalone node)?

Note: In order to get it running, remember you need to have a web server 
installed and running (apache), ceph base packages obviously, swift if you want 
to use the swift tool, s3cmd also, s3curl, …

JC

On Mar 10, 2014, at 19:35, wsnote  wrote:

> OS: CentOS 6.4
> version: ceph 0.67.7
> 
> Hello, everyone.
> With the help of document, I have install ceph gateway.
> But I don't know how to configure it. The web 
> http://ceph.com/docs/master/radosgw/config/ has many command not found.I 
> thought it's written in the ubuntu.
> can anyone help?
> Thanks!
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd create ... STRIPINGV2 and format 2 or later required

2014-03-11 Thread Jean-Charles LOPEZ

Hi Greg,

but our default also has stripe-count = 1 so that no more than 1 stripe-unit is 
included in each order x object.

So if you do --order 16—stripe-unit 65536 —stripe-count 1 it then works

I’m not sure if this is what you meant.
JC



On Mar 11, 2014, at 08:32, Gregory Farnum  wrote:

> If the stripe size and object size are the same it's just chunking --
> that's our default. Should work fine.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Tue, Mar 11, 2014 at 8:23 AM, Jean-Charles LOPEZ
>  wrote:
>> Hi Dieter,
>> 
>> you have a problem with your command.
>> 
>> You set order = 16 so your RBD objects is going to be 65536 bytes
>> 
>> Then you tell RBD that you stripe-unit is going to be 65536 which is the 
>> size of your full object.
>> 
>> Either decrease the size of --stripe-unit to 8192 for example
>> Or increase order so that it is bigger than your stripe unit and contains a 
>> multiple of stripe-units (e.g. 21)
>> 
>> And it will work without any problem
>> JC
>> 
>> 
>> 
>> On Mar 11, 2014, at 07:22, Kasper Dieter  
>> wrote:
>> 
>>> So, should I open a bug report ?
>>> 
>>> STRIPINGV2 feature was added in Ceph v0.53, and I'm running v0.61 and using 
>>> '--image-format 2' during 'rbd create'
>>> 
>>> Regards,
>>> -Dieter
>>> 
>>> 
>>> On Tue, Mar 11, 2014 at 03:13:28PM +0100, Srinivasa Rao Ragolu wrote:
>>>>  of course. rbd userland utilities provide you create  images on RADOS as
>>>>  block storage.
>>>> 
>>>>  On Tue, Mar 11, 2014 at 7:37 PM, Kasper Dieter
>>>>  <[1]dieter.kas...@ts.fujitsu.com> wrote:
>>>> 
>>>>I know, that format2 in rbd.ko is supported with kernel version 3.10 and
>>>>above.
>>>> 
>>>>But, if I want to create an rbd-image
>>>>only the Ceph Userland services should be involved, shouldn't it ?
>>>> 
>>>>-Dieter
>>>> 
>>>>BTW the kernel version on the nodes hosting the OSDs processes is
>>>>2.6.32-358.el6.x86_64
>>>>but I can also boot with a 3.10.32 kernel.
>>>> 
>>>>On Tue, Mar 11, 2014 at 02:57:05PM +0100, Srinivasa Rao Ragolu wrote:
>>>>>  Please check the kernel version . Only kernel version 3.10 and
>>>>above are
>>>>>  supported to create format type 2 images.
>>>>> 
>>>>>  On Tue, Mar 11, 2014 at 7:16 PM, Kasper Dieter
>>>>>  <[1][2]dieter.kas...@ts.fujitsu.com> wrote:
>>>>> 
>>>>>When using "rbd create ... --image-format 2" in some cases this
>>>>CMD is
>>>>>rejected by
>>>>>EINVAL with the message "librbd: STRIPINGV2 and format 2 or later
>>>>>required for non-default striping"
>>>>>But, in v0.61.9 "STRIPINGV2 and format 2" should be supported
>>>>> 
>>>>>[root@rx37-3 ~]# rbd create --pool SSD-r2 --size 20480 --order 16
>>>>>--image-format 2 --stripe-unit 65536 --stripe-count 4 t2
>>>>>rbd: create error: (22) Invalid argument
>>>>>2014-03-11 14:39:03.885185 7f15bc170760 -1 librbd: STRIPINGV2 and
>>>>format
>>>>>2 or later required for non-default striping
>>>>> 
>>>>>[root@rx37-3 ~]# ceph -v
>>>>>ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90)
>>>>> 
>>>>>Any hints ?
>>>>> 
>>>>>Regards,
>>>>>-Dieter
>>>>>___
>>>>>ceph-users mailing list
>>>>>[2][3]ceph-users@lists.ceph.com
>>>>>[3][4]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>>> References
>>>>> 
>>>>>  Visible links
>>>>>  1. mailto:[5]dieter.kas...@ts.fujitsu.com
>>>>>  2. mailto:[6]ceph-users@lists.ceph.com
>>>>>  3. [7]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> References
>>>> 
>>>>  Visible links
>>>>  1. mailto:dieter.kas...@ts.fujitsu.com
>>>>  2. mailto:dieter.kas...@ts.fujitsu.com
>>>>  3. mailto:ceph-users@lists.ceph.com
>>>>  4. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>  5. mailto:dieter.kas...@ts.fujitsu.com
>>>>  6. mailto:ceph-users@lists.ceph.com
>>>>  7. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd create ... STRIPINGV2 and format 2 or later required

2014-03-11 Thread Jean-Charles LOPEZ

Hi Dieter,

you have a problem with your command.

You set order = 16 so your RBD objects is going to be 65536 bytes

Then you tell RBD that you stripe-unit is going to be 65536 which is the size 
of your full object.

Either decrease the size of —stripe-unit to 8192 for example
Or increase order so that it is bigger than your stripe unit and contains a 
multiple of stripe-units (e.g. 21)

And it will work without any problem
JC



On Mar 11, 2014, at 07:22, Kasper Dieter  wrote:

> So, should I open a bug report ?
> 
> STRIPINGV2 feature was added in Ceph v0.53, and I'm running v0.61 and using 
> '--image-format 2' during 'rbd create'
> 
> Regards,
> -Dieter
> 
> 
> On Tue, Mar 11, 2014 at 03:13:28PM +0100, Srinivasa Rao Ragolu wrote:
>>   of course. rbd userland utilities provide you create  images on RADOS as   
>>   
>>   block storage. 
>>   
>> 
>>   On Tue, Mar 11, 2014 at 7:37 PM, Kasper Dieter 
>>   
>>   <[1]dieter.kas...@ts.fujitsu.com> wrote:   
>>   
>> 
>> I know, that format2 in rbd.ko is supported with kernel version 3.10 and 
>>   
>> above.   
>>   
>> 
>> But, if I want to create an rbd-image
>>   
>> only the Ceph Userland services should be involved, shouldn't it ?   
>>   
>> 
>> -Dieter  
>>   
>> 
>> BTW the kernel version on the nodes hosting the OSDs processes is
>>   
>> 2.6.32-358.el6.x86_64
>>   
>> but I can also boot with a 3.10.32 kernel.   
>>   
>> 
>> On Tue, Mar 11, 2014 at 02:57:05PM +0100, Srinivasa Rao Ragolu wrote:
>>   
>>>   Please check the kernel version . Only kernel version 3.10 and
>> above are
>>   
>>>   supported to create format type 2 images. 
>>> 
>>>   On Tue, Mar 11, 2014 at 7:16 PM, Kasper Dieter
>>>   <[1][2]dieter.kas...@ts.fujitsu.com> wrote:   
>>> 
>>> When using "rbd create ... --image-format 2" in some cases this 
>> CMD is   
>>   
>>> rejected by 
>>> EINVAL with the message "librbd: STRIPINGV2 and format 2 or later   
>>> required for non-default striping"  
>>> But, in v0.61.9 "STRIPINGV2 and format 2" should be supported   
>>> 
>>> [root@rx37-3 ~]# rbd create --pool SSD-r2 --size 20480 --order 16   
>>> --image-format 2 --stripe-unit 65536 --stripe-count 4 t2
>>> rbd: create error: (22) Invalid argument
>>> 2014-03-11 14:39:03.885185 7f15bc170760 -1 librbd: STRIPINGV2 and   
>> format   
>>   
>>> 2 or later required for non-default striping
>>> 
>>> [root@rx37-3 ~]# ceph -v
>>> ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90)  
>>> 
>>> Any hints ? 
>>> 
>>> Regards,
>>> -Dieter 
>>> ___ 
>>> ceph-users mailing list 
>>> [2][3]ceph-users@lists.ceph.com 
>>> [3][4]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> References   
>>> 
>>>   Visible links 
>>>   1. mailto:[5]dieter.kas...@ts.fujitsu.com 
>>>   2. mailto:[6]ceph-users@lists.ceph.com
>>>   3. [7]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
>> 
>> References
>> 
>>   Visible links
>>   1. mailto:dieter.kas...@ts.fujitsu.com
>>   2. mailto:dieter.kas...@ts.fujitsu.com
>>   3. mailto:ceph-users@lists.ceph.com
>>   4. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>   5. mailto:dieter.kas...@ts.fujitsu.com
>>   6. mailto:ceph-users@lists.ceph.com
>>   7. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] how to configure ceph object gateway

2014-03-10 Thread Jean-Charles LOPEZ

Hi,

looks like this comes from the apache install. Something is wrong or different 
with CentOS.

Replace first command with
ln -s /etc/httpd/sites-available/rgw.conf /etc/httpd/conf.d/rgw.conf

Replace second command with
unlink /etc/httpd/conf.d/default

This should make the trick

JC



On Mar 10, 2014, at 21:53, wsnote  wrote:

> >You must also create an rgw.conf file in the /etc/apache2/sites-enabled 
> >directory. 
> There is no  /etc/apache2/sites-enabled directory in the CentOS. So I didn't 
> create rgw.conf. I put the content of rgw.conf to the httpd.conf.
> 
> >sudo a2ensite rgw.conf
> >sudo a2dissite default
> 
> These 2 commands was not found in the CentOS.
> I can start /etc/init.d/ceph-radosgw and create a gateway user.But when I 
> used API, it shows "403 forbidden".
> I didn't add wildcart to DNS, because I didn't use domain.
> 
> 在 2014-03-11 10:47:57，"Jean-Charles LOPEZ"  写道：
> >Hi,
> >
> >what commands are “not found”?
> >
> >This page for configuring the RGW works fine as far as I know as I used it 
> >no later than a week ago.
> >
> >Can you please give us more details? What is your layout (radosgw installed 
> >on a ceph node, mon node, standalone node)?
> >
> >Note: In order to get it running, remember you need to have a web server 
> >installed and running (apache), ceph base packages obviously, swift if you 
> >want to use the swift tool, s3cmd also, s3curl, …
> >
> >JC
> >
> >On Mar 10, 2014, at 19:35, wsnote  wrote:
> >
> >> OS: CentOS 6.4
> >> version: ceph 0.67.7
> >> 
> >> Hello, everyone.
> >> With the help of document, I have install ceph gateway.
> >> But I don't know how to configure it. The web 
> >> http://ceph.com/docs/master/radosgw/config/ has many command not found.I 
> >> thought it's written in the ubuntu.
> >> can anyone help?
> >> Thanks!
> >> 
> >> 
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to configure ceph object gateway

2014-03-10 Thread Jean-Charles LOPEZ

Hi,

what commands are “not found”?

This page for configuring the RGW works fine as far as I know as I used it no 
later than a week ago.

Can you please give us more details? What is your layout (radosgw installed on 
a ceph node, mon node, standalone node)?

Note: In order to get it running, remember you need to have a web server 
installed and running (apache), ceph base packages obviously, swift if you want 
to use the swift tool, s3cmd also, s3curl, …

JC



On Mar 10, 2014, at 19:35, wsnote  wrote:

> OS: CentOS 6.4
> version: ceph 0.67.7
> 
> Hello, everyone.
> With the help of document, I have install ceph gateway.
> But I don't know how to configure it. The web 
> http://ceph.com/docs/master/radosgw/config/ has many command not found.I 
> thought it's written in the ubuntu.
> can anyone help?
> Thanks!
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd down

2014-02-16 Thread Jean-Charles LOPEZ

Hi Pavel,

It looks like you have deployed your 2 OSDs on the same host. By default, in 
the CRUSH map, each object is going to be assigned ti 2 OSDs that are on 
different host.

If you want this to work for testing, you’ll have to adapt your CRUSH map so 
that each copy is dispatch on a bucket of type ‘odd’ and not host.

Being unable to find a candidate OSD according to the current CRUSH map you 
have iOS probably why your PGs remain stuck and inactive. I reproduced your 
setup and got the same result but as soon as I modified the map, all the PGs 
came up active+clean

mtc
JC



On Feb 16, 2014, at 14:02, Pavel V. Kaygorodov  wrote:

> Hi!
> 
> I have tried, but situation not changed significantly:
> 
> # ceph -w
> cluster e90dfd37-98d1-45bb-a847-8590a5ed8e71
>  health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 2/2 in 
> osds are down
>  monmap e1: 1 mons at {host1=172.17.0.4:6789/0}, election epoch 1, quorum 
> 0 host1
>  osdmap e9: 2 osds: 0 up, 2 in
>   pgmap v10: 192 pgs, 3 pools, 0 bytes data, 0 objects
> 0 kB used, 0 kB / 0 kB avail
>  192 creating
> 2014-02-16 17:25:29.872538 mon.0 [INF] osdmap e9: 2 osds: 0 up, 2 in
> 
> # ceph osd tree
> # idweight  type name   up/down reweight
> -1  2   root default
> -2  2   host host1
> 0   1   osd.0   down1
> 1   1   osd.1   down1
> 
> # ceph health
> HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 2/2 in osds are 
> down
> 
> ps showed both osd daemons running.
> 
> Pavel.
> 
> 17 февр. 2014 г., в 1:50, Karan Singh  написал(а):
> 
>> Hi Pavel
>> 
>> Try to add at least 1 more OSD ( bare minimum ) and set pool replication to 
>> 2 after that.
>> For osd.0  try  ,   # ceph osd in osd.0   , once the osd is IN , try to 
>> bring up osd.0 services up 
>> 
>> 
>> Finally your both the OSD should be  IN  and UP , so that your cluster can 
>> store data.
>> 
>> Regards
>> Karan
>> 
>> 
>> On 16 Feb 2014, at 20:06, Pavel V. Kaygorodov  wrote:
>> 
>>> Hi, All!
>>> 
>>> I am trying to setup ceph from scratch, without dedicated drive, with one 
>>> mon and one osd.
>>> After all, I see following output of ceph osd tree:
>>> 
>>> # idweight  type name   up/down reweight
>>> -1  1   root default
>>> -2  1   host host1
>>> 0   1   osd.0   down0
>>> 
>>> ceph -w:
>>> 
>>>cluster e90dfd37-98d1-45bb-a847-8590a5ed8e71
>>> health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
>>> monmap e1: 1 mons at {host1=172.17.0.4:6789/0}, election epoch 1, 
>>> quorum 0 host1
>>> osdmap e5: 1 osds: 0 up, 0 in
>>>  pgmap v6: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>>0 kB used, 0 kB / 0 kB avail
>>> 192 creating
>>> 
>>> 2014-02-16 13:27:30.095938 mon.0 [INF] osdmap e5: 1 osds: 0 up, 0 in
>>> 
>>> What can be wrong?
>>> I see working daemons, and nothing bad in log files.
>>> 
>>> 
>>> 
>>> How to reproduce:
>>> I have cloned and compiled sources on debian/jessie:
>>> 
>>> git clone --recursive -b v0.75 https://github.com/ceph/ceph.git
>>> cd /ceph/ && ./autogen.sh && ./configure && make && make install
>>> 
>>> Everything seems ok.
>>> 
>>> I have created ceph.conf:
>>> 
>>> [global]
>>> 
>>> fsid = e90dfd37-98d1-45bb-a847-8590a5ed8e71
>>> mon initial members = host1
>>> 
>>> auth cluster required = cephx
>>> auth service required = cephx
>>> auth client required = cephx
>>> 
>>> keyring = /data/ceph.client.admin.keyring
>>> 
>>> osd pool default size = 1
>>> osd pool default min size = 1
>>> osd pool default pg num = 333
>>> osd pool default pgp num = 333
>>> osd crush chooseleaf type = 0   
>>> osd journal size = 1000
>>> 
>>> filestore xattr use omap = true
>>> 
>>> ;journal dio = false
>>> ;journal aio = false
>>> 
>>> mon addr = ceph.dkctl
>>> mon host = ceph.dkctl
>>> 
>>> log file = /data/logs/ceph.log
>>> 
>>> [mon]
>>> mon data = /data/mon0
>>> keyring = /data/ceph.mon.keyring
>>> log file = /data/logs/mon0.log
>>> 
>>> [osd.0]
>>> osd host= host1
>>> osd data= /data/osd0
>>> osd journal = /data/osd0.journal
>>> log file= /data/logs/osd0.log
>>> keyring = /data/ceph.osd0.keyring
>>> 
>>> ///
>>> 
>>> I have initialized mon and osd using following script:
>>> 
>>> /usr/local/bin/ceph-authtool --create-keyring /data/ceph.mon.keyring 
>>> --gen-key -n mon. --cap mon 'allow *'
>>> /usr/local/bin/ceph-authtool --create-keyring 
>>> /data/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap 
>>> mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
>>> /usr/local/bin/ceph-authtool /data/ceph.mon.keyring --import-keyring 
>>> /data/ceph.client.admin.keyring 
>>> /usr/local/bin/monmaptool --create --add host1 `grep ceph /etc/hosts | awk 
>>> '{print $1}'` --fsid de90dfd37-98d1-45bb-a847-8590a5ed8e71 /data/monmap
>

Re: [ceph-users] Block Devices and OpenStack

2014-02-15 Thread Jean-Charles LOPEZ

Hi,

what do you get when you run a 'ceph auth list' command for the user name 
(client.cinder) you created for cinder? Are the caps and the key for this user 
correct? No typo in the hostname in the cinder.conf file (host=) ? Did you copy 
the keyring to the cinder running cinder (can’t really say from your output and 
there is no ceph-s command to check the monitor names)?

It could just be a typo in the ceph auth get-or-create command that’s causing 
it.

Rgds
JC



On Feb 15, 2014, at 10:35, Ashish Chandra  wrote:

> Hi Cephers,
> 
> I am trying to configure ceph rbd as backend for cinder and glance by 
> following the steps mentioned in:
> 
> http://ceph.com/docs/master/rbd/rbd-openstack/
> 
> Before I start all openstack services are running normally and ceph cluster 
> health shows "HEALTH_OK"
> 
> But once I am done with all steps and restart openstack services, 
> cinder-volume fails to start and throws an error.
> 
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Traceback (most 
> recent call last):
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File 
> "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 262, in 
> check_for_setup_error
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd with 
> RADOSClient(self):
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File 
> "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 234, in __init__
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd self.cluster, 
> self.ioctx = driver._connect_to_rados(pool)
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File 
> "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 282, in 
> _connect_to_rados
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd client.connect()
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File 
> "/usr/lib/python2.7/dist-packages/rados.py", line 185, in connect
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd raise 
> make_ex(ret, "error calling connect")
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Error: error calling 
> connect: error code 95
> 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd
> 2014-02-16 00:01:42.591 ERROR cinder.volume.manager 
> [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Error encountered during 
> initialization of driver: RBDDriver
> 2014-02-16 00:01:42.592 ERROR cinder.volume.manager 
> [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Bad or unexpected 
> response from the storage volume backend API: error connecting to ceph cluster
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager Traceback (most recent 
> call last):
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File 
> "/opt/stack/cinder/cinder/volume/manager.py", line 190, in init_host
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager 
> self.driver.check_for_setup_error()
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File 
> "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 267, in 
> check_for_setup_error
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager raise 
> exception.VolumeBackendAPIException(data=msg)
> 2014-02-16 00:01:42.592 TRACE cinder.volume.manager 
> VolumeBackendAPIException: Bad or unexpected response from the storage volume 
> backend API: error connecting to ceph cluster
> 
> 
> Here is the content of my /etc/ceph in openstack node: 
> 
> ashish@ubuntu:/etc/ceph$ ls -lrt
> total 16
> -rw-r--r-- 1 cinder cinder 229 Feb 15 23:45 ceph.conf
> -rw-r--r-- 1 glance glance  65 Feb 15 23:46 ceph.client.glance.keyring
> -rw-r--r-- 1 cinder cinder  65 Feb 15 23:47 ceph.client.cinder.keyring
> -rw-r--r-- 1 cinder cinder  72 Feb 15 23:47 ceph.client.cinder-backup.keyring
> 
> I am really stuck and tried a lot. What Could possibly I be doing wrong.
> 
> 
> HELP.
> 
> 
> Thanks and Regards
> Ashish Chandra
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG not getting clean

2014-02-14 Thread Jean-Charles LOPEZ

Hi Karan,

have you tried first to identify which PGs are in this status: ceph pg dump | 
grep [peering | down+peering | remapped+peering]

This might point you to a specific OSD for all of them or some specific ones. 
If that’s the case, just make sure you just restart the OSDs for those PGs one 
after the other ones depending on how may OSDs are holding those PGs.

One other question: Are these real nodes or VMs you are using for a test? 
Because I had sometimes this behavior after hibernating a VM and restarting it.

JC



On Feb 14, 2014, at 08:58, Karan Singh  wrote:

> Hello Cephers
> 
> I am struggling with my ceph cluster health  ,  PGS are not getting clean , i 
> waited for recovery process to get end was hoping after recovery PG will 
> become clean , but it didn’t. Can you please share your suggestions.
> 
>cluster 0ff473d9-0670-42a3-89ff-81bbfb2e676a
> health HEALTH_WARN 119 pgs down; 303 pgs peering; 303 pgs stuck inactive; 
> 303 pgs stuck unclean; mds cluster is degraded; crush map has no
> n-optimal tunables
> monmap e3: 3 mons at 
> {ceph-mon1=192.168.1.38:6789/0,ceph-mon2=192.168.1.33:6789/0,ceph-mon3=192.168.1.31:6789/0},
>  election epoch 4226, quo
> rum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3
> mdsmap e8465: 1/1/1 up {0=ceph-mon1=up:replay}
> osdmap e250466: 10 osds: 10 up, 10 in
>  pgmap v585809: 576 pgs, 6 pools, 101933 MB data, 25453 objects
>343 GB used, 5423 GB / 5767 GB avail
> 273 active+clean
> 108 peering
> 119 down+peering
>  76 remapped+peering
> 
> 
> # id  weight  type name   up/down reweight
> -15.65root default
> -20   host ceph-node1
> -31.72host ceph-node2
> 4 0.43osd.4   up  1
> 5 0.43osd.5   up  1
> 6 0.43osd.6   up  1
> 7 0.43osd.7   up  1
> -41.31host ceph-node4
> 8 0.88osd.8   up  1
> 1 0.43osd.1   up  1
> -51.31host ceph-node5
> 9 0.88osd.9   up  1
> 2 0.43osd.2   up  1
> -60.88host ceph-node6
> 100.88osd.10  up  1
> -70.43host ceph-node3
> 0 0.43osd.0   up  1
> 
> 
> 
> Regards
> karan
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Use of daily-created/-deleted pools

2014-02-11 Thread Jean-Charles Lopez

Hi Lee

You could use an Ceph RBD device on a server and export a directory that
you would have created on this RBD though NFS.

3 days after the files are uploade, you could snapshot the RBD device,
delete the directory containing the files, and a week later, when sure you
do not need the snapshot for a restore, remove the s'apshot.

Hope this can help
Rgds
JC


On Tuesday, February 11, 2014, Hyangtack Lee  wrote:

> I'm new to Ceph, and looking for a new storage to replace legacy system.
>
> My system has a lot of files accessing temporarily for 2 or 3 days.
> Those files are uploaded from many clients everyday, and batch job deletes
> unused files everyday.
>
> In this case, can I use Ceph's pool to store daily uploaded files?
> Scenario is like below:
> 1. create daily pool, e.g. pool-2014-02-12
> 2. store files to the pool
> 3. After 3 days, remove the pool created at step 1.
>
> Is it possible? Is there anyone trying like this?
> Or, can you recommend a good way to delete a group of files(i.e. a
> directory on posix fs) on Ceph?
>
> According to http://ceph.com/docs/master/cephfs/, Ceph FS is currently
> not recommended for production data. So I exclude Ceph FS from my list and
> focus on Ceph Storage Cluster accessing by librados.
>
> Thanks in advance.
>


-- 
Sent while moving
Pardon my French and any spelling &| grammar glitches
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

82 matches

Mail list logo