Re: [ceph-users] CephFS in the wild
Hello, On Fri, 3 Jun 2016 15:43:11 +0100 David wrote: > I'm hoping to implement cephfs in production at some point this year so > I'd be interested to hear your progress on this. > > Have you considered SSD for your metadata pool? You wouldn't need loads > of capacity although even with reliable SSD I'd probably still do x3 > replication for metadata. I've been looking at the intel s3610's for > this. > That's an interesting and potentially quite beneficial thought, but it depends on a number of things (more below). I'm using S3610s (800GB) for a cache pool with 2x replication and am quite happy with that, but then again I have a very predictable usage pattern and am monitoring those SSDs religiously and I'm sure they will outlive things by a huge margin. We didn't go for 3x replication due to (in order): a) cost b) rack space c) increased performance with 2x Now for how useful/helpful a fast meta-data pool would be, I reckon it depends on a number of things: a) Is the cluster write or read heavy? b) Do reads, flocks, anything that is not directly considered a read cause writes to the meta-data pool? c) Anything else that might cause write storms to the meta-data pool, like bit in the current NFS over CephFS thread with sync? A quick glance at my test cluster seems to indicate that CephFS meta data per filesystem object is about 2KB, somebody with actual clues please confirm this. Brady has large amounts of NVMe space left over in his current design, assuming 10GB journals about 2.8TB of raw space. So if running the (verified) numbers indicates that the meta data can fit in this space, I'd put it there. Otherwise larger SSDs (indeed S3610s) for OS and meta-data pool storage may be the way forward. Regards, Christian > > > On Wed, Jun 1, 2016 at 9:50 PM, Brady Deetz wrote: > > > Question: > > I'm curious if there is anybody else out there running CephFS at the > > scale I'm planning for. I'd like to know some of the issues you didn't > > expect that I should be looking out for. I'd also like to simply see > > when CephFS hasn't worked out and why. Basically, give me your war > > stories. > > > > > > Problem Details: > > Now that I'm out of my design phase and finished testing on VMs, I'm > > ready to drop $100k on a pilo. I'd like to get some sense of > > confidence from the community that this is going to work before I pull > > the trigger. > > > > I'm planning to replace my 110 disk 300TB (usable) Oracle ZFS 7320 with > > CephFS by this time next year (hopefully by December). My workload is > > a mix of small and vary large files (100GB+ in size). We do fMRI > > analysis on DICOM image sets as well as other physio data collected > > from subjects. We also have plenty of spreadsheets, scripts, etc. > > Currently 90% of our analysis is I/O bound and generally sequential. > > > > In deploying Ceph, I am hoping to see more throughput than the 7320 can > > currently provide. I'm also looking to get away from traditional > > file-systems that require forklift upgrades. That's where Ceph really > > shines for us. > > > > I don't have a total file count, but I do know that we have about 500k > > directories. > > > > > > Planned Architecture: > > > > Storage Interconnect: > > Brocade VDX 6940 (40 gig) > > > > Access Switches for clients (servers): > > Brocade VDX 6740 (10 gig) > > > > Access Switches for clients (workstations): > > Brocade ICX 7450 > > > > 3x MON: > > 128GB RAM > > 2x 200GB SSD for OS > > 2x 400GB P3700 for LevelDB > > 2x E5-2660v4 > > 1x Dual Port 40Gb Ethernet > > > > 2x MDS: > > 128GB RAM > > 2x 200GB SSD for OS > > 2x 400GB P3700 for LevelDB (is this necessary?) > > 2x E5-2660v4 > > 1x Dual Port 40Gb Ethernet > > > > 8x OSD: > > 128GB RAM > > 2x 200GB SSD for OS > > 2x 400GB P3700 for Journals > > 24x 6TB Enterprise SATA > > 2x E5-2660v4 > > 1x Dual Port 40Gb Ethernet > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS in the wild
On Wed, Jun 1, 2016 at 1:50 PM, Brady Deetz wrote: > Question: > I'm curious if there is anybody else out there running CephFS at the scale > I'm planning for. I'd like to know some of the issues you didn't expect that > I should be looking out for. I'd also like to simply see when CephFS hasn't > worked out and why. Basically, give me your war stories. > > > Problem Details: > Now that I'm out of my design phase and finished testing on VMs, I'm ready > to drop $100k on a pilo. I'd like to get some sense of confidence from the > community that this is going to work before I pull the trigger. > > I'm planning to replace my 110 disk 300TB (usable) Oracle ZFS 7320 with > CephFS by this time next year (hopefully by December). My workload is a mix > of small and vary large files (100GB+ in size). We do fMRI analysis on DICOM > image sets as well as other physio data collected from subjects. We also > have plenty of spreadsheets, scripts, etc. Currently 90% of our analysis is > I/O bound and generally sequential. > > In deploying Ceph, I am hoping to see more throughput than the 7320 can > currently provide. I'm also looking to get away from traditional > file-systems that require forklift upgrades. That's where Ceph really shines > for us. > > I don't have a total file count, but I do know that we have about 500k > directories. > > > Planned Architecture: > > Storage Interconnect: > Brocade VDX 6940 (40 gig) > > Access Switches for clients (servers): > Brocade VDX 6740 (10 gig) > > Access Switches for clients (workstations): > Brocade ICX 7450 > > 3x MON: > 128GB RAM > 2x 200GB SSD for OS > 2x 400GB P3700 for LevelDB > 2x E5-2660v4 > 1x Dual Port 40Gb Ethernet > > 2x MDS: > 128GB RAM > 2x 200GB SSD for OS > 2x 400GB P3700 for LevelDB (is this necessary?) > 2x E5-2660v4 > 1x Dual Port 40Gb Ethernet The MDS doesn't use any local storage, other than for storing its ceph.conf and keyring. > > 8x OSD: > 128GB RAM > 2x 200GB SSD for OS > 2x 400GB P3700 for Journals > 24x 6TB Enterprise SATA > 2x E5-2660v4 > 1x Dual Port 40Gb Ethernet I don't know what kind of throughput you're currently seeing on your ZFS system. Unfortunately most of the big CephFS users are pretty quiet on the lists :( although they sometimes come out to play at events like https://www.msi.umn.edu/sc15Ceph. :) You'll definitely want to do some tuning. Right now we default to 100k inodes in the metadata cache for instance, which fits in <1GB of RAM. You'll want to bump that way, way up. Also keep in mind that CephFS' performance characteristics are just weirdly different to NAS boxes or ZFS in ways you might not be ready for. So large streaming writes will do great, but if you have shared RW files or directories, that might be much faster in some places and much slower in ones you didn't think about. Large streaming reads and writes will go as quickly as RADOS can drive them (80-100MB/s per OSD for reads is generally a good estimate, I think? And divide that by replication factor for writes); with smaller ops you start running into latency issues and the fact that CephFS (since it's sending RADOS writes to separate objects) can't coalesce writes as much as local FSes (or boxes built on them). -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse performance about hammer and jewel
Yan, Zheng: Thanks for your reply. But change into jewel, application read/write disk slowly. confirms the fio tested iops. Does there any other possibles? 在 16/6/1 21:39, Yan, Zheng 写道: On Wed, Jun 1, 2016 at 6:52 PM, qisy wrote: my test fio fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=1G -filename=test.iso -name="CEPH 4KB randwrite test" -iodepth=32 -runtime=60 You were testing direct-IO performance. Hammer does not handle direct-IO correctly, data are cached in ceph-fuse. Regards Yan, Zheng 在 16/6/1 15:22, Yan, Zheng 写道: On Mon, May 30, 2016 at 10:22 PM, qisy wrote: Hi, After jewel released fs product ready version, I upgrade the old hammer cluster, but iops droped a lot I made a test, with 3 nodes, each one have 8c 16G 1osd, the osd device got 15000 iops I found ceph-fuse client has better performance on hammer than jewel. fio randwrite 4K | | jewel server | hammer server | |jewel client | 480+ iops| no test | |hammer client | 6000+ iops | 6000+ iops | please post the fio config file. Regards Yan, Zheng ceph-fuse(jewel) mount with jewel server got pity iops, is there any special options need to set? If I continue use ceph-fuse(hammer) with jewel server, any problems will cause? thanks my ceph.conf below: [global] fsid = xxx mon_initial_members = xxx, xxx, xxx mon_host = 10.0.0.1,10.0.0.2,10.0.0.3 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 1 mon_data_avail_warn = 15 mon_data_avail_crit = 5 mon_clock_drift_allowed = 0.6 [osd] osd_disk_threads = 8 osd_op_threads = 8 journal_block_align = true journal_dio = true journal_aio = true journal_force_aio = true filestore_journal_writeahead = true filestore_max_sync_interval = 15 filestore_min_sync_interval = 10 filestore_queue_max_ops = 25000 filestore_queue_committing_max_ops = 5000 filestore_op_threads = 32 osd_journal_size = 2 osd_map_cache_size = 1024 osd_max_write_size = 512 osd_scrub_load_threshold = 1 osd_heartbeat_grace = 30 [mds] mds_session_timeout = 120 mds_session_autoclose = 600 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS: slow writes over NFS when fs is mounted with kernel driver but fast with Fuse
On Fri, Jun 3, 2016 at 10:43 PM, Jan Schermer wrote: > I'd be worried about it getting "fast" all of sudden. Test crash > consistency. > If you test something like file creation you should be able to estimate if > it should be that fast. (So it should be some fraction of theoretical IOPS > on the drives/backing rbd device...) Sudden "fast" is because MDS flushes its journal more frequently. There is no risk of metadata/data loss. Yan, Zheng > > If it's too fast then maybe the "sync" isn't working properly... > > Jan > > On 03 Jun 2016, at 16:26, David wrote: > > Zheng, thanks for looking into this, it makes sense although strangely I've > set up a new nfs server (different hardware, same OS, Kernel etc.) and I'm > unable to recreate the issue. I'm no longer getting the delay, the nfs > export is still using sync. I'm now comparing the servers to see what's > different on the original server. Apologies if I've wasted your time on > this! > > Jan, I did some more testing with Fuse on the original server and I was > seeing the same issue, yes I was testing from the nfs client. As above I > think there was something weird with that original server. Noted on sync vs > async, I plan on sticking with sync. > > On Fri, Jun 3, 2016 at 5:03 AM, Yan, Zheng wrote: >> >> On Mon, May 30, 2016 at 10:29 PM, David wrote: >> > Hi All >> > >> > I'm having an issue with slow writes over NFS (v3) when cephfs is >> > mounted >> > with the kernel driver. Writing a single 4K file from the NFS client is >> > taking 3 - 4 seconds, however a 4K write (with sync) into the same >> > folder on >> > the server is fast as you would expect. When mounted with ceph-fuse, I >> > don't >> > get this issue on the NFS client. >> > >> > Test environment is a small cluster with a single MON and single MDS, >> > all >> > running 10.2.1, CephFS metadata is an ssd pool, data is on spinners. The >> > NFS >> > server is CentOS 7, I've tested with the current shipped kernel (3.10), >> > ELrepo 4.4 and ELrepo 4.6. >> > >> > More info: >> > >> > With the kernel driver, I mount the filesystem with "-o >> > name=admin,secret" >> > >> > I've exported a folder with the following options: >> > >> > *(rw,root_squash,sync,wdelay,no_subtree_check,fsid=1244,sec=1) >> > >> > I then mount the folder on a CentOS 6 client with the following options >> > (all >> > default): >> > >> > >> > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.3.231,mountvers=3,mountport=597,mountproto=udp,local_lock=none >> > >> > A small 4k write is taking 3 - 4 secs: >> > >> > # time dd if=/dev/zero of=testfile bs=4k count=1 >> > 1+0 records in >> > 1+0 records out >> > 4096 bytes (4.1 kB) copied, 3.59678 s, 1.1 kB/s >> > >> > real0m3.624s >> > user0m0.000s >> > sys 0m0.001s >> > >> > But a sync write on the sever directly into the same folder is fast >> > (this is >> > with the kernel driver): >> > >> > # time dd if=/dev/zero of=testfile2 bs=4k count=1 conv=fdatasync >> > 1+0 records in >> > 1+0 records out >> > 4096 bytes (4.1 kB) copied, 0.0121925 s, 336 kB/s >> >> >> Your nfs export has sync option. 'dd if=/dev/zero of=testfile bs=4k >> count=1' on nfs client is equivalent to 'dd if=/dev/zero of=testfile >> bs=4k count=1 conv=fsync' on cephfs. The reason that sync metadata >> operation takes 3~4 seconds is that the MDS flushes its journal every >> 5 seconds. Adding async option to nfs export can avoid this delay. >> >> > >> > real0m0.015s >> > user0m0.000s >> > sys 0m0.002s >> > >> > If I mount cephfs with Fuse instead of the kernel, the NFS client write >> > is >> > fast: >> > >> > dd if=/dev/zero of=fuse01 bs=4k count=1 >> > 1+0 records in >> > 1+0 records out >> > 4096 bytes (4.1 kB) copied, 0.026078 s, 157 kB/s >> > >> >> In this case, ceph-fuse sends an extra request (getattr request on >> directory) to MDS. The request causes MDS to flush its journal. >> Whether or not client sends the extra request depends on what >> capabilities it has. What capabilities client has, in turn, depend on >> how many clients are accessing the directory. In my test, nfs on >> ceph-fuse is not always fast. >> >> Yan, Zheng >> >> >> > Does anyone know what's going on here? >> >> >> >> > >> > Thanks >> > >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
Thanks Jason. I don’t have anything specified explicitly for osd class dir. I suspect it might be related to the OSDs being restarted during the package upgrade process before all libraries are upgraded. > -Original Message- > From: Jason Dillaman [mailto:jdill...@redhat.com] > Sent: Monday, 6 June 2016 12:37 PM > To: Adrian Saul > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > Odd -- sounds like you might have Jewel and Infernalis class objects and > OSDs intermixed. I would double-check your installation and see if your > configuration has any overload for "osd class dir". > > On Sun, Jun 5, 2016 at 10:28 PM, Adrian Saul > wrote: > > > > I have traced it back to an OSD giving this error: > > > > 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open > > got (5) Input/output error > > 2016-06-06 12:19:49.835227 7fd714679700 0 _load_class could not open > > class /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): > > /usr/lib64/rados-classes/libcls_rbd.so: undefined symbol: > > _ZN4ceph6buffer4list8iteratorC1EPS1_j > > > > Trying to figure out why that is the case. > > > > > >> -Original Message- > >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > >> Of Adrian Saul > >> Sent: Monday, 6 June 2016 11:11 AM > >> To: dilla...@redhat.com > >> Cc: ceph-users@lists.ceph.com > >> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > >> > >> > >> No - it throws a usage error - if I add a file argument after it works: > >> > >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get > >> rbd_id.hypervtst- > >> lun04 /tmp/crap > >> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de > >> > >> stat works: > >> > >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat > >> rbd_id.hypervtst- > >> lun04 > >> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, > >> size 18 > >> > >> > >> I can do a rados ls: > >> > >> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id > >> rbd_id.cloud2sql-lun01 > >> rbd_id.glbcluster3-vm17 > >> rbd_id.holder <<< a create that said it failed while I was debugging > >> this > >> rbd_id.pvtcloud-nfs01 > >> rbd_id.hypervtst-lun05 > >> rbd_id.test02 > >> rbd_id.cloud2sql-lun02 > >> rbd_id.fiotest2 > >> rbd_id.radmast02-lun04 > >> rbd_id.hypervtst-lun04 > >> rbd_id.cloud2fs-lun00 > >> rbd_id.radmast02-lun03 > >> rbd_id.hypervtst-lun00 > >> rbd_id.cloud2sql-lun00 > >> rbd_id.radmast02-lun02 > >> > >> > >> > -Original Message- > >> > From: Jason Dillaman [mailto:jdill...@redhat.com] > >> > Sent: Monday, 6 June 2016 11:00 AM > >> > To: Adrian Saul > >> > Cc: ceph-users@lists.ceph.com > >> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > >> > > >> > Are you able to successfully run the following command successfully? > >> > > >> > rados -p glebe-sata get rbd_id.hypervtst-lun04 > >> > > >> > > >> > > >> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul > >> > wrote: > >> > > > >> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. > >> > > While > >> > the upgrade went through smoothly (aside from a time wasting > >> > restorecon /var/lib/ceph in the selinux package upgrade) and the > >> > services continued running without interruption. However this > >> > morning when I went to create some new RBD images I am unable to do > >> > much at all > >> with RBD. > >> > > > >> > > Just about any rbd command fails with an I/O error. I can run > >> > showmapped but that is about it - anything like an ls, info or > >> > status fails. This applies to all my pools. > >> > > > >> > > I can see no errors in any log files that appear to suggest an > >> > > issue. I have > >> > also tried the commands on other cluster members that have not done > >> > anything with RBD before (I was wondering if perhaps the kernel rbd > >> > was pinning the old library version open or something) but the same > >> > error > >> occurs. > >> > > > >> > > Where can I start trying to resolve this? > >> > > > >> > > Cheers, > >> > > Adrian > >> > > > >> > > > >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > >> > > rbd: list: (5) Input/output error > >> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing > >> > > image in directory: (5) Input/output error > >> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing > >> > > v2 > >> > > images: (5) Input/output error > >> > > > >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > >> > > rbd: list: (5) Input/output error > >> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing > >> > > image in directory: (5) Input/output error > >> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing > >> > > v2 > >> > > images: (5) Input/output error > >> > > > >> > > [root@ceph-glb-fec-02 ~]# rbd showmapped > >> > > id pool image snap device > >> > > 0 glebe-sata test02
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
I couldn't find anything wrong with the packages and everything seemed installed ok. Once I restarted the OSDs the directory issue went away but the error started moving to other rbd output, and the same class open error occurred on other OSDs. I have gone through and bounced all the OSDs and that seems to have cleared the issue. I am guessing that perhaps the restart of the OSDs during the package upgrade is occurring before all library packages are upgraded and so they are starting with the wrong versions loaded, so when these class libraries are dynamically opened later they are failing. > -Original Message- > From: Adrian Saul > Sent: Monday, 6 June 2016 12:29 PM > To: Adrian Saul; dilla...@redhat.com > Cc: ceph-users@lists.ceph.com > Subject: RE: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > I have traced it back to an OSD giving this error: > > 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got > (5) Input/output error > 2016-06-06 12:19:49.835227 7fd714679700 0 _load_class could not open class > /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): /usr/lib64/rados- > classes/libcls_rbd.so: undefined symbol: > _ZN4ceph6buffer4list8iteratorC1EPS1_j > > Trying to figure out why that is the case. > > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > > Of Adrian Saul > > Sent: Monday, 6 June 2016 11:11 AM > > To: dilla...@redhat.com > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > > > > No - it throws a usage error - if I add a file argument after it works: > > > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst- > > lun04 /tmp/crap > > [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de > > > > stat works: > > > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat > > rbd_id.hypervtst- > > lun04 > > glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, > > size 18 > > > > > > I can do a rados ls: > > > > [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id > > rbd_id.cloud2sql-lun01 > > rbd_id.glbcluster3-vm17 > > rbd_id.holder <<< a create that said it failed while I was debugging this > > rbd_id.pvtcloud-nfs01 > > rbd_id.hypervtst-lun05 > > rbd_id.test02 > > rbd_id.cloud2sql-lun02 > > rbd_id.fiotest2 > > rbd_id.radmast02-lun04 > > rbd_id.hypervtst-lun04 > > rbd_id.cloud2fs-lun00 > > rbd_id.radmast02-lun03 > > rbd_id.hypervtst-lun00 > > rbd_id.cloud2sql-lun00 > > rbd_id.radmast02-lun02 > > > > > > > -Original Message- > > > From: Jason Dillaman [mailto:jdill...@redhat.com] > > > Sent: Monday, 6 June 2016 11:00 AM > > > To: Adrian Saul > > > Cc: ceph-users@lists.ceph.com > > > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > > > > Are you able to successfully run the following command successfully? > > > > > > rados -p glebe-sata get rbd_id.hypervtst-lun04 > > > > > > > > > > > > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul > > > wrote: > > > > > > > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. > > > > While > > > the upgrade went through smoothly (aside from a time wasting > > > restorecon /var/lib/ceph in the selinux package upgrade) and the > > > services continued running without interruption. However this > > > morning when I went to create some new RBD images I am unable to do > > > much at all > > with RBD. > > > > > > > > Just about any rbd command fails with an I/O error. I can run > > > showmapped but that is about it - anything like an ls, info or > > > status fails. This applies to all my pools. > > > > > > > > I can see no errors in any log files that appear to suggest an > > > > issue. I have > > > also tried the commands on other cluster members that have not done > > > anything with RBD before (I was wondering if perhaps the kernel rbd > > > was pinning the old library version open or something) but the same > > > error > > occurs. > > > > > > > > Where can I start trying to resolve this? > > > > > > > > Cheers, > > > > Adrian > > > > > > > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > > > > rbd: list: (5) Input/output error > > > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing > > > > image in directory: (5) Input/output error > > > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing > > > > v2 > > > > images: (5) Input/output error > > > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > > > > rbd: list: (5) Input/output error > > > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing > > > > image in directory: (5) Input/output error > > > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing > > > > v2 > > > > images: (5) Input/output error > > > > > > > > [root@ceph-glb-fec-02 ~]# rbd showmapped > > > > id pool image snap device > > > > 0 glebe-sata test02-/de
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
Odd -- sounds like you might have Jewel and Infernalis class objects and OSDs intermixed. I would double-check your installation and see if your configuration has any overload for "osd class dir". On Sun, Jun 5, 2016 at 10:28 PM, Adrian Saul wrote: > > I have traced it back to an OSD giving this error: > > 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got > (5) Input/output error > 2016-06-06 12:19:49.835227 7fd714679700 0 _load_class could not open class > /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): > /usr/lib64/rados-classes/libcls_rbd.so: undefined symbol: > _ZN4ceph6buffer4list8iteratorC1EPS1_j > > Trying to figure out why that is the case. > > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Adrian Saul >> Sent: Monday, 6 June 2016 11:11 AM >> To: dilla...@redhat.com >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade >> >> >> No - it throws a usage error - if I add a file argument after it works: >> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst- >> lun04 /tmp/crap >> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de >> >> stat works: >> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst- >> lun04 >> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18 >> >> >> I can do a rados ls: >> >> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id >> rbd_id.cloud2sql-lun01 >> rbd_id.glbcluster3-vm17 >> rbd_id.holder <<< a create that said it failed while I was debugging this >> rbd_id.pvtcloud-nfs01 >> rbd_id.hypervtst-lun05 >> rbd_id.test02 >> rbd_id.cloud2sql-lun02 >> rbd_id.fiotest2 >> rbd_id.radmast02-lun04 >> rbd_id.hypervtst-lun04 >> rbd_id.cloud2fs-lun00 >> rbd_id.radmast02-lun03 >> rbd_id.hypervtst-lun00 >> rbd_id.cloud2sql-lun00 >> rbd_id.radmast02-lun02 >> >> >> > -Original Message- >> > From: Jason Dillaman [mailto:jdill...@redhat.com] >> > Sent: Monday, 6 June 2016 11:00 AM >> > To: Adrian Saul >> > Cc: ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade >> > >> > Are you able to successfully run the following command successfully? >> > >> > rados -p glebe-sata get rbd_id.hypervtst-lun04 >> > >> > >> > >> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul >> > wrote: >> > > >> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. >> > > While >> > the upgrade went through smoothly (aside from a time wasting >> > restorecon /var/lib/ceph in the selinux package upgrade) and the >> > services continued running without interruption. However this morning >> > when I went to create some new RBD images I am unable to do much at all >> with RBD. >> > > >> > > Just about any rbd command fails with an I/O error. I can run >> > showmapped but that is about it - anything like an ls, info or status >> > fails. This applies to all my pools. >> > > >> > > I can see no errors in any log files that appear to suggest an >> > > issue. I have >> > also tried the commands on other cluster members that have not done >> > anything with RBD before (I was wondering if perhaps the kernel rbd >> > was pinning the old library version open or something) but the same error >> occurs. >> > > >> > > Where can I start trying to resolve this? >> > > >> > > Cheers, >> > > Adrian >> > > >> > > >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata >> > > rbd: list: (5) Input/output error >> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing >> > > image in directory: (5) Input/output error >> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 >> > > images: (5) Input/output error >> > > >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd >> > > rbd: list: (5) Input/output error >> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing >> > > image in directory: (5) Input/output error >> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 >> > > images: (5) Input/output error >> > > >> > > [root@ceph-glb-fec-02 ~]# rbd showmapped >> > > id pool image snap device >> > > 0 glebe-sata test02-/dev/rbd0 >> > > 1 glebe-ssd zfstest -/dev/rbd1 >> > > 10 glebe-sata hypervtst-lun00 -/dev/rbd10 >> > > 11 glebe-sata hypervtst-lun02 -/dev/rbd11 >> > > 12 glebe-sata hypervtst-lun03 -/dev/rbd12 >> > > 13 glebe-ssd nspprd01_lun00-/dev/rbd13 >> > > 14 glebe-sata cirrux-nfs01 -/dev/rbd14 >> > > 15 glebe-sata hypervtst-lun04 -/dev/rbd15 >> > > 16 glebe-sata hypervtst-lun05 -/dev/rbd16 >> > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 >> > > 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 >> > > 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 >> > > 2 glebe-sata radmast02-lun00 -/dev/rbd2 >> > > 20
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
I have traced it back to an OSD giving this error: 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got (5) Input/output error 2016-06-06 12:19:49.835227 7fd714679700 0 _load_class could not open class /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): /usr/lib64/rados-classes/libcls_rbd.so: undefined symbol: _ZN4ceph6buffer4list8iteratorC1EPS1_j Trying to figure out why that is the case. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Monday, 6 June 2016 11:11 AM > To: dilla...@redhat.com > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > No - it throws a usage error - if I add a file argument after it works: > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst- > lun04 /tmp/crap > [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de > > stat works: > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst- > lun04 > glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18 > > > I can do a rados ls: > > [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id > rbd_id.cloud2sql-lun01 > rbd_id.glbcluster3-vm17 > rbd_id.holder <<< a create that said it failed while I was debugging this > rbd_id.pvtcloud-nfs01 > rbd_id.hypervtst-lun05 > rbd_id.test02 > rbd_id.cloud2sql-lun02 > rbd_id.fiotest2 > rbd_id.radmast02-lun04 > rbd_id.hypervtst-lun04 > rbd_id.cloud2fs-lun00 > rbd_id.radmast02-lun03 > rbd_id.hypervtst-lun00 > rbd_id.cloud2sql-lun00 > rbd_id.radmast02-lun02 > > > > -Original Message- > > From: Jason Dillaman [mailto:jdill...@redhat.com] > > Sent: Monday, 6 June 2016 11:00 AM > > To: Adrian Saul > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > > Are you able to successfully run the following command successfully? > > > > rados -p glebe-sata get rbd_id.hypervtst-lun04 > > > > > > > > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul > > wrote: > > > > > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. > > > While > > the upgrade went through smoothly (aside from a time wasting > > restorecon /var/lib/ceph in the selinux package upgrade) and the > > services continued running without interruption. However this morning > > when I went to create some new RBD images I am unable to do much at all > with RBD. > > > > > > Just about any rbd command fails with an I/O error. I can run > > showmapped but that is about it - anything like an ls, info or status > > fails. This applies to all my pools. > > > > > > I can see no errors in any log files that appear to suggest an > > > issue. I have > > also tried the commands on other cluster members that have not done > > anything with RBD before (I was wondering if perhaps the kernel rbd > > was pinning the old library version open or something) but the same error > occurs. > > > > > > Where can I start trying to resolve this? > > > > > > Cheers, > > > Adrian > > > > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > > > rbd: list: (5) Input/output error > > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing > > > image in directory: (5) Input/output error > > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 > > > images: (5) Input/output error > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > > > rbd: list: (5) Input/output error > > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing > > > image in directory: (5) Input/output error > > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 > > > images: (5) Input/output error > > > > > > [root@ceph-glb-fec-02 ~]# rbd showmapped > > > id pool image snap device > > > 0 glebe-sata test02-/dev/rbd0 > > > 1 glebe-ssd zfstest -/dev/rbd1 > > > 10 glebe-sata hypervtst-lun00 -/dev/rbd10 > > > 11 glebe-sata hypervtst-lun02 -/dev/rbd11 > > > 12 glebe-sata hypervtst-lun03 -/dev/rbd12 > > > 13 glebe-ssd nspprd01_lun00-/dev/rbd13 > > > 14 glebe-sata cirrux-nfs01 -/dev/rbd14 > > > 15 glebe-sata hypervtst-lun04 -/dev/rbd15 > > > 16 glebe-sata hypervtst-lun05 -/dev/rbd16 > > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 > > > 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 > > > 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 > > > 2 glebe-sata radmast02-lun00 -/dev/rbd2 > > > 20 glebe-sata cloud2sql-lun02 -/dev/rbd20 > > > 21 glebe-sata cloud2fs-lun00-/dev/rbd21 > > > 22 glebe-sata cloud2fs-lun01-/dev/rbd22 > > > 3 glebe-sata radmast02-lun01 -/dev/rbd3 > > > 4 glebe-sata radmast02-lun02 -/dev/rbd4 > > > 5 glebe-sata radmast02-lun03 -/dev/rbd5 > > > 6 glebe-sata radmast02-lun04 -/
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
The rbd_directory object is empty -- all data is stored as omap key/value pairs which you can list via "rados listomapvals rbd_directory". What is the output when you run "rbd ls --debug-ms=1 glebe-sata" and "rbd info --debug-ms=1 glebe-sata/hypervtst-lun04"? I am interested in the lines that looks like the following: ** rbd ls ** 2016-06-05 22:22:54.816801 7f25d4e4d1c0 1 -- 127.0.0.1:0/2033136975 --> 127.0.0.1:6800/29402 -- osd_op(client.4111.0:2 0.30a98c1c rbd_directory [call rbd.dir_list] snapc 0=[] ack+read+known_if_redirected e7) v7 -- ?+0 0x5598b0459410 con 0x5598b04580d0 2016-06-05 22:22:54.817396 7f25b8207700 1 -- 127.0.0.1:0/2033136975 <== osd.0 127.0.0.1:6800/29402 2 osd_op_reply(2 rbd_directory [call] v0'0 uv1 ondisk = 0) v7 133+0+27 (2231830616 0 2896097477) 0x7f258c000a20 con 0x5598b04580d0foo ** rbd info ** 2016-06-05 22:25:54.534064 7fab3cff9700 1 -- 127.0.0.1:0/951637948 --> 127.0.0.1:6800/29402 -- osd_op(client.4112.0:2 0.6a181655 rbd_id.foo [call rbd.get_id] snapc 0=[] ack+read+known_if_redirected e7) v7 -- ?+0 0x7fab180020a0 con 0x55e833b5e520 2016-06-05 22:25:54.534434 7fab4c589700 1 -- 127.0.0.1:0/951637948 <== osd.0 127.0.0.1:6800/29402 2 osd_op_reply(2 rbd_id.foo [call] v0'0 uv2 ondisk = 0) v7 130+0+16 (2464064221 0 855464132) 0x7fab24000b40 con 0x55e833b5e520 I suspect you are having issues with executing OSD class methods for some reason (like rbd.dir_list against rbd_directory and rbd.get_id against rbd_id.). On Sun, Jun 5, 2016 at 9:16 PM, Adrian Saul wrote: > > Seems like my rbd_directory is empty for some reason: > > [root@ceph-glb-fec-02 ceph]# rados get -p glebe-sata rbd_directory /tmp/dir > [root@ceph-glb-fec-02 ceph]# strings /tmp/dir > [root@ceph-glb-fec-02 ceph]# ls -la /tmp/dir > -rw-r--r--. 1 root root 0 Jun 6 11:12 /tmp/dir > > [root@ceph-glb-fec-02 ceph]# rados stat -p glebe-sata rbd_directory > glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0 > > > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Adrian Saul >> Sent: Monday, 6 June 2016 11:11 AM >> To: dilla...@redhat.com >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade >> >> >> No - it throws a usage error - if I add a file argument after it works: >> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst- >> lun04 /tmp/crap >> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de >> >> stat works: >> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst- >> lun04 >> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18 >> >> >> I can do a rados ls: >> >> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id >> rbd_id.cloud2sql-lun01 >> rbd_id.glbcluster3-vm17 >> rbd_id.holder <<< a create that said it failed while I was debugging this >> rbd_id.pvtcloud-nfs01 >> rbd_id.hypervtst-lun05 >> rbd_id.test02 >> rbd_id.cloud2sql-lun02 >> rbd_id.fiotest2 >> rbd_id.radmast02-lun04 >> rbd_id.hypervtst-lun04 >> rbd_id.cloud2fs-lun00 >> rbd_id.radmast02-lun03 >> rbd_id.hypervtst-lun00 >> rbd_id.cloud2sql-lun00 >> rbd_id.radmast02-lun02 >> >> >> > -Original Message- >> > From: Jason Dillaman [mailto:jdill...@redhat.com] >> > Sent: Monday, 6 June 2016 11:00 AM >> > To: Adrian Saul >> > Cc: ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade >> > >> > Are you able to successfully run the following command successfully? >> > >> > rados -p glebe-sata get rbd_id.hypervtst-lun04 >> > >> > >> > >> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul >> > wrote: >> > > >> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. >> > > While >> > the upgrade went through smoothly (aside from a time wasting >> > restorecon /var/lib/ceph in the selinux package upgrade) and the >> > services continued running without interruption. However this morning >> > when I went to create some new RBD images I am unable to do much at all >> with RBD. >> > > >> > > Just about any rbd command fails with an I/O error. I can run >> > showmapped but that is about it - anything like an ls, info or status >> > fails. This applies to all my pools. >> > > >> > > I can see no errors in any log files that appear to suggest an >> > > issue. I have >> > also tried the commands on other cluster members that have not done >> > anything with RBD before (I was wondering if perhaps the kernel rbd >> > was pinning the old library version open or something) but the same error >> occurs. >> > > >> > > Where can I start trying to resolve this? >> > > >> > > Cheers, >> > > Adrian >> > > >> > > >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata >> > > rbd: list: (5) Input/output error >> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing >> > > image in directory: (5) Input/output error >> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd:
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
Seems like my rbd_directory is empty for some reason: [root@ceph-glb-fec-02 ceph]# rados get -p glebe-sata rbd_directory /tmp/dir [root@ceph-glb-fec-02 ceph]# strings /tmp/dir [root@ceph-glb-fec-02 ceph]# ls -la /tmp/dir -rw-r--r--. 1 root root 0 Jun 6 11:12 /tmp/dir [root@ceph-glb-fec-02 ceph]# rados stat -p glebe-sata rbd_directory glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0 > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Adrian Saul > Sent: Monday, 6 June 2016 11:11 AM > To: dilla...@redhat.com > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > No - it throws a usage error - if I add a file argument after it works: > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst- > lun04 /tmp/crap > [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de > > stat works: > > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst- > lun04 > glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18 > > > I can do a rados ls: > > [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id > rbd_id.cloud2sql-lun01 > rbd_id.glbcluster3-vm17 > rbd_id.holder <<< a create that said it failed while I was debugging this > rbd_id.pvtcloud-nfs01 > rbd_id.hypervtst-lun05 > rbd_id.test02 > rbd_id.cloud2sql-lun02 > rbd_id.fiotest2 > rbd_id.radmast02-lun04 > rbd_id.hypervtst-lun04 > rbd_id.cloud2fs-lun00 > rbd_id.radmast02-lun03 > rbd_id.hypervtst-lun00 > rbd_id.cloud2sql-lun00 > rbd_id.radmast02-lun02 > > > > -Original Message- > > From: Jason Dillaman [mailto:jdill...@redhat.com] > > Sent: Monday, 6 June 2016 11:00 AM > > To: Adrian Saul > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > > > Are you able to successfully run the following command successfully? > > > > rados -p glebe-sata get rbd_id.hypervtst-lun04 > > > > > > > > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul > > wrote: > > > > > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. > > > While > > the upgrade went through smoothly (aside from a time wasting > > restorecon /var/lib/ceph in the selinux package upgrade) and the > > services continued running without interruption. However this morning > > when I went to create some new RBD images I am unable to do much at all > with RBD. > > > > > > Just about any rbd command fails with an I/O error. I can run > > showmapped but that is about it - anything like an ls, info or status > > fails. This applies to all my pools. > > > > > > I can see no errors in any log files that appear to suggest an > > > issue. I have > > also tried the commands on other cluster members that have not done > > anything with RBD before (I was wondering if perhaps the kernel rbd > > was pinning the old library version open or something) but the same error > occurs. > > > > > > Where can I start trying to resolve this? > > > > > > Cheers, > > > Adrian > > > > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > > > rbd: list: (5) Input/output error > > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing > > > image in directory: (5) Input/output error > > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 > > > images: (5) Input/output error > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > > > rbd: list: (5) Input/output error > > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing > > > image in directory: (5) Input/output error > > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 > > > images: (5) Input/output error > > > > > > [root@ceph-glb-fec-02 ~]# rbd showmapped > > > id pool image snap device > > > 0 glebe-sata test02-/dev/rbd0 > > > 1 glebe-ssd zfstest -/dev/rbd1 > > > 10 glebe-sata hypervtst-lun00 -/dev/rbd10 > > > 11 glebe-sata hypervtst-lun02 -/dev/rbd11 > > > 12 glebe-sata hypervtst-lun03 -/dev/rbd12 > > > 13 glebe-ssd nspprd01_lun00-/dev/rbd13 > > > 14 glebe-sata cirrux-nfs01 -/dev/rbd14 > > > 15 glebe-sata hypervtst-lun04 -/dev/rbd15 > > > 16 glebe-sata hypervtst-lun05 -/dev/rbd16 > > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 > > > 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 > > > 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 > > > 2 glebe-sata radmast02-lun00 -/dev/rbd2 > > > 20 glebe-sata cloud2sql-lun02 -/dev/rbd20 > > > 21 glebe-sata cloud2fs-lun00-/dev/rbd21 > > > 22 glebe-sata cloud2fs-lun01-/dev/rbd22 > > > 3 glebe-sata radmast02-lun01 -/dev/rbd3 > > > 4 glebe-sata radmast02-lun02 -/dev/rbd4 > > > 5 glebe-sata radmast02-lun03 -/dev/rbd5 > > > 6 glebe-sata radmast02-lun04 -/dev/rbd6 > > > 7 gl
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
No - it throws a usage error - if I add a file argument after it works: [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-lun04 /tmp/crap [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de stat works: [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-lun04 glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18 I can do a rados ls: [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id rbd_id.cloud2sql-lun01 rbd_id.glbcluster3-vm17 rbd_id.holder <<< a create that said it failed while I was debugging this rbd_id.pvtcloud-nfs01 rbd_id.hypervtst-lun05 rbd_id.test02 rbd_id.cloud2sql-lun02 rbd_id.fiotest2 rbd_id.radmast02-lun04 rbd_id.hypervtst-lun04 rbd_id.cloud2fs-lun00 rbd_id.radmast02-lun03 rbd_id.hypervtst-lun00 rbd_id.cloud2sql-lun00 rbd_id.radmast02-lun02 > -Original Message- > From: Jason Dillaman [mailto:jdill...@redhat.com] > Sent: Monday, 6 June 2016 11:00 AM > To: Adrian Saul > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade > > Are you able to successfully run the following command successfully? > > rados -p glebe-sata get rbd_id.hypervtst-lun04 > > > > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul > wrote: > > > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. While > the upgrade went through smoothly (aside from a time wasting restorecon > /var/lib/ceph in the selinux package upgrade) and the services continued > running without interruption. However this morning when I went to create > some new RBD images I am unable to do much at all with RBD. > > > > Just about any rbd command fails with an I/O error. I can run > showmapped but that is about it - anything like an ls, info or status fails. > This > applies to all my pools. > > > > I can see no errors in any log files that appear to suggest an issue. I > > have > also tried the commands on other cluster members that have not done > anything with RBD before (I was wondering if perhaps the kernel rbd was > pinning the old library version open or something) but the same error occurs. > > > > Where can I start trying to resolve this? > > > > Cheers, > > Adrian > > > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > > rbd: list: (5) Input/output error > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image > > in directory: (5) Input/output error > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 > > images: (5) Input/output error > > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > > rbd: list: (5) Input/output error > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image > > in directory: (5) Input/output error > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 > > images: (5) Input/output error > > > > [root@ceph-glb-fec-02 ~]# rbd showmapped > > id pool image snap device > > 0 glebe-sata test02-/dev/rbd0 > > 1 glebe-ssd zfstest -/dev/rbd1 > > 10 glebe-sata hypervtst-lun00 -/dev/rbd10 > > 11 glebe-sata hypervtst-lun02 -/dev/rbd11 > > 12 glebe-sata hypervtst-lun03 -/dev/rbd12 > > 13 glebe-ssd nspprd01_lun00-/dev/rbd13 > > 14 glebe-sata cirrux-nfs01 -/dev/rbd14 > > 15 glebe-sata hypervtst-lun04 -/dev/rbd15 > > 16 glebe-sata hypervtst-lun05 -/dev/rbd16 > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 > > 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 > > 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 > > 2 glebe-sata radmast02-lun00 -/dev/rbd2 > > 20 glebe-sata cloud2sql-lun02 -/dev/rbd20 > > 21 glebe-sata cloud2fs-lun00-/dev/rbd21 > > 22 glebe-sata cloud2fs-lun01-/dev/rbd22 > > 3 glebe-sata radmast02-lun01 -/dev/rbd3 > > 4 glebe-sata radmast02-lun02 -/dev/rbd4 > > 5 glebe-sata radmast02-lun03 -/dev/rbd5 > > 6 glebe-sata radmast02-lun04 -/dev/rbd6 > > 7 glebe-ssd sybase_iquser02_lun00 -/dev/rbd7 > > 8 glebe-ssd sybase_iquser03_lun00 -/dev/rbd8 > > 9 glebe-ssd sybase_iquser04_lun00 -/dev/rbd9 > > > > [root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04 > > 2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest: > > failed to retrieve image id: (5) Input/output error > > 2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed > > to open image: (5) Input/output error > > rbd: error opening image hypervtst-lun04: (5) Input/output error > > Confidentiality: This email and any attachments are confidential and may be > subject to copyright, legal or some other professional privilege. They are > intended solely for the attention and use of the named addressee(s). They > may only be copied, distributed or disclosed with the consent of the > copyright owner. If you have received this email by mistak
Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
Are you able to successfully run the following command successfully? rados -p glebe-sata get rbd_id.hypervtst-lun04 On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul wrote: > > I upgraded my Infernalis semi-production cluster to Jewel on Friday. While > the upgrade went through smoothly (aside from a time wasting restorecon > /var/lib/ceph in the selinux package upgrade) and the services continued > running without interruption. However this morning when I went to create > some new RBD images I am unable to do much at all with RBD. > > Just about any rbd command fails with an I/O error. I can run showmapped > but that is about it - anything like an ls, info or status fails. This > applies to all my pools. > > I can see no errors in any log files that appear to suggest an issue. I > have also tried the commands on other cluster members that have not done > anything with RBD before (I was wondering if perhaps the kernel rbd was > pinning the old library version open or something) but the same error occurs. > > Where can I start trying to resolve this? > > Cheers, > Adrian > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata > rbd: list: (5) Input/output error > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image in > directory: (5) Input/output error > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 images: > (5) Input/output error > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd > rbd: list: (5) Input/output error > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image in > directory: (5) Input/output error > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 images: > (5) Input/output error > > [root@ceph-glb-fec-02 ~]# rbd showmapped > id pool image snap device > 0 glebe-sata test02-/dev/rbd0 > 1 glebe-ssd zfstest -/dev/rbd1 > 10 glebe-sata hypervtst-lun00 -/dev/rbd10 > 11 glebe-sata hypervtst-lun02 -/dev/rbd11 > 12 glebe-sata hypervtst-lun03 -/dev/rbd12 > 13 glebe-ssd nspprd01_lun00-/dev/rbd13 > 14 glebe-sata cirrux-nfs01 -/dev/rbd14 > 15 glebe-sata hypervtst-lun04 -/dev/rbd15 > 16 glebe-sata hypervtst-lun05 -/dev/rbd16 > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 > 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 > 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 > 2 glebe-sata radmast02-lun00 -/dev/rbd2 > 20 glebe-sata cloud2sql-lun02 -/dev/rbd20 > 21 glebe-sata cloud2fs-lun00-/dev/rbd21 > 22 glebe-sata cloud2fs-lun01-/dev/rbd22 > 3 glebe-sata radmast02-lun01 -/dev/rbd3 > 4 glebe-sata radmast02-lun02 -/dev/rbd4 > 5 glebe-sata radmast02-lun03 -/dev/rbd5 > 6 glebe-sata radmast02-lun04 -/dev/rbd6 > 7 glebe-ssd sybase_iquser02_lun00 -/dev/rbd7 > 8 glebe-ssd sybase_iquser03_lun00 -/dev/rbd8 > 9 glebe-ssd sybase_iquser04_lun00 -/dev/rbd9 > > [root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04 > 2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest: failed > to retrieve image id: (5) Input/output error > 2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed to open > image: (5) Input/output error > rbd: error opening image hypervtst-lun04: (5) Input/output error > Confidentiality: This email and any attachments are confidential and may be > subject to copyright, legal or some other professional privilege. They are > intended solely for the attention and use of the named addressee(s). They may > only be copied, distributed or disclosed with the consent of the copyright > owner. If you have received this email by mistake or by breach of the > confidentiality clause, please notify the sender immediately by return email > and delete or destroy all copies of the email. Any confidentiality, privilege > or copyright is not waived or lost because this email has been sent to you by > mistake. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Jewel upgrade - rbd errors after upgrade
I upgraded my Infernalis semi-production cluster to Jewel on Friday. While the upgrade went through smoothly (aside from a time wasting restorecon /var/lib/ceph in the selinux package upgrade) and the services continued running without interruption. However this morning when I went to create some new RBD images I am unable to do much at all with RBD. Just about any rbd command fails with an I/O error. I can run showmapped but that is about it - anything like an ls, info or status fails. This applies to all my pools. I can see no errors in any log files that appear to suggest an issue. I have also tried the commands on other cluster members that have not done anything with RBD before (I was wondering if perhaps the kernel rbd was pinning the old library version open or something) but the same error occurs. Where can I start trying to resolve this? Cheers, Adrian [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata rbd: list: (5) Input/output error 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image in directory: (5) Input/output error 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 images: (5) Input/output error [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd rbd: list: (5) Input/output error 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image in directory: (5) Input/output error 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 images: (5) Input/output error [root@ceph-glb-fec-02 ~]# rbd showmapped id pool image snap device 0 glebe-sata test02-/dev/rbd0 1 glebe-ssd zfstest -/dev/rbd1 10 glebe-sata hypervtst-lun00 -/dev/rbd10 11 glebe-sata hypervtst-lun02 -/dev/rbd11 12 glebe-sata hypervtst-lun03 -/dev/rbd12 13 glebe-ssd nspprd01_lun00-/dev/rbd13 14 glebe-sata cirrux-nfs01 -/dev/rbd14 15 glebe-sata hypervtst-lun04 -/dev/rbd15 16 glebe-sata hypervtst-lun05 -/dev/rbd16 17 glebe-sata pvtcloud-nfs01-/dev/rbd17 18 glebe-sata cloud2sql-lun00 -/dev/rbd18 19 glebe-sata cloud2sql-lun01 -/dev/rbd19 2 glebe-sata radmast02-lun00 -/dev/rbd2 20 glebe-sata cloud2sql-lun02 -/dev/rbd20 21 glebe-sata cloud2fs-lun00-/dev/rbd21 22 glebe-sata cloud2fs-lun01-/dev/rbd22 3 glebe-sata radmast02-lun01 -/dev/rbd3 4 glebe-sata radmast02-lun02 -/dev/rbd4 5 glebe-sata radmast02-lun03 -/dev/rbd5 6 glebe-sata radmast02-lun04 -/dev/rbd6 7 glebe-ssd sybase_iquser02_lun00 -/dev/rbd7 8 glebe-ssd sybase_iquser03_lun00 -/dev/rbd8 9 glebe-ssd sybase_iquser04_lun00 -/dev/rbd9 [root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04 2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest: failed to retrieve image id: (5) Input/output error 2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed to open image: (5) Input/output error rbd: error opening image hypervtst-lun04: (5) Input/output error Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best upgrade strategy
If your monitor nodes are separate from the osd nodes, I'd get ceph upgraded to the latest point release of your current line (0.94.7). Upgrade monitors, then osds, then other dependent services (mds, rgw, qemu). Once everything is happy again, I'd run OS and ceph upgrades together, starting with monitors, then osds, and (again) dependent services. Keep in mind, that you'll want to chown all of the ceph data in there while you're doing this (per the upgrade notes). If they're combined, I'd probably upgrade ceph, then the OS. First from 0.94.5 to 0.94.7, then to Jewel, then I'd upgrade the OS version. Standard order still applies, monitors->osds->dependent services. -- Adam On Sun, Jun 5, 2016 at 6:47 PM, Sebastian Köhler wrote: > Hi, > > we are running a cluster with 6 storage nodes(72 osds) and 3 monitors. > The osds and and monitors are running on Ubuntu 14.04 and with ceph 0.94.5. > We want to upgrade the cluster to Jewel and at the same time the OS to > Ubuntu 16.04. What would be the best way to this? First to upgrade the > OS and then ceph to 0.94.7 followed by 10.2.1. Or should we first > upgrade Ceph and then Ubuntu? Or maybe doing it all at once? > > Regards > Sebastian > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best upgrade strategy
Hi, we are running a cluster with 6 storage nodes(72 osds) and 3 monitors. The osds and and monitors are running on Ubuntu 14.04 and with ceph 0.94.5. We want to upgrade the cluster to Jewel and at the same time the OS to Ubuntu 16.04. What would be the best way to this? First to upgrade the OS and then ceph to 0.94.7 followed by 10.2.1. Or should we first upgrade Ceph and then Ubuntu? Or maybe doing it all at once? Regards Sebastian signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Disaster recovery and backups
Let's assume that everything went very very bad and i have to manually recover a cluster with an unconfigured ceph. 1. How can i recover datas directly from raw disks? Is this possible? 2. How can i restore a ceph cluster (and have data back) by using existing disks? 3. How do you manage backups for ceph, in huge clusters? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados complexity
Ok, seems like my problem could be cephfs-related. I have 16 cephfs clients that do heavy, sub-optimal writes simultaneously. The cluster have no problems handling the load up until circa 2 kobjects. Above this threshold the OSDs start to go down randomly and eventually get killed by the ceph's watchdog mechanism. The funny thing is that CPU and HDDs are not really overloaded during this events. So I am really puzzled at this moment. -Mykola -Original Message- From: Sven Höper To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] rados complexity Date: Sun, 05 Jun 2016 19:18:27 +0200 We've got a simple cluster having 45 OSDs, have above 5 kobjects and did not have any issues so far. Our cluster does mainly serve some rados pools for an application which usually writes data once and reads it multiple times. - Sven Am Sonntag, den 05.06.2016, 18:47 +0200 schrieb Mykola Dvornik: > Are there any ceph users with pools containing >2 kobjects? > > If so, have you noticed any instabilities of the clusters once this > threshold > is reached? > > -Mykola > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] no osds in jewel
Hi, When I run below script to install Ceph (10.2.0), I met an error "no osds". Hammer was installed by the script. So I think I miss new thing, which was released since Hammer. Do you know what I miss? --- The script --- #!/bin/sh set -x ceph-deploy new csElsa echo "osd pool default size = 1" >> ceph.conf ceph-deploy install csElsa csAnt csBull csCat ceph-deploy mon create-initial ceph-deploy mon create csElsa ceph-deploy gatherkeys csElsa ceph-deploy disk zap csAnt:sda ceph-deploy disk zap csBull:sda ceph-deploy disk zap csCat:sda ceph-deploy osd create csAnt:sda csBull:sda csCat:sda ceph-deploy admin csElsa csElsa csAnt csBull csCat sudo chmod +r /etc/ceph/ceph.client.admin.keyring ceph health --- end --- --- The result of "ceph -w" --- # I blocked the IP jae@csElsa:~/git/ceph$ ceph -w cluster 8b2816e9-1953-4157-aaf7-95e9e668fe46 health HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds 64 pgs stuck inactive no osds monmap e1: 1 mons at {csElsa=1xx.1xx.2xx.1:6789/0} election epoch 3, quorum 0 csElsa osdmap e1: 0 osds: 0 up, 0 in flags sortbitwise pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 64 creating 2016-06-06 01:59:08.054985 mon.0 [INF] from='client.? 1xx.1xx.2xx.1:0/115687' entity='client.admin' cmd='[{"prefix": "auth get-or-create", "entity": "client.bootstrap-mds", "caps": ["mon", "allow profile bootstrap-mds"]}]': finished --- end --- Best regards, Jae -- Jaemyoun Lee CPS Lab. (Cyber-Physical Systems Laboratory in Hanyang University) E-mail : jaemy...@hanyang.ac.kr Website : http://cpslab.hanyang.ac.kr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados complexity
We've got a simple cluster having 45 OSDs, have above 5 kobjects and did not have any issues so far. Our cluster does mainly serve some rados pools for an application which usually writes data once and reads it multiple times. - Sven Am Sonntag, den 05.06.2016, 18:47 +0200 schrieb Mykola Dvornik: > Are there any ceph users with pools containing >2 kobjects? > > If so, have you noticed any instabilities of the clusters once this threshold > is reached? > > -Mykola > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rados complexity
Are there any ceph users with pools containing >2 kobjects? If so, have you noticed any instabilities of the clusters once this threshold is reached? -Mykola___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW AWS4 SignatureDoesNotMatch when requests with port != 80 or != 443
Hi! I get the error " SignatureDoesNotMatch" when I used presigned url with endpoint port != 80 and != 443. For example, if I use host http://192.168.1.1: then this is what I have in RGW log: // RGWEnv::set(): HTTP_HOST: 192.168.1.1: // RGWEnv::set(): SERVER_PORT: // HTTP_HOST=192.168.1.1: // SERVER_PORT= // host=192.168.1.1 // canonical headers format = host:192.168.1.1:: // canonical request = GET / X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=%2F20160605%2Fap%2Fs3%2Faws4_request&X-Amz-Date=20160605T125927Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host host:192.168.1.1:: host UNSIGNED-PAYLOAD // - Verifying signatures // failed to authorize request // I see this in the src / rgw / rgw_rest_s3.cc: int RGW_Auth_S3 :: authorize_v4 () { // string port = s-> info.env-> get ( 'SERVER_PORT "," "); secure_port string = s-> info.env-> get ( 'SERVER_PORT_SECURE "," "); // if (using_qs && (token == "host")) { if (! port.empty () && port! = "80") { token_value = token_value + ":" + port; } Else if (! Secure_port.empty () && secure_port! = "443") { token_value = token_value + ":" + secure_port; } } Is it caused my fault ? Can somebody please help me out ? Thank ! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 403 AccessDenied with presigned url in Jewel AWS4.
Thank Robin H. Johnson! I've set "debug rgw = 20" in RGW config file and I have seen "NOTICE: now = 1464998270, now_req = 1464973070, exp = 3600" in RGW log file. I see that now is the local time on the RGW server (my timezone is UTC + 7) and now_req is UTC time. This leads to one error in src/ rgw/rgw_rest_s3.cc: int RGW_Auth_S3::authorize_v4(..){ // if (now >= now_req + exp) { dout(10) << "NOTICE: now = " << now << ", now_req = " << now_req << ", exp = " << exp << dendl; return -EPERM; } // Then I tried to set the time on RGW server is UTC time and it works fine ! Is this a bug? 2016-06-03 11:44 GMT+07:00 Robin H. Johnson : > On Fri, Jun 03, 2016 at 11:34:35AM +0700, Khang Nguyễn Nhật wrote: > > s3 = boto3.client(service_name='s3', region_name='', use_ssl=False, > > endpoint_url='http://192.168.1.10:', aws_access_key_id=access_key, > > aws_secret_access_key= secret_key, > > config=Config(signature_version='s3v4', > region_name='')) > The region part doesn't seem right. Try setting it to 'ap' or > 'ap-southeast'. > > Failing that, turn up the RGW loglevel to 20, and run a request, then > look at the logs of how it created the signature, and manually compare > them to what your client should have built (with boto in verbose > debugging). > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer > E-Mail : robb...@gentoo.org > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Older Ceph packages for Ubuntu 12.04 (Precise Pangolin) to recompile libvirt with RBD support
Hi, Anyone can assist on this? Looking forward to your reply, thank you. Cheers. On Fri, Jun 3, 2016 at 11:56 AM, Cloud List wrote: > Dear all, > > I am trying to setup older version of CloudStack 4.2.0 on Ubuntu 12.04 to > use Ceph RBD as primary storage for our upgrade testing purposes. Two of > the steps involved were to add below repository to manually compile libvirt > to have RBD support, since the default libvirt on Ubuntu 12.04 doesn't have > RBD support by default, unlike on Ubuntu 14.04: > > > # wget -q -O- ' > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo > apt-key add - > OK > > # echo deb http://eu.ceph.com/debian-cuttlefish/ $(lsb_release -sc) main > | sudo tee /etc/apt/sources.list.d/ceph.list > deb http://eu.ceph.com/debian-cuttlefish/ precise main > > > But when I ran sudo apt-get update, I am receiving this error (excerpts): > > > Err http://eu.ceph.com precise/main amd64 Packages > 404 Not Found > Err http://eu.ceph.com precise/main i386 Packages > 404 Not Found > > W: Failed to fetch > http://eu.ceph.com/debian-cuttlefish/dists/precise/main/binary-amd64/Packages > 404 Not Found > > W: Failed to fetch > http://eu.ceph.com/debian-cuttlefish/dists/precise/main/binary-i386/Packages > 404 Not Found > > E: Some index files failed to download. They have been ignored, or old > ones used instead. > > > It seems that the repository for the particular required packages has been > removed, anyone can advise if I can get the required packages, may be from > a different location? > > Any help is greatly appreciated. > > Looking forward to your reply, thank you. > > Cheers. > > -ip- > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com