Re: [ceph-users] CephFS in the wild

2016-06-05 Thread Christian Balzer

Hello,

On Fri, 3 Jun 2016 15:43:11 +0100 David wrote:

> I'm hoping to implement cephfs in production at some point this year so
> I'd be interested to hear your progress on this.
> 
> Have you considered SSD for your metadata pool? You wouldn't need loads
> of capacity although even with reliable SSD I'd probably still do x3
> replication for metadata. I've been looking at the intel s3610's for
> this.
> 
That's an interesting and potentially quite beneficial thought, but it
depends on a number of things (more below).

I'm using S3610s (800GB) for a cache pool with 2x replication and am quite
happy with that, but then again I have a very predictable usage pattern
and am monitoring those SSDs religiously and I'm sure they will outlive
things by a huge margin. 

We didn't go for 3x replication due to (in order):
a) cost
b) rack space
c) increased performance with 2x


Now for how useful/helpful a fast meta-data pool would be, I reckon it
depends on a number of things:

a) Is the cluster write or read heavy?
b) Do reads, flocks, anything that is not directly considered a read
   cause writes to the meta-data pool?
c) Anything else that might cause write storms to the meta-data pool, like
   bit in the current NFS over CephFS thread with sync?

A quick glance at my test cluster seems to indicate that CephFS meta data
per filesystem object is about 2KB, somebody with actual clues please
confirm this.

Brady has large amounts of NVMe space left over in his current design,
assuming 10GB journals about 2.8TB of raw space.
So if running the (verified) numbers indicates that the meta data can fit
in this space, I'd put it there.

Otherwise larger SSDs (indeed S3610s) for OS and meta-data pool storage may
be the way forward.

Regards,

Christian
> 
> 
> On Wed, Jun 1, 2016 at 9:50 PM, Brady Deetz  wrote:
> 
> > Question:
> > I'm curious if there is anybody else out there running CephFS at the
> > scale I'm planning for. I'd like to know some of the issues you didn't
> > expect that I should be looking out for. I'd also like to simply see
> > when CephFS hasn't worked out and why. Basically, give me your war
> > stories.
> >
> >
> > Problem Details:
> > Now that I'm out of my design phase and finished testing on VMs, I'm
> > ready to drop $100k on a pilo. I'd like to get some sense of
> > confidence from the community that this is going to work before I pull
> > the trigger.
> >
> > I'm planning to replace my 110 disk 300TB (usable) Oracle ZFS 7320 with
> > CephFS by this time next year (hopefully by December). My workload is
> > a mix of small and vary large files (100GB+ in size). We do fMRI
> > analysis on DICOM image sets as well as other physio data collected
> > from subjects. We also have plenty of spreadsheets, scripts, etc.
> > Currently 90% of our analysis is I/O bound and generally sequential.
> >
> > In deploying Ceph, I am hoping to see more throughput than the 7320 can
> > currently provide. I'm also looking to get away from traditional
> > file-systems that require forklift upgrades. That's where Ceph really
> > shines for us.
> >
> > I don't have a total file count, but I do know that we have about 500k
> > directories.
> >
> >
> > Planned Architecture:
> >
> > Storage Interconnect:
> > Brocade VDX 6940 (40 gig)
> >
> > Access Switches for clients (servers):
> > Brocade VDX 6740 (10 gig)
> >
> > Access Switches for clients (workstations):
> > Brocade ICX 7450
> >
> > 3x MON:
> > 128GB RAM
> > 2x 200GB SSD for OS
> > 2x 400GB P3700 for LevelDB
> > 2x E5-2660v4
> > 1x Dual Port 40Gb Ethernet
> >
> > 2x MDS:
> > 128GB RAM
> > 2x 200GB SSD for OS
> > 2x 400GB P3700 for LevelDB (is this necessary?)
> > 2x E5-2660v4
> > 1x Dual Port 40Gb Ethernet
> >
> > 8x OSD:
> > 128GB RAM
> > 2x 200GB SSD for OS
> > 2x 400GB P3700 for Journals
> > 24x 6TB Enterprise SATA
> > 2x E5-2660v4
> > 1x Dual Port 40Gb Ethernet
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS in the wild

2016-06-05 Thread Gregory Farnum
On Wed, Jun 1, 2016 at 1:50 PM, Brady Deetz  wrote:
> Question:
> I'm curious if there is anybody else out there running CephFS at the scale
> I'm planning for. I'd like to know some of the issues you didn't expect that
> I should be looking out for. I'd also like to simply see when CephFS hasn't
> worked out and why. Basically, give me your war stories.
>
>
> Problem Details:
> Now that I'm out of my design phase and finished testing on VMs, I'm ready
> to drop $100k on a pilo. I'd like to get some sense of confidence from the
> community that this is going to work before I pull the trigger.
>
> I'm planning to replace my 110 disk 300TB (usable) Oracle ZFS 7320 with
> CephFS by this time next year (hopefully by December). My workload is a mix
> of small and vary large files (100GB+ in size). We do fMRI analysis on DICOM
> image sets as well as other physio data collected from subjects. We also
> have plenty of spreadsheets, scripts, etc. Currently 90% of our analysis is
> I/O bound and generally sequential.
>
> In deploying Ceph, I am hoping to see more throughput than the 7320 can
> currently provide. I'm also looking to get away from traditional
> file-systems that require forklift upgrades. That's where Ceph really shines
> for us.
>
> I don't have a total file count, but I do know that we have about 500k
> directories.
>
>
> Planned Architecture:
>
> Storage Interconnect:
> Brocade VDX 6940 (40 gig)
>
> Access Switches for clients (servers):
> Brocade VDX 6740 (10 gig)
>
> Access Switches for clients (workstations):
> Brocade ICX 7450
>
> 3x MON:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for LevelDB
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet
>
> 2x MDS:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for LevelDB (is this necessary?)
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet

The MDS doesn't use any local storage, other than for storing its
ceph.conf and keyring.

>
> 8x OSD:
> 128GB RAM
> 2x 200GB SSD for OS
> 2x 400GB P3700 for Journals
> 24x 6TB Enterprise SATA
> 2x E5-2660v4
> 1x Dual Port 40Gb Ethernet

I don't know what kind of throughput you're currently seeing on your
ZFS system. Unfortunately most of the big CephFS users are pretty
quiet on the lists :( although they sometimes come out to play at
events like https://www.msi.umn.edu/sc15Ceph. :)

You'll definitely want to do some tuning. Right now we default to 100k
inodes in the metadata cache for instance, which fits in <1GB of RAM.
You'll want to bump that way, way up. Also keep in mind that CephFS'
performance characteristics are just weirdly different to NAS boxes or
ZFS in ways you might not be ready for. So large streaming writes will
do great, but if you have shared RW files or directories, that might
be much faster in some places and much slower in ones you didn't think
about. Large streaming reads and writes will go as quickly as RADOS
can drive them (80-100MB/s per OSD for reads is generally a good
estimate, I think? And divide that by replication factor for writes);
with smaller ops you start running into latency issues and the fact
that CephFS (since it's sending RADOS writes to separate objects)
can't coalesce writes as much as local FSes (or boxes built on them).
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse performance about hammer and jewel

2016-06-05 Thread qisy

Yan, Zheng:

Thanks for your reply.
But change into jewel, application read/write disk slowly. confirms 
the fio tested iops.

Does there any other possibles?


在 16/6/1 21:39, Yan, Zheng 写道:

On Wed, Jun 1, 2016 at 6:52 PM, qisy  wrote:

my test fio

fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=1G
-filename=test.iso  -name="CEPH 4KB randwrite test" -iodepth=32 -runtime=60


You were testing direct-IO performance. Hammer does not handle
direct-IO correctly, data are cached in ceph-fuse.

Regards
Yan, Zheng


在 16/6/1 15:22, Yan, Zheng 写道:


On Mon, May 30, 2016 at 10:22 PM, qisy  wrote:

Hi,
  After jewel released fs product ready version, I upgrade the old
hammer
cluster, but iops droped a lot

  I made a test, with 3 nodes, each one have 8c 16G 1osd, the osd
device
got 15000 iops

  I found ceph-fuse client has better performance on hammer than
jewel.

  fio randwrite 4K
  |   | jewel server | hammer server |
  |jewel client  |  480+ iops|  no test |
  |hammer client |  6000+ iops  |   6000+ iops |

please post the fio config file.

Regards
Yan, Zheng


  ceph-fuse(jewel) mount with jewel server got pity iops, is there any
special options need to set?
  If I continue use ceph-fuse(hammer) with jewel server, any problems
will
cause?

  thanks

  my ceph.conf below:

[global]
fsid = xxx
mon_initial_members = xxx, xxx, xxx
mon_host = 10.0.0.1,10.0.0.2,10.0.0.3
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
mon_data_avail_warn = 15
mon_data_avail_crit = 5
mon_clock_drift_allowed = 0.6

[osd]
osd_disk_threads = 8
osd_op_threads = 8
journal_block_align = true
journal_dio = true
journal_aio = true
journal_force_aio = true
filestore_journal_writeahead = true
filestore_max_sync_interval = 15
filestore_min_sync_interval = 10
filestore_queue_max_ops = 25000
filestore_queue_committing_max_ops = 5000
filestore_op_threads = 32
osd_journal_size = 2
osd_map_cache_size = 1024
osd_max_write_size = 512
osd_scrub_load_threshold = 1
osd_heartbeat_grace = 30

[mds]
mds_session_timeout = 120
mds_session_autoclose = 600
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: slow writes over NFS when fs is mounted with kernel driver but fast with Fuse

2016-06-05 Thread Yan, Zheng
On Fri, Jun 3, 2016 at 10:43 PM, Jan Schermer  wrote:
> I'd be worried about it getting "fast" all of sudden. Test crash
> consistency.
> If you test something like file creation you should be able to estimate if
> it should be that fast. (So it should be some fraction of theoretical IOPS
> on the drives/backing rbd device...)

Sudden "fast" is because MDS flushes its journal more frequently.
There is no risk of metadata/data loss.

Yan, Zheng

>
> If it's too fast then maybe the "sync" isn't working properly...
>
> Jan
>
> On 03 Jun 2016, at 16:26, David  wrote:
>
> Zheng, thanks for looking into this, it makes sense although strangely I've
> set up a new nfs server (different hardware, same OS, Kernel etc.) and I'm
> unable to recreate the issue. I'm no longer getting the delay, the nfs
> export is still using sync. I'm now comparing the servers to see what's
> different on the original server. Apologies if I've wasted your time on
> this!
>
> Jan, I did some more testing with Fuse on the original server and I was
> seeing the same issue, yes I was testing from the nfs client. As above I
> think there was something weird with that original server. Noted on sync vs
> async, I plan on sticking with sync.
>
> On Fri, Jun 3, 2016 at 5:03 AM, Yan, Zheng  wrote:
>>
>> On Mon, May 30, 2016 at 10:29 PM, David  wrote:
>> > Hi All
>> >
>> > I'm having an issue with slow writes over NFS (v3) when cephfs is
>> > mounted
>> > with the kernel driver. Writing a single 4K file from the NFS client is
>> > taking 3 - 4 seconds, however a 4K write (with sync) into the same
>> > folder on
>> > the server is fast as you would expect. When mounted with ceph-fuse, I
>> > don't
>> > get this issue on the NFS client.
>> >
>> > Test environment is a small cluster with a single MON and single MDS,
>> > all
>> > running 10.2.1, CephFS metadata is an ssd pool, data is on spinners. The
>> > NFS
>> > server is CentOS 7, I've tested with the current shipped kernel (3.10),
>> > ELrepo 4.4 and ELrepo 4.6.
>> >
>> > More info:
>> >
>> > With the kernel driver, I mount the filesystem with "-o
>> > name=admin,secret"
>> >
>> > I've exported a folder with the following options:
>> >
>> > *(rw,root_squash,sync,wdelay,no_subtree_check,fsid=1244,sec=1)
>> >
>> > I then mount the folder on a CentOS 6 client with the following options
>> > (all
>> > default):
>> >
>> >
>> > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.3.231,mountvers=3,mountport=597,mountproto=udp,local_lock=none
>> >
>> > A small 4k write is taking 3 - 4 secs:
>> >
>> >  # time dd if=/dev/zero of=testfile bs=4k count=1
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 3.59678 s, 1.1 kB/s
>> >
>> > real0m3.624s
>> > user0m0.000s
>> > sys 0m0.001s
>> >
>> > But a sync write on the sever directly into the same folder is fast
>> > (this is
>> > with the kernel driver):
>> >
>> > # time dd if=/dev/zero of=testfile2 bs=4k count=1 conv=fdatasync
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 0.0121925 s, 336 kB/s
>>
>>
>> Your nfs export has sync option. 'dd if=/dev/zero of=testfile bs=4k
>> count=1' on nfs client is equivalent to 'dd if=/dev/zero of=testfile
>> bs=4k count=1 conv=fsync' on cephfs. The reason that sync metadata
>> operation takes 3~4 seconds is that the MDS flushes its journal every
>> 5 seconds.  Adding async option to nfs export can avoid this delay.
>>
>> >
>> > real0m0.015s
>> > user0m0.000s
>> > sys 0m0.002s
>> >
>> > If I mount cephfs with Fuse instead of the kernel, the NFS client write
>> > is
>> > fast:
>> >
>> > dd if=/dev/zero of=fuse01 bs=4k count=1
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 0.026078 s, 157 kB/s
>> >
>>
>> In this case, ceph-fuse sends an extra request (getattr request on
>> directory) to MDS. The request causes MDS to flush its journal.
>> Whether or not client sends the extra request depends on what
>> capabilities it has.  What capabilities client has, in turn, depend on
>> how many clients are accessing the directory. In my test, nfs on
>> ceph-fuse is not always fast.
>>
>> Yan, Zheng
>>
>>
>> > Does anyone know what's going on here?
>>
>>
>>
>> >
>> > Thanks
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

Thanks Jason.

I don’t have anything specified explicitly for osd class dir.   I suspect it 
might be related to the OSDs being restarted during the package upgrade process 
before all libraries are upgraded.


> -Original Message-
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Monday, 6 June 2016 12:37 PM
> To: Adrian Saul
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
> Odd -- sounds like you might have Jewel and Infernalis class objects and
> OSDs intermixed. I would double-check your installation and see if your
> configuration has any overload for "osd class dir".
>
> On Sun, Jun 5, 2016 at 10:28 PM, Adrian Saul
>  wrote:
> >
> > I have traced it back to an OSD giving this error:
> >
> > 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open
> > got (5) Input/output error
> > 2016-06-06 12:19:49.835227 7fd714679700  0 _load_class could not open
> > class /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed):
> > /usr/lib64/rados-classes/libcls_rbd.so: undefined symbol:
> > _ZN4ceph6buffer4list8iteratorC1EPS1_j
> >
> > Trying to figure out why that is the case.
> >
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Adrian Saul
> >> Sent: Monday, 6 June 2016 11:11 AM
> >> To: dilla...@redhat.com
> >> Cc: ceph-users@lists.ceph.com
> >> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> >>
> >>
> >> No - it throws a usage error - if I add a file argument after it works:
> >>
> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get
> >> rbd_id.hypervtst-
> >> lun04 /tmp/crap
> >> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
> >>
> >> stat works:
> >>
> >> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat
> >> rbd_id.hypervtst-
> >> lun04
> >> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00,
> >> size 18
> >>
> >>
> >> I can do a rados ls:
> >>
> >> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
> >> rbd_id.cloud2sql-lun01
> >> rbd_id.glbcluster3-vm17
> >> rbd_id.holder   <<<  a create that said it failed while I was debugging 
> >> this
> >> rbd_id.pvtcloud-nfs01
> >> rbd_id.hypervtst-lun05
> >> rbd_id.test02
> >> rbd_id.cloud2sql-lun02
> >> rbd_id.fiotest2
> >> rbd_id.radmast02-lun04
> >> rbd_id.hypervtst-lun04
> >> rbd_id.cloud2fs-lun00
> >> rbd_id.radmast02-lun03
> >> rbd_id.hypervtst-lun00
> >> rbd_id.cloud2sql-lun00
> >> rbd_id.radmast02-lun02
> >>
> >>
> >> > -Original Message-
> >> > From: Jason Dillaman [mailto:jdill...@redhat.com]
> >> > Sent: Monday, 6 June 2016 11:00 AM
> >> > To: Adrian Saul
> >> > Cc: ceph-users@lists.ceph.com
> >> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> >> >
> >> > Are you able to successfully run the following command successfully?
> >> >
> >> > rados -p glebe-sata get rbd_id.hypervtst-lun04
> >> >
> >> >
> >> >
> >> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
> >> >  wrote:
> >> > >
> >> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
> >> > > While
> >> > the upgrade went through smoothly (aside from a time wasting
> >> > restorecon /var/lib/ceph in the selinux package upgrade) and the
> >> > services continued running without interruption.  However this
> >> > morning when I went to create some new RBD images I am unable to do
> >> > much at all
> >> with RBD.
> >> > >
> >> > > Just about any rbd command fails with an I/O error.   I can run
> >> > showmapped but that is about it - anything like an ls, info or
> >> > status fails.  This applies to all my pools.
> >> > >
> >> > > I can see no errors in any log files that appear to suggest an
> >> > > issue.  I  have
> >> > also tried the commands on other cluster members that have not done
> >> > anything with RBD before (I was wondering if perhaps the kernel rbd
> >> > was pinning the old library version open or something) but the same
> >> > error
> >> occurs.
> >> > >
> >> > > Where can I start trying to resolve this?
> >> > >
> >> > > Cheers,
> >> > >  Adrian
> >> > >
> >> > >
> >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> >> > > rbd: list: (5) Input/output error
> >> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
> >> > > image in directory: (5) Input/output error
> >> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing
> >> > > v2
> >> > > images: (5) Input/output error
> >> > >
> >> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> >> > > rbd: list: (5) Input/output error
> >> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing
> >> > > image in directory: (5) Input/output error
> >> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing
> >> > > v2
> >> > > images: (5) Input/output error
> >> > >
> >> > > [root@ceph-glb-fec-02 ~]# rbd showmapped
> >> > > id pool   image snap device
> >> > > 0  glebe-sata test02 

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

I couldn't find anything wrong with the packages and everything seemed 
installed ok.

Once I restarted the OSDs the directory issue went away but the error started 
moving to other rbd output, and the same class open error occurred on other 
OSDs.  I have gone through and bounced all the OSDs and that seems to have 
cleared the issue.

I am guessing that perhaps the restart of the OSDs during the package upgrade 
is occurring before all library packages are upgraded and so they are starting 
with the wrong versions loaded, so when these class libraries are dynamically 
opened later they are failing.



> -Original Message-
> From: Adrian Saul
> Sent: Monday, 6 June 2016 12:29 PM
> To: Adrian Saul; dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
>
> I have traced it back to an OSD giving this error:
>
> 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got
> (5) Input/output error
> 2016-06-06 12:19:49.835227 7fd714679700  0 _load_class could not open class
> /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): /usr/lib64/rados-
> classes/libcls_rbd.so: undefined symbol:
> _ZN4ceph6buffer4list8iteratorC1EPS1_j
>
> Trying to figure out why that is the case.
>
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Adrian Saul
> > Sent: Monday, 6 June 2016 11:11 AM
> > To: dilla...@redhat.com
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> >
> >
> > No - it throws a usage error - if I add a file argument after it works:
> >
> > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-
> > lun04 /tmp/crap
> > [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
> >
> > stat works:
> >
> > [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat
> > rbd_id.hypervtst-
> > lun04
> > glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00,
> > size 18
> >
> >
> > I can do a rados ls:
> >
> > [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
> > rbd_id.cloud2sql-lun01
> > rbd_id.glbcluster3-vm17
> > rbd_id.holder   <<<  a create that said it failed while I was debugging this
> > rbd_id.pvtcloud-nfs01
> > rbd_id.hypervtst-lun05
> > rbd_id.test02
> > rbd_id.cloud2sql-lun02
> > rbd_id.fiotest2
> > rbd_id.radmast02-lun04
> > rbd_id.hypervtst-lun04
> > rbd_id.cloud2fs-lun00
> > rbd_id.radmast02-lun03
> > rbd_id.hypervtst-lun00
> > rbd_id.cloud2sql-lun00
> > rbd_id.radmast02-lun02
> >
> >
> > > -Original Message-
> > > From: Jason Dillaman [mailto:jdill...@redhat.com]
> > > Sent: Monday, 6 June 2016 11:00 AM
> > > To: Adrian Saul
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> > >
> > > Are you able to successfully run the following command successfully?
> > >
> > > rados -p glebe-sata get rbd_id.hypervtst-lun04
> > >
> > >
> > >
> > > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
> > >  wrote:
> > > >
> > > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
> > > > While
> > > the upgrade went through smoothly (aside from a time wasting
> > > restorecon /var/lib/ceph in the selinux package upgrade) and the
> > > services continued running without interruption.  However this
> > > morning when I went to create some new RBD images I am unable to do
> > > much at all
> > with RBD.
> > > >
> > > > Just about any rbd command fails with an I/O error.   I can run
> > > showmapped but that is about it - anything like an ls, info or
> > > status fails.  This applies to all my pools.
> > > >
> > > > I can see no errors in any log files that appear to suggest an
> > > > issue.  I  have
> > > also tried the commands on other cluster members that have not done
> > > anything with RBD before (I was wondering if perhaps the kernel rbd
> > > was pinning the old library version open or something) but the same
> > > error
> > occurs.
> > > >
> > > > Where can I start trying to resolve this?
> > > >
> > > > Cheers,
> > > >  Adrian
> > > >
> > > >
> > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> > > > rbd: list: (5) Input/output error
> > > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
> > > > image in directory: (5) Input/output error
> > > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing
> > > > v2
> > > > images: (5) Input/output error
> > > >
> > > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> > > > rbd: list: (5) Input/output error
> > > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing
> > > > image in directory: (5) Input/output error
> > > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing
> > > > v2
> > > > images: (5) Input/output error
> > > >
> > > > [root@ceph-glb-fec-02 ~]# rbd showmapped
> > > > id pool   image snap device
> > > > 0  glebe-sata test02-/de

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Jason Dillaman
Odd -- sounds like you might have Jewel and Infernalis class objects
and OSDs intermixed. I would double-check your installation and see if
your configuration has any overload for "osd class dir".

On Sun, Jun 5, 2016 at 10:28 PM, Adrian Saul
 wrote:
>
> I have traced it back to an OSD giving this error:
>
> 2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got 
> (5) Input/output error
> 2016-06-06 12:19:49.835227 7fd714679700  0 _load_class could not open class 
> /usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): 
> /usr/lib64/rados-classes/libcls_rbd.so: undefined symbol: 
> _ZN4ceph6buffer4list8iteratorC1EPS1_j
>
> Trying to figure out why that is the case.
>
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Adrian Saul
>> Sent: Monday, 6 June 2016 11:11 AM
>> To: dilla...@redhat.com
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>>
>>
>> No - it throws a usage error - if I add a file argument after it works:
>>
>> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-
>> lun04 /tmp/crap
>> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
>>
>> stat works:
>>
>> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-
>> lun04
>> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18
>>
>>
>> I can do a rados ls:
>>
>> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
>> rbd_id.cloud2sql-lun01
>> rbd_id.glbcluster3-vm17
>> rbd_id.holder   <<<  a create that said it failed while I was debugging this
>> rbd_id.pvtcloud-nfs01
>> rbd_id.hypervtst-lun05
>> rbd_id.test02
>> rbd_id.cloud2sql-lun02
>> rbd_id.fiotest2
>> rbd_id.radmast02-lun04
>> rbd_id.hypervtst-lun04
>> rbd_id.cloud2fs-lun00
>> rbd_id.radmast02-lun03
>> rbd_id.hypervtst-lun00
>> rbd_id.cloud2sql-lun00
>> rbd_id.radmast02-lun02
>>
>>
>> > -Original Message-
>> > From: Jason Dillaman [mailto:jdill...@redhat.com]
>> > Sent: Monday, 6 June 2016 11:00 AM
>> > To: Adrian Saul
>> > Cc: ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>> >
>> > Are you able to successfully run the following command successfully?
>> >
>> > rados -p glebe-sata get rbd_id.hypervtst-lun04
>> >
>> >
>> >
>> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
>> >  wrote:
>> > >
>> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
>> > > While
>> > the upgrade went through smoothly (aside from a time wasting
>> > restorecon /var/lib/ceph in the selinux package upgrade) and the
>> > services continued running without interruption.  However this morning
>> > when I went to create some new RBD images I am unable to do much at all
>> with RBD.
>> > >
>> > > Just about any rbd command fails with an I/O error.   I can run
>> > showmapped but that is about it - anything like an ls, info or status
>> > fails.  This applies to all my pools.
>> > >
>> > > I can see no errors in any log files that appear to suggest an
>> > > issue.  I  have
>> > also tried the commands on other cluster members that have not done
>> > anything with RBD before (I was wondering if perhaps the kernel rbd
>> > was pinning the old library version open or something) but the same error
>> occurs.
>> > >
>> > > Where can I start trying to resolve this?
>> > >
>> > > Cheers,
>> > >  Adrian
>> > >
>> > >
>> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
>> > > rbd: list: (5) Input/output error
>> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
>> > > image in directory: (5) Input/output error
>> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2
>> > > images: (5) Input/output error
>> > >
>> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
>> > > rbd: list: (5) Input/output error
>> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing
>> > > image in directory: (5) Input/output error
>> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2
>> > > images: (5) Input/output error
>> > >
>> > > [root@ceph-glb-fec-02 ~]# rbd showmapped
>> > > id pool   image snap device
>> > > 0  glebe-sata test02-/dev/rbd0
>> > > 1  glebe-ssd  zfstest   -/dev/rbd1
>> > > 10 glebe-sata hypervtst-lun00   -/dev/rbd10
>> > > 11 glebe-sata hypervtst-lun02   -/dev/rbd11
>> > > 12 glebe-sata hypervtst-lun03   -/dev/rbd12
>> > > 13 glebe-ssd  nspprd01_lun00-/dev/rbd13
>> > > 14 glebe-sata cirrux-nfs01  -/dev/rbd14
>> > > 15 glebe-sata hypervtst-lun04   -/dev/rbd15
>> > > 16 glebe-sata hypervtst-lun05   -/dev/rbd16
>> > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17
>> > > 18 glebe-sata cloud2sql-lun00   -/dev/rbd18
>> > > 19 glebe-sata cloud2sql-lun01   -/dev/rbd19
>> > > 2  glebe-sata radmast02-lun00   -/dev/rbd2
>> > > 20

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

I have traced it back to an OSD giving this error:

2016-06-06 12:18:14.315573 7fd714679700 -1 osd.20 23623 class rbd open got (5) 
Input/output error
2016-06-06 12:19:49.835227 7fd714679700  0 _load_class could not open class 
/usr/lib64/rados-classes/libcls_rbd.so (dlopen failed): 
/usr/lib64/rados-classes/libcls_rbd.so: undefined symbol: 
_ZN4ceph6buffer4list8iteratorC1EPS1_j

Trying to figure out why that is the case.


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Monday, 6 June 2016 11:11 AM
> To: dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
>
> No - it throws a usage error - if I add a file argument after it works:
>
> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-
> lun04 /tmp/crap
> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
>
> stat works:
>
> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-
> lun04
> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18
>
>
> I can do a rados ls:
>
> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
> rbd_id.cloud2sql-lun01
> rbd_id.glbcluster3-vm17
> rbd_id.holder   <<<  a create that said it failed while I was debugging this
> rbd_id.pvtcloud-nfs01
> rbd_id.hypervtst-lun05
> rbd_id.test02
> rbd_id.cloud2sql-lun02
> rbd_id.fiotest2
> rbd_id.radmast02-lun04
> rbd_id.hypervtst-lun04
> rbd_id.cloud2fs-lun00
> rbd_id.radmast02-lun03
> rbd_id.hypervtst-lun00
> rbd_id.cloud2sql-lun00
> rbd_id.radmast02-lun02
>
>
> > -Original Message-
> > From: Jason Dillaman [mailto:jdill...@redhat.com]
> > Sent: Monday, 6 June 2016 11:00 AM
> > To: Adrian Saul
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> >
> > Are you able to successfully run the following command successfully?
> >
> > rados -p glebe-sata get rbd_id.hypervtst-lun04
> >
> >
> >
> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
> >  wrote:
> > >
> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
> > > While
> > the upgrade went through smoothly (aside from a time wasting
> > restorecon /var/lib/ceph in the selinux package upgrade) and the
> > services continued running without interruption.  However this morning
> > when I went to create some new RBD images I am unable to do much at all
> with RBD.
> > >
> > > Just about any rbd command fails with an I/O error.   I can run
> > showmapped but that is about it - anything like an ls, info or status
> > fails.  This applies to all my pools.
> > >
> > > I can see no errors in any log files that appear to suggest an
> > > issue.  I  have
> > also tried the commands on other cluster members that have not done
> > anything with RBD before (I was wondering if perhaps the kernel rbd
> > was pinning the old library version open or something) but the same error
> occurs.
> > >
> > > Where can I start trying to resolve this?
> > >
> > > Cheers,
> > >  Adrian
> > >
> > >
> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> > > rbd: list: (5) Input/output error
> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
> > > image in directory: (5) Input/output error
> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2
> > > images: (5) Input/output error
> > >
> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> > > rbd: list: (5) Input/output error
> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing
> > > image in directory: (5) Input/output error
> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2
> > > images: (5) Input/output error
> > >
> > > [root@ceph-glb-fec-02 ~]# rbd showmapped
> > > id pool   image snap device
> > > 0  glebe-sata test02-/dev/rbd0
> > > 1  glebe-ssd  zfstest   -/dev/rbd1
> > > 10 glebe-sata hypervtst-lun00   -/dev/rbd10
> > > 11 glebe-sata hypervtst-lun02   -/dev/rbd11
> > > 12 glebe-sata hypervtst-lun03   -/dev/rbd12
> > > 13 glebe-ssd  nspprd01_lun00-/dev/rbd13
> > > 14 glebe-sata cirrux-nfs01  -/dev/rbd14
> > > 15 glebe-sata hypervtst-lun04   -/dev/rbd15
> > > 16 glebe-sata hypervtst-lun05   -/dev/rbd16
> > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17
> > > 18 glebe-sata cloud2sql-lun00   -/dev/rbd18
> > > 19 glebe-sata cloud2sql-lun01   -/dev/rbd19
> > > 2  glebe-sata radmast02-lun00   -/dev/rbd2
> > > 20 glebe-sata cloud2sql-lun02   -/dev/rbd20
> > > 21 glebe-sata cloud2fs-lun00-/dev/rbd21
> > > 22 glebe-sata cloud2fs-lun01-/dev/rbd22
> > > 3  glebe-sata radmast02-lun01   -/dev/rbd3
> > > 4  glebe-sata radmast02-lun02   -/dev/rbd4
> > > 5  glebe-sata radmast02-lun03   -/dev/rbd5
> > > 6  glebe-sata radmast02-lun04   -/

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Jason Dillaman
The rbd_directory object is empty -- all data is stored as omap
key/value pairs which you can list via "rados listomapvals
rbd_directory".   What is the output when you run "rbd ls --debug-ms=1
glebe-sata" and "rbd info --debug-ms=1 glebe-sata/hypervtst-lun04"?  I
am interested in the lines that looks like the following:

** rbd ls **

2016-06-05 22:22:54.816801 7f25d4e4d1c0  1 -- 127.0.0.1:0/2033136975
--> 127.0.0.1:6800/29402 -- osd_op(client.4111.0:2 0.30a98c1c
rbd_directory [call rbd.dir_list] snapc 0=[]
ack+read+known_if_redirected e7) v7 -- ?+0 0x5598b0459410 con
0x5598b04580d0

2016-06-05 22:22:54.817396 7f25b8207700  1 -- 127.0.0.1:0/2033136975
<== osd.0 127.0.0.1:6800/29402 2  osd_op_reply(2 rbd_directory
[call] v0'0 uv1 ondisk = 0) v7  133+0+27 (2231830616 0 2896097477)
0x7f258c000a20 con 0x5598b04580d0foo

** rbd info **

2016-06-05 22:25:54.534064 7fab3cff9700  1 -- 127.0.0.1:0/951637948
--> 127.0.0.1:6800/29402 -- osd_op(client.4112.0:2 0.6a181655
rbd_id.foo [call rbd.get_id] snapc 0=[] ack+read+known_if_redirected
e7) v7 -- ?+0 0x7fab180020a0 con 0x55e833b5e520

2016-06-05 22:25:54.534434 7fab4c589700  1 -- 127.0.0.1:0/951637948
<== osd.0 127.0.0.1:6800/29402 2  osd_op_reply(2 rbd_id.foo [call]
v0'0 uv2 ondisk = 0) v7  130+0+16 (2464064221 0 855464132)
0x7fab24000b40 con 0x55e833b5e520


I suspect you are having issues with executing OSD class methods for
some reason (like rbd.dir_list against rbd_directory and rbd.get_id
against rbd_id.).

On Sun, Jun 5, 2016 at 9:16 PM, Adrian Saul
 wrote:
>
> Seems like my rbd_directory is empty for some reason:
>
> [root@ceph-glb-fec-02 ceph]# rados get -p glebe-sata rbd_directory /tmp/dir
> [root@ceph-glb-fec-02 ceph]# strings /tmp/dir
> [root@ceph-glb-fec-02 ceph]# ls -la /tmp/dir
> -rw-r--r--. 1 root root 0 Jun  6 11:12 /tmp/dir
>
> [root@ceph-glb-fec-02 ceph]# rados stat -p glebe-sata rbd_directory
> glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0
>
>
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Adrian Saul
>> Sent: Monday, 6 June 2016 11:11 AM
>> To: dilla...@redhat.com
>> Cc: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>>
>>
>> No - it throws a usage error - if I add a file argument after it works:
>>
>> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-
>> lun04 /tmp/crap
>> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
>>
>> stat works:
>>
>> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-
>> lun04
>> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18
>>
>>
>> I can do a rados ls:
>>
>> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
>> rbd_id.cloud2sql-lun01
>> rbd_id.glbcluster3-vm17
>> rbd_id.holder   <<<  a create that said it failed while I was debugging this
>> rbd_id.pvtcloud-nfs01
>> rbd_id.hypervtst-lun05
>> rbd_id.test02
>> rbd_id.cloud2sql-lun02
>> rbd_id.fiotest2
>> rbd_id.radmast02-lun04
>> rbd_id.hypervtst-lun04
>> rbd_id.cloud2fs-lun00
>> rbd_id.radmast02-lun03
>> rbd_id.hypervtst-lun00
>> rbd_id.cloud2sql-lun00
>> rbd_id.radmast02-lun02
>>
>>
>> > -Original Message-
>> > From: Jason Dillaman [mailto:jdill...@redhat.com]
>> > Sent: Monday, 6 June 2016 11:00 AM
>> > To: Adrian Saul
>> > Cc: ceph-users@lists.ceph.com
>> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>> >
>> > Are you able to successfully run the following command successfully?
>> >
>> > rados -p glebe-sata get rbd_id.hypervtst-lun04
>> >
>> >
>> >
>> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
>> >  wrote:
>> > >
>> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
>> > > While
>> > the upgrade went through smoothly (aside from a time wasting
>> > restorecon /var/lib/ceph in the selinux package upgrade) and the
>> > services continued running without interruption.  However this morning
>> > when I went to create some new RBD images I am unable to do much at all
>> with RBD.
>> > >
>> > > Just about any rbd command fails with an I/O error.   I can run
>> > showmapped but that is about it - anything like an ls, info or status
>> > fails.  This applies to all my pools.
>> > >
>> > > I can see no errors in any log files that appear to suggest an
>> > > issue.  I  have
>> > also tried the commands on other cluster members that have not done
>> > anything with RBD before (I was wondering if perhaps the kernel rbd
>> > was pinning the old library version open or something) but the same error
>> occurs.
>> > >
>> > > Where can I start trying to resolve this?
>> > >
>> > > Cheers,
>> > >  Adrian
>> > >
>> > >
>> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
>> > > rbd: list: (5) Input/output error
>> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
>> > > image in directory: (5) Input/output error
>> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd:

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

Seems like my rbd_directory is empty for some reason:

[root@ceph-glb-fec-02 ceph]# rados get -p glebe-sata rbd_directory /tmp/dir
[root@ceph-glb-fec-02 ceph]# strings /tmp/dir
[root@ceph-glb-fec-02 ceph]# ls -la /tmp/dir
-rw-r--r--. 1 root root 0 Jun  6 11:12 /tmp/dir

[root@ceph-glb-fec-02 ceph]# rados stat -p glebe-sata rbd_directory
glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Monday, 6 June 2016 11:11 AM
> To: dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
>
> No - it throws a usage error - if I add a file argument after it works:
>
> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-
> lun04 /tmp/crap
> [root@ceph-glb-fec-02 ceph]# cat /tmp/crap 109eb01f5f89de
>
> stat works:
>
> [root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-
> lun04
> glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18
>
>
> I can do a rados ls:
>
> [root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
> rbd_id.cloud2sql-lun01
> rbd_id.glbcluster3-vm17
> rbd_id.holder   <<<  a create that said it failed while I was debugging this
> rbd_id.pvtcloud-nfs01
> rbd_id.hypervtst-lun05
> rbd_id.test02
> rbd_id.cloud2sql-lun02
> rbd_id.fiotest2
> rbd_id.radmast02-lun04
> rbd_id.hypervtst-lun04
> rbd_id.cloud2fs-lun00
> rbd_id.radmast02-lun03
> rbd_id.hypervtst-lun00
> rbd_id.cloud2sql-lun00
> rbd_id.radmast02-lun02
>
>
> > -Original Message-
> > From: Jason Dillaman [mailto:jdill...@redhat.com]
> > Sent: Monday, 6 June 2016 11:00 AM
> > To: Adrian Saul
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
> >
> > Are you able to successfully run the following command successfully?
> >
> > rados -p glebe-sata get rbd_id.hypervtst-lun04
> >
> >
> >
> > On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
> >  wrote:
> > >
> > > I upgraded my Infernalis semi-production cluster to Jewel on Friday.
> > > While
> > the upgrade went through smoothly (aside from a time wasting
> > restorecon /var/lib/ceph in the selinux package upgrade) and the
> > services continued running without interruption.  However this morning
> > when I went to create some new RBD images I am unable to do much at all
> with RBD.
> > >
> > > Just about any rbd command fails with an I/O error.   I can run
> > showmapped but that is about it - anything like an ls, info or status
> > fails.  This applies to all my pools.
> > >
> > > I can see no errors in any log files that appear to suggest an
> > > issue.  I  have
> > also tried the commands on other cluster members that have not done
> > anything with RBD before (I was wondering if perhaps the kernel rbd
> > was pinning the old library version open or something) but the same error
> occurs.
> > >
> > > Where can I start trying to resolve this?
> > >
> > > Cheers,
> > >  Adrian
> > >
> > >
> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> > > rbd: list: (5) Input/output error
> > > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing
> > > image in directory: (5) Input/output error
> > > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2
> > > images: (5) Input/output error
> > >
> > > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> > > rbd: list: (5) Input/output error
> > > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing
> > > image in directory: (5) Input/output error
> > > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2
> > > images: (5) Input/output error
> > >
> > > [root@ceph-glb-fec-02 ~]# rbd showmapped
> > > id pool   image snap device
> > > 0  glebe-sata test02-/dev/rbd0
> > > 1  glebe-ssd  zfstest   -/dev/rbd1
> > > 10 glebe-sata hypervtst-lun00   -/dev/rbd10
> > > 11 glebe-sata hypervtst-lun02   -/dev/rbd11
> > > 12 glebe-sata hypervtst-lun03   -/dev/rbd12
> > > 13 glebe-ssd  nspprd01_lun00-/dev/rbd13
> > > 14 glebe-sata cirrux-nfs01  -/dev/rbd14
> > > 15 glebe-sata hypervtst-lun04   -/dev/rbd15
> > > 16 glebe-sata hypervtst-lun05   -/dev/rbd16
> > > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17
> > > 18 glebe-sata cloud2sql-lun00   -/dev/rbd18
> > > 19 glebe-sata cloud2sql-lun01   -/dev/rbd19
> > > 2  glebe-sata radmast02-lun00   -/dev/rbd2
> > > 20 glebe-sata cloud2sql-lun02   -/dev/rbd20
> > > 21 glebe-sata cloud2fs-lun00-/dev/rbd21
> > > 22 glebe-sata cloud2fs-lun01-/dev/rbd22
> > > 3  glebe-sata radmast02-lun01   -/dev/rbd3
> > > 4  glebe-sata radmast02-lun02   -/dev/rbd4
> > > 5  glebe-sata radmast02-lun03   -/dev/rbd5
> > > 6  glebe-sata radmast02-lun04   -/dev/rbd6
> > > 7  gl

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

No - it throws a usage error - if I add a file argument after it works:

[root@ceph-glb-fec-02 ceph]# rados -p glebe-sata get rbd_id.hypervtst-lun04 
/tmp/crap
[root@ceph-glb-fec-02 ceph]# cat /tmp/crap
109eb01f5f89de

stat works:

[root@ceph-glb-fec-02 ceph]# rados -p glebe-sata stat rbd_id.hypervtst-lun04
glebe-sata/rbd_id.hypervtst-lun04 mtime 2016-06-06 10:55:08.00, size 18


I can do a rados ls:

[root@ceph-glb-fec-02 ceph]# rados ls -p glebe-sata|grep rbd_id
rbd_id.cloud2sql-lun01
rbd_id.glbcluster3-vm17
rbd_id.holder   <<<  a create that said it failed while I was debugging this
rbd_id.pvtcloud-nfs01
rbd_id.hypervtst-lun05
rbd_id.test02
rbd_id.cloud2sql-lun02
rbd_id.fiotest2
rbd_id.radmast02-lun04
rbd_id.hypervtst-lun04
rbd_id.cloud2fs-lun00
rbd_id.radmast02-lun03
rbd_id.hypervtst-lun00
rbd_id.cloud2sql-lun00
rbd_id.radmast02-lun02


> -Original Message-
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Monday, 6 June 2016 11:00 AM
> To: Adrian Saul
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
> Are you able to successfully run the following command successfully?
>
> rados -p glebe-sata get rbd_id.hypervtst-lun04
>
>
>
> On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
>  wrote:
> >
> > I upgraded my Infernalis semi-production cluster to Jewel on Friday.  While
> the upgrade went through smoothly (aside from a time wasting restorecon
> /var/lib/ceph in the selinux package upgrade) and the services continued
> running without interruption.  However this morning when I went to create
> some new RBD images I am unable to do much at all with RBD.
> >
> > Just about any rbd command fails with an I/O error.   I can run
> showmapped but that is about it - anything like an ls, info or status fails.  
> This
> applies to all my pools.
> >
> > I can see no errors in any log files that appear to suggest an issue.  I  
> > have
> also tried the commands on other cluster members that have not done
> anything with RBD before (I was wondering if perhaps the kernel rbd was
> pinning the old library version open or something) but the same error occurs.
> >
> > Where can I start trying to resolve this?
> >
> > Cheers,
> >  Adrian
> >
> >
> > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> > rbd: list: (5) Input/output error
> > 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image
> > in directory: (5) Input/output error
> > 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2
> > images: (5) Input/output error
> >
> > [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> > rbd: list: (5) Input/output error
> > 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image
> > in directory: (5) Input/output error
> > 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2
> > images: (5) Input/output error
> >
> > [root@ceph-glb-fec-02 ~]# rbd showmapped
> > id pool   image snap device
> > 0  glebe-sata test02-/dev/rbd0
> > 1  glebe-ssd  zfstest   -/dev/rbd1
> > 10 glebe-sata hypervtst-lun00   -/dev/rbd10
> > 11 glebe-sata hypervtst-lun02   -/dev/rbd11
> > 12 glebe-sata hypervtst-lun03   -/dev/rbd12
> > 13 glebe-ssd  nspprd01_lun00-/dev/rbd13
> > 14 glebe-sata cirrux-nfs01  -/dev/rbd14
> > 15 glebe-sata hypervtst-lun04   -/dev/rbd15
> > 16 glebe-sata hypervtst-lun05   -/dev/rbd16
> > 17 glebe-sata pvtcloud-nfs01-/dev/rbd17
> > 18 glebe-sata cloud2sql-lun00   -/dev/rbd18
> > 19 glebe-sata cloud2sql-lun01   -/dev/rbd19
> > 2  glebe-sata radmast02-lun00   -/dev/rbd2
> > 20 glebe-sata cloud2sql-lun02   -/dev/rbd20
> > 21 glebe-sata cloud2fs-lun00-/dev/rbd21
> > 22 glebe-sata cloud2fs-lun01-/dev/rbd22
> > 3  glebe-sata radmast02-lun01   -/dev/rbd3
> > 4  glebe-sata radmast02-lun02   -/dev/rbd4
> > 5  glebe-sata radmast02-lun03   -/dev/rbd5
> > 6  glebe-sata radmast02-lun04   -/dev/rbd6
> > 7  glebe-ssd  sybase_iquser02_lun00 -/dev/rbd7
> > 8  glebe-ssd  sybase_iquser03_lun00 -/dev/rbd8
> > 9  glebe-ssd  sybase_iquser04_lun00 -/dev/rbd9
> >
> > [root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04
> > 2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest:
> > failed to retrieve image id: (5) Input/output error
> > 2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed
> > to open image: (5) Input/output error
> > rbd: error opening image hypervtst-lun04: (5) Input/output error
> > Confidentiality: This email and any attachments are confidential and may be
> subject to copyright, legal or some other professional privilege. They are
> intended solely for the attention and use of the named addressee(s). They
> may only be copied, distributed or disclosed with the consent of the
> copyright owner. If you have received this email by mistak

Re: [ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Jason Dillaman
Are you able to successfully run the following command successfully?

rados -p glebe-sata get rbd_id.hypervtst-lun04



On Sun, Jun 5, 2016 at 8:49 PM, Adrian Saul
 wrote:
>
> I upgraded my Infernalis semi-production cluster to Jewel on Friday.  While 
> the upgrade went through smoothly (aside from a time wasting restorecon 
> /var/lib/ceph in the selinux package upgrade) and the services continued 
> running without interruption.  However this morning when I went to create 
> some new RBD images I am unable to do much at all with RBD.
>
> Just about any rbd command fails with an I/O error.   I can run showmapped 
> but that is about it - anything like an ls, info or status fails.  This 
> applies to all my pools.
>
> I can see no errors in any log files that appear to suggest an issue.  I  
> have also tried the commands on other cluster members that have not done 
> anything with RBD before (I was wondering if perhaps the kernel rbd was 
> pinning the old library version open or something) but the same error occurs.
>
> Where can I start trying to resolve this?
>
> Cheers,
>  Adrian
>
>
> [root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
> rbd: list: (5) Input/output error
> 2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image in 
> directory: (5) Input/output error
> 2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 images: 
> (5) Input/output error
>
> [root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
> rbd: list: (5) Input/output error
> 2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image in 
> directory: (5) Input/output error
> 2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 images: 
> (5) Input/output error
>
> [root@ceph-glb-fec-02 ~]# rbd showmapped
> id pool   image snap device
> 0  glebe-sata test02-/dev/rbd0
> 1  glebe-ssd  zfstest   -/dev/rbd1
> 10 glebe-sata hypervtst-lun00   -/dev/rbd10
> 11 glebe-sata hypervtst-lun02   -/dev/rbd11
> 12 glebe-sata hypervtst-lun03   -/dev/rbd12
> 13 glebe-ssd  nspprd01_lun00-/dev/rbd13
> 14 glebe-sata cirrux-nfs01  -/dev/rbd14
> 15 glebe-sata hypervtst-lun04   -/dev/rbd15
> 16 glebe-sata hypervtst-lun05   -/dev/rbd16
> 17 glebe-sata pvtcloud-nfs01-/dev/rbd17
> 18 glebe-sata cloud2sql-lun00   -/dev/rbd18
> 19 glebe-sata cloud2sql-lun01   -/dev/rbd19
> 2  glebe-sata radmast02-lun00   -/dev/rbd2
> 20 glebe-sata cloud2sql-lun02   -/dev/rbd20
> 21 glebe-sata cloud2fs-lun00-/dev/rbd21
> 22 glebe-sata cloud2fs-lun01-/dev/rbd22
> 3  glebe-sata radmast02-lun01   -/dev/rbd3
> 4  glebe-sata radmast02-lun02   -/dev/rbd4
> 5  glebe-sata radmast02-lun03   -/dev/rbd5
> 6  glebe-sata radmast02-lun04   -/dev/rbd6
> 7  glebe-ssd  sybase_iquser02_lun00 -/dev/rbd7
> 8  glebe-ssd  sybase_iquser03_lun00 -/dev/rbd8
> 9  glebe-ssd  sybase_iquser04_lun00 -/dev/rbd9
>
> [root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04
> 2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest: failed 
> to retrieve image id: (5) Input/output error
> 2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed to open 
> image: (5) Input/output error
> rbd: error opening image hypervtst-lun04: (5) Input/output error
> Confidentiality: This email and any attachments are confidential and may be 
> subject to copyright, legal or some other professional privilege. They are 
> intended solely for the attention and use of the named addressee(s). They may 
> only be copied, distributed or disclosed with the consent of the copyright 
> owner. If you have received this email by mistake or by breach of the 
> confidentiality clause, please notify the sender immediately by return email 
> and delete or destroy all copies of the email. Any confidentiality, privilege 
> or copyright is not waived or lost because this email has been sent to you by 
> mistake.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Jewel upgrade - rbd errors after upgrade

2016-06-05 Thread Adrian Saul

I upgraded my Infernalis semi-production cluster to Jewel on Friday.  While the 
upgrade went through smoothly (aside from a time wasting restorecon 
/var/lib/ceph in the selinux package upgrade) and the services continued 
running without interruption.  However this morning when I went to create some 
new RBD images I am unable to do much at all with RBD.

Just about any rbd command fails with an I/O error.   I can run showmapped but 
that is about it - anything like an ls, info or status fails.  This applies to 
all my pools.

I can see no errors in any log files that appear to suggest an issue.  I  have 
also tried the commands on other cluster members that have not done anything 
with RBD before (I was wondering if perhaps the kernel rbd was pinning the old 
library version open or something) but the same error occurs.

Where can I start trying to resolve this?

Cheers,
 Adrian


[root@ceph-glb-fec-01 ceph]# rbd ls glebe-sata
rbd: list: (5) Input/output error
2016-06-06 10:41:31.792720 7f53c06a2d80 -1 librbd: error listing image in 
directory: (5) Input/output error
2016-06-06 10:41:31.792749 7f53c06a2d80 -1 librbd: error listing v2 images: (5) 
Input/output error

[root@ceph-glb-fec-01 ceph]# rbd ls glebe-ssd
rbd: list: (5) Input/output error
2016-06-06 10:41:33.956648 7f90de663d80 -1 librbd: error listing image in 
directory: (5) Input/output error
2016-06-06 10:41:33.956672 7f90de663d80 -1 librbd: error listing v2 images: (5) 
Input/output error

[root@ceph-glb-fec-02 ~]# rbd showmapped
id pool   image snap device
0  glebe-sata test02-/dev/rbd0
1  glebe-ssd  zfstest   -/dev/rbd1
10 glebe-sata hypervtst-lun00   -/dev/rbd10
11 glebe-sata hypervtst-lun02   -/dev/rbd11
12 glebe-sata hypervtst-lun03   -/dev/rbd12
13 glebe-ssd  nspprd01_lun00-/dev/rbd13
14 glebe-sata cirrux-nfs01  -/dev/rbd14
15 glebe-sata hypervtst-lun04   -/dev/rbd15
16 glebe-sata hypervtst-lun05   -/dev/rbd16
17 glebe-sata pvtcloud-nfs01-/dev/rbd17
18 glebe-sata cloud2sql-lun00   -/dev/rbd18
19 glebe-sata cloud2sql-lun01   -/dev/rbd19
2  glebe-sata radmast02-lun00   -/dev/rbd2
20 glebe-sata cloud2sql-lun02   -/dev/rbd20
21 glebe-sata cloud2fs-lun00-/dev/rbd21
22 glebe-sata cloud2fs-lun01-/dev/rbd22
3  glebe-sata radmast02-lun01   -/dev/rbd3
4  glebe-sata radmast02-lun02   -/dev/rbd4
5  glebe-sata radmast02-lun03   -/dev/rbd5
6  glebe-sata radmast02-lun04   -/dev/rbd6
7  glebe-ssd  sybase_iquser02_lun00 -/dev/rbd7
8  glebe-ssd  sybase_iquser03_lun00 -/dev/rbd8
9  glebe-ssd  sybase_iquser04_lun00 -/dev/rbd9

[root@ceph-glb-fec-02 ~]# rbd status glebe-sata/hypervtst-lun04
2016-06-06 10:47:30.221453 7fc0030dc700 -1 librbd::image::OpenRequest: failed 
to retrieve image id: (5) Input/output error
2016-06-06 10:47:30.221556 7fc0028db700 -1 librbd::ImageState: failed to open 
image: (5) Input/output error
rbd: error opening image hypervtst-lun04: (5) Input/output error
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best upgrade strategy

2016-06-05 Thread Adam Tygart
If your monitor nodes are separate from the osd nodes, I'd get ceph
upgraded to the latest point release of your current line (0.94.7).
Upgrade monitors, then osds, then other dependent services (mds, rgw,
qemu).
Once everything is happy again, I'd run OS and ceph upgrades together,
starting with monitors, then osds, and (again) dependent services.
Keep in mind, that you'll want to chown all of the ceph data in there
while you're doing this (per the upgrade notes).

If they're combined, I'd probably upgrade ceph, then the OS. First
from 0.94.5 to 0.94.7, then to Jewel, then I'd upgrade the OS version.
Standard order still applies, monitors->osds->dependent services.
--
Adam

On Sun, Jun 5, 2016 at 6:47 PM, Sebastian Köhler  wrote:
> Hi,
>
> we are running a cluster with 6 storage nodes(72 osds) and 3 monitors.
> The osds and and monitors are running on Ubuntu 14.04 and with ceph 0.94.5.
> We want to upgrade the cluster to Jewel and at the same time the OS to
> Ubuntu 16.04. What would be the best way to this? First to upgrade the
> OS and then ceph to 0.94.7 followed by 10.2.1. Or should we first
> upgrade Ceph and then Ubuntu? Or maybe doing it all at once?
>
> Regards
> Sebastian
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best upgrade strategy

2016-06-05 Thread Sebastian Köhler
Hi,

we are running a cluster with 6 storage nodes(72 osds) and 3 monitors.
The osds and and monitors are running on Ubuntu 14.04 and with ceph 0.94.5.
We want to upgrade the cluster to Jewel and at the same time the OS to
Ubuntu 16.04. What would be the best way to this? First to upgrade the
OS and then ceph to 0.94.7 followed by 10.2.1. Or should we first
upgrade Ceph and then Ubuntu? Or maybe doing it all at once?

Regards
Sebastian



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Disaster recovery and backups

2016-06-05 Thread Gandalf Corvotempesta
Let's assume that everything went very very bad and i have to manually
recover a cluster with an unconfigured ceph.

1. How can i recover datas directly from raw disks?  Is this possible?
2. How can i restore a ceph cluster (and have data back) by using
existing disks?
3. How do you manage backups for ceph, in huge clusters?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados complexity

2016-06-05 Thread Mykola Dvornik
Ok, seems like my problem could be cephfs-related. I have 16 cephfs
clients that do heavy, sub-optimal writes simultaneously. The cluster
have no problems handling the load up until circa 2 kobjects.
 Above this threshold the OSDs start to go down randomly and eventually
get killed by the ceph's watchdog mechanism. The funny thing is that
CPU and HDDs are not really overloaded during this events. So I am
really puzzled at this moment.
-Mykola
-Original Message-
From: Sven Höper 
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rados complexity
Date: Sun, 05 Jun 2016 19:18:27 +0200
We've got a simple cluster having 45 OSDs, have above 5 kobjects
and did not
have any issues so far. Our cluster does mainly serve some rados pools
for an
application which usually writes data once and reads it multiple times.
- Sven
Am Sonntag, den 05.06.2016, 18:47 +0200 schrieb Mykola Dvornik:
> Are there any ceph users with pools containing >2 kobjects?
> 
> If so, have you noticed any instabilities of the clusters once this
> threshold
> is reached?
> 
> -Mykola
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] no osds in jewel

2016-06-05 Thread Jaemyoun Lee
Hi,

When I run below script to install Ceph (10.2.0), I met an error "no osds".
Hammer was installed by the script.
So I think I miss new thing, which was released since Hammer.

Do you know what I miss?

--- The script ---
#!/bin/sh

set -x

ceph-deploy new csElsa
echo "osd pool default size = 1" >> ceph.conf
ceph-deploy install csElsa csAnt csBull csCat
ceph-deploy mon create-initial
ceph-deploy mon create csElsa
ceph-deploy gatherkeys csElsa
ceph-deploy disk zap csAnt:sda
ceph-deploy disk zap csBull:sda
ceph-deploy disk zap csCat:sda
ceph-deploy osd create csAnt:sda csBull:sda csCat:sda
ceph-deploy admin csElsa csElsa csAnt csBull csCat
sudo chmod +r /etc/ceph/ceph.client.admin.keyring
ceph health
--- end ---

--- The result of "ceph -w" ---
# I blocked the IP
jae@csElsa:~/git/ceph$ ceph -w

cluster 8b2816e9-1953-4157-aaf7-95e9e668fe46
 health HEALTH_ERR
64 pgs are stuck inactive for more than 300 seconds
64 pgs stuck inactive
no osds
 monmap e1: 1 mons at {csElsa=1xx.1xx.2xx.1:6789/0}
election epoch 3, quorum 0 csElsa
 osdmap e1: 0 osds: 0 up, 0 in
flags sortbitwise
  pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
  64 creating

2016-06-06 01:59:08.054985 mon.0 [INF] from='client.?
1xx.1xx.2xx.1:0/115687' entity='client.admin' cmd='[{"prefix": "auth
get-or-create", "entity": "client.bootstrap-mds", "caps": ["mon", "allow
profile bootstrap-mds"]}]': finished
--- end ---

Best regards,
Jae

-- 
  Jaemyoun Lee

  CPS Lab. (Cyber-Physical Systems Laboratory in Hanyang University)
  E-mail : jaemy...@hanyang.ac.kr
  Website : http://cpslab.hanyang.ac.kr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados complexity

2016-06-05 Thread Sven Höper
We've got a simple cluster having 45 OSDs, have above 5 kobjects and did not
have any issues so far. Our cluster does mainly serve some rados pools for an
application which usually writes data once and reads it multiple times.

- Sven

Am Sonntag, den 05.06.2016, 18:47 +0200 schrieb Mykola Dvornik:
> Are there any ceph users with pools containing >2 kobjects?
> 
> If so, have you noticed any instabilities of the clusters once this threshold
> is reached?
> 
> -Mykola
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados complexity

2016-06-05 Thread Mykola Dvornik
Are there any ceph users with pools containing >2 kobjects?

If so, have you noticed any instabilities of the clusters once this
threshold is reached?

-Mykola___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW AWS4 SignatureDoesNotMatch when requests with port != 80 or != 443

2016-06-05 Thread Khang Nguyễn Nhật
Hi!
I get the error "  SignatureDoesNotMatch" when I used
presigned url with endpoint port != 80 and != 443. For example, if I use
host http://192.168.1.1: then this is what I have in RGW log:
//
RGWEnv::set(): HTTP_HOST: 192.168.1.1:
//
RGWEnv::set(): SERVER_PORT: 
//
HTTP_HOST=192.168.1.1:
//
SERVER_PORT=
//
host=192.168.1.1
//
canonical headers format = host:192.168.1.1::
//
canonical request = GET
/
X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=%2F20160605%2Fap%2Fs3%2Faws4_request&X-Amz-Date=20160605T125927Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host
host:192.168.1.1::

host
UNSIGNED-PAYLOAD
//
- Verifying signatures
//
failed to authorize request
//


I see this in the src / rgw / rgw_rest_s3.cc:
int RGW_Auth_S3 :: authorize_v4 () {
//
  string port = s-> info.env-> get ( 'SERVER_PORT "," ");
  secure_port string = s-> info.env-> get ( 'SERVER_PORT_SECURE "," ");
//
if (using_qs && (token == "host")) {
  if (! port.empty () && port! = "80") {
token_value = token_value + ":" + port;
  } Else if (! Secure_port.empty () && secure_port! = "443") {
token_value = token_value + ":" + secure_port;
  }
}

Is it caused my fault ? Can somebody please help me out ?
Thank !
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 403 AccessDenied with presigned url in Jewel AWS4.

2016-06-05 Thread Khang Nguyễn Nhật
Thank Robin H. Johnson!

I've set "debug rgw = 20" in RGW config file and I have seen "NOTICE: now =
1464998270, now_req = 1464973070, exp = 3600" in RGW log file. I see that
now is the local time on the RGW server (my timezone is UTC + 7) and
now_req is UTC time.  This leads to one error in src/ rgw/rgw_rest_s3.cc:
int RGW_Auth_S3::authorize_v4(..){
//
  if (now >= now_req + exp) {
dout(10) << "NOTICE: now = " << now << ", now_req = " << now_req <<
", exp = " << exp << dendl;
return -EPERM;
  }
//
Then I tried to set the time on RGW server is UTC time and it works fine !
Is this a bug?

2016-06-03 11:44 GMT+07:00 Robin H. Johnson :

> On Fri, Jun 03, 2016 at 11:34:35AM +0700, Khang Nguyễn Nhật wrote:
> > s3 = boto3.client(service_name='s3', region_name='', use_ssl=False,
> > endpoint_url='http://192.168.1.10:', aws_access_key_id=access_key,
> >   aws_secret_access_key= secret_key,
> >   config=Config(signature_version='s3v4',
> region_name=''))
> The region part doesn't seem right. Try setting it to 'ap' or
> 'ap-southeast'.
>
> Failing that, turn up the RGW loglevel to 20, and run a request, then
> look at the logs of how it created the signature, and manually compare
> them to what your client should have built (with boto in verbose
> debugging).
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
> E-Mail   : robb...@gentoo.org
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Older Ceph packages for Ubuntu 12.04 (Precise Pangolin) to recompile libvirt with RBD support

2016-06-05 Thread Cloud List
Hi,

Anyone can assist on this?

Looking forward to your reply, thank you.

Cheers.


On Fri, Jun 3, 2016 at 11:56 AM, Cloud List  wrote:

> Dear all,
>
> I am trying to setup older version of CloudStack 4.2.0 on Ubuntu 12.04 to
> use Ceph RBD as primary storage for our upgrade testing purposes. Two of
> the steps involved were to add below repository to manually compile libvirt
> to have RBD support, since the default libvirt on Ubuntu 12.04 doesn't have
> RBD support by default, unlike on Ubuntu 14.04:
>
> 
> # wget -q -O- '
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo
> apt-key add -
> OK
>
> # echo deb http://eu.ceph.com/debian-cuttlefish/ $(lsb_release -sc) main
> | sudo tee /etc/apt/sources.list.d/ceph.list
> deb http://eu.ceph.com/debian-cuttlefish/ precise main
> 
>
> But when I ran sudo apt-get update, I am receiving this error (excerpts):
>
> 
> Err http://eu.ceph.com precise/main amd64 Packages
>   404  Not Found
> Err http://eu.ceph.com precise/main i386 Packages
>   404  Not Found
>
> W: Failed to fetch
> http://eu.ceph.com/debian-cuttlefish/dists/precise/main/binary-amd64/Packages
> 404  Not Found
>
> W: Failed to fetch
> http://eu.ceph.com/debian-cuttlefish/dists/precise/main/binary-i386/Packages
> 404  Not Found
>
> E: Some index files failed to download. They have been ignored, or old
> ones used instead.
> 
>
> It seems that the repository for the particular required packages has been
> removed, anyone can advise if I can get the required packages, may be from
> a different location?
>
> Any help is greatly appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
> -ip-
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com