Re: [ceph-users] trouble starting second monitor

2014-12-01 Thread Irek Fasikhov
[celtic][DEBUG ] create the mon path if it does not exist

mkdir /var/lib/ceph/mon/

2014-12-01 4:32 GMT+03:00 K Richard Pixley r...@noir.com:

 What does this mean, please?

 --rich

 ceph@adriatic:~/my-cluster$ ceph status
 cluster 1023db58-982f-4b78-b507-481233747b13
  health HEALTH_OK
  monmap e1: 1 mons at {black=192.168.1.77:6789/0}, election epoch 2,
 quorum 0 black
  mdsmap e7: 1/1/1 up {0=adriatic=up:active}, 3 up:standby
  osdmap e17: 4 osds: 4 up, 4 in
   pgmap v48: 192 pgs, 3 pools, 1884 bytes data, 20 objects
 29134 MB used, 113 GB / 149 GB avail
  192 active+clean
 ceph@adriatic:~/my-cluster$ ceph-deploy mon create celtic
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /home/ceph/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.20): /usr/bin/ceph-deploy mon
 create celtic
 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts celtic
 [ceph_deploy.mon][DEBUG ] detecting platform for host celtic ...
 [celtic][DEBUG ] connection detected need for sudo
 [celtic][DEBUG ] connected to host: celtic
 [celtic][DEBUG ] detect platform information from remote host
 [celtic][DEBUG ] detect machine type
 [ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty
 [celtic][DEBUG ] determining if provided host has same hostname in remote
 [celtic][DEBUG ] get remote short hostname
 [celtic][DEBUG ] deploying mon to celtic
 [celtic][DEBUG ] get remote short hostname
 [celtic][DEBUG ] remote hostname: celtic
 [celtic][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
 [celtic][DEBUG ] create the mon path if it does not exist
 [celtic][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-celtic/
 done
 [celtic][DEBUG ] create a done file to avoid re-doing the mon deployment
 [celtic][DEBUG ] create the init path if it does not exist
 [celtic][DEBUG ] locating the `service` executable...
 [celtic][INFO  ] Running command: sudo initctl emit ceph-mon cluster=ceph
 id=celtic
 [celtic][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
 /var/run/ceph/ceph-mon.celtic.asok mon_status
 [celtic][ERROR ] admin_socket: exception getting command descriptions:
 [Errno 2] No such file or directory
 [celtic][WARNIN] monitor: mon.celtic, might not be running yet
 [celtic][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
 /var/run/ceph/ceph-mon.celtic.asok mon_status
 [celtic][ERROR ] admin_socket: exception getting command descriptions:
 [Errno 2] No such file or directory
 [celtic][WARNIN] celtic is not defined in `mon initial members`
 [celtic][WARNIN] monitor celtic does not exist in monmap
 [celtic][WARNIN] neither `public_addr` nor `public_network` keys are
 defined for monitors
 [celtic][WARNIN] monitors may not be able to form quorum

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant + nfs over cephfs hang tasks

2014-12-01 Thread Ilya Dryomov
On Mon, Dec 1, 2014 at 12:30 AM, Andrei Mikhailovsky and...@arhont.com wrote:

 Ilya, further to your email I have switched back to the 3.18 kernel that
 you've sent and I got similar looking dmesg output as I had on the 3.17
 kernel. Please find it attached for your reference. As before, this is the
 command I've ran on the client:


 time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct  time dd
 if=/dev/zero of=4G11 bs=4M count=5K oflag=direct time dd if=/dev/zero
 of=4G22 bs=4M count=5K oflag=direct time dd if=/dev/zero of=4G33 bs=4M
 count=5K oflag=direct  time dd if=/dev/zero of=4G44 bs=4M count=5K
 oflag=direct  time dd if=/dev/zero of=4G55 bs=4M count=5K oflag=direct
 time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct time dd
 if=/dev/zero of=4G77 bs=4M count=5K oflag=direct 

Can you run that command again - on 3.18 kernel, to completion - and
paste

- the entire dmesg
- time results for each dd

?

Compare those to your results with four dds (or any other number which
doesn't trigger page allocation failures).

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to shrink/rewrite rbd image ?

2014-12-01 Thread Alexandre DERUMIER
I think if you enable TRIM support on your RBD, then run fstrim on your 
filesystems inside the guest (assuming ext4 / XFS guest filesystem), 
Ceph should reclaim the trimmed space. 

Yes, it's working fine.

(you need to use virtio-scsi and enable discard option)


- Mail original - 

De: Daniel Swarbrick daniel.swarbr...@profitbricks.com 
À: ceph-users@lists.ceph.com 
Envoyé: Vendredi 28 Novembre 2014 17:16:14 
Objet: Re: [ceph-users] Fastest way to shrink/rewrite rbd image ? 

Take a look at 
http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim 

I think if you enable TRIM support on your RBD, then run fstrim on your 
filesystems inside the guest (assuming ext4 / XFS guest filesystem), 
Ceph should reclaim the trimmed space. 

On 28/11/14 17:05, Christoph Adomeit wrote: 
 Hi, 
 
 I would like to shrink a thin provisioned rbd image which has grown to 
 maximum. 
 90% of the data in the image is deleted data which is still hidden in the 
 image and marked as deleted. 
 
 So I think I can fill the whole Image with zeroes and then qemu-img convert 
 it. 
 So the newly created image should be only 10% of the maximum size. 
 
 I will do something like 
 qemu-img convert -O raw rbd:pool/origimage rbd:pool/smallimage 
 rbd rename origimage origimage-saved 
 rbd rename smallimage origimage 
 
 Would this be the best and fastest way or are there other ways to do this ? 
 
 Thanks 
 Christoph 
 
 
 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Daniel Schneller

Hi!

We take regular (nightly) snapshots of our Rados Gateway Pools for
backup purposes. This allows us - with some manual pokery - to restore
clients' documents should they delete them accidentally.

The cluster is a 4 server setup with 12x4TB spinning disks each,
totaling about 175TB. We are running firefly.

We have now completed our first month of snapshots and want to remove
the oldest ones. Unfortunately doing so practically kills everything
else that is using the cluster, because performance drops to almost zero
while the OSDs work their disks 100% (as per iostat). It seems this is
the same phenomenon I asked about some time ago where we were deleting
whole pools.

I could not find any way to throttle the background deletion activity
(the command returns almost immediately). Here is a graph the I/O
operations waiting (colored by device) while deleting a few snapshots.
Each of the blocks in the graph show one snapshot being removed. The
big one in the middle was a snapshot of the .rgw.buckets pool. It took
about 15 minutes during which basically nothing relying on the cluster
was working due to immense slowdowns. This included users getting 
kicked off their SSH sessions due to timeouts.


https://public.centerdevice.de/8c95f1c2-a7c3-457f-83b6-834688e0d048

While this is a big issue in itself for us, we would at least try to
estimate how long the process will take per snapshot / per pool. I
assume the time needed is a function of the number of objects that were
modified between two snapshots. We tried to get an idea of at least how
many objects were added/removed in total by running `rados df` with a
snapshot specified as a parameter, but it seems we still always get the
current values:

$ sudo rados -p .rgw df --snap backup-20141109
selected snap 13 'backup-20141109'
pool name       category                 KB      objects
.rgw            -                     276165      1368545

$ sudo rados -p .rgw df --snap backup-20141124
selected snap 28 'backup-20141124'
pool name       category                 KB      objects
.rgw            -                     276165      1368546

$ sudo rados -p .rgw df
pool name       category                 KB      objects
.rgw            -                     276165      1368547

So there are a few questions:

1) Is there any way to control how much such an operation will
tax the cluster (we would be happy to have it run longer, if that meant
not utilizing all disks fully during that time)?

2) Is there a way to get a decent approximation of how much work
deleting a specific snapshot will entail (in terms of objects, time,
whatever)?

3) Would SSD journals help here? Or any other hardware configuration
change for that matter?

4) Any other recommendations? We definitely need to remove the data,
not because of a lack of space (at least not at the moment), but because
when customers delete stuff / cancel accounts, we are obliged to remove
their data at least after a reasonable amount of time.

Cheers,
Daniel___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Dan Van Der Ster
Hi,
Which version of Ceph are you using? This could be related: 
http://tracker.ceph.com/issues/9487
See ReplicatedPG: don't move on to the next snap immediately; basically, the 
OSD is getting into a tight loop trimming the snapshot objects. The fix above 
breaks out of that loop more frequently, and then you can use the osd snap trim 
sleep option to throttle it further. I’m not sure if the fix above will be 
sufficient if you have many objects to remove per snapshot.

That commit is only in giant at the moment. The backport to dumpling is in the 
dumpling branch but not yet in a release, and firefly is still pending.
Cheers, Dan


 On 01 Dec 2014, at 10:51, Daniel Schneller 
 daniel.schnel...@centerdevice.com wrote:
 
 Hi!
 
 We take regular (nightly) snapshots of our Rados Gateway Pools for
 backup purposes. This allows us - with some manual pokery - to restore
 clients' documents should they delete them accidentally.
 
 The cluster is a 4 server setup with 12x4TB spinning disks each,
 totaling about 175TB. We are running firefly.
 
 We have now completed our first month of snapshots and want to remove
 the oldest ones. Unfortunately doing so practically kills everything
 else that is using the cluster, because performance drops to almost zero
 while the OSDs work their disks 100% (as per iostat). It seems this is
 the same phenomenon I asked about some time ago where we were deleting
 whole pools.
 
 I could not find any way to throttle the background deletion activity
 (the command returns almost immediately). Here is a graph the I/O
 operations waiting (colored by device) while deleting a few snapshots.
 Each of the blocks in the graph show one snapshot being removed. The
 big one in the middle was a snapshot of the .rgw.buckets pool. It took
 about 15 minutes during which basically nothing relying on the cluster
 was working due to immense slowdowns. This included users getting 
 kicked off their SSH sessions due to timeouts.
 
 https://public.centerdevice.de/8c95f1c2-a7c3-457f-83b6-834688e0d048
 
 While this is a big issue in itself for us, we would at least try to
 estimate how long the process will take per snapshot / per pool. I
 assume the time needed is a function of the number of objects that were
 modified between two snapshots. We tried to get an idea of at least how
 many objects were added/removed in total by running `rados df` with a
 snapshot specified as a parameter, but it seems we still always get the
 current values:
 
 $ sudo rados -p .rgw df --snap backup-20141109
 selected snap 13 'backup-20141109'
 pool name   category KB  objects
 .rgw- 276165  1368545
 
 $ sudo rados -p .rgw df --snap backup-20141124
 selected snap 28 'backup-20141124'
 pool name   category KB  objects
 .rgw- 276165  1368546
 
 $ sudo rados -p .rgw df
 pool name   category KB  objects
 .rgw- 276165  1368547
 
 So there are a few questions:
 
 1) Is there any way to control how much such an operation will
 tax the cluster (we would be happy to have it run longer, if that meant
 not utilizing all disks fully during that time)?
 
 2) Is there a way to get a decent approximation of how much work
 deleting a specific snapshot will entail (in terms of objects, time,
 whatever)?
 
 3) Would SSD journals help here? Or any other hardware configuration
 change for that matter?
 
 4) Any other recommendations? We definitely need to remove the data,
 not because of a lack of space (at least not at the moment), but because
 when customers delete stuff / cancel accounts, we are obliged to remove
 their data at least after a reasonable amount of time.
 
 Cheers,
 Daniel
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Julien Lutran
I'm sorry but the compilation still fails after including the cpp-client 
headers :



  CXX  os/libos_la-KeyValueDB.lo
os/KeyValueDB.cc: In static member function 'static KeyValueDB* 
KeyValueDB::create(CephContext*, const string, const string)':

os/KeyValueDB.cc:18:16: error: expected type-specifier before 'KineticStore'
 return new KineticStore(cct);
^
os/KeyValueDB.cc:18:16: error: expected ';' before 'KineticStore'
os/KeyValueDB.cc:18:32: error: 'KineticStore' was not declared in this scope
 return new KineticStore(cct);
^
os/KeyValueDB.cc: In static member function 'static int 
KeyValueDB::test_init(const string, const string)':

os/KeyValueDB.cc:36:12: error: 'KineticStore' has not been declared
 return KineticStore::_test_init(g_ceph_context);
^
  CXX  os/libos_la-KeyValueStore.lo
make[3]: *** [os/libos_la-KeyValueDB.lo] Error 1
make[3]: *** Waiting for unfinished jobs
In file included from os/KeyValueStore.cc:53:0:
os/KineticStore.h:13:29: fatal error: kinetic/kinetic.h: No such file or 
directory

 #include kinetic/kinetic.h
 ^
compilation terminated.
make[3]: *** [os/libos_la-KeyValueStore.lo] Error 1


-- Julien


On 11/28/2014 08:54 PM, Nigel Williams wrote:

On Sat, Nov 29, 2014 at 5:19 AM, Julien Lutran julien.lut...@ovh.net wrote:

Where can I find this kinetic devel package ?

I guess you want this (C== kinetic client)? it has kinetic.h at least.

https://github.com/Seagate/kinetic-cpp-client
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Satoru Funai
Hi guys,
I'm interested in to use key/value store as a backend of Ceph OSD.
When firefly release, LevelDB support is mentioned as experimental,
is it same status on Giant release?
Regards,

Satoru Funai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant + nfs over cephfs hang tasks

2014-12-01 Thread Andrei Mikhailovsky
Ilya, 

I will try doing that once again tonight as this is a production cluster and 
when dds trigger that dmesg error the cluster's io becomes very bad and I have 
to reboot the server to get things on track. Most of my vms start having 70-90% 
iowait until that server is rebooted. 

I've actually checked what you've asked last time i've ran the test. 

When I do 4 dds concurrently nothing aprears in the dmesg output. No messages 
at all. 

The kern.log file that i've sent last time is what I got about a minute after 
i've started 8 dds. I've pasted the full output. The 8 dds did actually 
complete, but it took a rather long time. I was getting about 6MB/s per dd 
process compared to around 70MB/s per dd process when 4 dds were running. Do 
you still want me to run this or is the information i've provided enough? 

Cheers 

Andrei 

- Original Message -

 From: Ilya Dryomov ilya.dryo...@inktank.com
 To: Andrei Mikhailovsky and...@arhont.com
 Cc: ceph-users ceph-users@lists.ceph.com, Gregory Farnum
 g...@gregs42.com
 Sent: Monday, 1 December, 2014 8:22:08 AM
 Subject: Re: [ceph-users] Giant + nfs over cephfs hang tasks

 On Mon, Dec 1, 2014 at 12:30 AM, Andrei Mikhailovsky
 and...@arhont.com wrote:
 
  Ilya, further to your email I have switched back to the 3.18 kernel
  that
  you've sent and I got similar looking dmesg output as I had on the
  3.17
  kernel. Please find it attached for your reference. As before, this
  is the
  command I've ran on the client:
 
 
  time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct  time dd
  if=/dev/zero of=4G11 bs=4M count=5K oflag=direct time dd
  if=/dev/zero
  of=4G22 bs=4M count=5K oflag=direct time dd if=/dev/zero of=4G33
  bs=4M
  count=5K oflag=direct  time dd if=/dev/zero of=4G44 bs=4M count=5K
  oflag=direct  time dd if=/dev/zero of=4G55 bs=4M count=5K
  oflag=direct
  time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct time dd
  if=/dev/zero of=4G77 bs=4M count=5K oflag=direct 

 Can you run that command again - on 3.18 kernel, to completion - and
 paste

 - the entire dmesg
 - time results for each dd

 ?

 Compare those to your results with four dds (or any other number
 which
 doesn't trigger page allocation failures).

 Thanks,

 Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

2014-12-01 Thread Ilya Dryomov
On Mon, Dec 1, 2014 at 1:09 PM, Dan Van Der Ster
daniel.vanders...@cern.ch wrote:
 Hi Ilya,

 On 28 Nov 2014, at 17:56, Ilya Dryomov ilya.dryo...@inktank.com wrote:

 On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster
 daniel.vanders...@cern.ch wrote:
 Hi Andrei,
 Yes, I’m testing from within the guest.

 Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and
 we see the reads are split into 4. (fio sees 25 iops, though iostat reports
 100 smaller iops):

 # echo 512   /sys/block/vdb/queue/max_sectors_kb  # this is the default
 # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
 --direct=1 --runtime=10s --blocksize=2m
 /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
 fio-2.0.13
 Starting 1 process
 Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0  iops] [eta
 00m:00s]

 meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e.
 512kB):

 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
 avgqu-sz   await  svctm  %util
 vdb   0.00 0.00  100.000.0050.00 0.00  1024.00
 3.02   30.25  10.00 100.00



 Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:

 # echo 4096   /sys/block/vdb/queue/max_sectors_kb
 # fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio
 --direct=1 --runtime=10s --blocksize=2m
 /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
 fio-2.0.13
 Starting 1 process
 Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0  iops] [eta
 00m:00s]

 iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):

 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
 avgqu-sz   await  svctm  %util
 vdb 300.00 0.00  100.000.00   200.00 0.00  4096.00
 0.999.94   9.94  99.40

 We set the hard request size limit to rbd object size (4M typically)

blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);


 Are you referring to librbd or krbd? My observations are limited to librbd at 
 the moment. (I didn’t try this on krbd).

Yes, I was referring to krbd.  But it looks like that patch from
Christoph will change this for qemu+librbd as well - an artificial soft
limit imposed by the VM kernel will disappear.  CC'ing Josh.


 but block layer then sets the soft limit for fs requests to 512K

   BLK_DEF_MAX_SECTORS  = 1024,

   limits-max_sectors = min_t(unsigned int, max_hw_sectors,
   BLK_DEF_MAX_SECTORS);

 which you are supposed to change on a per-device basis via sysfs.  We
 could probably raise the soft limit to rbd object size by default as
 well - I don't see any harm in that.


 Indeed, this patch which was being targeted for 3.19:

 https://lkml.org/lkml/2014/9/6/123

Oh good, I was just about to send a patch for krbd.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant + nfs over cephfs hang tasks

2014-12-01 Thread Ilya Dryomov
On Mon, Dec 1, 2014 at 1:39 PM, Andrei Mikhailovsky and...@arhont.com wrote:
 Ilya,

 I will try doing that once again tonight as this is a production cluster and
 when dds trigger that dmesg error the cluster's io becomes very bad and I
 have to reboot the server to get things on track. Most of my vms start
 having 70-90% iowait until that server is rebooted.

That's easily explained - those splats in dmesg indicate a case of a
severe memory pressure.


 I've actually checked what you've asked last time i've ran the test.

 When I do 4 dds concurrently nothing aprears in the dmesg output. No
 messages at all.

 The kern.log file that i've sent last time is what I got about a minute
 after i've started 8 dds. I've pasted the full output. The 8 dds did
 actually complete, but it took a rather long time. I was getting about 6MB/s
 per dd process compared to around 70MB/s per dd process when 4 dds were
 running. Do you still want me to run this or is the information i've
 provided enough?

No, no need if it's a production cluster.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant + nfs over cephfs hang tasks

2014-12-01 Thread Andrei Mikhailovsky
Ilya, 

I see. My server is has 24GB of ram + 3GB of swap. While running the tests, 
I've noticed that the server had 14GB of ram shown as cached and only 2MB were 
used from the swap. Not sure if this is helpful to your debugging. 

Andrei 

-- 
Andrei Mikhailovsky 
Director 
Arhont Information Security 

Web: http://www.arhont.com 
http://www.wi-foo.com 
Tel: +44 (0)870 4431337 
Fax: +44 (0)208 429 3111 
PGP: Key ID - 0x2B3438DE 
PGP: Server - keyserver.pgp.com 

DISCLAIMER 

The information contained in this email is intended only for the use of the 
person(s) to whom it is addressed and may be confidential or contain legally 
privileged information. If you are not the intended recipient you are hereby 
notified that any perusal, use, distribution, copying or disclosure is strictly 
prohibited. If you have received this email in error please immediately advise 
us by return email at and...@arhont.com and delete and purge the email and any 
attachments without making a copy. 

- Original Message -

 From: Ilya Dryomov ilya.dryo...@inktank.com
 To: Andrei Mikhailovsky and...@arhont.com
 Cc: ceph-users ceph-users@lists.ceph.com, Gregory Farnum
 g...@gregs42.com
 Sent: Monday, 1 December, 2014 11:06:37 AM
 Subject: Re: [ceph-users] Giant + nfs over cephfs hang tasks

 On Mon, Dec 1, 2014 at 1:39 PM, Andrei Mikhailovsky
 and...@arhont.com wrote:
  Ilya,
 
  I will try doing that once again tonight as this is a production
  cluster and
  when dds trigger that dmesg error the cluster's io becomes very bad
  and I
  have to reboot the server to get things on track. Most of my vms
  start
  having 70-90% iowait until that server is rebooted.

 That's easily explained - those splats in dmesg indicate a case of a
 severe memory pressure.

 
  I've actually checked what you've asked last time i've ran the
  test.
 
  When I do 4 dds concurrently nothing aprears in the dmesg output.
  No
  messages at all.
 
  The kern.log file that i've sent last time is what I got about a
  minute
  after i've started 8 dds. I've pasted the full output. The 8 dds
  did
  actually complete, but it took a rather long time. I was getting
  about 6MB/s
  per dd process compared to around 70MB/s per dd process when 4 dds
  were
  running. Do you still want me to run this or is the information
  i've
  provided enough?

 No, no need if it's a production cluster.

 Thanks,

 Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to shrink/rewrite rbd image ?

2014-12-01 Thread Daniel Swarbrick
On 01/12/14 10:22, Alexandre DERUMIER wrote:
 
 Yes, it's working fine.
 
 (you need to use virtio-scsi and enable discard option)
 

Does it work with virtio-blk if you attach the RBD as a LUN? Supposedly,
SCSI pass-through works in this mode, e.g.

disk type='block' device='lun'
  target dev='vda' bus='virtio'/
  ...
/disk

However, it seems that virtio-scsi is slowly becoming preferred over
virtio-blk. Are there any disadvantages to using virtio-scsi now? Does
it support live migration?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Daniel Schneller

On 2014-12-01 10:03:35 +, Dan Van Der Ster said:

Which version of Ceph are you using? This could be related: 
http://tracker.ceph.com/issues/9487


Firefly. I had seen this ticket earlier (when deleting a whole pool) and hoped
the backport of the fix would be available some time soon. I must admin, I did
not look this up before posting, because I had forgotten about it.

See ReplicatedPG: don't move on to the next snap immediately; 
basically, the OSD is getting into a tight loop trimming the snapshot 
objects. The fix above breaks out of that loop more frequently, and 
then you can use the osd snap trim sleep option to throttle it further. 
I’m not sure if the fix above will be sufficient if you have many 
objects to remove per snapshot.


Just so I get this right: With the fix alone you are not sure it would 
be nice

enough, so adjusting the snap trim sleep option in addition might be needed?
I assume the loop that will be broken up with 9487 does not take the sleep
time into account?

That commit is only in giant at the moment. The backport to dumpling is 
in the dumpling branch but not yet in a release, and firefly is still 
pending.


Holding my breath :)

Any thoughts on the other items I had in the original post?


2) Is there a way to get a decent approximation of how much work
deleting a specific snapshot will entail (in terms of objects, time,
whatever)?

3) Would SSD journals help here? Or any other hardware configuration
change for that matter?



Thanks!
Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Degraded

2014-12-01 Thread Georgios Dimitrakakis

Hi Andrei!

I had a similar setting with replicated size 2 and min_size also 2.

Changing that didn't change the status of the cluster.

I 've also tried to remove the pools and recreate them without success.

Removing and re-adding the OSDs also didn't have any influence!

Therefore and since I didn't have any data at all I performed a force 
recreate on all PGs and after that things went back to normal.


Thanks for your reply!


Best,


George

On Sat, 29 Nov 2014 11:39:51 + (GMT), Andrei Mikhailovsky wrote:
I think I had a similar issue recently when I've added a new pool. 
All

pgs that corresponded to the new pool were shown as degraded/unclean.
After doing a bit of testing I've realized that my issue was down to
this:

replicated size 2
min_size 2

replicated size and min size was the same. In my case, i've got 2 osd
servers with total replica of 2. The minimal size should be set to 1 
-

so that the cluster would still work with at least one PG being up.

After I've changed the min_size to 1 the cluster sorted itself out.
Try doing this for your pools.

Andrei

-


FROM: Georgios Dimitrakakis
TO: ceph-users@lists.ceph.com
SENT: Saturday, 29 November, 2014 11:13:05 AM
SUBJECT: [ceph-users] Ceph Degraded

Hi all!

I am setting UP a new cluster with 10 OSDs
and the state is degraded!

# ceph health
HEALTH_WARN 940 pgs degraded; 1536 pgs stuck unclean
#

There are only the default pools

# ceph osd lspools
0 data,1 metadata,2 rbd,

with each one having 512 pg_num and 512 pgp_num

# ceph osd dump | grep replic
pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0
object_hash
rjenkins pg_num 512 pgp_num 512 last_change 286 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 512 pgp_num 512 last_change 287 flags
hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0
object_hash
rjenkins pg_num 512 pgp_num 512 last_change 288 flags hashpspool
stripe_width 0

No data yet so is there something I can do to repair it as it is?

Best regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to shrink/rewrite rbd image ?

2014-12-01 Thread Alexandre DERUMIER
Does it work with virtio-blk if you attach the RBD as a LUN?

virtio-blk don't support discard and triming

 Supposedly, SCSI pass-through works in this mode, e.g.

SCSI pass-through works only with virtio-scsi, not virtio-blk

However, it seems that virtio-scsi is slowly becoming preferred over 
virtio-blk. Are there any disadvantages to using virtio-scsi now? 

It's a little bit slower sometimes.
(but I can be faster than virtio-blk with multiqueues and iscsi passtrough).

With librbd, I see a little slowdown vs virtio-blk (maybe 20% slower).

Does it support live migration? 
yes of course

- Mail original - 

De: Daniel Swarbrick daniel.swarbr...@profitbricks.com 
À: ceph-users@lists.ceph.com 
Envoyé: Lundi 1 Décembre 2014 13:32:15 
Objet: Re: [ceph-users] Fastest way to shrink/rewrite rbd image ? 

On 01/12/14 10:22, Alexandre DERUMIER wrote: 
 
 Yes, it's working fine. 
 
 (you need to use virtio-scsi and enable discard option) 
 

Does it work with virtio-blk if you attach the RBD as a LUN? Supposedly, 
SCSI pass-through works in this mode, e.g. 

disk type='block' device='lun' 
target dev='vda' bus='virtio'/ 
... 
/disk 

However, it seems that virtio-scsi is slowly becoming preferred over 
virtio-blk. Are there any disadvantages to using virtio-scsi now? Does 
it support live migration? 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Haomai Wang
Yeah, mainly used by test env.

On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai satoru.fu...@gmail.com wrote:

 Hi guys,
 I'm interested in to use key/value store as a backend of Ceph OSD.
 When firefly release, LevelDB support is mentioned as experimental,
 is it same status on Giant release?
 Regards,

 Satoru Funai
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Chen, Xiaoxi
We have tested it for a while, basically it seems kind of stable but show 
terrible bad performance.

This is not the fault of Ceph , but levelDB, or more generally,  all K-V 
storage with LSM design(RocksDB,etc), the LSM tree structure naturally 
introduce very large write amplification 10X to 20X when you have tens GB 
of data per OSD. So you can always see very bad sequential write performance 
(~200MB/s for a 12SSD setup), we can share more details on the performance 
meeting.

To this end,  key-value backend with LevelDB is not useable for RBD usage, but 
maybe workable(not tested) in the LOSF cases ( tons of small objects stored via 
rados , k-v backend can prevent the FS metadata become the bottleneck)

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Haomai 
Wang
Sent: Monday, December 1, 2014 9:48 PM
To: Satoru Funai
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Yeah, mainly used by test env.

On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai 
satoru.fu...@gmail.commailto:satoru.fu...@gmail.com wrote:
Hi guys,
I'm interested in to use key/value store as a backend of Ceph OSD.
When firefly release, LevelDB support is mentioned as experimental,
is it same status on Giant release?
Regards,

Satoru Funai
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] do I have to use sudo for CEPH install

2014-12-01 Thread Jiri Kanicky

Hi.

Do I have to install sudo in Debian Wheezy to deploy CEPH succesfully? I 
dont normally use sudo.


Thank you
Jiri
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Haomai Wang
Exactly, I'm just looking forward a better DB backend suitable for
KeyValueStore. It maybe traditional B-tree design.

Kinetic original I think it was a good backend, but it doesn't support
range query :-(



On Mon, Dec 1, 2014 at 10:04 PM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:

  We have tested it for a while, basically it seems kind of stable but
 show terrible bad performance.



 This is not the fault of Ceph , but levelDB, or more generally,  all K-V
 storage with LSM design(RocksDB,etc), the LSM tree structure naturally
 introduce very large write amplification 10X to 20X when you have tens
 GB of data per OSD. So you can always see very bad sequential write
 performance (~200MB/s for a 12SSD setup), we can share more details on the
 performance meeting.



 To this end,  key-value backend with LevelDB is not useable for RBD usage,
 but maybe workable(not tested) in the LOSF cases ( tons of small objects
 stored via rados , k-v backend can prevent the FS metadata become the
 bottleneck)



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *Haomai Wang
 *Sent:* Monday, December 1, 2014 9:48 PM
 *To:* Satoru Funai
 *Cc:* ceph-us...@ceph.com
 *Subject:* Re: [ceph-users] LevelDB support status is still experimental
 on Giant?



 Yeah, mainly used by test env.



 On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai satoru.fu...@gmail.com
 wrote:

 Hi guys,
 I'm interested in to use key/value store as a backend of Ceph OSD.
 When firefly release, LevelDB support is mentioned as experimental,
 is it same status on Giant release?
 Regards,

 Satoru Funai
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 --

 Best Regards,

 Wheat




-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Haomai Wang
Hmm, src/os/KeyValueDB.cc lack of lines:

#ifdef WITH_KINETIC
#include KineticStore.h
#endif


On Mon, Dec 1, 2014 at 6:14 PM, Julien Lutran julien.lut...@ovh.net wrote:

 I'm sorry but the compilation still fails after including the cpp-client
 headers :


   CXX  os/libos_la-KeyValueDB.lo
 os/KeyValueDB.cc: In static member function 'static KeyValueDB*
 KeyValueDB::create(CephContext*, const string, const string)':
 os/KeyValueDB.cc:18:16: error: expected type-specifier before
 'KineticStore'
  return new KineticStore(cct);
 ^
 os/KeyValueDB.cc:18:16: error: expected ';' before 'KineticStore'
 os/KeyValueDB.cc:18:32: error: 'KineticStore' was not declared in this
 scope
  return new KineticStore(cct);
 ^
 os/KeyValueDB.cc: In static member function 'static int
 KeyValueDB::test_init(const string, const string)':
 os/KeyValueDB.cc:36:12: error: 'KineticStore' has not been declared
  return KineticStore::_test_init(g_ceph_context);
 ^
   CXX  os/libos_la-KeyValueStore.lo
 make[3]: *** [os/libos_la-KeyValueDB.lo] Error 1
 make[3]: *** Waiting for unfinished jobs
 In file included from os/KeyValueStore.cc:53:0:
 os/KineticStore.h:13:29: fatal error: kinetic/kinetic.h: No such file or
 directory
  #include kinetic/kinetic.h
  ^
 compilation terminated.
 make[3]: *** [os/libos_la-KeyValueStore.lo] Error 1


 -- Julien



 On 11/28/2014 08:54 PM, Nigel Williams wrote:

 On Sat, Nov 29, 2014 at 5:19 AM, Julien Lutran julien.lut...@ovh.net
 wrote:

 Where can I find this kinetic devel package ?

 I guess you want this (C== kinetic client)? it has kinetic.h at least.

 https://github.com/Seagate/kinetic-cpp-client
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Chen, Xiaoxi
Range query is not that important in nowadays SSDyou can see very high read 
random read IOPS in ssd spec, and getting higher day by day.The key problem 
here is trying to exactly matching one query(get/put) to one SSD 
IO(read/write), eliminate the read/write amplification. We kind of believe 
OpenNvmKV may be the right approach.

Back to the context of Ceph,  can we find some use case of nowadays key-value 
backend?  We would like to learn from community what’s the workload pattern if 
you wants a K-V backed Ceph? Or just have a try?  I think before we get a 
suitable DB backend ,we had better off to optimize the key-value backend code 
to support specified kind of load.



From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Monday, December 1, 2014 10:14 PM
To: Chen, Xiaoxi
Cc: Satoru Funai; ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Exactly, I'm just looking forward a better DB backend suitable for 
KeyValueStore. It maybe traditional B-tree design.

Kinetic original I think it was a good backend, but it doesn't support range 
query :-(



On Mon, Dec 1, 2014 at 10:04 PM, Chen, Xiaoxi 
xiaoxi.c...@intel.commailto:xiaoxi.c...@intel.com wrote:
We have tested it for a while, basically it seems kind of stable but show 
terrible bad performance.

This is not the fault of Ceph , but levelDB, or more generally,  all K-V 
storage with LSM design(RocksDB,etc), the LSM tree structure naturally 
introduce very large write amplification 10X to 20X when you have tens GB 
of data per OSD. So you can always see very bad sequential write performance 
(~200MB/s for a 12SSD setup), we can share more details on the performance 
meeting.

To this end,  key-value backend with LevelDB is not useable for RBD usage, but 
maybe workable(not tested) in the LOSF cases ( tons of small objects stored via 
rados , k-v backend can prevent the FS metadata become the bottleneck)

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Haomai Wang
Sent: Monday, December 1, 2014 9:48 PM
To: Satoru Funai
Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Yeah, mainly used by test env.

On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai 
satoru.fu...@gmail.commailto:satoru.fu...@gmail.com wrote:
Hi guys,
I'm interested in to use key/value store as a backend of Ceph OSD.
When firefly release, LevelDB support is mentioned as experimental,
is it same status on Giant release?
Regards,

Satoru Funai
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

Best Regards,

Wheat



--

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing Snapshots Killing Cluster Performance

2014-12-01 Thread Daniel Schneller

Thanks for your input. We will see what we can find out
with the logs and how to proceed from there. 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] To clarify requirements for Monitors

2014-12-01 Thread Roman Naumenko

Thank you, Paulo.

Metadata = mds, so metadata server should have cpu power.

--Roman

On 14-11-28 05:34 PM, Paulo Almeida wrote:

On Fri, 2014-11-28 at 16:37 -0500, Roman Naumenko wrote:

And if I understand correctly, monitors are the access points to the
cluster, so they should provide enough aggregated network output for
all connected clients based on number of OSDs in the cluster?

I'm not sure what you mean by access points to the cluster, but the
monitors only provide the cluster map to the client, which then
communicates directly with the OSDs. Quoting the documentation[1]:

Ceph eliminates the centralized gateway to enable clients to interact
with Ceph OSD Daemons directly. (...) Before Ceph Clients can read or
write data, they must contact a Ceph Monitor to obtain the most recent
copy of the cluster map.

[1] http://ceph.com/docs/master/architecture/

Cheers,
Paulo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rsync mirror for repository?

2014-12-01 Thread Brian Rak

Is there a place I can download the entire repository for giant?

I'm really just looking for a rsync server that presents all the files 
here: http://download.ceph.com/ceph/giant/centos6.5/


I know that eu.ceph.com runs one, but I'm not sure how up to date that 
is (because of http://eu.ceph.com/rpm-giant/el6/x86_64/ , it has two 
versions in that directory).


Ceph is fairly critical to us, so we don't want to rely on an external 
mirror (we've had issues with other software where the files on the 
external mirror suddenly become broken).


For now, I downloaded it via 'wget -r', but this really isn't ideal.

I already tried:

$ rsync rsync://download.ceph.com
rsync: failed to connect to download.ceph.com: Connection refused (111)
$ rsync rsync://ceph.com --contimeout=2
rsync error: timeout waiting for daemon connection (code 35) at 
socket.c(279) [receiver=3.0.6]


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Satoru Funai
Hi guys,
I'm interested in to use key/value store as a backend of Ceph OSD.
When firefly release, LevelDB support is mentioned as experimental,
is it same status on Giant release?
Regards,

Satoru Funai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems with pgs incomplete

2014-12-01 Thread Butkeev Stas
Hi all,
I have Ceph cluster+rgw. Now I have problems with one of OSD, it's down now. I 
check ceph status and see this information

[root@node-1 ceph-0]# ceph -s
cluster fc8c3ecc-ccb8-4065-876c-dc9fc992d62d
 health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
unclean
 monmap e1: 3 mons at 
{a=10.29.226.39:6789/0,b=10.29.226.29:6789/0,c=10.29.226.40:6789/0}, election 
epoch 294, quorum 0,1,2 b,a,c
 osdmap e418: 6 osds: 5 up, 5 in
  pgmap v23588: 312 pgs, 16 pools, 141 kB data, 594 objects
5241 MB used, 494 GB / 499 GB avail
 308 active+clean
   4 incomplete

Why am I having 4 pgs incomplete in bucket .rgw.buckets if I am having 
replicated size 2 and min_size 2?

My osd tree
[root@node-1 ceph-0]# ceph osd tree
# idweight  type name   up/down reweight
-1  4   root croc
-2  4   region ru
-4  3   datacenter vol-5
-5  1   host node-1
0   1   osd.0   down0
-6  1   host node-2
1   1   osd.1   up  1
-7  1   host node-3
2   1   osd.2   up  1
-3  1   datacenter comp
-8  1   host node-4
3   1   osd.3   up  1
-9  1   host node-5
4   1   osd.4   up  1
-10 1   host node-6
5   1   osd.5   up  1

Addition information:

[root@node-1 ceph-0]# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
pg 13.6 is stuck inactive for 1547.665758, current state incomplete, last 
acting [1,3]
pg 13.4 is stuck inactive for 1547.652111, current state incomplete, last 
acting [1,2]
pg 13.5 is stuck inactive for 4502.009928, current state incomplete, last 
acting [1,3]
pg 13.2 is stuck inactive for 4501.979770, current state incomplete, last 
acting [1,3]
pg 13.6 is stuck unclean for 4501.969914, current state incomplete, last acting 
[1,3]
pg 13.4 is stuck unclean for 4502.001114, current state incomplete, last acting 
[1,2]
pg 13.5 is stuck unclean for 4502.009942, current state incomplete, last acting 
[1,3]
pg 13.2 is stuck unclean for 4501.979784, current state incomplete, last acting 
[1,3]
pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 13.6 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 13.4 is incomplete, acting [1,2] (reducing pool .rgw.buckets min_size from 2 
may help; search ceph.com/docs for 'incomplete')
pg 13.5 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 2 
may help; search ceph.com/docs for 'incomplete')

[root@node-1 ceph-0]# ceph osd dump | grep 'pool'
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 34 owner 18446744073709551615 flags 
hashpspool stripe_width 0
pool 2 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 36 owner 18446744073709551615 flags 
hashpspool stripe_width 0
pool 3 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 8 pgp_num 8 last_change 38 owner 18446744073709551615 flags hashpspool 
stripe_width 0
pool 4 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 39 flags hashpspool stripe_width 0
pool 5 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 40 owner 18446744073709551615 flags 
hashpspool stripe_width 0
pool 6 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 8 pgp_num 8 last_change 42 owner 18446744073709551615 flags hashpspool 
stripe_width 0
pool 7 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 44 flags hashpspool stripe_width 0
pool 8 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 46 flags hashpspool stripe_width 0
pool 9 '.usage' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 48 flags hashpspool stripe_width 0
pool 10 'test' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 136 pgp_num 136 last_change 68 flags hashpspool stripe_width 0
pool 11 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 

Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Georgios Dimitrakakis

Hi!

I had a very similar issue a few days ago.

For me it wasn't too much of a problem since the cluster was new 
without data and I could force recreate the PGs. I really hope that in 
your case it won't be necessary to do the same thing.


As a first step try to reduce the min_size from 2 to 1 as suggested for 
the .rgw.buckets pool and see if this can bring you cluster back to 
health.


Regards,

George

On Mon, 01 Dec 2014 17:09:31 +0300, Butkeev Stas wrote:

Hi all,
I have Ceph cluster+rgw. Now I have problems with one of OSD, it's
down now. I check ceph status and see this information

[root@node-1 ceph-0]# ceph -s
cluster fc8c3ecc-ccb8-4065-876c-dc9fc992d62d
 health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs
stuck unclean
 monmap e1: 3 mons at
{a=10.29.226.39:6789/0,b=10.29.226.29:6789/0,c=10.29.226.40:6789/0},
election epoch 294, quorum 0,1,2 b,a,c
 osdmap e418: 6 osds: 5 up, 5 in
  pgmap v23588: 312 pgs, 16 pools, 141 kB data, 594 objects
5241 MB used, 494 GB / 499 GB avail
 308 active+clean
   4 incomplete

Why am I having 4 pgs incomplete in bucket .rgw.buckets if I am
having replicated size 2 and min_size 2?

My osd tree
[root@node-1 ceph-0]# ceph osd tree
# idweight  type name   up/down reweight
-1  4   root croc
-2  4   region ru
-4  3   datacenter vol-5
-5  1   host node-1
0   1   osd.0   down0
-6  1   host node-2
1   1   osd.1   up  1
-7  1   host node-3
2   1   osd.2   up  1
-3  1   datacenter comp
-8  1   host node-4
3   1   osd.3   up  1
-9  1   host node-5
4   1   osd.4   up  1
-10 1   host node-6
5   1   osd.5   up  1

Addition information:

[root@node-1 ceph-0]# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
unclean

pg 13.6 is stuck inactive for 1547.665758, current state incomplete,
last acting [1,3]
pg 13.4 is stuck inactive for 1547.652111, current state incomplete,
last acting [1,2]
pg 13.5 is stuck inactive for 4502.009928, current state incomplete,
last acting [1,3]
pg 13.2 is stuck inactive for 4501.979770, current state incomplete,
last acting [1,3]
pg 13.6 is stuck unclean for 4501.969914, current state incomplete,
last acting [1,3]
pg 13.4 is stuck unclean for 4502.001114, current state incomplete,
last acting [1,2]
pg 13.5 is stuck unclean for 4502.009942, current state incomplete,
last acting [1,3]
pg 13.2 is stuck unclean for 4501.979784, current state incomplete,
last acting [1,3]
pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.6 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.4 is incomplete, acting [1,2] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.5 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')

[root@node-1 ceph-0]# ceph osd dump | grep 'pool'
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
stripe_width 0
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 34 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 2 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 36 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 3 '.rgw' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 38 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 4 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 flags
hashpspool stripe_width 0
pool 5 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 40 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 6 '.log' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 42 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 7 '.users' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 44 flags
hashpspool stripe_width 0
pool 8 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins 

Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Tomasz Kuzemko
On Mon, Dec 01, 2014 at 05:09:31PM +0300, Butkeev Stas wrote:
 Hi all,
 I have Ceph cluster+rgw. Now I have problems with one of OSD, it's down now. 
 I check ceph status and see this information
 
 [root@node-1 ceph-0]# ceph -s
 cluster fc8c3ecc-ccb8-4065-876c-dc9fc992d62d
  health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
 unclean
  monmap e1: 3 mons at 
 {a=10.29.226.39:6789/0,b=10.29.226.29:6789/0,c=10.29.226.40:6789/0}, election 
 epoch 294, quorum 0,1,2 b,a,c
  osdmap e418: 6 osds: 5 up, 5 in
   pgmap v23588: 312 pgs, 16 pools, 141 kB data, 594 objects
 5241 MB used, 494 GB / 499 GB avail
  308 active+clean
4 incomplete
 
 Why am I having 4 pgs incomplete in bucket .rgw.buckets if I am having 
 replicated size 2 and min_size 2?
 
 My osd tree
 [root@node-1 ceph-0]# ceph osd tree
 # idweight  type name   up/down reweight
 -1  4   root croc
 -2  4   region ru
 -4  3   datacenter vol-5
 -5  1   host node-1
 0   1   osd.0   down0
 -6  1   host node-2
 1   1   osd.1   up  1
 -7  1   host node-3
 2   1   osd.2   up  1
 -3  1   datacenter comp
 -8  1   host node-4
 3   1   osd.3   up  1
 -9  1   host node-5
 4   1   osd.4   up  1
 -10 1   host node-6
 5   1   osd.5   up  1
 
 Addition information:
 
 [root@node-1 ceph-0]# ceph health detail
 HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
 pg 13.6 is stuck inactive for 1547.665758, current state incomplete, last 
 acting [1,3]
 pg 13.4 is stuck inactive for 1547.652111, current state incomplete, last 
 acting [1,2]
 pg 13.5 is stuck inactive for 4502.009928, current state incomplete, last 
 acting [1,3]
 pg 13.2 is stuck inactive for 4501.979770, current state incomplete, last 
 acting [1,3]
 pg 13.6 is stuck unclean for 4501.969914, current state incomplete, last 
 acting [1,3]
 pg 13.4 is stuck unclean for 4502.001114, current state incomplete, last 
 acting [1,2]
 pg 13.5 is stuck unclean for 4502.009942, current state incomplete, last 
 acting [1,3]
 pg 13.2 is stuck unclean for 4501.979784, current state incomplete, last 
 acting [1,3]
 pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 
 2 may help; search ceph.com/docs for 'incomplete')
 pg 13.6 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 
 2 may help; search ceph.com/docs for 'incomplete')
 pg 13.4 is incomplete, acting [1,2] (reducing pool .rgw.buckets min_size from 
 2 may help; search ceph.com/docs for 'incomplete')
 pg 13.5 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 
 2 may help; search ceph.com/docs for 'incomplete')
 
 [root@node-1 ceph-0]# ceph osd dump | grep 'pool'
 pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
 pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 34 owner 18446744073709551615 flags 
 hashpspool stripe_width 0
 pool 2 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 36 owner 
 18446744073709551615 flags hashpspool stripe_width 0
 pool 3 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 38 owner 18446744073709551615 flags 
 hashpspool stripe_width 0
 pool 4 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 39 flags hashpspool stripe_width 0
 pool 5 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 40 owner 18446744073709551615 flags 
 hashpspool stripe_width 0
 pool 6 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 42 owner 18446744073709551615 flags 
 hashpspool stripe_width 0
 pool 7 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 44 flags hashpspool stripe_width 0
 pool 8 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 46 flags hashpspool 
 stripe_width 0
 pool 9 '.usage' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8 pgp_num 8 last_change 48 flags hashpspool stripe_width 0
 pool 10 'test' replicated size 2 min_size 2 crush_ruleset 0 

Re: [ceph-users] Revisiting MDS memory footprint

2014-12-01 Thread John Spray
I meant to chime in earlier here but then the weekend happened, comments inline

On Sun, Nov 30, 2014 at 7:20 PM, Wido den Hollander w...@42on.com wrote:
 Why would you want all CephFS metadata in memory? With any filesystem
 that will be a problem.

The latency associated with a cache miss (RADOS OMAP dirfrag read) is
fairly high, so the goal when sizing will to allow the MDSs to keep a
very large proportion of the metadata in RAM.  In a local FS, the
filesystem metadata in RAM is relatively small, and the speed to disk
is relatively high.  In Ceph FS, that is reversed: we want to
compensate for the cache miss latency by having lots of RAM in the MDS
and a big cache.

hot-standby MDSs are another manifestation of the expected large
cache: we expect these caches to be big, to the point where refilling
from the backing store on a failure would be annoyingly slow, and it's
worth keeping that hot standby cache.

Also, remember that because we embed inodes in dentries, when we load
a directory fragment we are also loading all the inodes in that
directory fragment -- if you have only one file open, but it has an
ancestor with lots of files, then you'll have more files in cache than
you might have expected.

 We do however need a good rule of thumb of how much memory is used for
 each inode.

Yes -- and ideally some practical measurements too :-)

One important point that I don't think anyone mentioned so far: the
memory consumption per inode depends on how many clients have
capabilities on the inode.  So if many clients hold a read capability
on a file, more memory will be used MDS-side for that file.  If
designing a benchmark for this, the client count, and level of overlap
in the client workloads would be an important dimension.

The number of *open* files on clients strongly affects the ability of
the MDS to trim is cache, since the MDS pins in cache any inode which
is in use by a client.  We recently added health checks so that the
MDS can complain about clients that are failing to respond to requests
to trim their caches, and the way we test this is to have a client
obstinately keep some number of files open.

We also allocate memory for pending metadata updates (so-called
'projected inodes') while they are in the journal, so the memory usage
will depend on the journal size and the number of writes in flight.

It would be really useful to come up with a test script that monitors
MDS memory consumption as a function of number of files in cache,
number of files opened by clients, number of clients opening the same
files.  I feel a 3d chart plot coming on :-)

Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Lionel Bouton
Le 01/12/2014 15:09, Butkeev Stas a écrit :
 pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets min_size from 
 2 may help; search ceph.com/docs for 'incomplete')
The answer is in the logs: your .rgw.buckets pool is using min_size = 2.
So it doesn't have enough valid pg replicas to start recovering.

IIRC past messages on this list you must have size  min_size to recover
from a failed OSD as Ceph doesn't try to use available data to recover
if it doesn't respect min_size.

I may be wrong here (I'm surprised you only have 4 incomplete pgs, I'd
expect ~1/3rd of your pgs to be incomplete given your ceph osd tree
output) but reducing min_size to 1 should be harmless and should
unfreeze the recovering process.

Best regards,

Lionel Bouton
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Lionel Bouton
Le 01/12/2014 17:08, Lionel Bouton a écrit :
 I may be wrong here (I'm surprised you only have 4 incomplete pgs, I'd
 expect ~1/3rd of your pgs to be incomplete given your ceph osd tree
 output) but reducing min_size to 1 should be harmless and should
 unfreeze the recovering process.

Ignore this part : I wasn't paying enough attention to the osd tree
output and mixed osd/host levels.

Others have pointed out that you have size = 3 for some pools. In this
case you might have lost an OSD before a previous recovering process
finished which would explain your current state (in this case my earlier
advice still applies).

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Revisiting MDS memory footprint

2014-12-01 Thread John Spray
On Fri, Nov 28, 2014 at 1:48 PM, Florian Haas flor...@hastexo.com wrote:
 Out of curiosity: would it matter at all whether or not a significant
 fraction of the files in CephFS were hard links? Clearly the only
 thing that differs in metadata between individual hard-linked files is
 the file name, but I wonder if the Ceph MDS actually takes this into
 consideration. In other words, I'm not sure whether the MDS simply
 adds another pointer to the same set of metadata, or whether that set
 of metadata is actually duplicated in MDS memory. I am guessing the
 latter, but it would be nice to be sure.

When we load a hard link dentry (in CDir::_omap_fetched), if we
already have the inode in cache then we just refer to that copy -- we
never have two of the same inode (CInode object) in memory.  If we
don't have the inode in cache, then the inode isn't loaded until
someone tries to traverse the dentry (i.e. touch the file in any way),
at which point we go to fetch the backtrace from the RADOS object for
that file.

So hard links may incur less memory overhead when loading a directory
fragment, but you will take an I/O hammering when dereferencing them
if the linked inode is not already in cache, as each individual hard
link has to be followed via a separate RADOS object.

In general I would be very cautious about workloads that do a lot of
reads of cold hard linked files, e.g. if benchmarking this case for
backups then you should try to create the hard links, let the files
fall out of cache, then observe the performance of a restore where
many hard links are being dereferenced via backtraces.

I'm mostly reading this from the code rather than from memory, so I'm
sure Greg or Sage will jump in if I'm getting any of these cases
wrong.

Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Butkeev Stas
Thank you Lionel,
Indeed I have forgotten about size  min_size. I have set min_size to 1 and my 
cluster is UP now. I have deleted crash osd and have set size to 3 and min_size 
to 2.

---
With regards,
Stanislav 


01.12.2014, 19:15, Lionel Bouton lionel-subscript...@bouton.name:
 Le 01/12/2014 17:08, Lionel Bouton a écrit :
  I may be wrong here (I'm surprised you only have 4 incomplete pgs, I'd
  expect ~1/3rd of your pgs to be incomplete given your ceph osd tree
  output) but reducing min_size to 1 should be harmless and should
  unfreeze the recovering process.

 Ignore this part : I wasn't paying enough attention to the osd tree
 output and mixed osd/host levels.

 Others have pointed out that you have size = 3 for some pools. In this
 case you might have lost an OSD before a previous recovering process
 finished which would explain your current state (in this case my earlier
 advice still applies).

 Best regards,

 Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to see which crush tunables are active in a ceph-cluster?

2014-12-01 Thread Udo Lembke
Hi all,
http://ceph.com/docs/master/rados/operations/crush-map/#crush-tunables
described how to set the tunables to legacy, argonaut, bobtail, firefly
or optimal.

But how can I see, which profile is active in an ceph-cluster?

With ceph osd getcrushmap I got not realy much info
(only tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50)


Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Ken Dreyer
On 11/28/14 7:04 AM, Haomai Wang wrote:
 Yeah, ceph source repo doesn't contain Kinetic header file and library
 souce, you need to install kinetic devel package separately.

Hi Haomai,

I'm wondering if we need AC_CHECK_HEADER([kinetic/kinetic.h], ...) in
configure.ac to double-check when the user specifies --with-kinetic? It
might help to avoid some user confusion if we can have ./configure bail
out early instead of continuing all the way through the build.

Something like this? (completely untested)

--- a/configure.ac
+++ b/configure.ac
@@ -557,7 +557,13 @@ AC_ARG_WITH([kinetic],
 #AS_IF([test x$with_kinetic = xyes],
 #[PKG_CHECK_MODULES([KINETIC], [kinetic_client], [], [true])])
 AS_IF([test x$with_kinetic = xyes],
-[AC_DEFINE([HAVE_KINETIC], [1], [Defined if you have
kinetic enable
+[AC_CHECK_HEADER([kinetic/kinetic.h],
+  [AC_DEFINE(
+ [HAVE_KINETIC], [1], [Defined if you have kinetic
enabled])],
+  [AC_MSG_FAILURE(
+ [Can't find kinetic headers; please install them])
+)]
+])
 AM_CONDITIONAL(WITH_KINETIC, [ test $with_kinetic = yes ])
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Julien Lutran


Sorry, It didn't change anything :

root@host:~/sources/ceph# head -12 src/os/KeyValueDB.cc
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
// vim: ts=8 sw=2 smarttab

#include KeyValueDB.h
#include LevelDBStore.h
#ifdef HAVE_LIBROCKSDB
#include RocksDBStore.h
#endif
#ifdef WITH_KINETIC
#include KineticStore.h
#endif

root@host:~/sources/ceph# make
[...]
  CXX  os/libos_la-KeyValueDB.lo
os/KeyValueDB.cc: In static member function 'static KeyValueDB* 
KeyValueDB::create(CephContext*, const string, const string)':

os/KeyValueDB.cc:21:16: error: expected type-specifier before 'KineticStore'
 return new KineticStore(cct);
^
os/KeyValueDB.cc:21:16: error: expected ';' before 'KineticStore'
os/KeyValueDB.cc:21:32: error: 'KineticStore' was not declared in this scope
 return new KineticStore(cct);
^
os/KeyValueDB.cc: In static member function 'static int 
KeyValueDB::test_init(const string, const string)':

os/KeyValueDB.cc:39:12: error: 'KineticStore' has not been declared
 return KineticStore::_test_init(g_ceph_context);
^
make[3]: *** [os/libos_la-KeyValueDB.lo] Error 1


On 12/01/2014 03:22 PM, Haomai Wang wrote:

#ifdef WITH_KINETIC
#include KineticStore.h
#endif


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Haomai Wang
Sorry, it's a typo

/WITH_KINETIC/HAVE_KINETIC/

:-)

On Tue, Dec 2, 2014 at 12:51 AM, Julien Lutran julien.lut...@ovh.net
wrote:


 Sorry, It didn't change anything :

 root@host:~/sources/ceph# head -12 src/os/KeyValueDB.cc
 // -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
 // vim: ts=8 sw=2 smarttab

 #include KeyValueDB.h
 #include LevelDBStore.h
 #ifdef HAVE_LIBROCKSDB
 #include RocksDBStore.h
 #endif
 #ifdef WITH_KINETIC
 #include KineticStore.h
 #endif

 root@host:~/sources/ceph# make
 [...]
   CXX  os/libos_la-KeyValueDB.lo
 os/KeyValueDB.cc: In static member function 'static KeyValueDB*
 KeyValueDB::create(CephContext*, const string, const string)':
 os/KeyValueDB.cc:21:16: error: expected type-specifier before
 'KineticStore'
  return new KineticStore(cct);
 ^
 os/KeyValueDB.cc:21:16: error: expected ';' before 'KineticStore'
 os/KeyValueDB.cc:21:32: error: 'KineticStore' was not declared in this
 scope
  return new KineticStore(cct);
 ^
 os/KeyValueDB.cc: In static member function 'static int
 KeyValueDB::test_init(const string, const string)':
 os/KeyValueDB.cc:39:12: error: 'KineticStore' has not been declared
  return KineticStore::_test_init(g_ceph_context);
 ^
 make[3]: *** [os/libos_la-KeyValueDB.lo] Error 1


 On 12/01/2014 03:22 PM, Haomai Wang wrote:

 #ifdef WITH_KINETIC
 #include KineticStore.h
 #endif





-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compile from source with Kinetic support

2014-12-01 Thread Haomai Wang
On Tue, Dec 2, 2014 at 12:38 AM, Ken Dreyer kdre...@redhat.com wrote:

 On 11/28/14 7:04 AM, Haomai Wang wrote:
  Yeah, ceph source repo doesn't contain Kinetic header file and library
  souce, you need to install kinetic devel package separately.

 Hi Haomai,

 I'm wondering if we need AC_CHECK_HEADER([kinetic/kinetic.h], ...) in
 configure.ac to double-check when the user specifies --with-kinetic? It
 might help to avoid some user confusion if we can have ./configure bail
 out early instead of continuing all the way through the build.

 Something like this? (completely untested)

 --- a/configure.ac
 +++ b/configure.ac
 @@ -557,7 +557,13 @@ AC_ARG_WITH([kinetic],
  #AS_IF([test x$with_kinetic = xyes],
  #[PKG_CHECK_MODULES([KINETIC], [kinetic_client], [], [true])])
  AS_IF([test x$with_kinetic = xyes],
 -[AC_DEFINE([HAVE_KINETIC], [1], [Defined if you have
 kinetic enable
 +[AC_CHECK_HEADER([kinetic/kinetic.h],
 +  [AC_DEFINE(
 + [HAVE_KINETIC], [1], [Defined if you have kinetic
 enabled])],
 +  [AC_MSG_FAILURE(
 + [Can't find kinetic headers; please install them])
 +)]
 +])
  AM_CONDITIONAL(WITH_KINETIC, [ test $with_kinetic = yes ])


Yeah, it's better. Anyone who help to add these?
You can close https://github.com/ceph/ceph/pull/3046 and create a PR. I
don't have a std-c++11 env to test it at all :-(


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimal or recommended threads values

2014-12-01 Thread Craig Lewis
I'm still using the default values, mostly because I haven't had time to
test.

On Thu, Nov 27, 2014 at 2:44 AM, Andrei Mikhailovsky and...@arhont.com
wrote:

 Hi Craig,

 Are you keeping the filestore, disk and op threads at their default
 values? or did you also change them?

 Cheers


 Tuning these values depends on a lot more than just the SSDs and HDDs.
 Which kernel and IO scheduler are you using?  Does your HBA do write
 caching?  It also depends on what your goals are.  Tuning for a RadosGW
 cluster is different that for a RDB cluster.  The short answer is that you
 are the only person that can can tell you what your optimal values are.  As
 always, the best benchmark is production load.


 In my small cluster (5 nodes, 44 osds), I'm optimizing to minimize latency
 during recovery.  When the cluster is healthy, bandwidth and latency are
 more than adequate for my needs.  Even with journals on SSDs, I've found
 that reducing the number of operations and threads has reduced my average
 latency.

 I use injectargs to try out new values while I monitor cluster latency.  I
 monitor latency while the cluster is healthy and recovering.  If a change
 is deemed better, only then will I persist the change to ceph.conf.  This
 gives me a fallback that any changes that causes massive problems can be
 undone with a restart or reboot.


 So far, the configs that I've written to ceph.conf are
 [global]
   mon osd down out interval = 900
   mon osd min down reporters = 9
   mon osd min down reports = 12
   osd pool default flag hashpspool = true

 [osd]
   osd max backfills = 1
   osd recovery max active = 1
   osd recovery op priority = 1


 I have it on my list to investigate filestore max sync interval.  And now
 that I've pasted that, I need to revisit the min down reports/reporters.  I
 have some nodes with 10 OSDs, and I don't want any one node able to mark
 the rest of the cluster as down (it happened once).




 On Sat, Nov 22, 2014 at 6:24 AM, Andrei Mikhailovsky and...@arhont.com
 wrote:

 Hello guys,

 Could some one comment on the optimal or recommended values of various
 threads values in ceph.conf?

 At the moment I have the following settings:

 filestore_op_threads = 8
 osd_disk_threads = 8
 osd_op_threads = 8
 filestore_merge_threshold = 40
 filestore_split_multiple = 8

 Are these reasonable for a small cluster made of 7.2K SAS disks with ssd
 journals with a ratio of 4:1?

 What are the settings that other people are using?

 Thanks

 Andrei



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fs-common ceph-mds on ARM Raspberry Debian 7.6

2014-12-01 Thread Florent MONTHEL
Hi Paulo,

Thanks a lot. I’ve just added into /etc/apst/sources.list below back ports:

deb http://ftp.debian.org/debian/ wheezy-backports main

And : apt-get update

But ceph-deploy still throw alerts. So I added package manually (to take them 
from wheezy-backports) :

apt-get -t wheezy-backports install ceph ceph-mds ceph-common ceph-fs-common 
gdisk

And ceph-deploy is now OK :
root@socrate:~/cluster# ceph-deploy install socrate.flox-arts.in
…
[socrate.flox-arts.in][DEBUG ] ceph version 0.80.7 
(6c0127fcb58008793d3c8b62d925bc91963672a3)

Thanks


Florent Monthel





 Le 1 déc. 2014 à 00:03, Paulo Almeida palme...@igc.gulbenkian.pt a écrit :
 
 Hi,
 
 You should be able to use the wheezy-backports repository, which has
 ceph 0.80.7.
 
 Cheers,
 Paulo
 
 On Sun, 2014-11-30 at 19:31 +0100, Florent MONTHEL wrote:
 Hi,
 
 
 I’m trying to deploy CEPH (with ceph-deploy) on Raspberry Debian 7.6
 and I have below error on ceph-deploy install command :
 
 
 
 
 [socrate.flox-arts.in][INFO  ] Running command: env
 DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get -q -o
 Dpkg::Options::=--force-confnew --no-install-recommends --assume-yes
 install -- ceph ceph-mds ceph-common ceph-fs-common gdisk
 [socrate.flox-arts.in][DEBUG ] Reading package lists...
 [socrate.flox-arts.in][DEBUG ] Building dependency tree...
 [socrate.flox-arts.in][DEBUG ] Reading state information...
 [socrate.flox-arts.in][WARNIN] E: Unable to locate package ceph-mds
 [socrate.flox-arts.in][WARNIN] E: Unable to locate package
 ceph-fs-common
 [socrate.flox-arts.in][ERROR ] RuntimeError: command returned non-zero
 exit status: 100
 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
 DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get -q -o
 Dpkg::Options::=--force-confnew --no-install-recommends --assume-yes
 install -- ceph ceph-mds ceph-common ceph-fs-common gdisk
 
 
 Do you know how I can have these 2 package on this platform ?
 Thanks
 
 
 
 Florent Monthel
 
 
 
 
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] do I have to use sudo for CEPH install

2014-12-01 Thread Lindsay Mathieson
You have to be a root user, either via login, su or sudo.

So no, you don't have to use sudo - just logon as root.

On 2 December 2014 at 00:05, Jiri Kanicky ji...@ganomi.com wrote:
 Hi.

 Do I have to install sudo in Debian Wheezy to deploy CEPH succesfully? I
 dont normally use sudo.

 Thank you
 Jiri
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:

 On 29/11/14 11:40, Yehuda Sadeh wrote:

 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:

 On 29/11/14 01:50, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:

 On 2014-11-28 15:42, Yehuda Sadeh wrote:

 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:

 On 2014-11-27 11:36, Yehuda Sadeh wrote:


 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:


 On 2014-11-27 10:21, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:



 On 2014-11-27 09:38, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:




 I've been deleting a bucket which originally had 60TB of data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?





 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda





 I've done it before, and it just returns square brackets [] (see
 below)

 radosgw-admin gc list --include-all
 []




 Do you know which of the rados pools have all that extra data? Try
 to
 list that pool's objects, verify that there are no surprises there
 (e.g., use 'rados -p pool ls').

 Yehuda




 I'm just running that command now, and its taking some time. There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?



 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose a
 list of all the bucket prefixes for the existing buckets, and try to
 look whether there are objects that have different prefixes.

 Yehuda



 Any ideas? I've found the prefix, the number of objects in the pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only
 having
 1.2 million.


 Well, the objects you're seeing are raw objects, and since rgw stripes
 the data, it is expected to have more raw objects than objects in the
 bucket. Still, it seems that you have much too many of these. You can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are not
 specified explicitly in the command you run at (3), so you might need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling up
 while
 not clearing any space.



 I did the last section:

 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 08:39, Yehuda Sadeh wrote:

On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:


On 29/11/14 11:40, Yehuda Sadeh wrote:


On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:


On 29/11/14 01:50, Yehuda Sadeh wrote:


On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:


On 2014-11-28 15:42, Yehuda Sadeh wrote:


On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:


On 2014-11-27 11:36, Yehuda Sadeh wrote:



On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:



On 2014-11-27 10:21, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email 
wrote:




On 2014-11-27 09:38, Yehuda Sadeh wrote:





On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email 
wrote:





I've been deleting a bucket which originally had 60TB of 
data

in
it,
with
our cluster doing only 1 replication, the total usage was
120TB.

I've been deleting the objects slowly using S3 browser, 
and I

can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, and 
it

doesn't
appear
to have any effect.

Is there a way to check where this space is being 
consumed?


Running 'ceph df' the USED space in the buckets pool is 
not

showing
any
of
the 57TB that should have been freed up from the deletion 
so

far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the space 
has

been
freed
from the bucket, but the cluster is all sorts of messed 
up.



ANY IDEAS? What can I look at?






Can you run 'radosgw-admin gc list --include-all'?

Yehuda






I've done it before, and it just returns square brackets [] 
(see

below)

radosgw-admin gc list --include-all
[]





Do you know which of the rados pools have all that extra 
data? Try

to
list that pool's objects, verify that there are no surprises 
there

(e.g., use 'rados -p pool ls').

Yehuda





I'm just running that command now, and its taking some time. 
There

is
a
large number of objects.

Once it has finished, what should I be looking for?




I assume the pool in question is the one that holds your 
objects

data?
You should be looking for objects that are not expected to 
exist

anymore, and objects of buckets that don't exist anymore. The
problem
here is to identify these.
I suggest starting by looking at all the existing buckets, 
compose a
list of all the bucket prefixes for the existing buckets, and 
try to

look whether there are objects that have different prefixes.

Yehuda




Any ideas? I've found the prefix, the number of objects in the 
pool

that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as 
only

having
1.2 million.



Well, the objects you're seeing are raw objects, and since rgw 
stripes
the data, it is expected to have more raw objects than objects in 
the
bucket. Still, it seems that you have much too many of these. You 
can
try to check whether there are pending multipart uploads that 
were

never completed using the S3 api.
At the moment there's no easy way to figure out which raw objects 
are

not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the 
parts.

5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects are 
not
specified explicitly in the command you run at (3), so you might 
need
a different procedure, e.g., find out the raw objects random 
string

that is being used, remove it from the list generated in 1, etc.)

That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 
--object=object

 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that the 
raw

parts are listed there
   - wait a few hours, repeat last step, see that the parts don't
appear
there anymore
   - run rados -p pool ls, check to see if the raw objects 
still

exist

Yehuda

Not sure where to go from here, and our cluster is slowly 
filling up

while
not clearing any space.




I did the last section:


I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 08:39, Yehuda Sadeh wrote:

 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:


 On 29/11/14 11:40, Yehuda Sadeh wrote:


 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:


 On 29/11/14 01:50, Yehuda Sadeh wrote:


 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:


 On 2014-11-28 15:42, Yehuda Sadeh wrote:


 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:


 On 2014-11-27 11:36, Yehuda Sadeh wrote:



 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:



 On 2014-11-27 10:21, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:




 On 2014-11-27 09:38, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email wrote:





 I've been deleting a bucket which originally had 60TB of data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?






 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda






 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []





 Do you know which of the rados pools have all that extra data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda





 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?




 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets, compose
 a
 list of all the bucket prefixes for the existing buckets, and try
 to
 look whether there are objects that have different prefixes.

 Yehuda




 Any ideas? I've found the prefix, the number of objects in the pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as only
 having
 1.2 million.



 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects in
 the
 bucket. Still, it seems that you have much too many of these. You
 can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling
 up
 while
 not clearing any space.




 I did the last section:


 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 09:25, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:

On 2014-12-02 08:39, Yehuda Sadeh wrote:


On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:



On 29/11/14 11:40, Yehuda Sadeh wrote:



On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:



On 29/11/14 01:50, Yehuda Sadeh wrote:



On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:



On 2014-11-28 15:42, Yehuda Sadeh wrote:



On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:



On 2014-11-27 11:36, Yehuda Sadeh wrote:




On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email 
wrote:




On 2014-11-27 10:21, Yehuda Sadeh wrote:





On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email 
wrote:





On 2014-11-27 09:38, Yehuda Sadeh wrote:






On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email 
wrote:






I've been deleting a bucket which originally had 60TB of 
data

in
it,
with
our cluster doing only 1 replication, the total usage 
was

120TB.

I've been deleting the objects slowly using S3 browser, 
and I

can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove --date=2014-11-20, 
and it

doesn't
appear
to have any effect.

Is there a way to check where this space is being 
consumed?


Running 'ceph df' the USED space in the buckets pool is 
not

showing
any
of
the 57TB that should have been freed up from the 
deletion so

far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the 
space

has
been
freed
from the bucket, but the cluster is all sorts of messed 
up.



ANY IDEAS? What can I look at?







Can you run 'radosgw-admin gc list --include-all'?

Yehuda







I've done it before, and it just returns square brackets 
[]

(see
below)

radosgw-admin gc list --include-all
[]






Do you know which of the rados pools have all that extra 
data?

Try
to
list that pool's objects, verify that there are no 
surprises

there
(e.g., use 'rados -p pool ls').

Yehuda






I'm just running that command now, and its taking some time.
There
is
a
large number of objects.

Once it has finished, what should I be looking for?





I assume the pool in question is the one that holds your 
objects

data?
You should be looking for objects that are not expected to 
exist

anymore, and objects of buckets that don't exist anymore. The
problem
here is to identify these.
I suggest starting by looking at all the existing buckets, 
compose

a
list of all the bucket prefixes for the existing buckets, and 
try

to
look whether there are objects that have different prefixes.

Yehuda





Any ideas? I've found the prefix, the number of objects in the 
pool

that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it as 
only

having
1.2 million.




Well, the objects you're seeing are raw objects, and since rgw
stripes
the data, it is expected to have more raw objects than objects 
in

the
bucket. Still, it seems that you have much too many of these. 
You

can
try to check whether there are pending multipart uploads that 
were

never completed using the S3 api.
At the moment there's no easy way to figure out which raw 
objects

are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object --rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all the 
parts.

5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw objects 
are

not
specified explicitly in the command you run at (3), so you 
might

need
a different procedure, e.g., find out the raw objects random 
string
that is being used, remove it from the list generated in 1, 
etc.)


That's basically it.
I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try verifying that 
it's

working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket 
--object=object

 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify that 
the raw

parts are listed there
   - wait a few hours, repeat last step, see that the parts 
don't

appear
there anymore
   - run rados -p pool ls, check to see if the raw objects 
still

exist

Yehuda

Not sure where to go from here, and our cluster is slowly 
filling

up
while
not clearing any space.





I did the last section:



I'll be interested to figure out what happened, why the garbage
collection didn't work correctly. You could try 

Re: [ceph-users] Radosgw agent only syncing metadata

2014-12-01 Thread Mark Kirkwood

On 25/11/14 12:40, Mark Kirkwood wrote:

On 25/11/14 11:58, Yehuda Sadeh wrote:

On Mon, Nov 24, 2014 at 2:43 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 22/11/14 10:54, Yehuda Sadeh wrote:


On Thu, Nov 20, 2014 at 6:52 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:




Fri Nov 21 02:13:31 2014

x-amz-copy-source:bucketbig/_multipart_big.dat.2/fjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta

/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta

2014-11-21 15:13:31.914925 7fb5e3f87700 15 generated auth header: AWS
us-west key:tk7RgBQMD92je2Nz1m2D/GV+VNM=
2014-11-21 15:13:31.914964 7fb5e3f87700 20 sending request to

http://ceph2:80/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta?rgwx-uid=us-westrgwx-region=usrgwx-prepend-metadata=us

2014-11-21 15:13:31.920510 7fb5e3f87700 10 receive_http_header
2014-11-21 15:13:31.920525 7fb5e3f87700 10 received header:HTTP/1.1
411
Length Required



It looks like you're running the wrong fastcgi module.

Yehuda



Thanks Yehuda - so what would be the right fastcgi? Do you mean
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/




This one should work, yeah.



Looks that that was the issue:

$ rados df|grep bucket
.us-east.rgw.buckets -  93740   24
00   0   3493746  216 93740
.us-east.rgw.buckets.index -  01
 00   0   24   25
270
.us-west.rgw.buckets -  93740   24
00   000  215 93740
.us-west.rgw.buckets.index -  01
 00   0   19   18
190

Now I reinstalled the Ceph patched apache2 and fastcgi module () not
sure if needed to do apache2 as well):

$ cat /etc/apt/sources.list.d/ceph.list
...
deb
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/
precise main
deb
http://gitbuilder.ceph.com/apache2-deb-precise-x86_64-basic/ref/master/
precise main

Now that I've got that working I'll look at getting a more complex setup


Just for the record, using these apache and fastcgi modules seems to be 
the story - I've managed to run through the more complicated examples:


- zones in different ceph clusters
- zones in different regions

... and get replication working (on Ubuntu 12.04 and 14.04 with Ceph 
0.87). Thanks for your help. I have some further questions that I'll ask 
in a new thread (as they are not really about 'how to make it work').


regards

Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 09:25, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 08:39, Yehuda Sadeh wrote:


 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:



 On 29/11/14 11:40, Yehuda Sadeh wrote:



 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:



 On 29/11/14 01:50, Yehuda Sadeh wrote:



 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:



 On 2014-11-28 15:42, Yehuda Sadeh wrote:



 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:



 On 2014-11-27 11:36, Yehuda Sadeh wrote:




 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:




 On 2014-11-27 10:21, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email wrote:





 On 2014-11-27 09:38, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:






 I've been deleting a bucket which originally had 60TB of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser, and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being consumed?

 Running 'ceph df' the USED space in the buckets pool is not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed up.


 ANY IDEAS? What can I look at?







 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda







 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []






 Do you know which of the rados pools have all that extra data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda






 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?





 I assume the pool in question is the one that holds your objects
 data?
 You should be looking for objects that are not expected to exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets, and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda





 Any ideas? I've found the prefix, the number of objects in the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.




 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects in
 the
 bucket. Still, it seems that you have much too many of these. You
 can
 try to check whether there are pending multipart uploads that were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1, etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the
 raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts don't
 appear
 there anymore
- run rados -p pool ls, check to see if the raw objects still
 exist

 Yehuda

 Not sure where to go from here, and our cluster is slowly filling
 up
 while
 not clearing any space.





 I 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 11:21, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:

On 2014-12-02 10:40, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:


On 2014-12-02 09:25, Yehuda Sadeh wrote:



On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:



On 2014-12-02 08:39, Yehuda Sadeh wrote:




On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:





On 29/11/14 11:40, Yehuda Sadeh wrote:





On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email 
wrote:





On 29/11/14 01:50, Yehuda Sadeh wrote:





On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email 
wrote:





On 2014-11-28 15:42, Yehuda Sadeh wrote:





On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email 
wrote:





On 2014-11-27 11:36, Yehuda Sadeh wrote:






On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email 
wrote:






On 2014-11-27 10:21, Yehuda Sadeh wrote:







On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
wrote:







On 2014-11-27 09:38, Yehuda Sadeh wrote:








On Wed, Nov 26, 2014 at 2:32 PM, b 
b@benjackson.email

wrote:








I've been deleting a bucket which originally had 
60TB of

data
in
it,
with
our cluster doing only 1 replication, the total 
usage was

120TB.

I've been deleting the objects slowly using S3 
browser,

and
I
can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin gc 
list

--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove 
--date=2014-11-20, and

it
doesn't
appear
to have any effect.

Is there a way to check where this space is being
consumed?

Running 'ceph df' the USED space in the buckets pool 
is

not
showing
any
of
the 57TB that should have been freed up from the 
deletion

so
far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the
space
has
been
freed
from the bucket, but the cluster is all sorts of 
messed

up.


ANY IDEAS? What can I look at?









Can you run 'radosgw-admin gc list --include-all'?

Yehuda









I've done it before, and it just returns square 
brackets []

(see
below)

radosgw-admin gc list --include-all
[]








Do you know which of the rados pools have all that 
extra

data?
Try
to
list that pool's objects, verify that there are no 
surprises

there
(e.g., use 'rados -p pool ls').

Yehuda








I'm just running that command now, and its taking some 
time.

There
is
a
large number of objects.

Once it has finished, what should I be looking for?







I assume the pool in question is the one that holds your
objects
data?
You should be looking for objects that are not expected 
to

exist
anymore, and objects of buckets that don't exist anymore. 
The

problem
here is to identify these.
I suggest starting by looking at all the existing 
buckets,

compose
a
list of all the bucket prefixes for the existing buckets, 
and

try
to
look whether there are objects that have different 
prefixes.


Yehuda







Any ideas? I've found the prefix, the number of objects in 
the

pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it 
as

only
having
1.2 million.






Well, the objects you're seeing are raw objects, and since 
rgw

stripes
the data, it is expected to have more raw objects than 
objects

in
the
bucket. Still, it seems that you have much too many of 
these.

You
can
try to check whether there are pending multipart uploads 
that

were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw
objects
are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object 
--rgw-cache-enabled=false

(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all 
the

parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw 
objects

are
not
specified explicitly in the command you run at (3), so you 
might

need
a different procedure, e.g., find out the raw objects 
random

string
that is being used, remove it from the list generated in 1,
etc.)

That's basically it.
I'll be interested to figure out what happened, why the 
garbage
collection didn't work correctly. You could try verifying 
that

it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket
--object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify 
that the

raw
parts are listed there
   - wait a few hours, repeat last step, see that the parts
don't
appear
there anymore
   - run rados -p pool ls, check to see if the raw 
objects

still
exist


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 11:21, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 10:40, Yehuda Sadeh wrote:


 On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:


 On 2014-12-02 09:25, Yehuda Sadeh wrote:



 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:



 On 2014-12-02 08:39, Yehuda Sadeh wrote:




 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:





 On 29/11/14 11:40, Yehuda Sadeh wrote:





 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:





 On 29/11/14 01:50, Yehuda Sadeh wrote:





 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:





 On 2014-11-28 15:42, Yehuda Sadeh wrote:





 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:





 On 2014-11-27 11:36, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email
 wrote:






 On 2014-11-27 10:21, Yehuda Sadeh wrote:







 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
 wrote:







 On 2014-11-27 09:38, Yehuda Sadeh wrote:








 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:








 I've been deleting a bucket which originally had 60TB
 of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage
 was
 120TB.

 I've been deleting the objects slowly using S3 browser,
 and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc
 list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20,
 and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being
 consumed?

 Running 'ceph df' the USED space in the buckets pool is
 not
 showing
 any
 of
 the 57TB that should have been freed up from the
 deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the
 space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed
 up.


 ANY IDEAS? What can I look at?









 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda









 I've done it before, and it just returns square brackets
 []
 (see
 below)

 radosgw-admin gc list --include-all
 []








 Do you know which of the rados pools have all that extra
 data?
 Try
 to
 list that pool's objects, verify that there are no
 surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda








 I'm just running that command now, and its taking some
 time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?







 I assume the pool in question is the one that holds your
 objects
 data?
 You should be looking for objects that are not expected to
 exist
 anymore, and objects of buckets that don't exist anymore.
 The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets,
 and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda







 Any ideas? I've found the prefix, the number of objects in
 the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.






 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects
 in
 the
 bucket. Still, it seems that you have much too many of these.
 You
 can
 try to check whether there are pending multipart uploads that
 were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw
 objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects
 are
 not
 specified explicitly in the command you run at (3), so you
 might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1,
 etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the
 garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket
 --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that
 the
 raw
 parts are 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 11:25, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:

On 2014-12-02 11:21, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:


On 2014-12-02 10:40, Yehuda Sadeh wrote:



On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:



On 2014-12-02 09:25, Yehuda Sadeh wrote:




On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:




On 2014-12-02 08:39, Yehuda Sadeh wrote:





On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email 
wrote:






On 29/11/14 11:40, Yehuda Sadeh wrote:






On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email 
wrote:






On 29/11/14 01:50, Yehuda Sadeh wrote:






On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email 
wrote:






On 2014-11-28 15:42, Yehuda Sadeh wrote:






On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email 
wrote:






On 2014-11-27 11:36, Yehuda Sadeh wrote:







On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email
wrote:







On 2014-11-27 10:21, Yehuda Sadeh wrote:








On Wed, Nov 26, 2014 at 3:09 PM, b 
b@benjackson.email

wrote:








On 2014-11-27 09:38, Yehuda Sadeh wrote:









On Wed, Nov 26, 2014 at 2:32 PM, b 
b@benjackson.email

wrote:









I've been deleting a bucket which originally had 
60TB

of
data
in
it,
with
our cluster doing only 1 replication, the total 
usage

was
120TB.

I've been deleting the objects slowly using S3 
browser,

and
I
can
see
the
bucket usage is now down to around 2.5TB or 5TB 
with

duplication,
but
the
usage in the cluster has not changed.

I've looked at garbage collection (radosgw-admin 
gc

list
--include
all)
and
it just reports square brackets []

I've run radosgw-admin temp remove 
--date=2014-11-20,

and
it
doesn't
appear
to have any effect.

Is there a way to check where this space is being
consumed?

Running 'ceph df' the USED space in the buckets 
pool is

not
showing
any
of
the 57TB that should have been freed up from the
deletion
so
far.

Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that 
the

space
has
been
freed
from the bucket, but the cluster is all sorts of 
messed

up.


ANY IDEAS? What can I look at?










Can you run 'radosgw-admin gc list --include-all'?

Yehuda










I've done it before, and it just returns square 
brackets

[]
(see
below)

radosgw-admin gc list --include-all
[]









Do you know which of the rados pools have all that 
extra

data?
Try
to
list that pool's objects, verify that there are no
surprises
there
(e.g., use 'rados -p pool ls').

Yehuda









I'm just running that command now, and its taking some
time.
There
is
a
large number of objects.

Once it has finished, what should I be looking for?








I assume the pool in question is the one that holds 
your

objects
data?
You should be looking for objects that are not expected 
to

exist
anymore, and objects of buckets that don't exist 
anymore.

The
problem
here is to identify these.
I suggest starting by looking at all the existing 
buckets,

compose
a
list of all the bucket prefixes for the existing 
buckets,

and
try
to
look whether there are objects that have different 
prefixes.


Yehuda








Any ideas? I've found the prefix, the number of objects 
in

the
pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports 
it as

only
having
1.2 million.







Well, the objects you're seeing are raw objects, and 
since rgw

stripes
the data, it is expected to have more raw objects than 
objects

in
the
bucket. Still, it seems that you have much too many of 
these.

You
can
try to check whether there are pending multipart uploads 
that

were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw
objects
are
not supposed to exist. The process would be like this:
1. rados ls -p data pool
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=bucket --object=object 
--rgw-cache-enabled=false

(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all 
the

parts.
5. sort result of (4), compare it to (1)

Note that if you're running firefly or later, the raw 
objects

are
not
specified explicitly in the command you run at (3), so 
you

might
need
a different procedure, e.g., find out the raw objects 
random

string
that is being used, remove it from the list generated in 
1,

etc.)

That's basically it.
I'll be interested to figure out what happened, why the
garbage
collection didn't work correctly. You could try verifying 
that

it's
working by:
   - create an object (let's say ~10MB in size).
   - radosgw-admin object stat --bucket=bucket
--object=object
 (keep this info, see
   - remove the object
   - run radosgw-admin gc list --include-all and verify 
that

the
raw
parts are listed there
   - wait a few hours, 

Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 3:47 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 10:40, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 3:20 PM, Ben b@benjackson.email wrote:

 On 2014-12-02 09:25, Yehuda Sadeh wrote:


 On Mon, Dec 1, 2014 at 2:10 PM, Ben b@benjackson.email wrote:


 On 2014-12-02 08:39, Yehuda Sadeh wrote:



 On Sat, Nov 29, 2014 at 2:26 PM, Ben b@benjackson.email wrote:




 On 29/11/14 11:40, Yehuda Sadeh wrote:




 On Fri, Nov 28, 2014 at 1:38 PM, Ben b@benjackson.email wrote:




 On 29/11/14 01:50, Yehuda Sadeh wrote:




 On Thu, Nov 27, 2014 at 9:22 PM, Ben b@benjackson.email wrote:




 On 2014-11-28 15:42, Yehuda Sadeh wrote:




 On Thu, Nov 27, 2014 at 2:15 PM, b b@benjackson.email wrote:




 On 2014-11-27 11:36, Yehuda Sadeh wrote:





 On Wed, Nov 26, 2014 at 3:49 PM, b b@benjackson.email wrote:





 On 2014-11-27 10:21, Yehuda Sadeh wrote:






 On Wed, Nov 26, 2014 at 3:09 PM, b b@benjackson.email
 wrote:






 On 2014-11-27 09:38, Yehuda Sadeh wrote:







 On Wed, Nov 26, 2014 at 2:32 PM, b b@benjackson.email
 wrote:







 I've been deleting a bucket which originally had 60TB of
 data
 in
 it,
 with
 our cluster doing only 1 replication, the total usage was
 120TB.

 I've been deleting the objects slowly using S3 browser,
 and
 I
 can
 see
 the
 bucket usage is now down to around 2.5TB or 5TB with
 duplication,
 but
 the
 usage in the cluster has not changed.

 I've looked at garbage collection (radosgw-admin gc list
 --include
 all)
 and
 it just reports square brackets []

 I've run radosgw-admin temp remove --date=2014-11-20, and
 it
 doesn't
 appear
 to have any effect.

 Is there a way to check where this space is being
 consumed?

 Running 'ceph df' the USED space in the buckets pool is
 not
 showing
 any
 of
 the 57TB that should have been freed up from the deletion
 so
 far.

 Running 'radosgw-admin bucket stats | jshon | grep
 size_kb_actual'
 and
 adding up all the buckets usage, this shows that the
 space
 has
 been
 freed
 from the bucket, but the cluster is all sorts of messed
 up.


 ANY IDEAS? What can I look at?








 Can you run 'radosgw-admin gc list --include-all'?

 Yehuda








 I've done it before, and it just returns square brackets []
 (see
 below)

 radosgw-admin gc list --include-all
 []







 Do you know which of the rados pools have all that extra
 data?
 Try
 to
 list that pool's objects, verify that there are no surprises
 there
 (e.g., use 'rados -p pool ls').

 Yehuda







 I'm just running that command now, and its taking some time.
 There
 is
 a
 large number of objects.

 Once it has finished, what should I be looking for?






 I assume the pool in question is the one that holds your
 objects
 data?
 You should be looking for objects that are not expected to
 exist
 anymore, and objects of buckets that don't exist anymore. The
 problem
 here is to identify these.
 I suggest starting by looking at all the existing buckets,
 compose
 a
 list of all the bucket prefixes for the existing buckets, and
 try
 to
 look whether there are objects that have different prefixes.

 Yehuda






 Any ideas? I've found the prefix, the number of objects in the
 pool
 that
 match that prefix numbers in the 21 millions
 The actual 'radosgw-admin bucket stats' command reports it as
 only
 having
 1.2 million.





 Well, the objects you're seeing are raw objects, and since rgw
 stripes
 the data, it is expected to have more raw objects than objects
 in
 the
 bucket. Still, it seems that you have much too many of these.
 You
 can
 try to check whether there are pending multipart uploads that
 were
 never completed using the S3 api.
 At the moment there's no easy way to figure out which raw
 objects
 are
 not supposed to exist. The process would be like this:
 1. rados ls -p data pool
 keep the list sorted
 2. list objects in the bucket
 3. for each object in (2), do: radosgw-admin object stat
 --bucket=bucket --object=object --rgw-cache-enabled=false
 (disabling the cache so that it goes quicker)
 4. look at the result of (3), and generate a list of all the
 parts.
 5. sort result of (4), compare it to (1)

 Note that if you're running firefly or later, the raw objects
 are
 not
 specified explicitly in the command you run at (3), so you might
 need
 a different procedure, e.g., find out the raw objects random
 string
 that is being used, remove it from the list generated in 1,
 etc.)

 That's basically it.
 I'll be interested to figure out what happened, why the garbage
 collection didn't work correctly. You could try verifying that
 it's
 working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=bucket
 --object=object
  (keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify that the
 raw
 parts are listed there
- wait a few hours, repeat last step, see that the parts
 don't
 appear
 there anymore
- run rados -p pool ls, check to see if 

[ceph-users] Incomplete PGs

2014-12-01 Thread Aaron Bassett
Hi all, I have a problem with some incomplete pgs. Here’s the backstory: I had 
a pool that I had accidently left with a size of 2. On one of the ods nodes, 
the system hdd started to fail and I attempted to rescue it by sacrificing one 
of my osd nodes. That went ok and I was able to bring the node back up minus 
the one osd. Now I have 11 incomplete osds. I believe these are mostly from the 
pool that only had size two, but I cant tell for sure. I found another thread 
on here that talked about using ceph_objectstore_tool to add or remove pg data 
to get out of an incomplete state. 

Let’s start with the one pg I’ve been playing with, this is a loose description 
of where I’ve been. First I saw that it had the missing osd in 
“down_osds_we_would_probe” when I queried it, and some reading around that told 
me to recreate the missing osd, so I did that. It (obviously) didnt have the 
missing data, but it took the pg from down+incomplete to just incomplete. Then 
I tried pg_force_create and that didnt seem to make a difference. Some more 
googling then brought me to ceph_objectstore_tool and I started to take a 
closer look at the results from pg query. I noticed that the list of probing 
osds gets longer and longer till the end of the query has something like:

 probing_osds: [
   0,
   3,
   4,
   16,
   23,
   26,
   35,
   41,
   44,
   51,
   56”],

So I took a look at those osds and noticed that some of them have data in the 
directory for the troublesome pg and others dont. So I tried picking one with 
the *most* data and i used ceph_objectstore_tool to export the pg. It was  6G 
so a fair amount of data is still there. I then imported it (after removing) 
into all the others in that list. Unfortunately, it is still incomplete. I’m 
not sure what my next step should be here. Here’s some other stuff from the 
query on it:

info: { pgid: 0.63b,
last_update: 50495'8246,
last_complete: 50495'8246,
log_tail: 20346'5245,
last_user_version: 8246,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 1,
last_epoch_started: 51102,
last_epoch_clean: 50495,
last_epoch_split: 0,
same_up_since: 68312,
same_interval_since: 68312,
same_primary_since: 68190,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486},
stats: { version: 50495'8246,
reported_seq: 84279,
reported_epoch: 69394,
state: down+incomplete,
last_fresh: 2014-12-01 23:23:07.355308,
last_change: 2014-12-01 21:28:52.771807,
last_active: 2014-11-24 13:37:09.784417,
last_clean: 2014-11-22 21:59:49.821836,
last_became_active: 0.00,
last_unstale: 2014-12-01 23:23:07.355308,
last_undegraded: 2014-12-01 23:23:07.355308,
last_fullsized: 2014-12-01 23:23:07.355308,
mapping_epoch: 68285,
log_start: 20346'5245,
ondisk_log_start: 20346'5245,
created: 1,
last_epoch_clean: 50495,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486,
log_size: 3001,
ondisk_log_size: 3001,


Also in the peering section, all the peers now have the same last_update: which 
makes me think it should just pick up and take off. 

There is another think I’m having problems with and I’m not sure if it’s 
related or not. I set a crush map manually as I have a mix of ssd and platter 
osds and it seems to work when I set it, the cluster starts rebalancing, etc, 
but if I do a restart ceph-all on all my nodes the crush maps seems to revert 
to the one I didn’t set. I don’t know if its being blocked from taking by these 
incomplete pgs or if I’m missing a step to get it to “stick” It makes me think 
when I’m stopping and starting these osds to use ceph_objectstore_tool on them 
they may be getting out of sync with the cluster.

Any insights would be greatly appreciated,

Aaron 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client forward compatibility

2014-12-01 Thread Gregory Farnum
On Tue, Nov 25, 2014 at 1:00 AM, Dan Van Der Ster
daniel.vanders...@cern.ch wrote:
 Hi Greg,


 On 24 Nov 2014, at 22:01, Gregory Farnum g...@gregs42.com wrote:

 On Thu, Nov 20, 2014 at 9:08 AM, Dan van der Ster
 daniel.vanders...@cern.ch wrote:
 Hi all,
 What is compatibility/incompatibility of dumpling clients to talk to firefly
 and giant clusters?

 We sadly don't have a good matrix about this yet, but in general you
 should assume that anything which changed the way the data is
 physically placed on the cluster will prevent them from communicating;
 if you don't enable those features then they should remain compatible.


 It would be good to have such a compat matrix, as I was confused, probably 
 others are confused, and if I’m not wrong, even you are confused ... see 
 below.


 In particular

 I know that tunables=firefly will prevent dumpling
 clients from talking to a firefly cluster, but how about the existence or
 not of erasure pools?

 As you mention, updating the tunables will prevent old clients from
 accessing them (although that shouldn't be the case in future now that
 they're all set by the crush map for later interpretation). Erasure
 pools are a special case (precisely because people had issues with
 them) and you should be able to communicate with a cluster that has EC
 pools while using old clients


 That’s what we’d hoped, but alas we get the same error mentioned here: 
 http://tracker.ceph.com/issues/8178
 In our case (0.67.11 clients talking to the latest firefly gitbuilder build) 
 we get:
protocol feature mismatch, my 407  peer 417 missing 
 10

 By adding an EC pool, we lose connectivity for dumpling clients to even the 
 replicated pools. The good news is that when we remove the EC pool, the 
 10 feature bit is removed so dumpling clients can connect again. But 
 nevertheless it leaves open the possibility of accidentally breaking the 
 users’ access.

yep. Sorry, apparently we tried to do this and didn't quite make
it all the way. :/

We discussed last week trying to build and maintain a forward
compatibility matrix briefly, but haven't done it yet. There's one
floating around somewhere in the docs for the kernel client but a
userspace one just hasn't been anything people have asked for
previously, so we never thought of it. Meanwhile, I'm sure it's not
the most pleasant way to do things but if you go over the upgrade
notes for each major release they should include the possible break
points.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant + nfs over cephfs hang tasks

2014-12-01 Thread Gregory Farnum
On Sun, Nov 30, 2014 at 1:15 PM, Andrei Mikhailovsky and...@arhont.com wrote:
 Greg, thanks for your comment. Could you please share what OS, kernel and
 any nfs/cephfs settings you've used to achieve the pretty well stability?
 Also, what kind of tests have you ran to check that?


We're just doing it on our testing cluster with the
teuthology/ceph-qa-suite stuff in
https://github.com/ceph/ceph-qa-suite/tree/master/suites/knfs/basic
So that'll be running our ceph-client kernel, which I believe is
usually a recent rc release with the new Ceph changes on top, with
knfs exporting a kcephfs mount, and then running each of the tasks
named in the tasks folder on top of a client to that knfs export.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Revisiting MDS memory footprint

2014-12-01 Thread Gregory Farnum
On Mon, Dec 1, 2014 at 8:06 AM, John Spray john.sp...@redhat.com wrote:
 I meant to chime in earlier here but then the weekend happened, comments 
 inline

 On Sun, Nov 30, 2014 at 7:20 PM, Wido den Hollander w...@42on.com wrote:
 Why would you want all CephFS metadata in memory? With any filesystem
 that will be a problem.

 The latency associated with a cache miss (RADOS OMAP dirfrag read) is
 fairly high, so the goal when sizing will to allow the MDSs to keep a
 very large proportion of the metadata in RAM.  In a local FS, the
 filesystem metadata in RAM is relatively small, and the speed to disk
 is relatively high.  In Ceph FS, that is reversed: we want to
 compensate for the cache miss latency by having lots of RAM in the MDS
 and a big cache.

 hot-standby MDSs are another manifestation of the expected large
 cache: we expect these caches to be big, to the point where refilling
 from the backing store on a failure would be annoyingly slow, and it's
 worth keeping that hot standby cache.

I actually don't think the cache misses should be *dramatically* more
expensive than local FS misses. They'll be larger since it's remote
and a leveldb lookup is a bit slower than hitting the rest spot on
disk, but everything's nicely streamed in and such so it's not too
bad.
But I'm also making this up as much as you are the rest of it, which
looks good to me. :)

The one thing I'd also bring up is just to be a bit more explicit
about CephFS in-memory inode size having nothing to do with that of a
local FS. We don't need to keep track of things like block locations,
but we do keep track of file capabilities (leases) and a whole bunch
of other state like the scrubbing/fsck status of it (coming soon!),
the clean/dirty status in a lot more detail than the kernel does, any
old versions of the inode that have been snapshotted, etc etc etc.
Once upon a time Sage did have some numbers indicating that a cached
dentry took about 1KB, but things change in both directions pretty
frequently and memory use will likely be a thing we look at around the
time we're wondering if we should declare CephFS to be ready for
community use in production previews.
-Greg


 Also, remember that because we embed inodes in dentries, when we load
 a directory fragment we are also loading all the inodes in that
 directory fragment -- if you have only one file open, but it has an
 ancestor with lots of files, then you'll have more files in cache than
 you might have expected.

 We do however need a good rule of thumb of how much memory is used for
 each inode.

 Yes -- and ideally some practical measurements too :-)

 One important point that I don't think anyone mentioned so far: the
 memory consumption per inode depends on how many clients have
 capabilities on the inode.  So if many clients hold a read capability
 on a file, more memory will be used MDS-side for that file.  If
 designing a benchmark for this, the client count, and level of overlap
 in the client workloads would be an important dimension.

 The number of *open* files on clients strongly affects the ability of
 the MDS to trim is cache, since the MDS pins in cache any inode which
 is in use by a client.  We recently added health checks so that the
 MDS can complain about clients that are failing to respond to requests
 to trim their caches, and the way we test this is to have a client
 obstinately keep some number of files open.

 We also allocate memory for pending metadata updates (so-called
 'projected inodes') while they are in the journal, so the memory usage
 will depend on the journal size and the number of writes in flight.

 It would be really useful to come up with a test script that monitors
 MDS memory consumption as a function of number of files in cache,
 number of files opened by clients, number of clients opening the same
 files.  I feel a 3d chart plot coming on :-)

 Cheers,
 John
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble starting second monitor

2014-12-01 Thread K Richard Pixley

Hm.  Already exists.

And now I'm completely confused.  Ok, so I'm trying to start over. I've 
ceph-deploy purge'd all my machines a few times with ceph-deploy 
purgedata intermixed.  I've manually removed all the files I could see 
that were generated, except my osd directories, which I apparently can't 
remove.


ceph@adriatic:~$ sudo rm -rf osd
rm: cannot remove â: Operation not permitted
rm: cannot remove â: Operation not permitted
rm: cannot remove â: Operation not permitted

What's up with that and how do I get rid of it in order to start over?

--rich

On 12/1/14 00:01 , Irek Fasikhov wrote:

[celtic][DEBUG ] create the mon path if it does not exist

mkdir /var/lib/ceph/mon/

2014-12-01 4:32 GMT+03:00 K Richard Pixley r...@noir.com 
mailto:r...@noir.com:


What does this mean, please?

--rich

ceph@adriatic:~/my-cluster$ ceph status
cluster 1023db58-982f-4b78-b507-481233747b13
 health HEALTH_OK
 monmap e1: 1 mons at {black=192.168.1.77:6789/0
http://192.168.1.77:6789/0}, election epoch 2, quorum 0 black
 mdsmap e7: 1/1/1 up {0=adriatic=up:active}, 3 up:standby
 osdmap e17: 4 osds: 4 up, 4 in
  pgmap v48: 192 pgs, 3 pools, 1884 bytes data, 20 objects
29134 MB used, 113 GB / 149 GB avail
 192 active+clean
ceph@adriatic:~/my-cluster$ ceph-deploy mon create celtic
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.20): /usr/bin/ceph-deploy
mon create celtic
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts celtic
[ceph_deploy.mon][DEBUG ] detecting platform for host celtic ...
[celtic][DEBUG ] connection detected need for sudo
[celtic][DEBUG ] connected to host: celtic
[celtic][DEBUG ] detect platform information from remote host
[celtic][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty
[celtic][DEBUG ] determining if provided host has same hostname in
remote
[celtic][DEBUG ] get remote short hostname
[celtic][DEBUG ] deploying mon to celtic
[celtic][DEBUG ] get remote short hostname
[celtic][DEBUG ] remote hostname: celtic
[celtic][DEBUG ] write cluster configuration to
/etc/ceph/{cluster}.conf
[celtic][DEBUG ] create the mon path if it does not exist
[celtic][DEBUG ] checking for done path:
/var/lib/ceph/mon/ceph-celtic/done
[celtic][DEBUG ] create a done file to avoid re-doing the mon
deployment
[celtic][DEBUG ] create the init path if it does not exist
[celtic][DEBUG ] locating the `service` executable...
[celtic][INFO  ] Running command: sudo initctl emit ceph-mon
cluster=ceph id=celtic
[celtic][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.celtic.asok mon_status
[celtic][ERROR ] admin_socket: exception getting command
descriptions: [Errno 2] No such file or directory
[celtic][WARNIN] monitor: mon.celtic, might not be running yet
[celtic][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.celtic.asok mon_status
[celtic][ERROR ] admin_socket: exception getting command
descriptions: [Errno 2] No such file or directory
[celtic][WARNIN] celtic is not defined in `mon initial members`
[celtic][WARNIN] monitor celtic does not exist in monmap
[celtic][WARNIN] neither `public_addr` nor `public_network` keys
are defined for monitors
[celtic][WARNIN] monitors may not be able to form quorum

___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Yehuda Sadeh
On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote:
 On 2014-12-02 11:25, Yehuda Sadeh wrote:

 On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:

...

 How can I tell if the shard has an object in it from the logs?




 Search for a different sequence (e.g., search for rgw.gc_remove).

 Yehuda





 0 Results in the logs for rgw.gc_remove



 Well, something is modifying the gc log. Do you happen to have more
 than one radosgw running on the same cluster?

 Yehuda



 We have 2 radosgw servers
 obj01 and obj02


 Are both of them pointing at the same zone?


 Yes, they are load balanced

Well, the gc log show entries, and then it doesn't, so something
clears these up. Try reproducing again with logs on, see if you see
new entries in the rgw logs. If you don't see these, maybe try turning
on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms
1'), and look in your osd logs for such messages. These might give you
some hint for their origin.
Also, could it be that you ran 'radosgw-admin gc process', instead of
waiting for the gc cycle to complete?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting buckets and objects fails to reduce reported cluster usage

2014-12-01 Thread Ben

On 2014-12-02 15:03, Yehuda Sadeh wrote:

On Mon, Dec 1, 2014 at 4:26 PM, Ben b@benjackson.email wrote:

On 2014-12-02 11:25, Yehuda Sadeh wrote:


On Mon, Dec 1, 2014 at 4:23 PM, Ben b@benjackson.email wrote:


...


How can I tell if the shard has an object in it from the logs?





Search for a different sequence (e.g., search for rgw.gc_remove).

Yehuda






0 Results in the logs for rgw.gc_remove




Well, something is modifying the gc log. Do you happen to have more
than one radosgw running on the same cluster?

Yehuda




We have 2 radosgw servers
obj01 and obj02



Are both of them pointing at the same zone?



Yes, they are load balanced


Well, the gc log show entries, and then it doesn't, so something
clears these up. Try reproducing again with logs on, see if you see
new entries in the rgw logs. If you don't see these, maybe try turning
on 'debug ms = 1' on your osds (ceph tell osd.* injectargs '--debug_ms
1'), and look in your osd logs for such messages. These might give you
some hint for their origin.
Also, could it be that you ran 'radosgw-admin gc process', instead of
waiting for the gc cycle to complete?

Yehuda


I did anohter test, this time with a 600mb file. I uploaded it, then 
deleted the file and did a gc list --include all.
It displayed around 143 _shadow_ files. I let GC process itself (I did 
not force this process) and I checked the pool afterward by running 
'rados ls -p .rgw.buckets | grep gc-listed-shadowfiles' and they no 
longer exist.


I've added the debug ms to the OSDs, I'll do another test with the 600mb 
file.


Also worth noting, I have started clearing out files from the 
.rgw.buckets pool that are from a bucket which has been deleted and no 
longer visible by running 'rados -p .rgw.gc rm' over all the _shadow_ 
files contained in that bucket prefix default.4804.14__shadow_

Is this alright to do, or is there a better way to clear out files?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Chen, Xiaoxi
Compared to Filestore on SSD(We run levelDB on top of SSD). The usage pattern 
is RBD sequential write(64K * QD8) and random write( 4K * QD8), read seems on 
par.

I would suspect KV backend on HDD will be even worse ,compared to Filestore on 
HDD.

From: Satoru Funai [mailto:satoru.fu...@gmail.com]
Sent: Tuesday, December 2, 2014 1:27 PM
To: Chen, Xiaoxi
Cc: ceph-us...@ceph.com; Haomai Wang
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Hi Xiaoxi,
Thanks for very useful information.
Can you share more details about Terrible bad performance is compare against 
what? and what kind of usage pattern?
I'm just interested in key/value backend for more cost/performance without 
expensive HW such as ssd/fusion io.
Regards,
Satoru Funai

差出人: Xiaoxi Chen xiaoxi.c...@intel.commailto:xiaoxi.c...@intel.com
宛先: Haomai Wang haomaiw...@gmail.commailto:haomaiw...@gmail.com
Cc: Satoru Funai satoru.fu...@gmail.commailto:satoru.fu...@gmail.com, 
ceph-us...@ceph.commailto:ceph-us...@ceph.com
送信済み: 2014年12月1日, 月曜日 午後 11:26:56
件名: RE: [ceph-users] LevelDB support status is still experimental on Giant?
Range query is not that important in nowadays SSDyou can see very high read 
random read IOPS in ssd spec, and getting higher day by day.The key problem 
here is trying to exactly matching one query(get/put) to one SSD 
IO(read/write), eliminate the read/write amplification. We kind of believe 
OpenNvmKV may be the right approach.

Back to the context of Ceph,  can we find some use case of nowadays key-value 
backend?  We would like to learn from community what’s the workload pattern if 
you wants a K-V backed Ceph? Or just have a try?  I think before we get a 
suitable DB backend ,we had better off to optimize the key-value backend code 
to support specified kind of load.



From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Monday, December 1, 2014 10:14 PM
To: Chen, Xiaoxi
Cc: Satoru Funai; ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Exactly, I'm just looking forward a better DB backend suitable for 
KeyValueStore. It maybe traditional B-tree design.

Kinetic original I think it was a good backend, but it doesn't support range 
query :-(



On Mon, Dec 1, 2014 at 10:04 PM, Chen, Xiaoxi 
xiaoxi.c...@intel.commailto:xiaoxi.c...@intel.com wrote:
We have tested it for a while, basically it seems kind of stable but show 
terrible bad performance.

This is not the fault of Ceph , but levelDB, or more generally,  all K-V 
storage with LSM design(RocksDB,etc), the LSM tree structure naturally 
introduce very large write amplification 10X to 20X when you have tens GB 
of data per OSD. So you can always see very bad sequential write performance 
(~200MB/s for a 12SSD setup), we can share more details on the performance 
meeting.

To this end,  key-value backend with LevelDB is not useable for RBD usage, but 
maybe workable(not tested) in the LOSF cases ( tons of small objects stored via 
rados , k-v backend can prevent the FS metadata become the bottleneck)

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Haomai Wang
Sent: Monday, December 1, 2014 9:48 PM
To: Satoru Funai
Cc: ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: Re: [ceph-users] LevelDB support status is still experimental on Giant?

Yeah, mainly used by test env.

On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai 
satoru.fu...@gmail.commailto:satoru.fu...@gmail.com wrote:
Hi guys,
I'm interested in to use key/value store as a backend of Ceph OSD.
When firefly release, LevelDB support is mentioned as experimental,
is it same status on Giant release?
Regards,

Satoru Funai
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

Best Regards,

Wheat



--

Best Regards,

Wheat

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] LevelDB support status is still experimental on Giant?

2014-12-01 Thread Satoru Funai
Hi Xiaoxi, 
Thanks for very useful information. 
Can you share more details about Terrible bad performance is compare against 
what? and what kind of usage pattern? 
I'm just interested in key/value backend for more cost/performance without 
expensive HW such as ssd/fusion io. 
Regards, 
Satoru Funai 
- 元のメッセージ -

 差出人: Xiaoxi Chen xiaoxi.c...@intel.com
 宛先: Haomai Wang haomaiw...@gmail.com
 Cc: Satoru Funai satoru.fu...@gmail.com, ceph-us...@ceph.com
 送信済み: 2014年12月1日, 月曜日 午後 11:26:56
 件名: RE: [ceph-users] LevelDB support status is still experimental on
 Giant?

 Range query is not that important in nowadays SSDyou can see very
 high read random read IOPS in ssd spec, and getting higher day by
 day.The key problem here is trying to exactly matching one
 query(get/put) to one SSD IO(read/write), eliminate the read/write
 amplification. We kind of believe OpenNvmKV may be the right
 approach.

 Back to the context of Ceph, can we find some use case of nowadays
 key-value backend? We would like to learn from community what’s the
 workload pattern if you wants a K-V backed Ceph? Or just have a try?
 I think before we get a suitable DB backend ,we had better off to
 optimize the key-value backend code to support specified kind of
 load.

 From: Haomai Wang [mailto:haomaiw...@gmail.com]
 Sent: Monday, December 1, 2014 10:14 PM
 To: Chen, Xiaoxi
 Cc: Satoru Funai; ceph-us...@ceph.com
 Subject: Re: [ceph-users] LevelDB support status is still
 experimental on Giant?

 Exactly, I'm just looking forward a better DB backend suitable for
 KeyValueStore. It maybe traditional B-tree design.

 Kinetic original I think it was a good backend, but it doesn't
 support range query :-(

 On Mon, Dec 1, 2014 at 10:04 PM, Chen, Xiaoxi  xiaoxi.c...@intel.com
  wrote:
  We have tested it for a while, basically it seems kind of stable
  but
  show terrible bad performance.
 

  This is not the fault of Ceph , but levelDB, or more generally, all
  K-V storage with LSM design(RocksDB,etc), the LSM tree structure
  naturally introduce very large write amplification 10X to 20X
  when you have tens GB of data per OSD. So you can always see very
  bad sequential write performance (~200MB/s for a 12SSD setup), we
  can share more details on the performance meeting.
 

  To this end, key-value backend with LevelDB is not useable for RBD
  usage, but maybe workable(not tested) in the LOSF cases ( tons of
  small objects stored via rados , k-v backend can prevent the FS
  metadata become the bottleneck)
 

  From: ceph-users [mailto: ceph-users-boun...@lists.ceph.com ] On
  Behalf Of Haomai Wang
 
  Sent: Monday, December 1, 2014 9:48 PM
 
  To: Satoru Funai
 
  Cc: ceph-us...@ceph.com
 
  Subject: Re: [ceph-users] LevelDB support status is still
  experimental on Giant?
 

  Yeah, mainly used by test env.
 

  On Mon, Dec 1, 2014 at 6:29 PM, Satoru Funai 
  satoru.fu...@gmail.com
   wrote:
 
   Hi guys,
  
 
   I'm interested in to use key/value store as a backend of Ceph
   OSD.
  
 
   When firefly release, LevelDB support is mentioned as
   experimental,
  
 
   is it same status on Giant release?
  
 
   Regards,
  
 

   Satoru Funai
  
 
   ___
  
 
   ceph-users mailing list
  
 
   ceph-users@lists.ceph.com
  
 
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
  --
 

  Best Regards,
 
  Wheat
 
 --

 Best Regards,
 Wheat___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com