Re: [ceph-users] radosgw leaking objects

2017-03-31 Thread Marius Vaitiekunas
On Fri, Mar 31, 2017 at 11:15 AM, Luis Periquito 
wrote:

> But wasn't that what orphans finish was supposed to do?
>
>
orphans finish only removes search results from a log pool.


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Marius Vaitiekunas
On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

> I can't guarantee it's the same as my issue, but from that it sounds the
> same.
>
> Jewel 10.2.4, 10.2.5 tested
> hypervisors are proxmox qemu-kvm, using librbd
> 3 ceph nodes with mon+osd on each
>
> -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
> and bw limits on client side, jumbo frames, etc. all improve/smooth out
> performance and mitigate the hangs, but don't prevent it.
> -hangs are usually associated with blocked requests (I set the complaint
> time to 5s to see them)
> -hangs are very easily caused by rbd snapshot + rbd export-diff to do
> incremental backup (one snap persistent, plus one more during backup)
> -when qemu VM io hangs, I have to kill -9 the qemu process for it to
> stop. Some broken VMs don't appear to be hung until I try to live
> migrate them (live migrating all VMs helped test solutions)
>
> Finally I have a workaround... disable exclusive-lock, object-map, and
> fast-diff rbd features (and restart clients via live migrate).
> (object-map and fast-diff appear to have no effect on dif or export-diff
> ... so I don't miss them). I'll file a bug at some point (after I move
> all VMs back and see if it is still stable). And one other user on IRC
> said this solved the same problem (also using rbd snapshots).
>
> And strangely, they don't seem to hang if I put back those features,
> until a few days later (making testing much less easy...but now I'm very
> sure removing them prevents the issue)
>
> I hope this works for you (and maybe gets some attention from devs too),
> so you don't waste months like me.
>
> On 03/27/17 19:31, Hall, Eric wrote:
> > In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel),
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and
> ceph hosts, we occasionally see hung processes (usually during boot, but
> otherwise as well), with errors reported in the instance logs as shown
> below.  Configuration is vanilla, based on openstack/ceph docs.
> >
> > Neither the compute hosts nor the ceph hosts appear to be overloaded in
> terms of memory or network bandwidth, none of the 67 osds are over 80%
> full, nor do any of them appear to be overwhelmed in terms of IO.  Compute
> hosts and ceph cluster are connected via a relatively quiet 1Gb network,
> with an IBoE net between the ceph nodes.  Neither network appears
> overloaded.
> >
> > I don’t see any related (to my eye) errors in client or server logs,
> even with 20/20 logging from various components (rbd, rados, client,
> objectcacher, etc.)  I’ve increased the qemu file descriptor limit
> (currently 64k... overkill for sure.)
> >
> > I “feels” like a performance problem, but I can’t find any capacity
> issues or constraining bottlenecks.
> >
> > Any suggestions or insights into this situation are appreciated.  Thank
> you for your time,
> > --
> > Eric
> >
> >
> > [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more
> than 120 seconds.
> > [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> > [Fri Mar 24 20:30:40 2017] "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> > [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0
>  226  2 0x
> > [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046
> 88042690 88003728bfd8
> > [Fri Mar 24 20:30:40 2017]  00013180 00013180
> 88042690 88043fd13a18
> > [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002
> 811ef7c0 88003728bc50
> > [Fri Mar 24 20:30:40 2017] Call Trace:
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> > [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> > [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> generic_block_bmap+0x50/0x50
> > [Fri Mar 24 20:30:40 2017]  []
> out_of_line_wait_on_bit+0x77/0x90
> > [Fri Mar 24 20:30:40 2017]  [] ?
> autoremove_wake_function+0x40/0x40
> > [Fri Mar 24 20:30:40 2017]  []
> __wait_on_buffer+0x2a/0x30
> > [Fri Mar 24 20:30:40 2017]  [] jbd2_journal_commit_
> transaction+0x185d/0x1ab0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> try_to_del_timer_sync+0x4f/0x70
> > [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> > [Fri Mar 24 20:30:40 2017]  [] ?
> prepare_to_wait_event+0x100/0x100
> > [Fri Mar 24 20:30:40 2017]  [] ?
> commit_timeout+0x10/0x10
> > [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> > [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> > [Fri Mar 24 20:30:40 2017]  [] ?
> kthread_create_on_node+0x1c0/0x1c0
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-us

Re: [ceph-users] rgw multisite resync only one bucket

2017-02-28 Thread Marius Vaitiekunas
On Wed, Mar 1, 2017 at 9:06 AM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

>
>
> On Mon, Feb 27, 2017 at 11:40 AM, Marius Vaitiekunas <
> mariusvaitieku...@gmail.com> wrote:
>
>>
>>
>> On Mon, Feb 27, 2017 at 9:59 AM, Marius Vaitiekunas <
>> mariusvaitieku...@gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Feb 24, 2017 at 6:35 PM, Yehuda Sadeh-Weinraub <
>>> yeh...@redhat.com> wrote:
>>>
>>>> On Fri, Feb 24, 2017 at 3:59 AM, Marius Vaitiekunas
>>>>  wrote:
>>>> >
>>>> >
>>>> > On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub <
>>>> yeh...@redhat.com>
>>>> > wrote:
>>>> >>
>>>> >> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
>>>> >>  wrote:
>>>> >> > Hi Cephers,
>>>> >> >
>>>> >> > We are testing rgw multisite solution between to DC. We have one
>>>> >> > zonegroup
>>>> >> > and to zones. At the moment all writes/deletes are done only to
>>>> primary
>>>> >> > zone.
>>>> >> >
>>>> >> > Sometimes not all the objects are replicated.. We've written
>>>> prometheus
>>>> >> > exporter to check replication status. It gives us each bucket
>>>> object
>>>> >> > count
>>>> >> > from user perspective, because we have millions of objects and
>>>> hundreds
>>>> >> > of
>>>> >> > buckets. We just want to be sure, that everything is replicated
>>>> without
>>>> >> > using ceph internals like rgw admin api for now.
>>>> >> >
>>>> >> > Is it possible to initiate full resync of only one rgw bucket from
>>>> >> > master
>>>> >> > zone? What are the options about resync when things go wrong and
>>>> >> > replication
>>>> >> > misses some objects?
>>>> >> >
>>>> >> > We run latest jewel 10.2.5.
>>>> >>
>>>> >>
>>>> >> There's the 'radosgw-admin bucket sync init' command that you can run
>>>> >> on the specific bucket on the target zone. This will reinitialize the
>>>> >> sync state, so that when it starts syncing it will go through the
>>>> >> whole full sync process. Note that it shouldn't actually copy data
>>>> >> that already exists on the target. Also, in order to actually start
>>>> >> the sync, you'll need to have some change that would trigger the sync
>>>> >> on that bucket, e.g., create a new object there.
>>>> >>
>>>> >> Yehuda
>>>> >>
>>>> >
>>>> > Hi,
>>>> >
>>>> > I've tried to resync a bucket, but it didn't manage to resync a
>>>> missing
>>>> > object. If I try to copy missing object by hand into secondary zone,
>>>> i get
>>>> > asked to overwrite existing object.. It looks like the object is
>>>> replicated,
>>>> > but is not in a bucket index. I've tried to check bucket index with
>>>> --fix
>>>> > and --check-objects flags, but nothing changes. What else should i
>>>> try?
>>>> >
>>>>
>>>> That's weird. Do you see anything when you run 'radosgw-admin bi list
>>>> --bucket='?
>>>>
>>>> Yehuda
>>>>
>>>
>>> 'radosgw-admin bi list --bucket=' gives me an error:
>>> 2017-02-27 08:55:30.861659 7f20c15779c0  0 error in read_id for id  :
>>> (2) No such file or directory
>>> 2017-02-27 08:55:30.861991 7f20c15779c0  0 error in read_id for id  :
>>> (2) No such file or directory
>>> ERROR: bi_list(): (5) Input/output error
>>>
>>> 'radosgw-admin bucket list --bucket=' successfully list all the
>>> files except missing ones.
>>>
>>>
>>>
>>>
>>
>> I've done some more investigation. These missing objects could be found
>> in "rgw.buckets.data" pool, but bucket index is not aware about them.
>> How does 'radosgw-admin bucket check -b  --fix --check-objects'
>> works?
>> I guess that it's not scanning "rgw.buckets.data" pool for "leaked"
>> objects? These unreplicated objects looks for me the same like leaked ones
>> :)
>>
>>
>>
> By the way in rgw logs I can find all the missing files with http 304
> return code. For example:
> "GET /go84/WRWRDGROWKFKROTWKHXXIBHERRLHBK HTTP/1.1" 304 0 - -
>
> All the gateways in both sites are behind haproxies. Any ideas?
>
>
>
>
>
Sorry.. Don't pay attention to my last message. Wrong 'grep'.

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw multisite resync only one bucket

2017-02-28 Thread Marius Vaitiekunas
On Mon, Feb 27, 2017 at 11:40 AM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

>
>
> On Mon, Feb 27, 2017 at 9:59 AM, Marius Vaitiekunas <
> mariusvaitieku...@gmail.com> wrote:
>
>>
>>
>> On Fri, Feb 24, 2017 at 6:35 PM, Yehuda Sadeh-Weinraub > > wrote:
>>
>>> On Fri, Feb 24, 2017 at 3:59 AM, Marius Vaitiekunas
>>>  wrote:
>>> >
>>> >
>>> > On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub <
>>> yeh...@redhat.com>
>>> > wrote:
>>> >>
>>> >> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
>>> >>  wrote:
>>> >> > Hi Cephers,
>>> >> >
>>> >> > We are testing rgw multisite solution between to DC. We have one
>>> >> > zonegroup
>>> >> > and to zones. At the moment all writes/deletes are done only to
>>> primary
>>> >> > zone.
>>> >> >
>>> >> > Sometimes not all the objects are replicated.. We've written
>>> prometheus
>>> >> > exporter to check replication status. It gives us each bucket object
>>> >> > count
>>> >> > from user perspective, because we have millions of objects and
>>> hundreds
>>> >> > of
>>> >> > buckets. We just want to be sure, that everything is replicated
>>> without
>>> >> > using ceph internals like rgw admin api for now.
>>> >> >
>>> >> > Is it possible to initiate full resync of only one rgw bucket from
>>> >> > master
>>> >> > zone? What are the options about resync when things go wrong and
>>> >> > replication
>>> >> > misses some objects?
>>> >> >
>>> >> > We run latest jewel 10.2.5.
>>> >>
>>> >>
>>> >> There's the 'radosgw-admin bucket sync init' command that you can run
>>> >> on the specific bucket on the target zone. This will reinitialize the
>>> >> sync state, so that when it starts syncing it will go through the
>>> >> whole full sync process. Note that it shouldn't actually copy data
>>> >> that already exists on the target. Also, in order to actually start
>>> >> the sync, you'll need to have some change that would trigger the sync
>>> >> on that bucket, e.g., create a new object there.
>>> >>
>>> >> Yehuda
>>> >>
>>> >
>>> > Hi,
>>> >
>>> > I've tried to resync a bucket, but it didn't manage to resync a missing
>>> > object. If I try to copy missing object by hand into secondary zone, i
>>> get
>>> > asked to overwrite existing object.. It looks like the object is
>>> replicated,
>>> > but is not in a bucket index. I've tried to check bucket index with
>>> --fix
>>> > and --check-objects flags, but nothing changes. What else should i try?
>>> >
>>>
>>> That's weird. Do you see anything when you run 'radosgw-admin bi list
>>> --bucket='?
>>>
>>> Yehuda
>>>
>>
>> 'radosgw-admin bi list --bucket=' gives me an error:
>> 2017-02-27 08:55:30.861659 7f20c15779c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-02-27 08:55:30.861991 7f20c15779c0  0 error in read_id for id  : (2)
>> No such file or directory
>> ERROR: bi_list(): (5) Input/output error
>>
>> 'radosgw-admin bucket list --bucket=' successfully list all the
>> files except missing ones.
>>
>>
>>
>>
>
> I've done some more investigation. These missing objects could be found in
> "rgw.buckets.data" pool, but bucket index is not aware about them.
> How does 'radosgw-admin bucket check -b  --fix --check-objects'
> works?
> I guess that it's not scanning "rgw.buckets.data" pool for "leaked"
> objects? These unreplicated objects looks for me the same like leaked ones
> :)
>
>
>
By the way in rgw logs I can find all the missing files with http 304
return code. For example:
"GET /go84/WRWRDGROWKFKROTWKHXXIBHERRLHBK HTTP/1.1" 304 0 - -

All the gateways in both sites are behind haproxies. Any ideas?


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw multisite resync only one bucket

2017-02-27 Thread Marius Vaitiekunas
On Mon, Feb 27, 2017 at 9:59 AM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

>
>
> On Fri, Feb 24, 2017 at 6:35 PM, Yehuda Sadeh-Weinraub 
> wrote:
>
>> On Fri, Feb 24, 2017 at 3:59 AM, Marius Vaitiekunas
>>  wrote:
>> >
>> >
>> > On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub <
>> yeh...@redhat.com>
>> > wrote:
>> >>
>> >> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
>> >>  wrote:
>> >> > Hi Cephers,
>> >> >
>> >> > We are testing rgw multisite solution between to DC. We have one
>> >> > zonegroup
>> >> > and to zones. At the moment all writes/deletes are done only to
>> primary
>> >> > zone.
>> >> >
>> >> > Sometimes not all the objects are replicated.. We've written
>> prometheus
>> >> > exporter to check replication status. It gives us each bucket object
>> >> > count
>> >> > from user perspective, because we have millions of objects and
>> hundreds
>> >> > of
>> >> > buckets. We just want to be sure, that everything is replicated
>> without
>> >> > using ceph internals like rgw admin api for now.
>> >> >
>> >> > Is it possible to initiate full resync of only one rgw bucket from
>> >> > master
>> >> > zone? What are the options about resync when things go wrong and
>> >> > replication
>> >> > misses some objects?
>> >> >
>> >> > We run latest jewel 10.2.5.
>> >>
>> >>
>> >> There's the 'radosgw-admin bucket sync init' command that you can run
>> >> on the specific bucket on the target zone. This will reinitialize the
>> >> sync state, so that when it starts syncing it will go through the
>> >> whole full sync process. Note that it shouldn't actually copy data
>> >> that already exists on the target. Also, in order to actually start
>> >> the sync, you'll need to have some change that would trigger the sync
>> >> on that bucket, e.g., create a new object there.
>> >>
>> >> Yehuda
>> >>
>> >
>> > Hi,
>> >
>> > I've tried to resync a bucket, but it didn't manage to resync a missing
>> > object. If I try to copy missing object by hand into secondary zone, i
>> get
>> > asked to overwrite existing object.. It looks like the object is
>> replicated,
>> > but is not in a bucket index. I've tried to check bucket index with
>> --fix
>> > and --check-objects flags, but nothing changes. What else should i try?
>> >
>>
>> That's weird. Do you see anything when you run 'radosgw-admin bi list
>> --bucket='?
>>
>> Yehuda
>>
>
> 'radosgw-admin bi list --bucket=' gives me an error:
> 2017-02-27 08:55:30.861659 7f20c15779c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-02-27 08:55:30.861991 7f20c15779c0  0 error in read_id for id  : (2)
> No such file or directory
> ERROR: bi_list(): (5) Input/output error
>
> 'radosgw-admin bucket list --bucket=' successfully list all the
> files except missing ones.
>
> --
> Marius Vaitiekūnas
>


I've done some more investigation. These missing objects could be found in
"rgw.buckets.data" pool, but bucket index is not aware about them.
How does 'radosgw-admin bucket check -b  --fix --check-objects'
works?
I guess that it's not scanning "rgw.buckets.data" pool for "leaked"
objects? These unreplicated objects looks for me the same like leaked ones
:)

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw multisite resync only one bucket

2017-02-27 Thread Marius Vaitiekunas
On Fri, Feb 24, 2017 at 6:35 PM, Yehuda Sadeh-Weinraub 
wrote:

> On Fri, Feb 24, 2017 at 3:59 AM, Marius Vaitiekunas
>  wrote:
> >
> >
> > On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub <
> yeh...@redhat.com>
> > wrote:
> >>
> >> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
> >>  wrote:
> >> > Hi Cephers,
> >> >
> >> > We are testing rgw multisite solution between to DC. We have one
> >> > zonegroup
> >> > and to zones. At the moment all writes/deletes are done only to
> primary
> >> > zone.
> >> >
> >> > Sometimes not all the objects are replicated.. We've written
> prometheus
> >> > exporter to check replication status. It gives us each bucket object
> >> > count
> >> > from user perspective, because we have millions of objects and
> hundreds
> >> > of
> >> > buckets. We just want to be sure, that everything is replicated
> without
> >> > using ceph internals like rgw admin api for now.
> >> >
> >> > Is it possible to initiate full resync of only one rgw bucket from
> >> > master
> >> > zone? What are the options about resync when things go wrong and
> >> > replication
> >> > misses some objects?
> >> >
> >> > We run latest jewel 10.2.5.
> >>
> >>
> >> There's the 'radosgw-admin bucket sync init' command that you can run
> >> on the specific bucket on the target zone. This will reinitialize the
> >> sync state, so that when it starts syncing it will go through the
> >> whole full sync process. Note that it shouldn't actually copy data
> >> that already exists on the target. Also, in order to actually start
> >> the sync, you'll need to have some change that would trigger the sync
> >> on that bucket, e.g., create a new object there.
> >>
> >> Yehuda
> >>
> >
> > Hi,
> >
> > I've tried to resync a bucket, but it didn't manage to resync a missing
> > object. If I try to copy missing object by hand into secondary zone, i
> get
> > asked to overwrite existing object.. It looks like the object is
> replicated,
> > but is not in a bucket index. I've tried to check bucket index with --fix
> > and --check-objects flags, but nothing changes. What else should i try?
> >
>
> That's weird. Do you see anything when you run 'radosgw-admin bi list
> --bucket='?
>
> Yehuda
>

'radosgw-admin bi list --bucket=' gives me an error:
2017-02-27 08:55:30.861659 7f20c15779c0  0 error in read_id for id  : (2)
No such file or directory
2017-02-27 08:55:30.861991 7f20c15779c0  0 error in read_id for id  : (2)
No such file or directory
ERROR: bi_list(): (5) Input/output error

'radosgw-admin bucket list --bucket=' successfully list all the
files except missing ones.

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw multisite resync only one bucket

2017-02-24 Thread Marius Vaitiekunas
On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub 
wrote:

> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
>  wrote:
> > Hi Cephers,
> >
> > We are testing rgw multisite solution between to DC. We have one
> zonegroup
> > and to zones. At the moment all writes/deletes are done only to primary
> > zone.
> >
> > Sometimes not all the objects are replicated.. We've written prometheus
> > exporter to check replication status. It gives us each bucket object
> count
> > from user perspective, because we have millions of objects and hundreds
> of
> > buckets. We just want to be sure, that everything is replicated without
> > using ceph internals like rgw admin api for now.
> >
> > Is it possible to initiate full resync of only one rgw bucket from master
> > zone? What are the options about resync when things go wrong and
> replication
> > misses some objects?
> >
> > We run latest jewel 10.2.5.
>
>
> There's the 'radosgw-admin bucket sync init' command that you can run
> on the specific bucket on the target zone. This will reinitialize the
> sync state, so that when it starts syncing it will go through the
> whole full sync process. Note that it shouldn't actually copy data
> that already exists on the target. Also, in order to actually start
> the sync, you'll need to have some change that would trigger the sync
> on that bucket, e.g., create a new object there.
>
> Yehuda
>
>
Hi,

I've tried to resync a bucket, but it didn't manage to resync a missing
object. If I try to copy missing object by hand into secondary zone, i get
asked to overwrite existing object.. It looks like the object is
replicated, but is not in a bucket index. I've tried to check bucket index
with --fix and --check-objects flags, but nothing changes. What else should
i try?

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-admin bucket check kills SSD disks

2017-02-23 Thread Marius Vaitiekunas
On Wed, Feb 22, 2017 at 4:06 PM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

> Hi Cephers,
>
> We are running latest jewel (10.2.5). Bucket index sharding is set to 8.
> rgw pools except data are placed on SSD.
> Today I've done some testing and run bucket index check on a bucket with
> ~120k objects:
>
> # radosgw-admin bucket check -b mybucket --fix --check-objects
> --rgw-realm=myrealm
>
> In a minute or two three SSD disks were down and flapping. My guess is
> that these disks host a PG with index of this bucket.
>
> Should we expect that with --check-objects flag and do not use it?
>
> --
> Marius Vaitiekūnas
>

Hi,

In case, somebody also hits the issue :) Actually bucket index shard in our
cluster is 1. We didn't know about the following setting:
'rgw_override_bucket_index_max_shards'. We though, that only
'bucket_index_max_shards' is enough.
Keep in mind, that you need to update existing zonegroups to make sharding
work after correct settings are set..

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw multisite resync only one bucket

2017-02-22 Thread Marius Vaitiekunas
On Wed, Feb 22, 2017 at 8:33 PM, Yehuda Sadeh-Weinraub 
wrote:

> On Wed, Feb 22, 2017 at 6:19 AM, Marius Vaitiekunas
>  wrote:
> > Hi Cephers,
> >
> > We are testing rgw multisite solution between to DC. We have one
> zonegroup
> > and to zones. At the moment all writes/deletes are done only to primary
> > zone.
> >
> > Sometimes not all the objects are replicated.. We've written prometheus
> > exporter to check replication status. It gives us each bucket object
> count
> > from user perspective, because we have millions of objects and hundreds
> of
> > buckets. We just want to be sure, that everything is replicated without
> > using ceph internals like rgw admin api for now.
> >
> > Is it possible to initiate full resync of only one rgw bucket from master
> > zone? What are the options about resync when things go wrong and
> replication
> > misses some objects?
> >
> > We run latest jewel 10.2.5.
>
>
> There's the 'radosgw-admin bucket sync init' command that you can run
> on the specific bucket on the target zone. This will reinitialize the
> sync state, so that when it starts syncing it will go through the
> whole full sync process. Note that it shouldn't actually copy data
> that already exists on the target. Also, in order to actually start
> the sync, you'll need to have some change that would trigger the sync
> on that bucket, e.g., create a new object there.
>
> Yehuda
>
>

Great! I guess 'radosgw-admin data sync init' initiates whole zone data
sync,  'radosgw-admin metadata sync init' metadata sync on some trigger?
And when should we use 'radosgw-admin data/metadata sync run' ?

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-admin bucket check kills SSD disks

2017-02-22 Thread Marius Vaitiekunas
On Wed, Feb 22, 2017 at 4:06 PM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

> Hi Cephers,
>
> We are running latest jewel (10.2.5). Bucket index sharding is set to 8.
> rgw pools except data are placed on SSD.
> Today I've done some testing and run bucket index check on a bucket with
> ~120k objects:
>
> # radosgw-admin bucket check -b mybucket --fix --check-objects
> --rgw-realm=myrealm
>
> In a minute or two three SSD disks were down and flapping. My guess is
> that these disks host a PG with index of this bucket.
>
> Should we expect that with --check-objects flag and do not use it?
>
> --
> Marius Vaitiekūnas
>

By the way i've got the same situation without --check-objects flag:

# radosgw-admin bucket check -b mybucket --fix --rgw-realm=myrealm

Any hints, what's wrong? :)
-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw multisite resync only one bucket

2017-02-22 Thread Marius Vaitiekunas
Hi Cephers,

We are testing rgw multisite solution between to DC. We have one zonegroup
and to zones. At the moment all writes/deletes are done only to primary
zone.

Sometimes not all the objects are replicated.. We've written prometheus
exporter to check replication status. It gives us each bucket object count
from user perspective, because we have millions of objects and hundreds of
buckets. We just want to be sure, that everything is replicated without
using ceph internals like rgw admin api for now.

Is it possible to initiate full resync of only one rgw bucket from master
zone? What are the options about resync when things go wrong and
replication misses some objects?

We run latest jewel 10.2.5.


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw-admin bucket check kills SSD disks

2017-02-22 Thread Marius Vaitiekunas
Hi Cephers,

We are running latest jewel (10.2.5). Bucket index sharding is set to 8.
rgw pools except data are placed on SSD.
Today I've done some testing and run bucket index check on a bucket with
~120k objects:

# radosgw-admin bucket check -b mybucket --fix --check-objects
--rgw-realm=myrealm

In a minute or two three SSD disks were down and flapping. My guess is that
these disks host a PG with index of this bucket.

Should we expect that with --check-objects flag and do not use it?

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Monitoring

2017-01-16 Thread Marius Vaitiekunas
On Mon, Jan 16, 2017 at 3:54 PM, Andre Forigato 
wrote:

> Hello Marius Vaitiekunas, Chris Jones,
>
> Thank you for your contributions.
> I was looking for this information.
>
> I'm starting to use Ceph, and my concern is about monitoring.
>
> Do you have any scripts for this monitoring?
> If you can help me. I will be very grateful to you.
>
> (Excuse me if there is misinterpretation)
>
> Best Regards,
> André Forigato
>
>
Try prometheus exporter for monitoring:
https://github.com/digitalocean/ceph_exporter

And inkscope is a nice tool for management tasks -
https://github.com/inkscope/inkscope


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Monitoring

2017-01-15 Thread Marius Vaitiekunas
On Fri, 13 Jan 2017 at 22:15, Chris Jones  wrote:

> General question/survey:
>
> Those that have larger clusters, how are you doing alerting/monitoring?
> Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about
> collectd related but more on initial alerts of an issue or potential issue?
> What threshold do you use basically? Just trying to get a pulse of what
> others are doing.
>
> Thanks in advance.
>
> --
> Best Regards,
> Chris Jones
> ​Bloomberg​
>
> Hi,
>
> We monitor for 'low iops'. The number differs on our clusters. For example
> if we have only 3000 iops per second, there is something wrong going on.
>
> Another good check is for s3 api. We try to read an object from s3 api
> every 30 seconds.
>
> Also we have many checks like more than 10% osds are down, pg inactive,
> cluster has degradated capacity and similiar. Some of these checks are not
> critical and we get only emails.
>
> One more important thing is disk latency monitoring. We've had huge
> slowdowns on our cluster when journalling ssd disks wear out. It's quite
> hard to understand what's going on, because all osds are up and running,
> but cluster is not performing at all.
>
> Network.errors on interfaces could be important. We had some issues, when
> physical cable was mulfunctioning and cluster had many blocks.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw swift api long term support

2017-01-10 Thread Marius Vaitiekunas
Hi,

I would like to ask ceph developers if there any chance that swift api
support for rgw is going to be dropped in the future (like in 5 years).

Why am I asking? :)

We were happy openstack glance users on ceph s3 api until openstack decided
to drop glance s3 support.. So, we need to switch our image backend. Swift
api on ceph looks quite good solution.

Thank's in advance!

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw leaking data, orphan search loop

2016-12-26 Thread Marius Vaitiekunas
On Sat, Dec 24, 2016 at 2:47 PM, Wido den Hollander  wrote:

>
> > Op 23 december 2016 om 16:05 schreef Wido den Hollander :
> >
> >
> >
> > > Op 22 december 2016 om 19:00 schreef Orit Wasserman <
> owass...@redhat.com>:
> > >
> > >
> > > HI Maruis,
> > >
> > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas
> > >  wrote:
> > > > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> 1) I've written before into mailing list, but one more time. We
> have big
> > > >> issues recently with rgw on jewel. because of leaked data - the
> rate is
> > > >> about 50GB/hour.
> > > >>
> > > >> We've hitted these bugs:
> > > >> rgw: fix put_acls for objects starting and ending with underscore
> > > >> (issue#17625, pr#11669, Orit Wasserman)
> > > >>
> > > >> Upgraded to jewel 10.2.5 - no luck.
> > > >>
> > > >> Also we've hitted this one:
> > > >> rgw: RGW loses realm/period/zonegroup/zone data: period overwritten
> if
> > > >> somewhere in the cluster is still running Hammer (issue#17371,
> pr#11519,
> > > >> Orit Wasserman)
> > > >>
> > > >> Fixed zonemaps - also no luck.
> > > >>
> > > >> We do not use multisite - only default realm, zonegroup, zone.
> > > >>
> > > >> We have no more ideas, how these data leak could happen. gc is
> working -
> > > >> we can see it in rgw logs.
> > > >>
> > > >> Maybe, someone could give any hint about this? Where should we look?
> > > >>
> > > >>
> > > >> 2) Another story is about removing all the leaked/orphan objects.
> > > >> radosgw-admin orphans find enters the loop state on stage when it
> starts
> > > >> linking objects.
> > > >>
> > > >> We've tried to change the number of shards to 16, 64 (default),
> 512. At
> > > >> the moment it's running with shards number 1.
> > > >>
> > > >> Again, any ideas how to make orphan search happen?
> > > >>
> > > >>
> > > >> I could provide any logs, configs, etc. if someone is ready to help
> on
> > > >> this case.
> > > >>
> > > >>
> > >
> > > How many buckets do you have ? how many object in each?
> > > Can you provide the output of rados ls -p .rgw.buckets ?
> >
> > Marius asked me to look into this for him, so I did.
> >
> > What I found is that at *least* three buckets have way more RADOS
> objects then they should.
> >
> > The .rgw.buckets pool has 35.651.590 objects totaling 76880G.
> >
> > I listed all objects in the .rgw.buckets pool and summed them per
> bucket, the top 5:
> >
> >  783844 default.25918901.102486
> >  876013 default.25918901.3
> > 3325825 default.24201682.7
> > 6324217 default.84795862.29891
> > 7805208 default.25933378.233873
> >
> > So I started to rados_stat() (using Python) all the objects in the last
> three pools. While these stat() calls are still running. I statted about
> 30% of the objects and their total size is already 17511GB/17TB.
> >
> > size_kb_actual summed up for bucket default.24201682.7,
> default.84795862.29891 and default.25933378.233873 sums up to 12TB.
> >
> > So I'm currently at 30% of statting the objects and I'm already 5TB over
> the total size of these buckets.
> >
>
> The stat calls have finished. The grant total is 65TB.
>
> So while the buckets should consume only 12TB they seems to occupy 65TB of
> storage.
>

All these leaking buckets have on thing in common - hadoop S3A client (
https://wiki.apache.org/hadoop/AmazonS3) is used. And some of the objects
have long names with many underscores For example:
dt=20160814-060014-911/_temporary/0/_temporary/attempt_201608140600_0001_m_03_339/part-3.gz
dt=20160814-083014-948/_temporary/0/_temporary/attempt_201608140830_0001_m_06_294/part-6.gz


>
> > What I noticed is that it's mainly *shadow* objects which are all 4MB in
> size.
> >
> > I know that 'radosgw-admin orphans find --pool=.rgw.buckets
> --job-id=xyz' should also do this for me, but as mentioned, this keeps
> looping and hangs.
> >
>
> I started this tool about 2

Re: [ceph-users] rgw leaking data, orphan search loop

2016-12-22 Thread Marius Vaitiekunas
On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

> Hi,
>
> 1) I've written before into mailing list, but one more time. We have big
> issues recently with rgw on jewel. because of leaked data - the rate is
> about 50GB/hour.
>
> We've hitted these bugs:
> rgw: fix put_acls for objects starting and ending with underscore (
> issue#17625 <http://tracker.ceph.com/issues/17625>, pr#11669
> <http://github.com/ceph/ceph/pull/11669>, Orit Wasserman)
>
> Upgraded to jewel 10.2.5 - no luck.
>
> Also we've hitted this one:
> rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if
> somewhere in the cluster is still running Hammer (issue#17371
> <http://tracker.ceph.com/issues/17371>, pr#11519
> <http://github.com/ceph/ceph/pull/11519>, Orit Wasserman)
>
> Fixed zonemaps - also no luck.
>
> We do not use multisite - only default realm, zonegroup, zone.
>
> We have no more ideas, how these data leak could happen. gc is working -
> we can see it in rgw logs.
>
> Maybe, someone could give any hint about this? Where should we look?
>
>
> 2) Another story is about removing all the leaked/orphan objects.
> radosgw-admin orphans find enters the loop state on stage when it starts
> linking objects.
>
> We've tried to change the number of shards to 16, 64 (default), 512. At
> the moment it's running with shards number 1.
>
> Again, any ideas how to make orphan search happen?
>
>
> I could provide any logs, configs, etc. if someone is ready to help on
> this case.
>
>
>
Sorry. I forgot to mention, that we've registered two issues on tracker:
http://tracker.ceph.com/issues/18331
http://tracker.ceph.com/issues/18258

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw leaking data, orphan search loop

2016-12-22 Thread Marius Vaitiekunas
Hi,

1) I've written before into mailing list, but one more time. We have big
issues recently with rgw on jewel. because of leaked data - the rate is
about 50GB/hour.

We've hitted these bugs:
rgw: fix put_acls for objects starting and ending with underscore (
issue#17625 , pr#11669
, Orit Wasserman)

Upgraded to jewel 10.2.5 - no luck.

Also we've hitted this one:
rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if
somewhere in the cluster is still running Hammer (issue#17371
, pr#11519
, Orit Wasserman)

Fixed zonemaps - also no luck.

We do not use multisite - only default realm, zonegroup, zone.

We have no more ideas, how these data leak could happen. gc is working - we
can see it in rgw logs.

Maybe, someone could give any hint about this? Where should we look?


2) Another story is about removing all the leaked/orphan objects.
radosgw-admin orphans find enters the loop state on stage when it starts
linking objects.

We've tried to change the number of shards to 16, 64 (default), 512. At the
moment it's running with shards number 1.

Again, any ideas how to make orphan search happen?


I could provide any logs, configs, etc. if someone is ready to help on this
case.

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How radosgw works with .rgw pools?

2016-12-20 Thread Marius Vaitiekunas
On Tue, Dec 20, 2016 at 3:18 PM, Marius Vaitiekunas <
mariusvaitieku...@gmail.com> wrote:

> Hi Cephers,
>
> Could anybody explain, how rgw works with pools? I don't understand
> how .rgw.control, .rgw.gc,,  .rgw.buckets.index pools could be 0 size, but
> also have some objects?
>
> # ceph df detail
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED OBJECTS
> 507T  190T 316T 62.49  42285k
> POOLS:
> NAME   ID CATEGORY QUOTA OBJECTS
> QUOTA BYTES USED   %USED MAX AVAIL OBJECTS  DIRTY
>  READ   WRITE  RAW USED
> .rgw.control   14 -N/A
> N/A  0 013605G8  3
>  0  00
> .rgw.gc16 -N/A
> N/A  0 013605G  512492
>   365M   449M0
> .rgw.buckets.index 20 -N/A
> N/A  0 013605G  977977
>  1593M   743M0
>
>
> If I try to download objects from a .rgw.gc pool with rados -p .rgw.gc get
> ${object} ${object} , all the objects are zero size. So, how gc process
> know what should be deleted?
>
>
> Thanks!
>
>
We found an article, how rgw works with pools:
https://allthenodes.wordpress.com/2016/01/29/how-indexes-work-in-ceph-rados-gateway/

Just in case someone else needs it :)

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How radosgw works with .rgw pools?

2016-12-20 Thread Marius Vaitiekunas
Hi Cephers,

Could anybody explain, how rgw works with pools? I don't understand
how .rgw.control, .rgw.gc,,  .rgw.buckets.index pools could be 0 size, but
also have some objects?

# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
507T  190T 316T 62.49  42285k
POOLS:
NAME   ID CATEGORY QUOTA OBJECTS
QUOTA BYTES USED   %USED MAX AVAIL OBJECTS  DIRTY
 READ   WRITE  RAW USED
.rgw.control   14 -N/A
N/A  0 013605G8  3
 0  00
.rgw.gc16 -N/A
N/A  0 013605G  512492
  365M   449M0
.rgw.buckets.index 20 -N/A
N/A  0 013605G  977977
 1593M   743M0


If I try to download objects from a .rgw.gc pool with rados -p .rgw.gc get
${object} ${object} , all the objects are zero size. So, how gc process
know what should be deleted?


Thanks!


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Loop in radosgw-admin orphan find

2016-12-14 Thread Marius Vaitiekunas
Hello,

We have the same loop in our jobs in 2 clusters. Only one difference is
that our cluster don't use erasure coding. The same cluster version -
10.2.2. Any ideas, what could be wrong?
Maybe, we need to upgrade? :)

BR,

On Thu, Oct 13, 2016 at 6:15 PM, Yoann Moulin  wrote:

> Hello,
>
> I run a cluster in jewel 10.2.2, I have deleted the last Bucket of a
> radosGW pool to delete this pool and recreate it in EC (was replicate)
>
> Detail of the pool :
>
> > pool 36 'erasure.rgw.buckets.data' replicated size 3 min_size 2
> crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change
> 31459 flags hashpspool stripe_width 0
>
> > POOLS:
> >NAME  ID USED   %USED MAX AVAIL
>OBJECTS
> >erasure.rgw.buckets.data  36 11838M 075013G
>4735
>
> After the GC, I found lots of orphan objects still remain in the pool :
>
> > $ rados ls -p erasure.rgw.buckets.data  | egrep -c "(multipart|shadow)"
> > 4735
> > $ rados ls -p erasure.rgw.buckets.data  | grep -c multipart
> > 2368
> > $ rados ls -p erasure.rgw.buckets.data  | grep -c shadow
> > 2367
>
> example :
>
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> multipart_CC-MAIN-2016-40/segments/1474738660158.61/
> warc/CC-MAIN-20160924173740-00147-ip-10-143-35-109.ec2.internal.warc.gz.2~
> WezpbEQW1C9nskvtnyAteCVoO3D255Q.29
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> multipart_CC-MAIN-2016-40/segments/1474738660158.61/
> warc/CC-MAIN-20160924173740-00147-ip-10-143-35-109.ec2.internal.warc.gz.2~
> WezpbEQW1C9nskvtnyAteCVoO3D255Q.61
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> shadow_segments/1466783398869.97/wet/CC-MAIN-20160624154958-
> 00194-ip-10-164-35-72.ec2.internal.warc.wet.gz.2~7ru9WPCLMf9Lpi__
> TP1NXuYwjSU7KQK.11_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> shadow_crawl-data/CC-MAIN-2016-26/segments/1466783396147.66/wet/CC-MAIN-
> 20160624154956-00071-ip-10-164-35-72.ec2.internal.warc.wet.gz.2~
> 7bKg6WEmNo23IQ6rd8oWF_vbaG0QAFR.6_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> shadow_segments/1466783398516.82/wet/CC-MAIN-20160624154958-
> 00172-ip-10-164-35-72.ec2.internal.warc.wet.gz.2~ap5QynCJTco_L7yK6bn4M_
> bnHBbBe64.14_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> multipart_CC-MAIN-2016-40/segments/1474738662400.75/
> warc/CC-MAIN-20160924173742-00076-ip-10-143-35-109.ec2.internal.warc.gz.2~
> LEM4bpbbdiTu86rs3Ew_LFNN_oHg_m7.13
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__shadow_CC-MAIN-2016-40/
> segments/1474738662400.75/warc/CC-MAIN-20160924173742-
> 00033-ip-10-143-35-109.ec2.internal.warc.gz.2~
> FrN02NmencyDwXavvuzwqR8M8WnWNbH.8_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__multipart_segments/
> 1466783395560.14/wet/CC-MAIN-20160624154955-00118-ip-10-
> 164-35-72.ec2.internal.warc.wet.gz.2~GqyEUdSepIxGwPOXfKLSxtS8miWGASe.3
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__multipart_segments/
> 1466783395346.6/wet/CC-MAIN-20160624154955-00083-ip-10-
> 164-35-72.ec2.internal.warc.wet.gz.2~cTQ86ZEmOvxYD4BUI7zW37X-JcJeMgW.19
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> multipart_CC-MAIN-2016-40/segments/1474738660158.61/
> warc/CC-MAIN-20160924173740-00147-ip-10-143-35-109.ec2.internal.warc.gz.2~
> WezpbEQW1C9nskvtnyAteCVoO3D255Q.62
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__shadow_CC-MAIN-2016-40/
> segments/1474738662400.75/warc/CC-MAIN-20160924173742-
> 00259-ip-10-143-35-109.ec2.internal.warc.gz.2~1b-
> olF9koids0gqT9DsO0y1vAsTOasf.12_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__shadow_CC-MAIN-2016-40/
> segments/1474738660338.16/warc/CC-MAIN-20160924173740-
> 00067-ip-10-143-35-109.ec2.internal.warc.gz.2~
> JxuX8v0DmsSgAr3iprPBoHx6PoTKRi6.19_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__multipart_segments/
> 1466783397864.87/wet/CC-MAIN-20160624154957-00110-ip-10-
> 164-35-72.ec2.internal.warc.wet.gz.2~q2_hY5oSoBWaSZgxh0NdK8JvxmEySPB.29
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> shadow_segments/1466783396949.33/wet/CC-MAIN-20160624154956-
> 0-ip-10-164-35-72.ec2.internal.warc.wet.gz.2~
> kUInFVpsWy23JFm9eWNPiFNKlXrjDQU.18_1
> > c9724aff-5fa0-4dd9-b494-57bdb48fab4e.1371134.1__
> multipart_CC-MAIN-2016-40/segments/1474738662400.75/
> warc/CC-MAIN-20160924173742-00076-ip-10-143-35-109.ec2.internal.warc.gz.2~
> LEM4bpbbdiTu86rs3Ew_LFNN_oHg_m7.36
>
> firstly, Can I delete the pool even if there is orphan object in ? Should
> I delete other metadata (index, data_extra) pools related to this pool
> defined in the zone ? is there other data I should clean to be sure to no
> have side effect by removing those objects by deleting the pool
> instead of deleting them with radosgw-admin orphan ?
>
> for now, I have followed this doc to find and delete them :
>
> https://access.redhat.com/documentation/en/red-hat-ceph-
> storage/1.3/single/object-gateway-guide-for-ubuntu/#finding_orphan_objects
>
> I have ran this command :
>
> > radosgw-admin --c

[ceph-users] radosgw leaked orphan objects

2016-12-02 Thread Marius Vaitiekunas
Hi Cephers,

I would like to ask more about this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1254398

On our backup cluster we've a search of leaked objects:
# radosgw-admin orphans find --pool=.rgw.buckets --job-id=bck1

The result is 131288. Before running radosgw-admin orphans finish, I would
like to know other cephers experience. Have anybody tried to delete leaked
objects? How did it go?

Maybe, Yehuda as an author could bring some confidence about this script.
Because our production cluster has 35TB of data which probably is leaked.
We've counted the usage in all of our buckets and compared to rgw buckets
pool usage. The pool is 60TB of size and all the buckets takes only 25TB.
We would like to get these 35TB back :)

How safe is to run leaked objects deletion? Any horrible stories?

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW quota

2016-03-19 Thread Marius Vaitiekunas
On Wednesday, 16 March 2016, Derek Yarnell  wrote:

> Hi,
>
> We have a user with a 50GB quota and has now a single bucket with 20GB
> of files.  They had previous buckets created and removed but the quota
> has not decreased.  I understand that we do garbage collection but it
> has been significantly longer than the defaults that we have not
> overridden.  They get 403 QuotaExceeded when trying to write additional
> data to a new bucket or the existing bucket.
>
> # radosgw-admin user info --uid=username
> ...
> "user_quota": {
> "enabled": true,
> "max_size_kb": 52428800,
> "max_objects": -1
> },
>
> # radosgw-admin bucket stats --bucket=start
> ...
> "usage": {
> "rgw.main": {
> "size_kb": 21516505,
> "size_kb_actual": 21516992,
> "num_objects": 243
> }
> },
>
> # radosgw-admin user stats --uid=username
> ...
> {
> "stats": {
> "total_entries": 737,
> "total_bytes": 55060794604,
> "total_bytes_rounded": 55062102016
> },
> "last_stats_sync": "2016-03-16 14:16:25.205060Z",
> "last_stats_update": "2016-03-16 14:16:25.190605Z"
> }
>
> Thanks,
> derek
>
> --
> Derek T. Yarnell
> University of Maryland
> Institute for Advanced Computer Studies
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

 Hi,
It's possible that somebody changed the owner of some bucket. But all
objects in that bucket still belongs to this user. That way you can get
quota exceeded. We had the same situation.

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Delete a bucket with 14 millions objects

2016-01-28 Thread Marius Vaitiekunas
Hi,

Anybody could give a hint how to delete a bucket with lots of files (about
14 millions)? I've unsuccessfully tried:
# radosgw-admin bucket rm --bucket=big-bucket --purge-objects
--yes-i-really-mean-it


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-27 Thread Marius Vaitiekunas
Hi,

With ceph rbd you should use raw image format. As i know qcow2 is not
supported.

On Thu, Jan 28, 2016 at 6:21 AM, Bill WONG  wrote:

> Hi Simon,
>
> i have installed ceph package into the compute node, but it looks qcow2
> format is unable to create.. it show error with : Could not write qcow2
> header: Invalid argument
>
> ---
> qemu-img create -f qcow2 rbd:storage1/CentOS7-3 10G
> Formatting 'rbd:storage1/CentOS7-3', fmt=qcow2 size=10737418240
> encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> qemu-img: rbd:storage1/CentOS7-3: Could not write qcow2 header: Invalid
> argument
> ---
>
> any ideas?
> thank you!
>
> On Thu, Jan 28, 2016 at 1:01 AM, Simon Ironside 
> wrote:
>
>> On 27/01/16 16:51, Bill WONG wrote:
>>
>> i have ceph cluster and KVM in different machine the qemu-kvm
>>> (CentOS7) is dedicated compute node installed with qemu-kvm + libvirtd
>>> only, there should be no /etc/ceph/ceph.conf
>>>
>>
>> Likewise, my compute nodes are separate machines from the OSDs/monitors
>> but the compute nodes still have the ceph package installed and
>> /etc/ceph/ceph.conf present. They just aren't running any ceph daemons.
>>
>> I give the compute nodes their own ceph key with write access to the pool
>> for VM storage and read access to the monitors. I can then use ceph status,
>> rbd create, qemu-img etc directly on the compute nodes.
>>
>> Cheers,
>> Simon.
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] raid0 and ceph?

2015-11-12 Thread Marius Vaitiekunas
>> We have write cache enabled on raid0. Everything is good while it works,
but
>> we had one strange incident with cluster. Looks like SSD disk failed and
>> linux didn't remove it from the system. All data disks which are using
this
>> SSD for journaling started to flap (up/down). Cluster performance dropped
>> down terribly. We managed to replace SSD and everything was back to
normal.

> What was the failing drive actually giving Ceph?  EIO errors?  Was it
> still readable in terms of listing partitions etc?  Was the ceph-osd
> process flapping (something restarting it?) or just the mon's idea of
> whether it was up or down?

There were some EIO errors in dmesg. ceph-disk list was struggling to list
allt the disks, but I was able to list partitions with parted command.
As I understood ssd disk was not completely dead and random io was passing
to it. ceph-osd processes were running, but random io was failing, so the
mon marked it up/down.

>> Could it be related to raid0 usage or we encountered some other bug? We
>> haven't found anything similar on google. Any thoughts would be very
>> appreciated. Thanks in advance.

> You might find it interesting to follow up with whoever provides the
> RAID controller/software that you're using, to find out why drive
> failure was manifesting itself in some way other than the drive
> becoming fully inaccessible (which is pretty much what we expect iirc
> in order to properly have the OSD go away)

HP P420i raid controller didn't detect that ssd disk is failing, because it
was intel ssd, i guess..

I would like to ask about whole idea of using raid0. Am i understanding
correctly, that ceph-osd processes go down only when ssd journaling disk on
raid0 is completely dead?
What ceph-osd process is doing when random io to journaling is failing?
When it decides to go down?  Does somebody successfully uses raid0 for ceph
or is it very bad way to go? :)
We just need to know for future hardware design, how ceph works with raid0.
At the moment we have servers, which doesn't support HBA mode. So, we can
not easily rebuild on the same hardware.


On Wed, Nov 11, 2015 at 4:12 PM, John Spray  wrote:

> On Wed, Nov 11, 2015 at 9:54 AM, Marius Vaitiekunas
>  wrote:
> > Hi,
> >
> > We use firefly 0.80.9.
> >
> > We have some ceph nodes in our cluster configured to use raid0. The node
> > configuration looks like this:
> >
> > 2xHDD - RAID1 - /dev/sda  -  OS
> > 1xSSD - RAID0 - /dev/sdb  -  ceph journaling disk, usually one for four
> data
> > disks
> > 1xHDD - RAID0 - /dev/sdc  -  ceph data disk
> > 1xHDD - RAID0 - /dev/sdd  -  ceph data disk
> > 1xHDD - RAID0 - /dev/sde  -  ceph data disk
> > 1xHDD - RAID0 - /dev/sdf   -  ceph data disk
> > 
> >
> > We have write cache enabled on raid0. Everything is good while it works,
> but
> > we had one strange incident with cluster. Looks like SSD disk failed and
> > linux didn't remove it from the system. All data disks which are using
> this
> > SSD for journaling started to flap (up/down). Cluster performance dropped
> > down terribly. We managed to replace SSD and everything was back to
> normal.
>
> What was the failing drive actually giving Ceph?  EIO errors?  Was it
> still readable in terms of listing partitions etc?  Was the ceph-osd
> process flapping (something restarting it?) or just the mon's idea of
> whether it was up or down?
>
> > Could it be related to raid0 usage or we encountered some other bug? We
> > haven't found anything similar on google. Any thoughts would be very
> > appreciated. Thanks in advance.
>
> You might find it interesting to follow up with whoever provides the
> RAID controller/software that you're using, to find out why drive
> failure was manifesting itself in some way other than the drive
> becoming fully inaccessible (which is pretty much what we expect iirc
> in order to properly have the OSD go away)
>
> John
>



-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] raid0 and ceph?

2015-11-11 Thread Marius Vaitiekunas
Hi,

We use firefly 0.80.9.

We have some ceph nodes in our cluster configured to use raid0. The node
configuration looks like this:

2xHDD - RAID1 - /dev/sda  -  OS
1xSSD - RAID0 - /dev/sdb  -  ceph journaling disk, usually one for four
data disks
1xHDD - RAID0 - /dev/sdc  -  ceph data disk
1xHDD - RAID0 - /dev/sdd  -  ceph data disk
1xHDD - RAID0 - /dev/sde  -  ceph data disk
1xHDD - RAID0 - /dev/sdf   -  ceph data disk


We have write cache enabled on raid0. Everything is good while it works,
but we had one strange incident with cluster. Looks like SSD disk failed
and linux didn't remove it from the system. All data disks which are using
this SSD for journaling started to flap (up/down). Cluster performance
dropped down terribly. We managed to replace SSD and everything was back to
normal.

Could it be related to raid0 usage or we encountered some other bug? We
haven't found anything similar on google. Any thoughts would be very
appreciated. Thanks in advance.





-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com