Re: [ceph-users] Large LOG like files on monitor

2015-10-08 Thread Christian Balzer

Hello,

On Thu, 8 Oct 2015 10:27:16 +0200 Erwin Lubbers wrote:

> Christian,
> 
> Still running Dumpling (I know I have to start upgrading). Cluster has
> 66 OSD’s and a total size close to 100 GB. Cluster is running for around
> 2 years now and the monitor server has an uptime of 258 days.
> 
> The LOG file is 1.2 GB in size and ls shows the current time for it. The
> LOG.OLD is 1.5 GB and ls tells me a date of Jan 23 2015 (which is 258
> before today).
> 
> lsof tells that the LOG is open, while the LOG.OLD isn’t.
> 
> So it don’t seems to rotate, without restarting the monitor.
> 
Yup, that sounds about right.
Firefly will auto-rotate frequently, keeping logs in the few MB range.

Guess deleting the old one, restarting the monitor and then again deleting
the old, former log is your best way forward.

This is of course assuming that you have 3 or 5 MONs up and running.

Christian

> Regards,
> Erwin
> 
> 
> > Op 8 okt. 2015, om 09:57 heeft Christian Balzer  het
> > volgende geschreven:
> > 
> > 
> > Hello,
> > 
> > On Thu, 8 Oct 2015 09:38:02 +0200 Erwin Lubbers wrote:
> > 
> >> Hi,
> >> 
> >> In the /var/lib/ceph/mon/ceph-l16-s01/store.db/ directory there are
> >> two very large files LOG and LOG.OLD (multiple GB's) and my diskspace
> >> is running low. Can I safely delete those files?
> >> 
> > That sounds odd, what version of Ceph are you running, something
> > pre-Firefly?
> > 
> > How long has this monitor been running, what size, how busy is your
> > cluster?
> > 
> > Those leveldb log files shouldn't get that big and getting
> > auto-rotated.
> > 
> > In general, if lsof shows that file being used (most likely if the
> > date of LOG is current) it obviously isn't safe to remove it. 
> > 
> > Christian 
> > 
> >> Regards,
> >> Erwin
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> > 
> > 
> > -- 
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-08 Thread Paweł Sadowski
On 10/07/2015 10:52 PM, Sage Weil wrote:
> On Wed, 7 Oct 2015, David Zafman wrote:
>> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after 
>> deep-scrub reads for objects not recently accessed by clients.
> Yeah, it's the 'except for stuff already in cache' part that we don't do 
> (and the kernel doesn't give us a good interface for).  IIRC there was a 
> patch that guessed based on whether the obc was already in cache, which 
> seems like a pretty decent heuristic, but I forget if that was in the 
> final version.

I've run some tests and it look like on XFS cache is discarded on
O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
I've found some patches to add support for "read only if in page cache"
(preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
Milosz Tanski can tell more about that. I think it could help a bit
during deep scrub.

>> I see the NewStore objectstore sometimes using the O_DIRECT  flag for writes.
>> This concerns me because the open(2) man pages says:
>>
>> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
>> and especially to overlapping byte regions in the same file.  Even when the
>> filesystem correctly handles the coherency issues in this situation, overall
>> I/O throughput is likely to be slower than using either mode alone."
> Yeah: an O_DIRECT write will do a cache flush on the write range, so if 
> there was already dirty data in cache you'll write twice.  There's 
> similarly an invalidate on read.  I need to go back through the newstore 
> code and see how the modes are being mixed and how it can be avoided...
>
> sage
>
>
>> On 10/7/15 7:50 AM, Sage Weil wrote:
>>> It's not, but it would not be ahrd to do this.  There are fadvise-style
>>> hints being passed down that could trigger O_DIRECT reads in this case.
>>> That may not be the best choice, though--it won't use data that happens
>>> to be in cache and it'll also throw it out..
>>>
>>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>>>
 Hi,

 Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
 not able to verify that in source code.

 If not would it be possible to add such feature (maybe config option) to
 help keeping Linux page cache in better shape?

 Thanks,

-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large LOG like files on monitor

2015-10-08 Thread Erwin Lubbers
Christian,

Still running Dumpling (I know I have to start upgrading). Cluster has 66 OSD’s 
and a total size close to 100 GB. Cluster is running for around 2 years now and 
the monitor server has an uptime of 258 days.

The LOG file is 1.2 GB in size and ls shows the current time for it. The 
LOG.OLD is 1.5 GB and ls tells me a date of Jan 23 2015 (which is 258 before 
today).

lsof tells that the LOG is open, while the LOG.OLD isn’t.

So it don’t seems to rotate, without restarting the monitor.

Regards,
Erwin


> Op 8 okt. 2015, om 09:57 heeft Christian Balzer  het volgende 
> geschreven:
> 
> 
> Hello,
> 
> On Thu, 8 Oct 2015 09:38:02 +0200 Erwin Lubbers wrote:
> 
>> Hi,
>> 
>> In the /var/lib/ceph/mon/ceph-l16-s01/store.db/ directory there are two
>> very large files LOG and LOG.OLD (multiple GB's) and my diskspace is
>> running low. Can I safely delete those files?
>> 
> That sounds odd, what version of Ceph are you running, something
> pre-Firefly?
> 
> How long has this monitor been running, what size, how busy is your
> cluster?
> 
> Those leveldb log files shouldn't get that big and getting auto-rotated.
> 
> In general, if lsof shows that file being used (most likely if the date of
> LOG is current) it obviously isn't safe to remove it. 
> 
> Christian 
> 
>> Regards,
>> Erwin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread Burkhard Linke

Hi,

I've moved all files from a CephFS data pool (EC pool with frontend 
cache tier) in order to remove the pool completely.


Some objects are left in the pools ('ceph df' output of the affected pools):

cephfs_ec_data   19  7565k 0 66288G   13

Listing the objects and the readable part of their 'parent' attribute:

# for obj in $(rados -p cephfs_ec_data ls); do echo $obj; rados -p 
cephfs_ec_data getxattr parent | strings; done

1f6119f.
1f6119f
stray9
1f63fe5.
1f6119f
stray9
1f61196.
1f6119f
stray9
...

The names are valid CephFS object names. But the parent attribute does 
not contain the path of file the object belongs to; instead the string 
'stray' is the only useful information (without dissecting the binary 
content of the parent attribute).


What are those objects and is it safe to remove the pool in this state?

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Large LOG like files on monitor

2015-10-08 Thread Erwin Lubbers
Hi,

In the /var/lib/ceph/mon/ceph-l16-s01/store.db/ directory there are two very 
large files LOG and LOG.OLD (multiple GB's) and my diskspace is running low. 
Can I safely delete those files?

Regards,
Erwin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] proxmox 4.0 release : lxc with krbd support and qemu librbd improvements

2015-10-08 Thread Irek Fasikhov
Hi, Alexandre.

Very Very Good!
Thank you for your work! :)

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-10-07 7:25 GMT+03:00 Alexandre DERUMIER :

> Hi,
>
> proxmox 4.0 has been released:
>
> http://forum.proxmox.com/threads/23780-Proxmox-VE-4-0-released!
>
>
> Some ceph improvements :
>
> - lxc containers with krbd support (multiple disks + snapshots)
> - qemu with jemalloc support (improve librbd performance)
> - qemu iothread option by disk (improve scaling rbd  with multiple disk)
> - librbd hammer version
>
> Regards,
>
> Alexandre
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] input / output error

2015-10-08 Thread gjprabu
Hi All, 



   We have CEPH  RBD with OCFS2 mounted servers. we are facing i/o errors 
simultaneously while move the data's in the same disk (Copying is not having 
any problem). Temporary we remount the partition and the issue get resolved but 
after sometime problem again reproduced. If anybody faced same issu, please 
help us.



Note : We have total 5 Nodes, here two nodes working fine other nodes are 
showing link below input/output error on moved data's.



ls -althr 

ls: cannot access MICKEYLITE_3_0_M4_1_TEST: Input/output error 

ls: cannot access MICKEYLITE_3_0_M4_1_OLD: Input/output error 

total 0 

d? ? ? ? ? ? MICKEYLITE_3_0_M4_1_TEST 

d? ? ? ? ? ? MICKEYLITE_3_0_M4_1_OLD 





Regards

Prabu









___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] leveldb compaction error

2015-10-08 Thread Selcuk TUNC
Hi Narendra,

we upgraded from (0.80.9)Firefly to Hammer.

On Thu, Oct 8, 2015 at 2:49 AM, Narendra Trivedi (natrived) <
natri...@cisco.com> wrote:

> Hi Selcuk,
>
>
>
> Which version of ceph did you upgrade from to Hammer (0.94)?
>
>
>
> --Narendra
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Selcuk TUNC
> *Sent:* Thursday, September 17, 2015 12:41 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] leveldb compaction error
>
>
>
> hello,
>
>
>
> we have noticed leveldb compaction on mount causes a segmentation fault in
> hammer release(0.94).
>
> It seems related to this pull request (github.com/ceph/ceph/pull/4372).
> Are you planning to backport
>
> this fix to next hammer release?
>
>
>
> --
>
> st
>



-- 
st
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Placement rule not resolved

2015-10-08 Thread ghislain.chevalier
HI all,

I didn't notice that osd reweight for ssd was curiously set to a low value.
I don't know how and when these values were set so low.
Our environment is Mirantis-driven and the installation was powered by fuel and 
puppet.
(the installation was run by the openstack team and I checked the ceph cluster 
configuration afterwards)

After reweighing them to 1, the ceph-cluster is working properly.
Thanks to object lookup module of inkscope, I checked that the osd allocation 
was ok.

What is not normal is that crush tried to allocated osd that are not targeted 
by the rule in that case sas disks instead of ssd disks?
Must the cluster normal behavior ,i.e. the pg allocation, be  to be frozen?
I can say that because if I analyze the stuck pgs (inkscope module) and noticed 
that osd allocation for these pgs were either not correct (acting list) or 
uncomplete.

Best regards

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
ghislain.cheval...@orange.com
Envoyé : mardi 6 octobre 2015 14:18
À : ceph-users
Objet : [ceph-users] Placement rule not resolved

Hi,

Context:
Firefly 0.80.9
8 storage nodes
176 osds : 14*8 sas and 8*8 ssd
3 monitors

I create an alternate crushmap in order to fulfill tiering requirement i.e. 
select ssd or sas.
I created specific buckets "host-ssd" and "host-sas" and regroup them in 
"tier-ssd" and "tier-sas" under a "tier-root"
E.g. I want to select 1 ssd in 3 distinct hosts

I don't understand why the placement rule for sas is working and not for ssd.
Sas are selected even if ,according to the crushmap,  they are not in the right 
tree.
When sometimes 3 ssd are selected, the pgs stay stuck but active

I attached the crushmap and ceph osd tree.

Can someone have a look and tell me where the default is?

Bgrds
- - - - - - - - - - - - - - - - -
Ghislain Chevalier
ORANGE/IMT/OLPS/ASE/DAPI/CSE
Architecte de services d'infrastructure de stockage
Sofware-Defined Storage Architect
+33299124432
+33788624370
ghislain.cheval...@orange.com
P Pensez à l'Environnement avant d'imprimer ce message !


_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to improve 'rbd ls [pool]' response time

2015-10-08 Thread WD_Hwang
Hi, all:
  If the Ceph cluster health status is HEALTH_OK, the execution time of 'sudo 
rbd ls rbd' is very short, like the following results.
$ time sudo rbd ls rbd
real0m0.096s
user0m0.014s
sys 0m0.028s

  But if there are several warnings (eg: 1 pgs degraded; 6 pgs incomplete; 1650 
pgs peering; 7 pgs stale;), the execution time of 'sudo rbd ls rbd' may take a 
long time.

Is there any way to improve the response time of 'rbd' commands?
Any help would be much appreciated.

Best Regards,
WD Hwang

---
This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient. 
Any unauthorized review, use, copying or distribution of this email or the 
content of this email is strictly prohibited.
If you are not the intended recipient, you may reply to the sender and should 
delete this e-mail immediately.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large LOG like files on monitor

2015-10-08 Thread Christian Balzer

Hello,

On Thu, 8 Oct 2015 09:38:02 +0200 Erwin Lubbers wrote:

> Hi,
> 
> In the /var/lib/ceph/mon/ceph-l16-s01/store.db/ directory there are two
> very large files LOG and LOG.OLD (multiple GB's) and my diskspace is
> running low. Can I safely delete those files?
> 
That sounds odd, what version of Ceph are you running, something
pre-Firefly?

How long has this monitor been running, what size, how busy is your
cluster?

Those leveldb log files shouldn't get that big and getting auto-rotated.

In general, if lsof shows that file being used (most likely if the date of
LOG is current) it obviously isn't safe to remove it. 

Christian 

> Regards,
> Erwin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] get user list via rados-rest: {code: 403, message: Forbidden}

2015-10-08 Thread Klaus Franken
Hi,

I’m trying to get a list of all users from the rados-rest-gateway analog to 
"radosgw-admin  metadata list user“.

I can retrieve a user info for a specified user from 
https://rgw01.XXX.de/admin/user?uid=klaus=json.
http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info say "If no 
user is specified returns the list of all users along with suspension 
information“.
But when using the same url without „uid=klaus“ I’m always get a {code: 403, 
message: Forbidden}. I tried to give the user all capabilities I found, but 
without success.

How can I get more debug messages (/var/log/ceph/radosgw.log wasn’t helpful 
even with a higher debug level)?
Or is that maybe a bug?


Successfull with uid=:
- request:
body: null
headers:
  Accept: ['*/*']
  Accept-Encoding: ['gzip, deflate']
  Authorization: ['AWS ']
  Connection: [keep-alive]
  User-Agent: [python-requests/2.7.0 CPython/3.4.2 Darwin/14.5.0]
  date: ['Thu, 08 Oct 2015 09:12:14 GMT']
method: GET
uri: https://rgw01.XXX.de/admin/user?uid=klaus=json
  response:
body: {string: '{"user_id":"klaus","display_name":"Klaus 
Franken","email":"","suspended":0,"max_buckets":1000,"subusers":[],"keys":[{"user":"klaus","access_key“:"","secret_key":"SpxxE\/"}],"swift_keys":[],"caps":[{"type":"buckets","perm":"*"},{"type":"metadata","perm":"*"},{"type":"usage","perm":"*"},{"type":"users","perm":"*"}]}'}
headers:
  Connection: [close]
  Content-Type: [application/json]
  Date: ['Thu, 08 Oct 2015 09:12:14 GMT']
  Server: [Apache]
status: {code: 200, message: OK}

403 withou uid=:
- request:
body: null
headers:
  Accept: ['*/*']
  Accept-Encoding: ['gzip, deflate']
  Authorization: ['AWS =']
  Connection: [keep-alive]
  User-Agent: [python-requests/2.7.0 CPython/3.4.2 Darwin/14.5.0]
  date: ['Thu, 08 Oct 2015 09:13:15 GMT']
method: GET
uri: https://rgw01.XXX.de/admin/user?format=json
  response:
body: {string: '{"Code":"AccessDenied"}'}
headers:
  Accept-Ranges: [bytes]
  Connection: [close]
  Content-Length: ['23']
  Content-Type: [application/json]
  Date: ['Thu, 08 Oct 2015 09:13:16 GMT']
  Server: [Apache]
status: {code: 403, message: Forbidden}
version: 1


Thank you,
Klaus


noris network AG - Thomas-Mann-Straße 16-20 - D-90471 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100

http://www.noris.de - The IT-Outsourcing Company

Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Potential OSD deadlock?

2015-10-08 Thread Dzianis Kahanovich
I have probably similar situation on latest hammer & 4.1+ kernels on spinning 
OSDs (journal - leased partition on same HDD): evential slow requests, etc. Try:

1) even on leased partition journal - "journal aio = false";
2) single-queue "noop" scheduler (OSDs);
3) reduce nr_requests to 32 (OSDs);
4) remove all other queue "tunes";
5) killall irqbalance (& any balancers exclude in-kernel NUMA auto-balancing);
6) net.ipv4.tcp_congestion_control = scalable
7) net.ipv4.tcp_notsent_lowat = 131072
8) vm.zone_reclaim_mode = 7

This is neutral, fairness settings.
If all fixed - play with other values (congestion "yeah", etc).

Also I put all active processes (ceph daemons & qemu) in single RR level ("chrt 
-par 3 $pid", etc).


Robert LeBlanc пишет:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We have had two situations where I/O just seems to be indefinitely
blocked on our production cluster today (0.94.3). In the case this
morning, it was just normal I/O traffic, no recovery or backfill. The
case this evening, we were backfilling to some new OSDs. I would have
loved to have bumped up the debugging to get an idea of what was going
on, but time was exhausted. The incident this evening I was able to do
some additional troubleshooting, but got real anxious after I/O had
been blocked for 10 minutes and OPs was getting hot around the collar.

Here are the important parts of the logs:
[osd.30]
2015-09-18 23:05:36.188251 7efed0ef0700  0 log_channel(cluster) log
[WRN] : slow request 30.662958 seconds old,
  received at 2015-09-18 23:05:05.525220: osd_op(client.3117179.0:18654441
  rbd_data.1099d2f67aaea.0f62 [set-alloc-hint object_size
8388608 write_size 8388608,write 1048576~643072] 4.5ba1672c
ack+ondisk+write+known_if_redirected e55919)
  currently waiting for subops from 32,70,72

[osd.72]
2015-09-18 23:05:19.302985 7f3fa19f8700  0 log_channel(cluster) log
[WRN] : slow request 30.200408 seconds old,
  received at 2015-09-18 23:04:49.102519: osd_op(client.4267090.0:3510311
  rbd_data.3f41d41bd65b28.9e2b [set-alloc-hint object_size
4194304 write_size 4194304,write 1048576~421888] 17.40adcada
ack+ondisk+write+known_if_redirected e55919)
  currently waiting for subops from 2,30,90

The other OSDs listed (32,70,2,90) did not have any errors in the logs
about blocked I/O. It seems that osd.30 was waiting for osd.72 and
visa versa. I looked at top and iostat of these two hosts and the OSD
processes and disk I/O were pretty idle.

I know that this isn't a lot to go on. Our cluster is under very heavy
load and we get several blocked I/Os every hour, but they usually
clear up within 15 seconds. We seem to get I/O blocked when the op
latency of the cluster goes above 1 (average from all OSDs as seen by
Graphite).

Has anyone seen this infinite blocked I/O? Bouncing osd.72 immediately
cleared all the blocked I/O and then it was fine after rejoining the
cluster. Increasing what logs and to what level would be most
beneficial in this case for troubleshooting?

I hope this makes sense, it has been a long day.

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV/QiuCRDmVDuy+mK58QAAfskP/A0+RRAtq49pwfJcmuaV
LKMsdaOFu0WL1zNLgnj4KOTR1oYyEShXW3Xn0axw1C2U2qXkJQfvMyQ7PTj7
cKqNeZl7rcgwkgXlij1hPYs9tjsetjYXBmmui+CqbSyNNo95aPrtUnWPcYnc
K7blP6wuv7p0ddaF8wgw3Jf0GhzlHyykvVlxLYjQWwBh1CTrSzNWcEiHz5NE
9Y/GU5VZn7o8jeJDh6tQGgSbUjdk4NM2WuhyWNEP1klV+x1P51krXYDR7cNC
DSWaud1hNtqYdquVPzx0UCcUVR0JfVlEX26uxRLgNd0dDkq+CRXIGhakVU75
Yxf8jwVdbAg1CpGtgHx6bWyho2rrsTzxeul8AFLWtELfod0e5nLsSUfQuQ2c
MXrIoyHUcs7ySP3ozazPOdxwBEpiovUZOBy1gl2sCSGvYsmYokHEO0eop2rl
kVS4dSAvDezmDhWumH60Y661uzySBGtrMlV/u3nw8vfvLhEAbuE+lLybMmtY
nJvJIzbTqFzxaeX4PTWcUhXRNaPp8PDS5obmx5Fpn+AYOeLet/S1Alz1qNM2
4w34JKwKO92PtDYqzA6cj628fltdLkxFNoz7DFfqxr80DM7ndLukmSkPY+Oq
qYOQMoownMnHuL0IrC9Jo8vK07H8agQyLF8/m4c3oTqnzZhh/rPRlPfyHEio
Roj5
=ut4B
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-08 Thread Lionel Bouton
Le 07/10/2015 13:44, Paweł Sadowski a écrit :
> Hi,
>
> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> not able to verify that in source code.
>
> If not would it be possible to add such feature (maybe config option) to
> help keeping Linux page cache in better shape?

Note : this would probably be even more useful with backfills when
inserting/replacing OSDs because they focus most of the IOs on these
OSDs (I recently posted that we got far better performance when
rebuilding OSDs if we selectively disabled the RAID card cache for them
for example).

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread John Spray
On Thu, Oct 8, 2015 at 11:41 AM, Burkhard Linke
 wrote:
> Hi John,
>
> On 10/08/2015 12:05 PM, John Spray wrote:
>>
>> On Thu, Oct 8, 2015 at 10:21 AM, Burkhard Linke
>>  wrote:
>>>
>>> Hi,
>
> *snipsnap*
>>>
>>>
>>> I've moved all files from a CephFS data pool (EC pool with frontend cache
>>> tier) in order to remove the pool completely.
>>>
>>> Some objects are left in the pools ('ceph df' output of the affected
>>> pools):
>>>
>>>  cephfs_ec_data   19  7565k 0 66288G   13
>>>
>>> Listing the objects and the readable part of their 'parent' attribute:
>>>
>>> # for obj in $(rados -p cephfs_ec_data ls); do echo $obj; rados -p
>>> cephfs_ec_data getxattr parent | strings; done
>>> 1f6119f.
>>> 1f6119f
>>> stray9
>>> 1f63fe5.
>>> 1f6119f
>>> stray9
>>> 1f61196.
>>> 1f6119f
>>> stray9
>>> ...
>
>
> *snipsnap*
>>
>>
>> Well, they're strays :-)
>>
>> You get stray dentries when you unlink files.  They hang around either
>> until the inode is ready to be purged, or if there are hard links then
>> they hang around until something prompts ceph to "reintegrate" the
>> stray into a new path.
>
> Thanks for the fast reply. During the transfer of all files from the EC pool
> to a standard replicated pool I've copied the file to a new file name,
> removed the orignal one and renamed the copy. There might have been some
> processed with open files at that time, which might explain the stray files
> objects.
>
> I've also been able to locate some processes that might be the reason for
> these leftover files. I've terminated these processes, but the objects are
> still present in the pool. How long does purging an inode usually take?

If nothing is holding a file open, it'll start purging within a couple
of journal-latencies of the unlink (i.e. pretty darn quick), and it'll
take as long to purge as there are objects in the file (again, pretty
darn quick for normal-sized files and a non-overloaded cluster).
Chances are if you're noticing strays, they're stuck for some reason.
You're probably on the right track looking for processes holding files
open.

>> You don't say what version you're running, so it's possible you're
>> running an older version (pre hammer, I think) where you're
>> experiencing either a bug holding up deletion (we've had a few) or a
>> bug preventing reintegration (we had one of those too).  The bugs
>> holding up deletion can usually be worked around with some client
>> and/or mds restarts.
>
> The cluster is running on hammer. I'm going to restart the mds to try to get
> rid of these objects.

OK, let us know how it goes.  You may find the num_strays,
num_strays_purging, num_strays_delayted performance counters (ceph
daemon mds. perf dump) useful.

>> It isn't safe to remove the pool in this state.  The MDS is likely to
>> crash if it eventually gets around to trying to purge these files.
>
> That's bad. Does the mds provide a way to get more information about these
> files, e.g. which client is blocking purging? We have about 3 hosts working
> on CephFS, and checking every process might be difficult.

If a client has caps on an inode, you can find out about it by dumping
(the whole!) cache from a running MDS.  We have tickets for adding a
more surgical version of this[1] but for now it's bit of a heavyweight
thing.  You can do JSON ("ceph daemon mds. dump cache > foo.json")
or plain text ("ceph daemon mds. dump cache foo.txt").  The latter
version is harder to parse but is less likely to eat all the memory on
your MDS (JSON output builds the whole thing in memory before writing
it)!

In the dump output, search for the inode number you're interested in,
and look for client caps.  Remember if search json output to look for
the decimal form of the inode, vs. the hex form in plan text output.
Resolve the client session ID in the caps to a meaningful name with
"ceph daemon mds. session ls", assuming the clients are recent
enough to report the hostnames.

You can also look at "ceph daemon mds. dump_ops_in_flight" to
check there are no (stuck) requests touching the inode.

John

1.  http://tracker.ceph.com/issues/11171,
http://tracker.ceph.com/issues/11172,
http://tracker.ceph.com/issues/11173
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread John Spray
On Thu, Oct 8, 2015 at 10:21 AM, Burkhard Linke
 wrote:
> Hi,
>
> I've moved all files from a CephFS data pool (EC pool with frontend cache
> tier) in order to remove the pool completely.
>
> Some objects are left in the pools ('ceph df' output of the affected pools):
>
> cephfs_ec_data   19  7565k 0 66288G   13
>
> Listing the objects and the readable part of their 'parent' attribute:
>
> # for obj in $(rados -p cephfs_ec_data ls); do echo $obj; rados -p
> cephfs_ec_data getxattr parent | strings; done
> 1f6119f.
> 1f6119f
> stray9
> 1f63fe5.
> 1f6119f
> stray9
> 1f61196.
> 1f6119f
> stray9
> ...
>
> The names are valid CephFS object names. But the parent attribute does not
> contain the path of file the object belongs to; instead the string 'stray'
> is the only useful information (without dissecting the binary content of the
> parent attribute).
>
> What are those objects and is it safe to remove the pool in this state?


Well, they're strays :-)

You get stray dentries when you unlink files.  They hang around either
until the inode is ready to be purged, or if there are hard links then
they hang around until something prompts ceph to "reintegrate" the
stray into a new path.

You don't say what version you're running, so it's possible you're
running an older version (pre hammer, I think) where you're
experiencing either a bug holding up deletion (we've had a few) or a
bug preventing reintegration (we had one of those too).  The bugs
holding up deletion can usually be worked around with some client
and/or mds restarts.

It isn't safe to remove the pool in this state.  The MDS is likely to
crash if it eventually gets around to trying to purge these files.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread Burkhard Linke

Hi John,

On 10/08/2015 12:05 PM, John Spray wrote:

On Thu, Oct 8, 2015 at 10:21 AM, Burkhard Linke
 wrote:

Hi,

*snipsnap*


I've moved all files from a CephFS data pool (EC pool with frontend cache
tier) in order to remove the pool completely.

Some objects are left in the pools ('ceph df' output of the affected pools):

 cephfs_ec_data   19  7565k 0 66288G   13

Listing the objects and the readable part of their 'parent' attribute:

# for obj in $(rados -p cephfs_ec_data ls); do echo $obj; rados -p
cephfs_ec_data getxattr parent | strings; done
1f6119f.
1f6119f
stray9
1f63fe5.
1f6119f
stray9
1f61196.
1f6119f
stray9
...


*snipsnap*


Well, they're strays :-)

You get stray dentries when you unlink files.  They hang around either
until the inode is ready to be purged, or if there are hard links then
they hang around until something prompts ceph to "reintegrate" the
stray into a new path.
Thanks for the fast reply. During the transfer of all files from the EC 
pool to a standard replicated pool I've copied the file to a new file 
name, removed the orignal one and renamed the copy. There might have 
been some processed with open files at that time, which might explain 
the stray files objects.


I've also been able to locate some processes that might be the reason 
for these leftover files. I've terminated these processes, but the 
objects are still present in the pool. How long does purging an inode 
usually take?


You don't say what version you're running, so it's possible you're
running an older version (pre hammer, I think) where you're
experiencing either a bug holding up deletion (we've had a few) or a
bug preventing reintegration (we had one of those too).  The bugs
holding up deletion can usually be worked around with some client
and/or mds restarts.
The cluster is running on hammer. I'm going to restart the mds to try to 
get rid of these objects.


It isn't safe to remove the pool in this state.  The MDS is likely to
crash if it eventually gets around to trying to purge these files.
That's bad. Does the mds provide a way to get more information about 
these files, e.g. which client is blocking purging? We have about 3 
hosts working on CephFS, and checking every process might be difficult.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread Burkhard Linke

Hi John,

On 10/08/2015 01:03 PM, John Spray wrote:

On Thu, Oct 8, 2015 at 11:41 AM, Burkhard Linke
 wrote:


*snipsnap*


Thanks for the fast reply. During the transfer of all files from the EC pool
to a standard replicated pool I've copied the file to a new file name,
removed the orignal one and renamed the copy. There might have been some
processed with open files at that time, which might explain the stray files
objects.

I've also been able to locate some processes that might be the reason for
these leftover files. I've terminated these processes, but the objects are
still present in the pool. How long does purging an inode usually take?

If nothing is holding a file open, it'll start purging within a couple
of journal-latencies of the unlink (i.e. pretty darn quick), and it'll
take as long to purge as there are objects in the file (again, pretty
darn quick for normal-sized files and a non-overloaded cluster).
Chances are if you're noticing strays, they're stuck for some reason.
You're probably on the right track looking for processes holding files
open.


You don't say what version you're running, so it's possible you're
running an older version (pre hammer, I think) where you're
experiencing either a bug holding up deletion (we've had a few) or a
bug preventing reintegration (we had one of those too).  The bugs
holding up deletion can usually be worked around with some client
and/or mds restarts.

The cluster is running on hammer. I'm going to restart the mds to try to get
rid of these objects.

OK, let us know how it goes.  You may find the num_strays,
num_strays_purging, num_strays_delayted performance counters (ceph
daemon mds. perf dump) useful.
The number of objects dropped to 7 after the mds restart. I was also 
able to identify the application the objects belong to (some where perl 
modules), but I've been unable to locate a running instance of this 
application. The main user of this application is also not aware of any 
running instance at the moment.

It isn't safe to remove the pool in this state.  The MDS is likely to
crash if it eventually gets around to trying to purge these files.

That's bad. Does the mds provide a way to get more information about these
files, e.g. which client is blocking purging? We have about 3 hosts working
on CephFS, and checking every process might be difficult.

If a client has caps on an inode, you can find out about it by dumping
(the whole!) cache from a running MDS.  We have tickets for adding a
more surgical version of this[1] but for now it's bit of a heavyweight
thing.  You can do JSON ("ceph daemon mds. dump cache > foo.json")
or plain text ("ceph daemon mds. dump cache foo.txt").  The latter
version is harder to parse but is less likely to eat all the memory on
your MDS (JSON output builds the whole thing in memory before writing
it)!
Hammer 0.94.3 does not support a 'dump cache' mds command. 
'dump_ops_in_flight' does not list any pending operations. Is there any 
other way to access the cache?


'perf dump' stray information (after mds restart):
"num_strays": 2327,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"strays_created": 33,
"strays_purged": 34,

The data pool is a combination of EC pool and cache tier. I've evicted 
the cache pool resulting in 128 objects left (one per PG? hitset 
information?). After restarting the MDS the number of objects increases 
by 7 objects (the ones left in the data pool). So either the MDS rejoin 
process promotes them back to the cache, or some ceph-fuse instance 
insists on reading them.



Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to improve 'rbd ls [pool]' response time

2015-10-08 Thread Wido den Hollander
On 10/08/2015 10:46 AM, wd_hw...@wistron.com wrote:
> Hi, all:
>   If the Ceph cluster health status is HEALTH_OK, the execution time of 'sudo 
> rbd ls rbd' is very short, like the following results.
> $ time sudo rbd ls rbd
> real0m0.096s
> user0m0.014s
> sys 0m0.028s
> 
>   But if there are several warnings (eg: 1 pgs degraded; 6 pgs incomplete; 
> 1650 pgs peering; 7 pgs stale;), the execution time of 'sudo rbd ls rbd' may 
> take a long time.
> 
> Is there any way to improve the response time of 'rbd' commands?

If you have incomplete PGs those can't serve any I/O. Same goes for PGs
which are in peering state.

That causes all RADOS operations to block. You'd have to resolve that
situation and that will make 'rbd ls' snappy again.

> Any help would be much appreciated.
> 
> Best Regards,
> WD Hwang
> 
> ---
> This email contains confidential or legally privileged information and is for 
> the sole use of its intended recipient. 
> Any unauthorized review, use, copying or distribution of this email or the 
> content of this email is strictly prohibited.
> If you are not the intended recipient, you may reply to the sender and should 
> delete this e-mail immediately.
> ---
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to improve 'rbd ls [pool]' response time

2015-10-08 Thread WD_Hwang
Hi Wido:
  According to your reply, if I add/remove OSDs from Ceph cluster, I have to 
wait all PGs moving action are completed.
  Then 'rbd ls' operation may works well.
  Is there any way to speed up PGs action of adding/removing OSDs ?

  Thanks a lot.

Best Regards,
WD

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
den Hollander
Sent: Thursday, October 08, 2015 10:06 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to improve 'rbd ls [pool]' response time

On 10/08/2015 10:46 AM, wd_hw...@wistron.com wrote:
> Hi, all:
>   If the Ceph cluster health status is HEALTH_OK, the execution time of 'sudo 
> rbd ls rbd' is very short, like the following results.
> $ time sudo rbd ls rbd
> real0m0.096s
> user0m0.014s
> sys 0m0.028s
> 
>   But if there are several warnings (eg: 1 pgs degraded; 6 pgs incomplete; 
> 1650 pgs peering; 7 pgs stale;), the execution time of 'sudo rbd ls rbd' may 
> take a long time.
> 
> Is there any way to improve the response time of 'rbd' commands?

If you have incomplete PGs those can't serve any I/O. Same goes for PGs which 
are in peering state.

That causes all RADOS operations to block. You'd have to resolve that situation 
and that will make 'rbd ls' snappy again.

> Any help would be much appreciated.
> 
> Best Regards,
> WD Hwang
> 
> --
> --
> --- This email contains confidential or legally 
> privileged information and is for the sole use of its intended recipient.
> Any unauthorized review, use, copying or distribution of this email or the 
> content of this email is strictly prohibited.
> If you are not the intended recipient, you may reply to the sender and should 
> delete this e-mail immediately.
> --
> --
> ---
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to improve 'rbd ls [pool]' response time

2015-10-08 Thread Wido den Hollander
On 10/08/2015 04:28 PM, wd_hw...@wistron.com wrote:
> Hi Wido:
>   According to your reply, if I add/remove OSDs from Ceph cluster, I have to 
> wait all PGs moving action are completed.
>   Then 'rbd ls' operation may works well.
>   Is there any way to speed up PGs action of adding/removing OSDs ?
> 

If completely depends on your cluster. Usually more CPU power improves
the peering performance, but it's not a guarantee.

Usually a peering operation shouldn't take very long.

Wido

>   Thanks a lot.
> 
> Best Regards,
> WD
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
> den Hollander
> Sent: Thursday, October 08, 2015 10:06 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] How to improve 'rbd ls [pool]' response time
> 
> On 10/08/2015 10:46 AM, wd_hw...@wistron.com wrote:
>> Hi, all:
>>   If the Ceph cluster health status is HEALTH_OK, the execution time of 
>> 'sudo rbd ls rbd' is very short, like the following results.
>> $ time sudo rbd ls rbd
>> real0m0.096s
>> user0m0.014s
>> sys 0m0.028s
>>
>>   But if there are several warnings (eg: 1 pgs degraded; 6 pgs incomplete; 
>> 1650 pgs peering; 7 pgs stale;), the execution time of 'sudo rbd ls rbd' may 
>> take a long time.
>>
>> Is there any way to improve the response time of 'rbd' commands?
> 
> If you have incomplete PGs those can't serve any I/O. Same goes for PGs which 
> are in peering state.
> 
> That causes all RADOS operations to block. You'd have to resolve that 
> situation and that will make 'rbd ls' snappy again.
> 
>> Any help would be much appreciated.
>>
>> Best Regards,
>> WD Hwang
>>
>> --
>> --
>> --- This email contains confidential or legally 
>> privileged information and is for the sole use of its intended recipient.
>> Any unauthorized review, use, copying or distribution of this email or the 
>> content of this email is strictly prohibited.
>> If you are not the intended recipient, you may reply to the sender and 
>> should delete this e-mail immediately.
>> --
>> --
>> ---
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy error

2015-10-08 Thread Ken Dreyer
This issue with the conflicts between Firefly and EPEL is tracked at
http://tracker.ceph.com/issues/11104

On Sun, Aug 30, 2015 at 4:11 PM, pavana bhat
 wrote:
> In case someone else runs into the same issue in future:
>
> I came out of this issue by installing epel-release before installing
> ceph-deploy. If the order of installation is ceph-deploy followed by
> epel-release, the issue is being hit.
>
> Thanks,
> Pavana
>
> On Sat, Aug 29, 2015 at 10:02 AM, pavana bhat 
> wrote:
>>
>> Hi,
>>
>> I'm trying to install ceph for the first time following the quick
>> installation guide. I'm getting the below error, can someone please help?
>>
>> ceph-deploy install --release=firefly ceph-vm-mon1
>>
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /home/cloud-user/.cephdeploy.conf
>>
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy install
>> --release=firefly ceph-vm-mon1
>>
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>>
>> [ceph_deploy.cli][INFO  ]  verbose   : False
>>
>> [ceph_deploy.cli][INFO  ]  testing   : None
>>
>> [ceph_deploy.cli][INFO  ]  cd_conf   :
>> 
>>
>> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>>
>> [ceph_deploy.cli][INFO  ]  install_mds   : False
>>
>> [ceph_deploy.cli][INFO  ]  stable: None
>>
>> [ceph_deploy.cli][INFO  ]  default_release   : False
>>
>> [ceph_deploy.cli][INFO  ]  username  : None
>>
>> [ceph_deploy.cli][INFO  ]  adjust_repos  : True
>>
>> [ceph_deploy.cli][INFO  ]  func  : > install at 0x7f34b410e938>
>>
>> [ceph_deploy.cli][INFO  ]  install_all   : False
>>
>> [ceph_deploy.cli][INFO  ]  repo  : False
>>
>> [ceph_deploy.cli][INFO  ]  host  :
>> ['ceph-vm-mon1']
>>
>> [ceph_deploy.cli][INFO  ]  install_rgw   : False
>>
>> [ceph_deploy.cli][INFO  ]  repo_url  : None
>>
>> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>>
>> [ceph_deploy.cli][INFO  ]  install_osd   : False
>>
>> [ceph_deploy.cli][INFO  ]  version_kind  : stable
>>
>> [ceph_deploy.cli][INFO  ]  install_common: False
>>
>> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>>
>> [ceph_deploy.cli][INFO  ]  quiet : False
>>
>> [ceph_deploy.cli][INFO  ]  dev   : master
>>
>> [ceph_deploy.cli][INFO  ]  local_mirror  : None
>>
>> [ceph_deploy.cli][INFO  ]  release   : firefly
>>
>> [ceph_deploy.cli][INFO  ]  install_mon   : False
>>
>> [ceph_deploy.cli][INFO  ]  gpg_url   : None
>>
>> [ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster
>> ceph hosts ceph-vm-mon1
>>
>> [ceph_deploy.install][DEBUG ] Detecting platform for host ceph-vm-mon1 ...
>>
>> [ceph-vm-mon1][DEBUG ] connection detected need for sudo
>>
>> [ceph-vm-mon1][DEBUG ] connected to host: ceph-vm-mon1
>>
>> [ceph-vm-mon1][DEBUG ] detect platform information from remote host
>>
>> [ceph-vm-mon1][DEBUG ] detect machine type
>>
>> [ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server
>> 7.1 Maipo
>>
>> [ceph-vm-mon1][INFO  ] installing Ceph on ceph-vm-mon1
>>
>> [ceph-vm-mon1][INFO  ] Running command: sudo yum clean all
>>
>> [ceph-vm-mon1][DEBUG ] Loaded plugins: fastestmirror, priorities
>>
>> [ceph-vm-mon1][DEBUG ] Cleaning repos: epel rhel-7-ha-rpms
>> rhel-7-optional-rpms rhel-7-server-rpms
>>
>> [ceph-vm-mon1][DEBUG ]   : rhel-7-supplemental-rpms
>>
>> [ceph-vm-mon1][DEBUG ] Cleaning up everything
>>
>> [ceph-vm-mon1][DEBUG ] Cleaning up list of fastest mirrors
>>
>> [ceph-vm-mon1][INFO  ] Running command: sudo yum -y install epel-release
>>
>> [ceph-vm-mon1][DEBUG ] Loaded plugins: fastestmirror, priorities
>>
>> [ceph-vm-mon1][DEBUG ] Determining fastest mirrors
>>
>> [ceph-vm-mon1][DEBUG ]  * epel: kdeforge2.unl.edu
>>
>> [ceph-vm-mon1][DEBUG ]  * rhel-7-ha-rpms:
>> rhel-repo.eu-biere-1.t-systems.cloud.cisco.com
>>
>> [ceph-vm-mon1][DEBUG ]  * rhel-7-optional-rpms:
>> rhel-repo.eu-biere-1.t-systems.cloud.cisco.com
>>
>> [ceph-vm-mon1][DEBUG ]  * rhel-7-server-rpms:
>> rhel-repo.eu-biere-1.t-systems.cloud.cisco.com
>>
>> [ceph-vm-mon1][DEBUG ]  * rhel-7-supplemental-rpms:
>> rhel-repo.eu-biere-1.t-systems.cloud.cisco.com
>>
>> [ceph-vm-mon1][DEBUG ] Package epel-release-7-5.noarch already installed
>> and latest version
>>
>> [ceph-vm-mon1][DEBUG ] Nothing to do
>>
>> [ceph-vm-mon1][INFO  ] Running command: sudo yum -y install
>> yum-plugin-priorities
>>
>> [ceph-vm-mon1][DEBUG ] Loaded plugins: fastestmirror, priorities
>>
>> [ceph-vm-mon1][DEBUG ] Loading mirror speeds from cached hostfile
>>
>> 

Re: [ceph-users] Annoying libust warning on ceph reload

2015-10-08 Thread Ken Dreyer
On Wed, Sep 30, 2015 at 7:46 PM, Goncalo Borges
 wrote:
> - Each time logrotate is executed, we received a daily notice with the
> message
>
> ibust[8241/8241]: Warning: HOME environment variable not set. Disabling
> LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)

Thanks for this detailed report!

Would you mind filing a new bug in tracker.ceph.com for this? It would
be nice to fix this in Ceph or LTTNG without having to set the HOME
env var.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS "corruption" -- Nulled bytes

2015-10-08 Thread Lincoln Bryant
Hi Sage,

Will this patch be in 0.94.4? We've got the same problem here.

-Lincoln

> On Oct 8, 2015, at 12:11 AM, Sage Weil  wrote:
> 
> On Wed, 7 Oct 2015, Adam Tygart wrote:
>> Does this patch fix files that have been corrupted in this manner?
> 
> Nope, it'll only prevent it from happening to new files (that haven't yet 
> been migrated between the cache and base tier).
> 
>> If not, or I guess even if it does, is there a way to walk the
>> metadata and data pools and find objects that are affected?
> 
> Hmm, this may actually do the trick.. find a file that appears to be 
> zeroed, and do truncate it up and then down again.  For example, of foo is 
> 100 bytes, do
> 
> truncate --size 101 foo
> truncate --size 100 foo
> 
> then unmount and remound the client and see if the content reappears.
> 
> Assuming that works (it did in my simple test) it'd be pretty easy to 
> write something that walks the tree and does the truncate trick for any 
> file whose first however many bytes are 0 (though it will mess up 
> mtime...).
> 
>> Is that '_' xattr in hammer? If so, how can I access it? Doing a
>> listxattr on the inode just lists 'parent', and doing the same on the
>> parent directory's inode simply lists 'parent'.
> 
> This is the file in /var/lib/ceph/osd/ceph-NNN/current.  For example,
> 
> $ attr -l ./3.0_head/100.__head_F0B56F30__3
> Attribute "cephos.spill_out" has a 2 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "cephos.seq" has a 23 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph._" has a 250 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph._@1" has a 5 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> Attribute "ceph.snapset" has a 31 byte value for 
> ./3.0_head/100.__head_F0B56F30__3
> 
> ...but hopefully you won't need to touch any of that ;)
> 
> sage
> 
> 
>> 
>> Thanks for your time.
>> 
>> --
>> Adam
>> 
>> 
>> On Mon, Oct 5, 2015 at 9:36 AM, Sage Weil  wrote:
>>> On Mon, 5 Oct 2015, Adam Tygart wrote:
 Okay, this has happened several more times. Always seems to be a small
 file that should be read-only (perhaps simultaneously) on many
 different clients. It is just through the cephfs interface that the
 files are corrupted, the objects in the cachepool and erasure coded
 pool are still correct. I am beginning to doubt these files are
 getting a truncation request.
>>> 
>>> This is still consistent with the #12551 bug.  The object data is correct,
>>> but the cephfs truncation metadata on the object is wrong, causing it to
>>> be implicitly zeroed out on read.  It's easily triggered by writers who
>>> use O_TRUNC on open...
>>> 
 Twice now have been different perl files, once was someones .bashrc,
 once was an input file for another application, timestamps on the
 files indicate that the files haven't been modified in weeks.
 
 Any other possibilites? Or any way to figure out what happened?
>>> 
>>> You can confirm by extracting the '_' xattr on the object (append any @1
>>> etc fragments) and feeding it to ceph-dencoder with
>>> 
>>> ceph-dencoder type object_info_t import  decode 
>>> dump_json
>>> 
>>> and confirming that truncate_seq is 0, and verifying that the truncate_seq
>>> on the read request is non-zero.. you'd need to turn up the osd logs with
>>> debug ms = 1 and look for the osd_op that looks like "read 0~$length
>>> [$truncate_seq@$truncate_size]" (with real values in there).
>>> 
>>> ...but it really sounds like you're hitting the bug.  Unfortunately
>>> the fix is not backported to hammer just yet.  You can follow
>>>http://tracker.ceph.com/issues/13034
>>> 
>>> sage
>>> 
>>> 
>>> 
 
 --
 Adam
 
 On Sun, Sep 27, 2015 at 10:44 PM, Adam Tygart  wrote:
> I've done some digging into cp and mv's semantics (from coreutils). If
> the inode is existing, the file will get truncated, then data will get
> copied in. This is definitely within the scope of the bug above.
> 
> --
> Adam
> 
> On Fri, Sep 25, 2015 at 8:08 PM, Adam Tygart  wrote:
>> It may have been. Although the timestamp on the file was almost a
>> month ago. The typical workflow for this particular file is to copy an
>> updated version overtop of it.
>> 
>> i.e. 'cp qss kstat'
>> 
>> I'm not sure if cp semantics would keep the same inode and simply
>> truncate/overwrite the contents, or if it would do an unlink and then
>> create a new file.
>> --
>> Adam
>> 
>> On Fri, Sep 25, 2015 at 8:00 PM, Ivo Jimenez  wrote:
>>> Looks like you might be experiencing this bug:
>>> 
>>>  http://tracker.ceph.com/issues/12551
>>> 
>>> Fix has been merged to master and I believe it'll be part of 
>>> infernalis. The

Re: [ceph-users] Annoying libust warning on ceph reload

2015-10-08 Thread Jason Dillaman
Somewhat related to this, I have a pending pull request to dynamically load 
LTTng-UST via your ceph.conf or via the admin socket [1].  While it won't solve 
this particular issue if you have manually enabled tracing, it will prevent 
these messages in the new default case where tracing isn't enabled.

[1] https://github.com/ceph/ceph/pull/6135

-- 

Jason Dillaman 


- Original Message -
> From: "Ken Dreyer" 
> To: "Goncalo Borges" 
> Cc: ceph-users@lists.ceph.com
> Sent: Thursday, October 8, 2015 11:58:27 AM
> Subject: Re: [ceph-users] Annoying libust warning on ceph reload
> 
> On Wed, Sep 30, 2015 at 7:46 PM, Goncalo Borges
>  wrote:
> > - Each time logrotate is executed, we received a daily notice with the
> > message
> >
> > ibust[8241/8241]: Warning: HOME environment variable not set. Disabling
> > LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
> 
> Thanks for this detailed report!
> 
> Would you mind filing a new bug in tracker.ceph.com for this? It would
> be nice to fix this in Ceph or LTTNG without having to set the HOME
> env var.
> 
> - Ken
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread John Spray
On Thu, Oct 8, 2015 at 7:23 PM, Gregory Farnum  wrote:
> On Thu, Oct 8, 2015 at 6:29 AM, Burkhard Linke
>  wrote:
>> Hammer 0.94.3 does not support a 'dump cache' mds command.
>> 'dump_ops_in_flight' does not list any pending operations. Is there any
>> other way to access the cache?
>
> "dumpcache", it looks like. You can get all the supported commands
> with "help" and look for things of interest or alternative phrasings.
> :)

To head off any confusion for someone trying to just replace dump
cache with dumpcache: "dump cache" is the new (post hammer,
apparently) admin socket command, dumpcache is the old tell command.
So it's "ceph mds tell  dumpcache ".

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS file to rados object mapping

2015-10-08 Thread Gregory Farnum
On Tue, Sep 29, 2015 at 7:24 AM, Andras Pataki
 wrote:
> Thanks, that makes a lot of sense.
> One more question about checksumming objects in rados.  Our cluster uses
> two copies per object, and I have some where the checkums mismatch between
> the two copies (that deep scrub warns about).  Does ceph store an
> authoritative checksum of what the block should look like?  I.e. Is there
> a way to tell which version of the block is correct?  I seem to recall
> some changelog entry that Hammer is adding checksum storage for blocks, or
> am I wrong?

There's no general authoritative checksumming yet. EC pools get
checksums, and replicated pools checksum the objects as part of deep
scrub. But maintaining a checksum when you do a partial overwrite
requires reading the whole object and updating the checksum, so we
don't do that.
NewStore is doing some stuff like this (but I'm not sure how much you
can count on) and there is "opportunistic checksumming" in the code
base, but that's really just a developer feature rather than something
users should be running.

So that means there's no automated way to guarantee the right copy of
an object when scrubbing. If you have 3+ copies I'd recommend checking
each of them and picking the one that's duplicated...
-Greg

>
> Andras
>
>
> On 9/29/15, 9:58 AM, "Gregory Farnum"  wrote:
>
>>The formula for objects in a file is .>sequence>. So you'll have noticed they all look something like
>>12345.0001, 12345.0002, 12345.0003, ...
>>
>>So if you've got a particular inode and file size, you can generate a
>>list of all the possible objects in it. To find the object->OSD
>>mapping you'd need to run crush, by making use of the crushtool or
>>similar.
>>-Greg
>>
>>On Tue, Sep 29, 2015 at 6:29 AM, Andras Pataki
>> wrote:
>>> Thanks, that worked.  Is there a mapping in the other direction easily
>>> available, I.e. To find where all the 4MB pieces of a file are?
>>>
>>> On 9/28/15, 4:56 PM, "John Spray"  wrote:
>>>
On Mon, Sep 28, 2015 at 9:46 PM, Andras Pataki
 wrote:
> Hi,
>
> Is there a way to find out which radios objects a file in cephfs is
>mapped
> to from the command line?  Or vice versa, which file a particular
>radios
> object belongs to?

The part of the object name before the period is the inode number (in
hex).

John

> Our ceph cluster has some inconsistencies/corruptions and I am trying
>to
> find out which files are impacted in cephfs.
>
> Thanks,
>
> Andras
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Peering algorithm questions

2015-10-08 Thread Gregory Farnum
On Tue, Sep 29, 2015 at 12:08 AM, Balázs Kossovics  wrote:
> Hey!
>
> I'm trying to understand the peering algorithm based on [1] and [2]. There
> are things that aren't really clear or I'm not entirely sure if I understood
> them correctly, so I'd like to ask some clarification on the points below:
>
> 1, Is it right, that the primary writes the operations to the PG log
> immediately upon its reception?

The operation is written into the PG log as part of the same
transaction that logs the op itself. The primary ships off the
operations to the replicas concurrently with this happening (it
doesn't care about the ordering of those bits) so while it might
happen first on the primary, there's no particular guarantee of that.

>
> 2, Is it possible that an operation is persisted, but never acknowledged?
> Imagine this situation: a write arrives to an object, the operation is
> copied to and get written to the journal by the replicas, but the primary
> OSD dies and never recovers before it could acknowledge to the user. Upon
> the next peering, this operations will make part of the authoritative
> history?

Operations can be persisted without being acknowledged, yes. Persisted
operations that aren't acknowledged will *usually* end up as part of
the authoritative history, but it depends on which OSDs persisted it
and which are involved in peering as part of the next set.

>
> 3, Quote from the second step of the peering algorithm: "generate a list of
> past intervals since last epoch started"
> If there was no peering failure, than there is exactly one past interval?

Yes? I'm not quite clear on your question.

>
> 4, Quote from the same step: "the subset for which peering could have
> completed before the acting set changed to another set of OSDs".
> The other intervals are ignored, because we can be sure that no write
> operations were allowed during those?

I can't find these quotes and don't know which bit you're asking about.

>
> 5, In each moment, the Up set is either equals to, or a strict subset of the
> Acting set?

No. CRUSH calculates the set of OSDs responsible for a PG. That set
can include OSDs which are not currently running, so the filtered set
of OSDs which are both responsible for a PG and are currently up
(running, not dead, etc) is the "up set".
However, in some cases it's possible that Ceph has forcibly remapped
the PG to a different set of OSDs. This happens a lot when
rebalancing, for instance — if a PG moves from a,b,c to a,b,d it will
make a,b,c the "acting set" in order to maintain the requested 3
copies while OSD d gets backfilled.

>
> 6, When does OSDs repeer? Only when an OSD goes from in -> out, or even if
> an OSD goes down (but not yet marked automatically out)?

OSDs go through the peering process whenever a member of one of their
PGs changes state (up, down, in, out, whichever). This is usually a
fast process if data doesn't actually have to move.

> 7, For what reasons can the peering fail? If the OSD map changes before the
> peering completes, then it's a failure? If the OSD map doesn't change, then
> a reason for failure is not being able to contact "at least one OSD from
> each of past interval‘s acting set"?

Peering only "fails" if the OSDs can't find enough members of prior
acting sets. OSD map changes won't cause failure, they just might
require peering to re-run a lot.

> 8, up_thru: is a per OSD value in the OSD map, which is updated for the
> primary after successfully agreeing on the authoritative history, but before
> completing the peering. What about the secondaries?

up_thru is an indicator that the PGs on this OSD might have been
written to. It's an optimization (albeit an important one) to keep
track of it (and allow later peering processes to skip any epoch which
doesn't have a high enough up_thru value) and requiring it of the
secondaries wouldn't really improve anything, since the primary OSD
doesn't necessarily require any individual one of them in order to go
active.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Christian Sarrasin
After discovering this excellent blog post [1], I thought that taking 
advantage of users' "default_placement" feature would be a preferable 
way to achieve my multi-tenancy requirements (see previous post).


Alas I seem to be hitting a snag. Any attempt to create a bucket with a 
user setup with a non-empty default_placement results in a 400 error 
thrown back to the client and the following msg in the radosgw logs:


"could not find placement rule placement-user2 within region"

(The pools exist, I reloaded the radosgw service and ran 'radosgw-admin 
regionmap update' as suggested in the blog post before running the 
client test)


Here's the setup.  What am I doing wrong?  Any insight is really 
appreciated!


radosgw-admin region get
{ "name": "default",
  "api_name": "",
  "is_master": "true",
  "endpoints": [],
  "master_zone": "",
  "zones": [
{ "name": "default",
  "endpoints": [],
  "log_meta": "false",
  "log_data": "false"}],
  "placement_targets": [
{ "name": "default-placement",
  "tags": []},
{ "name": "placement-user2",
  "tags": []}],
  "default_placement": "default-placement"}

radosgw-admin zone get default
{ "domain_root": ".rgw",
  "control_pool": ".rgw.control",
  "gc_pool": ".rgw.gc",
  "log_pool": ".log",
  "intent_log_pool": ".intent-log",
  "usage_log_pool": ".usage",
  "user_keys_pool": ".users",
  "user_email_pool": ".users.email",
  "user_swift_pool": ".users.swift",
  "user_uid_pool": ".users.uid",
  "system_key": { "access_key": "",
  "secret_key": ""},
  "placement_pools": [
{ "key": "default-placement",
  "val": { "index_pool": ".rgw.buckets.index",
  "data_pool": ".rgw.buckets",
  "data_extra_pool": ".rgw.buckets.extra"}},
{ "key": "placement-user2",
  "val": { "index_pool": ".rgw.index.user2",
  "data_pool": ".rgw.buckets.user2",
  "data_extra_pool": ".rgw.buckets.extra"}}]}

radosgw-admin user info --uid=user2
{ "user_id": "user2",
  "display_name": "User2",
  "email": "",
  "suspended": 0,
  "max_buckets": 1000,
  "auid": 0,
  "subusers": [],
  "keys": [
{ "user": "user2",
  "access_key": "VYM2EEU1X5H6Y82D0K4F",
  "secret_key": "vEeJ9+yadvtqZrb2xoCAEuM2AlVyZ7UTArbfIEek"}],
  "swift_keys": [],
  "caps": [],
  "op_mask": "read, write, delete",
  "default_placement": "placement-user2",
  "placement_tags": [],
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1},
  "user_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1},
  "temp_url_keys": []}

[1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw

On 03/10/15 19:48, Christian Sarrasin wrote:

What are the best options to setup the Ceph radosgw so it supports
separate/independent "tenants"? What I'm after:

1. Ensure isolation between tenants, ie: no overlap/conflict in bucket
namespace; something separate radosgw "users" doesn't achieve
2. Ability to backup/restore tenants' pools individually

Referring to the docs [1], it seems this could possibly be achieved with
zones; one zone per tenant and leave out synchronization. Seems a little
heavy handed and presumably the overhead is non-negligible.

Is this "supported"? Is there a better way?

I'm running Firefly. I'm also rather new to Ceph so apologies if this is
already covered somewhere; kindly send pointers if so...

Cheers,
Christian

PS: cross-posted from [2]

[1] http://docs.ceph.com/docs/v0.80/radosgw/federated-config/
[2]
http://serverfault.com/questions/726491/how-to-setup-ceph-radosgw-to-support-multi-tenancy




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub
On Thu, Oct 8, 2015 at 1:55 PM, Christian Sarrasin
 wrote:
> After discovering this excellent blog post [1], I thought that taking
> advantage of users' "default_placement" feature would be a preferable way to
> achieve my multi-tenancy requirements (see previous post).
>
> Alas I seem to be hitting a snag. Any attempt to create a bucket with a user
> setup with a non-empty default_placement results in a 400 error thrown back
> to the client and the following msg in the radosgw logs:
>
> "could not find placement rule placement-user2 within region"
>
> (The pools exist, I reloaded the radosgw service and ran 'radosgw-admin
> regionmap update' as suggested in the blog post before running the client
> test)
>
> Here's the setup.  What am I doing wrong?  Any insight is really
> appreciated!

Not sure. Did you run 'radosgw-admin regionmap update'?

>
> radosgw-admin region get
> { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}
>
> radosgw-admin zone get default
> { "domain_root": ".rgw",
>   "control_pool": ".rgw.control",
>   "gc_pool": ".rgw.gc",
>   "log_pool": ".log",
>   "intent_log_pool": ".intent-log",
>   "usage_log_pool": ".usage",
>   "user_keys_pool": ".users",
>   "user_email_pool": ".users.email",
>   "user_swift_pool": ".users.swift",
>   "user_uid_pool": ".users.uid",
>   "system_key": { "access_key": "",
>   "secret_key": ""},
>   "placement_pools": [
> { "key": "default-placement",
>   "val": { "index_pool": ".rgw.buckets.index",
>   "data_pool": ".rgw.buckets",
>   "data_extra_pool": ".rgw.buckets.extra"}},
> { "key": "placement-user2",
>   "val": { "index_pool": ".rgw.index.user2",
>   "data_pool": ".rgw.buckets.user2",
>   "data_extra_pool": ".rgw.buckets.extra"}}]}
>
> radosgw-admin user info --uid=user2
> { "user_id": "user2",
>   "display_name": "User2",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "user2",
>   "access_key": "VYM2EEU1X5H6Y82D0K4F",
>   "secret_key": "vEeJ9+yadvtqZrb2xoCAEuM2AlVyZ7UTArbfIEek"}],
>   "swift_keys": [],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "placement-user2",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "temp_url_keys": []}
>
> [1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw
>
>
> On 03/10/15 19:48, Christian Sarrasin wrote:
>>
>> What are the best options to setup the Ceph radosgw so it supports
>> separate/independent "tenants"? What I'm after:
>>
>> 1. Ensure isolation between tenants, ie: no overlap/conflict in bucket
>> namespace; something separate radosgw "users" doesn't achieve
>> 2. Ability to backup/restore tenants' pools individually
>>
>> Referring to the docs [1], it seems this could possibly be achieved with
>> zones; one zone per tenant and leave out synchronization. Seems a little
>> heavy handed and presumably the overhead is non-negligible.
>>
>> Is this "supported"? Is there a better way?
>>
>> I'm running Firefly. I'm also rather new to Ceph so apologies if this is
>> already covered somewhere; kindly send pointers if so...
>>
>> Cheers,
>> Christian
>>
>> PS: cross-posted from [2]
>>
>> [1] http://docs.ceph.com/docs/v0.80/radosgw/federated-config/
>> [2]
>>
>> http://serverfault.com/questions/726491/how-to-setup-ceph-radosgw-to-support-multi-tenancy
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD reaching file open limit - known issues?

2015-10-08 Thread Gregory Farnum
On Fri, Sep 25, 2015 at 10:04 AM, Jan Schermer  wrote:
> I get that, even though I think it should be handled more gracefuly.
> But is it expected to also lead to consistency issues like this?

I don't think it's expected, but obviously we never reproduced it in
the lab. Given that dumpling is EOL and the huge number of changes in
all this code since then, I don't think you should expect anybody to
figure out why it broke, though. :/

>
> I think this is exactly what we're hitting right now
> http://tracker.ceph.com/issues/6101 except I have no idea why it also
> happens on a freshly backfilled OSD...

Yeah, probably the backfill went through a different path that didn't
fail but copied the bad data over. You could try using the patches on
that ticket to get things up and running again...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Christian Sarrasin

Hi Yehuda,

Yes I did run "radosgw-admin regionmap update" and the regionmap appears 
to know about my custom placement_target.  Any other idea?


Thanks a lot
Christian

radosgw-admin region-map get
{ "regions": [
{ "key": "default",
  "val": { "name": "default",
  "api_name": "",
  "is_master": "true",
  "endpoints": [],
  "master_zone": "",
  "zones": [
{ "name": "default",
  "endpoints": [],
  "log_meta": "false",
  "log_data": "false"}],
  "placement_targets": [
{ "name": "default-placement",
  "tags": []},
{ "name": "placement-user2",
  "tags": []}],
  "default_placement": "default-placement"}}],
  "master_region": "default",
  "bucket_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1},
  "user_quota": { "enabled": false,
  "max_size_kb": -1,
  "max_objects": -1}}

On 08/10/15 23:02, Yehuda Sadeh-Weinraub wrote:


Here's the setup.  What am I doing wrong?  Any insight is really
appreciated!


Not sure. Did you run 'radosgw-admin regionmap update'?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub
When you start radosgw, do you explicitly state the name of the region
that gateway belongs to?


On Thu, Oct 8, 2015 at 2:19 PM, Christian Sarrasin
 wrote:
> Hi Yehuda,
>
> Yes I did run "radosgw-admin regionmap update" and the regionmap appears to
> know about my custom placement_target.  Any other idea?
>
> Thanks a lot
> Christian
>
> radosgw-admin region-map get
> { "regions": [
> { "key": "default",
>   "val": { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}}],
>   "master_region": "default",
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1}}
>
> On 08/10/15 23:02, Yehuda Sadeh-Weinraub wrote:
>
>>> Here's the setup.  What am I doing wrong?  Any insight is really
>>> appreciated!
>>
>>
>> Not sure. Did you run 'radosgw-admin regionmap update'?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados python library missing functions

2015-10-08 Thread Rumen Telbizov
Sounds good. We'll try to work on this.

On Thu, Oct 8, 2015 at 5:06 PM, Gregory Farnum  wrote:

> On Thu, Oct 8, 2015 at 5:01 PM, Rumen Telbizov  wrote:
> > Hello everyone,
> >
> > I am very new to Ceph so, please excuse me if this has already been
> > discussed. I couldn't find anything on the web.
> >
> > We are interested in using Ceph and access it directly via its native
> rados
> > API with python. We noticed that certain functions that are available in
> the
> > C library aren't exposed/implemented in the Python module.
> >
> > For example: rados_write_op_cmpxattr or rados_write_op_omap_cmp.
> >
> > Is there any reason why they aren't implemented in python?
> >
> > Are there anything particular problems that might stop us from using them
> > python directly from the C library ourselves?
>
> I think this is just a result of the python bindings growing ad-hoc as
> people need particular functions. Patches welcome!
> -Greg
>



-- 
Rumen Telbizov
Unix Systems Administrator 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS file to rados object mapping

2015-10-08 Thread Gregory Farnum
On Thu, Oct 8, 2015 at 6:45 PM, Francois Lafont  wrote:
> Hi,
>
> On 08/10/2015 22:25, Gregory Farnum wrote:
>
>> So that means there's no automated way to guarantee the right copy of
>> an object when scrubbing. If you have 3+ copies I'd recommend checking
>> each of them and picking the one that's duplicated...
>
> It's curious because I have already tried with cephfs to "corrupt" a
> file in the OSD backend. I had a little text file in cephfs mapped to
> the object "$inode.$num" and this object was in the PG $pg_id, in the
> primary OSD $primary and in the secondary OSD $secondary (I had indeed
> size == 2). I thought that the primary OSD was always taken as reference
> by the "ceph pg repair" command, so I have tried this:
>
> # Test A
> echo "foo blabla..." 
> >/var/lib/ceph/osd/ceph-$primary/current/$pg_id_head/$inode.$num
> ceph pg repair $pg_id
>
> and the "repair" command worked correctly and my file was repaired
> correctly. I have tried to change the file in the secondary OSD too with:
>
> # Test B
> echo "foo blabla..." 
> >/var/lib/ceph/osd/ceph-$secondary/current/$pg_id_head/$inode.$num
> ceph pg repair $pg_id
>
> and it was the same, the file was repaired correctly too. In these 2
> cases, the good OSD was taken as reference (the secondary for the test
> A and the primary for the test B).
>
> So, in this case, how did ceph know which copy was the correct object?

The size of the on-disk file didn't match the OSD's record of the
object size, so it rejected it. This works for that kind of gross
change, but it won't catch stuff like a partial overwrite or loss of
data within a file.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados python library missing functions

2015-10-08 Thread Gregory Farnum
On Thu, Oct 8, 2015 at 5:01 PM, Rumen Telbizov  wrote:
> Hello everyone,
>
> I am very new to Ceph so, please excuse me if this has already been
> discussed. I couldn't find anything on the web.
>
> We are interested in using Ceph and access it directly via its native rados
> API with python. We noticed that certain functions that are available in the
> C library aren't exposed/implemented in the Python module.
>
> For example: rados_write_op_cmpxattr or rados_write_op_omap_cmp.
>
> Is there any reason why they aren't implemented in python?
>
> Are there anything particular problems that might stop us from using them
> python directly from the C library ourselves?

I think this is just a result of the python bindings growing ad-hoc as
people need particular functions. Patches welcome!
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados python library missing functions

2015-10-08 Thread Rumen Telbizov
Hello everyone,

I am very new to Ceph so, please excuse me if this has already been
discussed. I couldn't find anything on the web.

We are interested in using Ceph and access it directly via its native rados
API with python. We noticed that certain functions that are available in
the C library aren't exposed/implemented in the Python module.

For example: rados_write_op_cmpxattr or rados_write_op_omap_cmp.

Is there any reason why they aren't implemented in python?

Are there anything particular problems that might stop us from using them
python directly from the C library ourselves?

Thank you,
-- 
Rumen Telbizov
Unix Systems Administrator 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Potential OSD deadlock?

2015-10-08 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Sage,

After trying to bisect this issue (all test moved the bisect towards
Infernalis) and eventually testing the Infernalis branch again, it
looks like the problem still exists although it is handled a tad
better in Infernalis. I'm going to test against Firefly/Giant next
week and then try and dive into the code to see if I can expose any
thing.

If I can do anything to provide you with information, please let me know.

Thanks,
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.2.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWF1QlCRDmVDuy+mK58QAAWLgP/2l+TkcpeKihDxF8h/kw
YFffNWODNfOMq8FVDQkQceo2mFCFc29JnBYiAeqW+XPelwuU5S86LG998aUB
BvIU4EHaJNJ31X1NCIA7nwi8rXlFYfSG2qQn58+IzqZoWCQM5vD/THISV1rP
qQKtoOAEuRxz+vOAJGI1A1xJSOiFwTRjs4LjE1zYjSP26LdEF61D/lb+AVzV
ufxi/ci6mAla/4VTAH4VqEviDgC8AbAZnWFGfUPcTUxJQS99kFrfjJnWvgyF
V9EmWtQCvhRO74hQLBqspOwdAxEJesPfGcJT1LjR0eEAMWvbGPtaqbSFAEWa
jjyy5wP9+4NnGLdhba6UBtLphjqTcl0e2vVwRj0zLhI14moAOlbhIKmZ1Dt+
1P6vfgOUGvO76xgDMwrVKRoQgWJO/0Tup9+oqInnNYgf4W+ZWsLgLgo7ETAF
VcI7LP1wkwAI3lz5YphY/TnKNGs6i+wVjKBamOt3R1yz9WeylaG0T6xgGHrs
VugrRSUuO+ND9+mE5EsUgITCZoaavXJESJMb30XkK6hYGB+T/q+hBafc6Wle
Jgs+aT2m1erdSyZn0ZC9a6CjWmwJXY6FCSGhE53BbefBxmCFxn+8tVav+Q8W
7s14TntP6ex4ca7eTwGuSXC9FU5fAVa+3+3aXDAC1QPAkeVkXyB716W1XG6b
BCFo
=GJL4
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Oct 7, 2015 at 1:25 PM, Robert LeBlanc  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> We forgot to upload the ceph.log yesterday. It is there now.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Oct 6, 2015 at 5:40 PM, Robert LeBlanc  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> I upped the debug on about everything and ran the test for about 40
>> minutes. I took OSD.19 on ceph1 doen and then brought it back in.
>> There was at least one op on osd.19 that was blocked for over 1,000
>> seconds. Hopefully this will have something that will cast a light on
>> what is going on.
>>
>> We are going to upgrade this cluster to Infernalis tomorrow and rerun
>> the test to verify the results from the dev cluster. This cluster
>> matches the hardware of our production cluster but is not yet in
>> production so we can safely wipe it to downgrade back to Hammer.
>>
>> Logs are located at http://dev.v3trae.net/~jlavoy/ceph/logs/
>>
>> Let me know what else we can do to help.
>>
>> Thanks,
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v1.2.0
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWFFwACRDmVDuy+mK58QAAs/UP/1L+y7DEfHqD/5OpkiNQ
>> xuEEDm7fNJK58tLRmKsCrDrsFUvWCjiqUwboPg/E40e2GN7Lt+VkhMUEUWoo
>> e3L20ig04c8Zu6fE/SXX3lnvayxsWTPcMnYI+HsmIV9E/efDLVLEf6T4fvXg
>> 5dKLiqQ8Apu+UMVfd1+aKKDdLdnYlgBCZcIV9AQe1GB8X2VJJhmNWh6TQ3Xr
>> gNXDexBdYjFBLu84FXOITd3ZtyUkgx/exCUMmwsJSc90jduzipS5hArvf7LN
>> HD6m1gBkZNbfWfc/4nzqOQnKdY1pd9jyoiQM70jn0R5b2BlZT0wLjiAJm+07
>> eCCQ99TZHFyeu1LyovakrYncXcnPtP5TfBFZW952FWQugupvxPCcaduz+GJV
>> OhPAJ9dv90qbbGCO+8kpTMAD1aHgt/7+0/hKZTg8WMHhua68SFCXmdGAmqje
>> IkIKswIAX4/uIoo5mK4TYB5HdEMJf9DzBFd+1RzzfRrrRalVkBfsu5ChFTx3
>> mu5LAMwKTslvILMxAct0JwnwkOX5Gd+OFvmBRdm16UpDaDTQT2DfykylcmJd
>> Cf9rPZxUv0ZHtZyTTyP2e6vgrc7UM/Ie5KonABxQ11mGtT8ysra3c9kMhYpw
>> D6hcAZGtdvpiBRXBC5gORfiFWFxwu5kQ+daUhgUIe/O/EWyeD0rirZoqlLnZ
>> EDrG
>> =BZVw
>> -END PGP SIGNATURE-
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Tue, Oct 6, 2015 at 2:36 PM, Robert LeBlanc  wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA256
>>>
>>> On my second test (a much longer one), it took nearly an hour, but a
>>> few messages have popped up over a 20 window. Still far less than I
>>> have been seeing.
>>> - 
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Tue, Oct 6, 2015 at 2:00 PM, Robert LeBlanc  wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 I'll capture another set of logs. Is there any other debugging you
 want turned up? I've seen the same thing where I see the message
 dispatched to the secondary OSD, but the message just doesn't show up
 for 30+ seconds in the secondary OSD logs.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Tue, Oct 6, 2015 at 1:34 PM, Sage Weil  wrote:
> On Tue, 6 Oct 2015, Robert LeBlanc wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> I can't think of anything. In my dev cluster the only thing that has
>> changed is the Ceph versions (no reboot). What I like is even though
>> the disks are 100% utilized, it is preforming as I expect now. Client
>> I/O is slightly degraded during the recovery, but no 

Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-08 Thread Gregory Farnum
On Thu, Oct 8, 2015 at 6:29 AM, Burkhard Linke
 wrote:
> Hammer 0.94.3 does not support a 'dump cache' mds command.
> 'dump_ops_in_flight' does not list any pending operations. Is there any
> other way to access the cache?

"dumpcache", it looks like. You can get all the supported commands
with "help" and look for things of interest or alternative phrasings.
:)

>
> 'perf dump' stray information (after mds restart):
> "num_strays": 2327,
> "num_strays_purging": 0,
> "num_strays_delayed": 0,
> "strays_created": 33,
> "strays_purged": 34,
>
> The data pool is a combination of EC pool and cache tier. I've evicted the
> cache pool resulting in 128 objects left (one per PG? hitset information?).

Yeah, probably. I don't remember the naming scheme, but it does keep
hitset objects. I don't think you should be able to list them via
rados but they probably show up in the aggregate stats.

-Greg

> After restarting the MDS the number of objects increases by 7 objects (the
> ones left in the data pool). So either the MDS rejoin process promotes them
> back to the cache, or some ceph-fuse instance insists on reading them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com