[ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-05 Thread Dennis Kramer (DT)

Hi,

I'm getting a bunch of "loaded dup inode" errors in the MDS logs.
How can this be fixed?

logs:
2018-07-05 10:20:05.591948 mds.mds05 [ERR] loaded dup inode 0x1991921 
[2,head] v160 at , but inode 0x1991921.head v146 already 
exists at 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS MDS server stuck in "resolve" state

2018-06-27 Thread Dennis Kramer (DT)

Hi,

Currently i'm running Ceph Luminous 12.2.5.

This morning I tried running Multi MDS with:
ceph fs set  max_mds 2

I have 5 MDS servers. After running above command,
I had 2 active MDSs, 2 standby-active and 1 standby.

And after trying a failover on one 
of the active MDSs, a standby-active did a replay but crashed (laggy or 
crashed). Memory and CPU went sky high on the MDS and was unresponsive 
after some time. I ended up with the one active MDS but got stuck with a 
degraded filesystem and warning messages about MDS behind on trimming.


I never got any additional MDS active since then. I tried restarting the 
last active MDS (because the filesystem was becoming unresponsive and had 
a load of slow requets) and it never got passed replay -> resolve. My MDS 
cluster still isn't active... :(


What is the "resolve" state? I have never seen that before pre-Luminous.
Debug on 20 doesn't give me much.

Also tried removing the Multi MDS setup, but my CephFS cluster won't go 
active. How can I get my CephFS up and running again in an active state.


Please help.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs/ceph-fuse: mds0: Client XXX:XXX failingtorespond to capability release

2016-09-14 Thread Dennis Kramer (DT)

Hi Burkhard,

Thank you for your reply, see inline:

On Wed, 14 Sep 2016, Burkhard Linke wrote:


Hi,


On 09/14/2016 12:43 PM, Dennis Kramer (DT) wrote:

Hi Goncalo,

Thank you. Yes, i have seen that thread, but I have no near full osds and 
my mds cache size is pretty high.


You can use the daemon socket on the mds server to get an overview of the 
current cache state:


ceph daemon mds.XXX perf dump

The message itself indicates that the mds is in fact trying to convince 
clients to release capabilities, probably because it is running out of cache.


My cache is set to mds_cache_size = 1500, but you are right, it seems 
the complete cache is used, but that shouldn't be a real problem if the 
clients can release the caps in time. Correct me if i'm wrong but the 
cache_size is pretty high compared to the default (100k). I will raise the 
mds_cache_size a bit and see if it helps a bit.


The 'session ls' command on the daemon socket lists all current ceph clients 
and the number capabilities for each client. Depending on your workload / 
applications you might be surprised how many capabilities are assigned to 
individual nodes...


From the client side of view the error means that there's either a bug in the 
client, or an application is keeping a large number of files open (e.g. do 
you run mlocate on the clients?)
I haven't had this issue when I was on hammer and the amount of clients 
haven't changed. I have "ceph fuse.ceph fuse.ceph-fuse" in my PRUNEFS for 
updatedb, so it probably isn't mlocate which would cause this issue.

The only real difference is my upgrade to Jewel.


If you use the kernel based client re-mounting won't help, since the internal 
state is keep the same (afaik). In case of the ceph-fuse client the ugly way 
to get rid off the mount point is a lazy / forced umount and killing the 
ceph-fuse process if necessary. Processes with open file handles will 
complain afterwards.



Before using rude ways to terminate the client session i would propose to 
look for rogue applications on the involved host. We had a number of problems 
with multithreaded applications and concurrent file access on the past (both 
with ceph-fuse from hammer and kernel based clients). lsof or other tools 
might help locating the application.


My cluster is back to HEALTH_OK, the involved host has been restarted by 
the user. But I will debug some more on the host when i see this issue 
again next time.


PS: For completeness, i've stated that this issue was often seen in my 
current Jewel environment, I meant to say that this issue comes up 
sometimes (so not so often). But the times when i *do* have this issue, it blocks some 
I/O for clients as a consequence.



Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs/ceph-fuse: mds0: Client XXX:XXX failing to respond to capability release

2016-09-14 Thread Dennis Kramer (DT)

Hi Goncalo,

Thank you. Yes, i have seen that thread, but I have no near full osds and 
my mds cache size is pretty high.


On Wed, 14 Sep 2016, Goncalo Borges wrote:


Hi Dennis
Have you checked

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007207.html ?

The issue there was some near full osd blocking IO.

Cheers
G.


From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dennis Kramer 
(DBS) [den...@holmes.nl]
Sent: 14 September 2016 17:44
To: ceph-users@lists.ceph.com
Subject: [ceph-users] cephfs/ceph-fuse: mds0: Client XXX:XXX failing to respond 
to capability release

Hi All,

Running Ubuntu 16.04, with version JEWEL ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374)

In our environment we are running cephfs and our clients are connecting
through ceph-fuse. Since I have upgraded from Hammer to Jewel I was
haunted by the ceph-fuse segfaults, which were resolved by using the
patch from https://github.com/ceph/ceph/pull/10027

But lately I'm often getting the error:
"mds0: Client XXX:XXX failing to respond to capability release"

These clients are always (patched) ceph-fuse clients and are generating
the following MDS log error:

2016-09-14 08:06:36.953168 7f07df309700  0 log_channel(cluster) log
[WRN] : client.196596070 isn't responding to mclientcaps(revoke), ino
10001d28611 pending pAsLsXsFscr issued pAsLsXsFsxcrwb, sent 30723.302011
seconds ago
2016-09-14 08:29:42.262518 7f07df309700  0 log_channel(cluster) log
[WRN] : client.196596070 isn't responding to mclientcaps(revoke), ino
10001d28613 pending pAsxLsXsxFcb issued pAsxLsXsxFcwb, sent 62.097011
seconds ago
2016-09-14 08:30:42.263593 7f07df309700  0 log_channel(cluster) log
[WRN] : client.196596070 isn't responding to mclientcaps(revoke), ino
10001d28613 pending pAsxLsXsxFcb issued pAsxLsXsxFcwb, sent 122.098144
seconds ago
2016-09-14 08:32:42.283509 7f07df309700  0 log_channel(cluster) log
[WRN] : client.196596070 isn't responding to mclientcaps(revoke), ino
10001d28613 pending pAsxLsXsxFcb issued pAsxLsXsxFcwb, sent 242.118043
seconds ago
2016-09-14 09:37:32.347160 7f07df309700  0 log_channel(cluster) log
[INF] : closing stale session client.196596070 10.5.5.83:0/4117427533
after 304.829698

It seems that these clients also trigger blocked requests.
Everything is running Jewel code and I have no old clients anymore.

My only solution is to restart the client (or forcebly remount).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-fuse "Transport endpoint is not connected" on Jewel 10.2.2

2016-08-24 Thread Dennis Kramer (DT)

Hi all,

Running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) on 
Ubuntu 16.04LTS.


Currently I have the weirdest thing, I have a bunch of linux clients, 
mostly debian based (Ubuntu/Mint). They all use version 10.2.2 of 
ceph-fuse. I'm running cephfs since Hammer without any issues, but 
upgraded last week to Jewel and now my clients get:

"Transport endpoint is not connected".

It seems the error only arises when the client is using the GUI when they 
browse through the ceph-fuse mount, some use nemo, some nautilus. The 
error doesnt show up immediatly, sometimes the client can browse through 
the share for some time before they are kicked out with the error.


But when I strictly use the shell to browse the ceph-fuse mount in the CLI 
it works without any issues, when I try to use the GUI browser on the same 
client, the error shows and I get kicked out of the ceph-fuse mount until 
I remount.


Any suggestions?

With regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Path restriction

2015-12-08 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ah, that explains alot. Thank you.
Yes, it was a bit confusing for which version it applied to.

Awesome addition by the way, I like the path parameter!

Cheers.

On 12/08/2015 03:15 PM, John Spray wrote:
> On Tue, Dec 8, 2015 at 1:43 PM, Dennis Kramer (DT)
>  wrote:
> 
> 
> Hi,
> 
> I'm trying to restrict clients to mount a specific path in CephFS. 
> I've been using the official doc for this: 
> http://docs.ceph.com/docs/master/cephfs/client-auth/
> 
> After setting these cap restrictions, the client can still mount
> and use all directories in CephFS. Am I missing something?
> 
>> You're looking at the master docs -- this functionality is newer
>> than Hammer.  It'll be in the Jewel release.
> 
>> I should have noted that on the page, because people do tend to
>> end up finding master docs no matter what version they're using.
> 
>> John
> 
> 
> I'm using the Hammer release version 0.94.5 
> (9764da52395923e0b32908d83a9f7304401fee43)
>> ___ ceph-users
>> mailing list ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlZm5xMACgkQiJDTKUBxIRtx7wCg0dpcI2yMuzXASgYzlA1xLD1k
C7cAniY0JciDqU3Z1t5A1tqtdXk3vFxM
=OSaG
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Path restriction

2015-12-08 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Hi,

I'm trying to restrict clients to mount a specific path in CephFS.
I've been using the official doc for this:
http://docs.ceph.com/docs/master/cephfs/client-auth/

After setting these cap restrictions, the client can still mount and
use all directories in CephFS. Am I missing something?

I'm using the Hammer release version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43)
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlZm3ocACgkQiJDTKUBxIRu3sQCfWbi3EOQ3jSE8BPo3uRfDEVur
5FAAn1FED0a8wueNs4F3IwoO+Og3fV/m
=ooJh
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-12-01 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I've been testing below options out, but I still have the same problem
that files are not visible on different clients. After a "touch" of a
new file (or directory) all files are visible again. It definitly
looks like a directory cache problem.

Client mount options like "noac" or "actimeo=0" solved it for some,
but after a while the clients ran into the same problem again. I'm a
bit at a loss here, so hopefully someone can shed some more light on
this annoying problem.

It seems when I restart the NFS-server, the problem disappears for a
while. After a week or so, the problem resurfaced.

I've used the following options for the NFS-Ganesha config:
NFSv4
{
DomainName = "<>";
IdmapConf = "/etc/idmapd.conf";
}
NFS_KRB5
{
Active_krb5 = false;
}

NFS_DupReq_Hash
{
Index_Size = 17 ;
Alphabet_Length = 10 ;
}

NFSv4_ClientId_Cache
{
Index_Size = 17 ;
Alphabet_Length = 10 ;
}

CEPH
{
}

CacheInode_Client
{
Entry_Prealloc_PoolSize = 1000 ;
Attr_Expiration_Time = Immediate ;
Symlink_Expiration_Time = Immediate ;
Directory_Expiration_Time = Immediate ;
Use_Test_Access = 1 ;
}

CacheInode
{
Attr_Expiration_Time = 0 ;
Use_Getattr_Directory_Invalidation = true;
}

EXPORT_DEFAULTS
{
Disable_ACL = FALSE;
SecType = "sys";
Protocols = "4";
Transports = "TCP";
Manage_Gids = TRUE;
}

EXPORT
{
Export_ID=1;
FSAL {
Name = Ceph;
}
Path = "/DATA/SHARE";
Pseudo = "/DATA";
Tag = "DATA";
CLIENT {
Clients = 172.17.0.0/16;
Access_Type = RW;
Squash = Root;
}
}


With regards,


On 10/28/2015 05:37 PM, Lincoln Bryant wrote:
> Hi Dennis,
> 
> We're using NFS Ganesha here as well. I can send you my
> configuration which is working but we squash users and groups down
> to a particular uid/gid, so it may not be super helpful for you.
> 
> I think files not being immediately visible is working as intended,
> due to directory caching. I _believe_ what you need to do is set
> the following (comments shamelessly stolen from the Gluster FSAL): 
> # If thuis flag is set to yes, a getattr is performed each time a
> readdir is done # if mtime do not match, the directory is renewed.
> This will make the cache more # synchronous to the FSAL, but will
> strongly decrease the directory cache performance 
> Use_Getattr_Directory_Invalidation = true;
> 
> Hope that helps.
> 
> Thanks, Lincoln
> 
>> On Oct 28, 2015, at 9:08 AM, Dennis Kramer (DT)
>>  wrote:
>> 
> Sorry for raising this topic from the dead, but i'm having the
> same issues with NFS-GANESHA /w the wrong user/group information.
> 
> Do you maybe have a working ganesha.conf? I'm assuming I might 
> mis-configured something in this file. It's also nice to have some 
> reference config file from a working FSAL CEPH, the sample config
> is very minimalistic.
> 
> I also have another issue with files that are not immediately
> visible in a NFS folder after another system (using the same NFS)
> has created it. There seems to be a slight delay before all system
> have the same directory listing. This can be enforced by creating a
> *new* file in this directory which will cause a refresh on this
> folder. Changing directories also helps on affected system(s).
> 
> On 07/28/2015 11:30 AM, Haomai Wang wrote:
>>>> On Tue, Jul 28, 2015 at 5:28 PM, Burkhard Linke 
>>>>  wrote:
>>>>> Hi,
>>>>> 
>>>>> On 07/28/2015 11:08 AM, Haomai Wang wrote:
>>>>>> 
>>>>>> On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum 
>>>>>>  wrote:
>>>>>>> 
>>>>>>> On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke 
>>>>>>> 
>>>>>>> wrote:
>>>>> 
>>>>> 
>>>>> *snipsnap*
>>>>>>>> 
>>>>>>>> Can you give some details on that issues? I'm
>>>>>>>> currently looking for a way to provide NFS based
>>>>>>>> access to CephFS to our desktop machines.
>>>>>>> 
>>>>>>> Ummm...sadly I can't; we don't appear to have any
>>>>>>> tracker tickets and I'm not sure where the report went
>>>>>>> to. :( I think it was from Haomai...
>>>>>> 
>>>>>> My fault, I should report this to 

Re: [ceph-users] State of nfs-ganesha CEPH fsal

2015-10-28 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sorry for raising this topic from the dead, but i'm having the same
issues with NFS-GANESHA /w the wrong user/group information.

Do you maybe have a working ganesha.conf? I'm assuming I might
mis-configured something in this file. It's also nice to have some
reference config file from a working FSAL CEPH, the sample config is
very minimalistic.

I also have another issue with files that are not immediately visible
in a NFS folder after another system (using the same NFS) has created
it. There seems to be a slight delay before all system have the same
directory listing. This can be enforced by creating a *new* file in
this directory which will cause a refresh on this folder. Changing
directories also helps on affected system(s).

On 07/28/2015 11:30 AM, Haomai Wang wrote:
> On Tue, Jul 28, 2015 at 5:28 PM, Burkhard Linke 
>  wrote:
>> Hi,
>> 
>> On 07/28/2015 11:08 AM, Haomai Wang wrote:
>>> 
>>> On Tue, Jul 28, 2015 at 4:47 PM, Gregory Farnum
>>>  wrote:
 
 On Tue, Jul 28, 2015 at 8:01 AM, Burkhard Linke 
  wrote:
>> 
>> 
>> *snipsnap*
> 
> Can you give some details on that issues? I'm currently
> looking for a way to provide NFS based access to CephFS to
> our desktop machines.
 
 Ummm...sadly I can't; we don't appear to have any tracker
 tickets and I'm not sure where the report went to. :( I think
 it was from Haomai...
>>> 
>>> My fault, I should report this to ticket.
>>> 
>>> I have forgotten the details about the problem, I submit the
>>> infos to IRC :-(
>>> 
>>> It related to the "ls" output. It will print the wrong
>>> user/group owner as "-1", maybe related to root squash?
>> 
>> Are you sure this problem is related to the CephFS FSAL? I also
>> had a hard time setting up ganesha correctly, especially with
>> respect to user and group mappings, especially with a kerberized
>> setup.
>> 
>> I'm currently running a small test setup with one server and one
>> client to single out the last kerberos related problems
>> (nfs-ganesha 2.2.0 / Ceph Hammer 0.94.2 / Ubuntu 14.04).
>> User/group listings have been OK so far. Do you remember whether
>> the problem occurs every time or just arbitrarily?
>> 
> 
> Great!
> 
> I'm not sure the reason. I guess it may related to nfs-ganesha
> version or client distro version.
> 
>> Best regards, Burkhard 
>> ___ ceph-users
>> mailing list ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.20 (GNU/Linux)

iEYEARECAAYFAlYw1vgACgkQiJDTKUBxIRsrMACggkb1IZw7od43s9AFUMznwP6M
hW4AoJf2O11uM0F20TQwFJKPt76YcwhW
=PKLQ
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing "Learning Ceph" : The First ever Book on Ceph

2015-02-13 Thread Dennis Kramer (DT)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Awesome!
Just bought the paper back copy. The sample looked very good. Thanks!

Grt,

On Fri, 6 Feb 2015, Karan Singh wrote:


Hello Community Members

I am happy to introduce the first book on Ceph with the title ?Learning Ceph?.

Me and many folks from the publishing house together with technical reviewers 
spent several months to get this book compiled and published.

Finally the book is up for sale on , i hope you would like it and surely will 
learn a lot from it.

Amazon :  
http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=books&ie=UTF8&qid=1423174441&sr=1-1&keywords=ceph
 

Packtpub : https://www.packtpub.com/application-development/learning-ceph 


You can grab the sample copy from here :  
https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0 


Finally , I would like to express my sincere thanks to

Sage Weil - For developing Ceph and everything around it as well as writing 
foreword for ?Learning Ceph?.
Patrick McGarry - For his usual off the track support that too always.

Last but not the least , to our great community members , who are also 
reviewers of the book Don Talton , Julien Recurt , Sebastien Han and Zihong 
Chen , Thank you guys for your efforts.



Karan Singh
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/ 





-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlTeIbYACgkQiJDTKUBxIRtAKQCeNFTMsIcoBXOrjyNauKZJGf72
lKsAoN56HgyInQehLK4LbzSJezadNQ5b
=dRfa
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-13 Thread Dennis Kramer (DT)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Fri, 6 Feb 2015, Gregory Farnum wrote:


On Fri, Feb 6, 2015 at 7:11 AM, Dennis Kramer (DT)  wrote:


On Fri, 6 Feb 2015, Gregory Farnum wrote:


On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) 
wrote:


I've used the upstream module for our production cephfs cluster, but i've
noticed a bug where timestamps aren't being updated correctly. Modified
files are being reset to the beginning of Unix time.

It looks like this bug only manifest itself in applications like MS
Office
where extra metadata is added to files. If I for example modify a text
file
in notepad everything is working fine, but when I modify a docx (or .xls
for
that matter), the timestamp is getting a reset to 1-1-1970.
You can imagine that this could be a real dealbreaker for production use
(think of backups/rsyncs based on mtimes which will render useless).

Further more the return values for free/total disk space is also not
working
correctly when you mount a share in Windows. My 340TB cluster had 7.3EB
storage available in Windows ;) This could be fixed with a workaround by
using a custom "dfree command =" script in the smb.conf, but VFS will
override this and thus this script will not work (unless you remove the
lines of codes for these disk operations in vfs_ceph.c).

My experience with the VFS module is pretty awesome nonetheless. I really
noticed an improvement in throughput when using this module instead of an
re-export with the kernel client. So I hope the VFS module will be
maintained actively again any time soon.



Can you file bugs for these? The timestamp one isn't anything I've
heard of before.

http://tracker.ceph.com/issues/10834



The weird free space on Windows actually does sound familiar; I think
it has to do with either Windows or the Samba/Windows interface not
handling our odd block sizes properly...

This one has been fixed.
http://tracker.ceph.com/issues/10835

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlTd3gsACgkQiJDTKUBxIRseAQCeNFsD4jEQKqsWvGVWPph1m6+S
o+QAoNDy9zrcxdYNYBM+czdMpV9DV6o1
=U/hx
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache pressure fail

2015-02-11 Thread Dennis Kramer (DT)

On Wed, 11 Feb 2015, Wido den Hollander wrote:


On 11-02-15 12:57, Dennis Kramer (DT) wrote:

On Fri, 7 Nov 2014, Gregory Farnum wrote:


Did you upgrade your clients along with the MDS? This warning
indicates the
MDS asked the clients to boot some inboxes out of cache and they have
taken
too long to do so.
It might also just mean that you're actively using more inodes at any
given
time than your MDS is configured to keep in memory.
-Greg

How can one verify this? I'm getting the same warnings. I'm curious how
I can check if there are indeed more inodes actively used than my MDS
can keep in memory.



I think that using the admin socket you can query the MDS for how much
Inodes are cached.

$ ceph daemon mds.X help

I don't know the exact syntax from the top of my head, but it should be
something you can fetch there.

And iirc it also prints this ones every X seconds in the MDS log file.

Wido


After setting the debug level to 2, I can see:
2015-02-11 13:36:31.922262 7f0b38294700  2 mds.0.cache check_memory_usage 
total 58516068, rss 57508660, heap 32676, malloc 1227560 mmap 0, baseline 
39848, buffers 0, max 67108864, 8656261 / 931 inodes have caps, 
10367318 caps, 1.03674 caps per inode


It doesn't look like it has serious memory problems, unless my 
interpretation is wrong of the output.


It looks like I have the same symptoms as:
http://tracker.ceph.com/issues/10151

I'm running 0.87 on all my nodes.


Thanks.


On Fri, Nov 7, 2014 at 5:17 AM Daniel Takatori Ohara

wrote:


Hi,

In my cluster, when i execute the command ceph health detail, show me
the
message.

mds0: Many clients (17) failing to respond to cache
pressure(client_count:
)

This message appear when i upgrade the ceph for 0.87 from 0.80.7.

Anyone help me?

Thank's,

Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto S?rio-Liban?s de Ensino e Pesquisa
Hospital S?rio-Liban?s
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
S?o Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache pressure fail

2015-02-11 Thread Dennis Kramer (DT)

On Fri, 7 Nov 2014, Gregory Farnum wrote:


Did you upgrade your clients along with the MDS? This warning indicates the
MDS asked the clients to boot some inboxes out of cache and they have taken
too long to do so.
It might also just mean that you're actively using more inodes at any given
time than your MDS is configured to keep in memory.
-Greg
How can one verify this? I'm getting the same warnings. I'm curious how I 
can check if there are indeed more inodes actively used than my MDS can 
keep in memory.


Thanks.


On Fri, Nov 7, 2014 at 5:17 AM Daniel Takatori Ohara 
wrote:


Hi,

In my cluster, when i execute the command ceph health detail, show me the
message.

mds0: Many clients (17) failing to respond to cache pressure(client_count:
)

This message appear when i upgrade the ceph for 0.87 from 0.80.7.

Anyone help me?

Thank's,

Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto S?rio-Liban?s de Ensino e Pesquisa
Hospital S?rio-Liban?s
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
S?o Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Kramer M.D.
Infrastructure Engineer


Nederlands Forensisch Instituut
Digitale Technologie & Biometrie
Laan van Ypenburg 6 | 2497 GB | Den Haag
Postbus 24044 | 2490 AA | Den Haag

T 070 888 66 46
M 06 29 62 12 02
d.kra...@nfi.minvenj.nl / den...@holmes.nl
PGP publickey: http://www.holmes.nl/dennis.asc
www.forensischinstituut.nl

Nederlands Forensisch Instituut. In feiten het beste.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)


On Fri, 6 Feb 2015, Gregory Farnum wrote:


On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT)  wrote:

I've used the upstream module for our production cephfs cluster, but i've
noticed a bug where timestamps aren't being updated correctly. Modified
files are being reset to the beginning of Unix time.

It looks like this bug only manifest itself in applications like MS Office
where extra metadata is added to files. If I for example modify a text file
in notepad everything is working fine, but when I modify a docx (or .xls for
that matter), the timestamp is getting a reset to 1-1-1970.
You can imagine that this could be a real dealbreaker for production use
(think of backups/rsyncs based on mtimes which will render useless).

Further more the return values for free/total disk space is also not working
correctly when you mount a share in Windows. My 340TB cluster had 7.3EB
storage available in Windows ;) This could be fixed with a workaround by
using a custom "dfree command =" script in the smb.conf, but VFS will
override this and thus this script will not work (unless you remove the
lines of codes for these disk operations in vfs_ceph.c).

My experience with the VFS module is pretty awesome nonetheless. I really
noticed an improvement in throughput when using this module instead of an
re-export with the kernel client. So I hope the VFS module will be
maintained actively again any time soon.


Can you file bugs for these? The timestamp one isn't anything I've
heard of before.
The weird free space on Windows actually does sound familiar; I think
it has to do with either Windows or the Samba/Windows interface not
handling our odd block sizes properly...
-Greg



Sure, just point me in the right direction for these bug reports.

It's true BTW, IIRC Windows defaults to 1024k block size for calculating 
the free/total space, but this could be managed by the vfs module. 
Windows only expects two mandatory values, available- and total space in 
bytes and optionally the block size as a third value.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)
I've used the upstream module for our production cephfs cluster, but i've 
noticed a bug where timestamps aren't being updated correctly. 
Modified files are being reset to the beginning of Unix time.


It looks like this bug only manifest itself in applications like MS Office 
where extra metadata is added to files. If I for example modify a text 
file in notepad everything is working fine, but when I modify a docx (or 
.xls for that matter), the timestamp is getting a reset to 1-1-1970.
You can imagine that this could be a real dealbreaker for production use 
(think of backups/rsyncs based on mtimes which will render useless).


Further more the return values for free/total disk space is also not 
working correctly when you mount a share in Windows. My 340TB cluster had 
7.3EB storage available in Windows ;) This could be fixed with a 
workaround by using a custom "dfree command =" script in the smb.conf, 
but VFS will override this and thus this script will not work (unless you 
remove the lines of codes for these disk operations in vfs_ceph.c).


My experience with the VFS module is pretty awesome nonetheless. I really 
noticed an improvement in throughput when using this module instead of an 
re-export with the kernel client. So I hope the VFS module will be 
maintained actively again any time soon.



On Fri, 6 Feb 2015, Sage Weil wrote:


On Fri, 6 Feb 2015, Dennis Kramer (DT) wrote:

Hi,

Is the Samba VFS module for CephFS actively maintained at this moment?
I haven't seen much updates in the ceph/samba git repo.


You should really ignore the ceph/samba fork; it isn't used.  The Ceph VFS
driver is upstream in Samba and maintained there.

That said, it isn't being actively developed at the moment, but I'm hoping
to change that shortly!  We do some basic nightly testing in the ceph lab
but I'd be very interested in hearing about users' experiences.

Thanks!
sage




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)

Hi,

Is the Samba VFS module for CephFS actively maintained at this moment?
I haven't seen much updates in the ceph/samba git repo.

With regards,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Nova] [RBD] Copy-on-write cloning for RBD-backed disks

2014-07-16 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Dmitry,

I've been using Ubuntu 14.04LTS + Icehouse /w CEPH as a storage
backend for glance, cinder and nova (kvm/libvirt). I *really* would
love to see this patch cycle in Juno. It's been a real performance
issue because of the unnecessary re-copy from-and-to CEPH when using
the default "boot from image"-option. It seems that the your fix would
be the solution to all. IMHO this is one of the most important
features when using CEPH RBD as a backend for Openstack Nova.

Can you point me in the right direction in how to apply this patch of
yours on a default Ubuntu14.04LTS + Icehouse installation? I'm using
the default ubuntu packages since Icehouse lives in core and I'm not
sure how to apply the patch series. I would love to test and review it.

With regards,

Dennis

On 07/16/2014 11:18 PM, Dmitry Borodaenko wrote:
> I've got a bit of good news and bad news about the state of
> landing the rbd-ephemeral-clone patch series for Nova in Juno.
> 
> The good news is that the first patch in the series 
> (https://review.openstack.org/91722 fixing a data loss inducing
> bug with live migrations of instances with RBD backed ephemeral
> drives) was merged yesterday.
> 
> The bad news is that after 2 months of sitting in review queue and 
> only getting its first a +1 from a core reviewer on the spec
> approval freeze day, the spec for the blueprint
> rbd-clone-image-handler (https://review.openstack.org/91486) wasn't
> approved in time. Because of that, today the blueprint was rejected
> along with the rest of the commits in the series, even though the
> code itself was reviewed and approved a number of times.
> 
> Our last chance to avoid putting this work on hold for yet another 
> OpenStack release cycle is to petition for a spec freeze exception
> in the next Nova team meeting: 
> https://wiki.openstack.org/wiki/Meetings/Nova
> 
> If you're using Ceph RBD as backend for ephemeral disks in Nova
> and are interested this patch series, please speak up. Since the
> biggest concern raised about this spec so far has been lack of CI
> coverage, please let us know if you're already using this patch
> series with Juno, Icehouse, or Havana.
> 
> I've put together an etherpad with a summary of where things are
> with this patch series and how we got here: 
> https://etherpad.openstack.org/p/nova-ephemeral-rbd-clone-status
> 
> Previous thread about this patch series on ceph-users ML: 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-March/028097.html
>
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlPHa6kACgkQiJDTKUBxIRtpOwCeNjTlYlyypOsaGeI/+HRxZ6nt
Y2kAoNLckOlSaEfw+dwSBacXP3JGkcAj
=0Ez1
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PERC H710 raid card

2014-07-16 Thread Dennis Kramer (DT)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

What do you recommend in case of a disk failure in this kind of
configuration? Are you bringing down the host when you replace the
disk and re-create the raid-0 for the replaced disk? I reckon that
linux doesn't automatically get the disk replacement either...

Dennis

On 07/16/2014 11:02 PM, Shain Miley wrote:
> Robert, We use those cards here in our Dell R-720 servers.
> 
> We just ended up creating a bunch of single disk RAID-0 units,
> since there was no jbod option available.
> 
> Shain
> 
> 
> On 07/16/2014 04:55 PM, Robert Fantini wrote:
>> I've 2 dell systems with PERC H710 raid cards. Those are very
>> good end cards , but do not support jbod .
>> 
>> They support raid 0, 1, 5, 6, 10, 50, 60 .
>> 
>> lspci shows them as:  LSI Logic / Symbios Logic MegaRAID SAS 2208
>>  [Thunderbolt] (rev 05)
>> 
>> The firmware Dell uses on the card does not support jbod.
>> 
>> My question is how can this be best used for Ceph? Or should it
>> not be used?
>> 
>> 
>> 
>> 
>> 
>> ___ ceph-users
>> mailing list ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -- Shain Miley | Manager of Systems and Infrastructure, Digital
> Media | smi...@npr.org | 202.513.3649
> 


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlPHZ1MACgkQiJDTKUBxIRusogCeJ+jnADW/KBoQAxnDSz62yT3P
FNoAnin3A52AqiA+KlFJQoc5bdQRoyYe
=/MPE
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com