Re: [ceph-users] how to run rados common by non-root user in ceph node

2014-11-24 Thread Huynh Dac Nguyen

Hi Michael,  

It works now, after "chmod +r /etc/ceph/*"  
It's a stupid mistake, I remember that i have checked it before. 

Thank you Michael. 

Regards,  
Ndhuynh 

>>> Michael Kuriger  11/25/2014 1:21 AM >>>

The non root user needs to be able to read the key file. 

Michael Kuriger  

mk7...@yp.com 

818-649-7235 

MikeKuriger (IM) 



From: Huynh Dac Nguyen 
Date: Wednesday, November 19, 2014 at 8:44 PM
To: "ceph-users@lists.ceph.com" 
Subject: [ceph-users] how to run rados common by non-root user in ceph
node



Hi All,  


After setting up the ceph, i can run ceph, rados command by root user, 



how can i run them by non-root account?  


i created a key already:  
[root@ho-srv-ceph-02 ~]# ceph auth list 
client.oneadmin 
key: AQBQY21UaLLhCBAAKjsM8qRxFpJA4ppbA7Rn9A== 
caps: [mon] allow r 
caps: [osd] allow rw pool=one 


[oneadmin@ho-srv-ceph-02 ~]$ rados df 
2014-11-20 11:43:44.143434 7f53ac04a760 -1 monclient(hunting): ERROR:
missing keyring, cannot use cephx for authentication 
2014-11-20 11:43:44.143611 7f53ac04a760  0 librados: client.admin
initialization error (2) No such file or directory 
couldn't connect to cluster! error -2 



Regards, 
Ndhuynh 

This e-mail message including any attachments is for the sole use of
the intended(s) and may contain privileged or confidential information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not intended recipient, please immediately contact the sender
by reply e-mail and delete the original message and destroy all copies
thereof.

This e-mail message including any attachments is for the sole use of the
intended(s) and may contain privileged or confidential information. Any
unauthorized review, use, disclosure or distribution is prohibited. If
you are not intended recipient, please immediately contact the sender by
reply e-mail and delete the original message and destroy all copies
thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Huynh Dac Nguyen

Hi Chris,  


I see.  
I'm runing on version 0.80.7.  
How do we know which part of document for our version? As you see, we
have only one ceph document here, It make us confused.  
Could you show me the document for ceph version 0.80.7? 


Regards,  
Ndhuynh 


>>> Christopher Armstrong  11/25/2014 2:57 AM >>>

You're likely not running a release which includes this command - it
was only introduced in v0.84. If you're running giant, it should work.
If not, you won't have it (and won't need to create a filesystem - it
will exist by default). 



Chris Armstrong
Head of Services & Community
OpDemand / Deis.io 
GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/ 



On Thu, Nov 20, 2014 at 8:43 PM, Huynh Dac Nguyen
 wrote:



Hi All,  


When i create a new cephfs, this error shows 


Is it a bug?  


[root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata
cephfs_data 
no valid command found; 10 closest matches: 
fsid 
Error EINVAL: invalid command 



Regards,  
Ndhuynh 

This e-mail message including any attachments is for the sole use of
the intended(s) and may contain privileged or confidential information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not intended recipient, please immediately contact the sender
by reply e-mail and delete the original message and destroy all copies
thereof.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 



This e-mail message including any attachments is for the sole use of the
intended(s) and may contain privileged or confidential information. Any
unauthorized review, use, disclosure or distribution is prohibited. If
you are not intended recipient, please immediately contact the sender by
reply e-mail and delete the original message and destroy all copies
thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Huynh Dac Nguyen

Hi JC,  

Thank you, JC. 

# ceph osd lspools 
0 data,1 metadata,2 rbd,3 one,4 cephfs_data,5 cephfs_metadata, 

# ceph mds newfs 4 5 --yes-i-really-mean-it 
new fs with metadata pool 4 and data pool 5 

- So is the document wrong now?  

- When "ceph mds newfs 4 5" runs without "--yes-i-really-mean-it"  
the warning: "Error EPERM: this is DANGEROUS and will wipe out the
mdsmap's fs, and may clobber data in the new pools you specify.  add
--yes-i-really-mean-it if you do"  shows, is it dangerous ? can we lost
data here?  

- How can we check the newfs information?   


Regards,  
Ndhuynh

>>> Jean-Charles LOPEZ  11/25/2014 1:58 AM >>>
Hi, 


Use ceph mds newfs {metaid} {dataid} instead 



JC 









On Nov 20, 2014, at 20:43, Huynh Dac Nguyen 
wrote: 



Hi All,  



When i create a new cephfs, this error shows 



Is it a bug?  



[root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata
cephfs_data 

no valid command found; 10 closest matches: 

fsid 

Error EINVAL: invalid command 




Regards,  

Ndhuynh 


This e-mail message including any attachments is for the sole use of
the intended(s) and may contain privileged or confidential information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not intended recipient, please immediately contact the sender
by reply e-mail and delete the original message and destroy all copies
thereof.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



This e-mail message including any attachments is for the sole use of the
intended(s) and may contain privileged or confidential information. Any
unauthorized review, use, disclosure or distribution is prohibited. If
you are not intended recipient, please immediately contact the sender by
reply e-mail and delete the original message and destroy all copies
thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fiemap bug on giant

2014-11-24 Thread Haomai Wang
It's surprised that the test machine run on a very new kernel but
occur this problem:

plana 47 is 12.04.5 with kernel 3.18.0-rc6-ceph-00024-geb0e5fd

plana 50 is 12.04.4 with kernel 3.17.0-rc6-ceph-2-ge8acad6

Which local filesystem is ran on?

On Tue, Nov 25, 2014 at 5:03 AM, Samuel Just  wrote:
> Bug #10166 (http://tracker.ceph.com/issues/10166) can cause recovery
> to result in incorrect object sizes on giant if the setting 'filestore
> fiemap' is set to true.  This setting is disabled by default.  This
> should be fixed in a future point release, though filestore fiemap
> will probably continue to default to false.
> -Sam
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Negative number of objects degraded for extended period of time

2014-11-24 Thread Craig Lewis
To disable RadosGW GC, you could bump rgw_gc_obj_min_wait to something
really big.   If you set it to a week, you should have a week with no GC.
When you return it to normal, it should just need a couple passes,
depending on how much stuff you delete while GC is stalled.

injectargs doesn't appear to work with radosgw (at least, I can't figure
out how to do it), so you'll have to edit ceph.conf and restart all of your
radosgw daemons.


I think that you should have completed some of those backfills by now.  The
OSDs that are currently backfilling, are they doing a lot of IO?  If
they're doing practically nothing, I'd restart those ceph-osd daemons.

You have a large number of PGs for the number of OSDs you have, over 1000
each.  An excessive number of PGs can cause memory pressure on the osd
daemons.  You're not having any performance problems while this is going on?



On Mon, Nov 24, 2014 at 8:36 AM, Fred Yang  wrote:

> Well, after another 2 weeks, the backfilling still going, although it did
> drop(or increase?) to -0.284%. If I could count from -0.925% to -0.284% is
> 75% complete, probably one more week to go:
>
> 2014-11-24 11:11:56.445007 mon.0 [INF] pgmap v6805197: 44816 pgs: 44713
> active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27
> active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33
> active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 2308
> GB data, 4664 GB used, 15445 GB / 20109 GB avail; 96114 B/s rd, 122 MB/s
> wr, 360 op/s; -5419/1906450 objects degraded (-0.284%)
>
> Yes, I can leave it run since it's not production environment. If this
> indeed is production environment, I would have to answer the question
> quicker regarding what's the cause, and, how do I tune the pace to let the
> cluster back to healthy state faster rather than just cross the finger and
> let it run.
>
> I suspected it might be caused by the default garbage collection process
> are too low to handle large amount of pending object deletion:
>
>   "rgw_gc_max_objs": "32",
>   "rgw_gc_obj_min_wait": "7200",
>   "rgw_gc_processor_max_time": "3600",
>   "rgw_gc_processor_period": "3600",
>
> However, after increasing rgw_gc_max_objs to 1024, I'm actually seeing
> object degraded go from -0.284% to 0.301%. Which seems like this is
> actually garbage collector contention between multiple radosgw servers.
>
> I have trouble to find out document regarding how radosgw garbage
> collection works and how to disable garbage collector for some of the
> radosgw to prove that's the issue.
>
> Yehuda Sadeh mentioned back in 2012 that ""we may also want to explore
> doing that as part of a bigger garbage collection scheme that we'll soon be
> working on." in below thread:
> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/7927
>
> I'm hoping he can give some insight to this..
>
> Fred
>
> On Mon, Nov 17, 2014 at 5:36 PM, Craig Lewis 
> wrote:
>
>> Well, after 4 days, this is probably moot.  Hopefully it's finished
>> backfilling, and your problem is gone.
>>
>> If not, I believe that if you fix those backfill_toofull, the negative
>> numbers will start approaching zero.  I seem to recall that negative
>> degraded is a special case of degraded, but I don't remember exactly, and
>> can't find any references.  I have seen it before, and it went away when my
>> cluster became healthy.
>>
>> As long as you still have OSDs completing their backfilling, I'd let it
>> run.
>>
>> If you get to the point that all of the backfills are done, and you're
>> left with only wait_backfill+backfill_toofull, then you can bump
>> osd_backfill_full_ratio, mon_osd_nearfull_ratio, and maybe 
>> osd_failsafe_nearfull_ratio.
>>  If you do, be careful, and only bump them just enough to let them start
>> backfilling.  If you set them to 0.99, bad things will happen.
>>
>>
>>
>>
>> On Thu, Nov 13, 2014 at 7:57 AM, Fred Yang 
>> wrote:
>>
>>> Hi,
>>>
>>> The Ceph cluster we are running have few OSDs approaching to 95% 1+
>>> weeks ago so I ran a reweight to balance it out, in the meantime,
>>> instructing application to purge data not required. But after large amount
>>> of data purge issued from application side(all OSDs' usage dropped below
>>> 20%), the cluster fall into this weird state for days, the "objects
>>> degraded" remain negative for more than 7 days, I'm seeing some IOs going
>>> on on OSDs consistently, but the number(negative) objects degraded does not
>>> change much:
>>>
>>> 2014-11-13 10:43:07.237292 mon.0 [INF] pgmap v5935301: 44816 pgs: 44713
>>> active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27
>>> active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33
>>> active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 1473
>>> GB data, 2985 GB used, 17123 GB / 20109 GB avail; 30172 kB/s wr, 58 op/s;
>>> -13582/1468299 objects degraded (-0.925%)
>>> 2014-11-13 10:43:08.248232 mon.0 [INF] pgmap v5935302: 44816 pgs:

Re: [ceph-users] Optimal or recommended threads values

2014-11-24 Thread Mark Nelson
Don't forget number of cores in the node.  Basically you want enough 
threads to keep all of the cores busy while not having so many that you 
end up with a ton of context switching overhead.  Also as you said 
there's a lot of other factors that may have an affect, like the number 
of AGs (assuming XFS), scheduler, HBA, etc.  What I found a while back 
was that increasing the OSD op thread count to ~8 helped reads in some 
cases on the node I was testing, but could hurt write performance if 
increased too high.  Increasing the other thread counts didn't make 
enough of a difference to be able to discern if they helped or hurt.


It may be different now though with all of the improvements that have 
gone in to Giant.


Mark

On 11/24/2014 06:33 PM, Craig Lewis wrote:

Tuning these values depends on a lot more than just the SSDs and HDDs.
Which kernel and IO scheduler are you using?  Does your HBA do write
caching?  It also depends on what your goals are.  Tuning for a RadosGW
cluster is different that for a RDB cluster.  The short answer is that
you are the only person that can can tell you what your optimal values
are.  As always, the best benchmark is production load.


In my small cluster (5 nodes, 44 osds), I'm optimizing to minimize
latency during recovery.  When the cluster is healthy, bandwidth and
latency are more than adequate for my needs.  Even with journals on
SSDs, I've found that reducing the number of operations and threads has
reduced my average latency.

I use injectargs to try out new values while I monitor cluster latency.
I monitor latency while the cluster is healthy and recovering.  If a
change is deemed better, only then will I persist the change to
ceph.conf.  This gives me a fallback that any changes that causes
massive problems can be undone with a restart or reboot.


So far, the configs that I've written to ceph.conf are
[global]
   mon osd down out interval = 900
   mon osd min down reporters = 9
   mon osd min down reports = 12
   osd pool default flag hashpspool = true

[osd]
   osd max backfills = 1
   osd recovery max active = 1
   osd recovery op priority = 1


I have it on my list to investigate filestore max sync interval.  And
now that I've pasted that, I need to revisit the min down
reports/reporters.  I have some nodes with 10 OSDs, and I don't want any
one node able to mark the rest of the cluster as down (it happened once).




On Sat, Nov 22, 2014 at 6:24 AM, Andrei Mikhailovsky mailto:and...@arhont.com>> wrote:

Hello guys,

Could some one comment on the optimal or recommended values of
various threads values in ceph.conf?

At the moment I have the following settings:

filestore_op_threads = 8
osd_disk_threads = 8
osd_op_threads = 8
filestore_merge_threshold = 40
filestore_split_multiple = 8

Are these reasonable for a small cluster made of 7.2K SAS disks with
ssd journals with a ratio of 4:1?

What are the settings that other people are using?

Thanks

Andrei



___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimal or recommended threads values

2014-11-24 Thread Craig Lewis
Tuning these values depends on a lot more than just the SSDs and HDDs.
Which kernel and IO scheduler are you using?  Does your HBA do write
caching?  It also depends on what your goals are.  Tuning for a RadosGW
cluster is different that for a RDB cluster.  The short answer is that you
are the only person that can can tell you what your optimal values are.  As
always, the best benchmark is production load.


In my small cluster (5 nodes, 44 osds), I'm optimizing to minimize latency
during recovery.  When the cluster is healthy, bandwidth and latency are
more than adequate for my needs.  Even with journals on SSDs, I've found
that reducing the number of operations and threads has reduced my average
latency.

I use injectargs to try out new values while I monitor cluster latency.  I
monitor latency while the cluster is healthy and recovering.  If a change
is deemed better, only then will I persist the change to ceph.conf.  This
gives me a fallback that any changes that causes massive problems can be
undone with a restart or reboot.


So far, the configs that I've written to ceph.conf are
[global]
  mon osd down out interval = 900
  mon osd min down reporters = 9
  mon osd min down reports = 12
  osd pool default flag hashpspool = true

[osd]
  osd max backfills = 1
  osd recovery max active = 1
  osd recovery op priority = 1


I have it on my list to investigate filestore max sync interval.  And now
that I've pasted that, I need to revisit the min down reports/reporters.  I
have some nodes with 10 OSDs, and I don't want any one node able to mark
the rest of the cluster as down (it happened once).




On Sat, Nov 22, 2014 at 6:24 AM, Andrei Mikhailovsky 
wrote:

> Hello guys,
>
> Could some one comment on the optimal or recommended values of various
> threads values in ceph.conf?
>
> At the moment I have the following settings:
>
> filestore_op_threads = 8
> osd_disk_threads = 8
> osd_op_threads = 8
> filestore_merge_threshold = 40
> filestore_split_multiple = 8
>
> Are these reasonable for a small cluster made of 7.2K SAS disks with ssd
> journals with a ratio of 4:1?
>
> What are the settings that other people are using?
>
> Thanks
>
> Andrei
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw agent only syncing metadata

2014-11-24 Thread Mark Kirkwood

On 25/11/14 11:58, Yehuda Sadeh wrote:

On Mon, Nov 24, 2014 at 2:43 PM, Mark Kirkwood
 wrote:

On 22/11/14 10:54, Yehuda Sadeh wrote:


On Thu, Nov 20, 2014 at 6:52 PM, Mark Kirkwood
 wrote:




Fri Nov 21 02:13:31 2014

x-amz-copy-source:bucketbig/_multipart_big.dat.2/fjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
2014-11-21 15:13:31.914925 7fb5e3f87700 15 generated auth header: AWS
us-west key:tk7RgBQMD92je2Nz1m2D/GV+VNM=
2014-11-21 15:13:31.914964 7fb5e3f87700 20 sending request to

http://ceph2:80/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta?rgwx-uid=us-west&rgwx-region=us&rgwx-prepend-metadata=us
2014-11-21 15:13:31.920510 7fb5e3f87700 10 receive_http_header
2014-11-21 15:13:31.920525 7fb5e3f87700 10 received header:HTTP/1.1 411
Length Required



It looks like you're running the wrong fastcgi module.

Yehuda



Thanks Yehuda - so what would be the right fastcgi? Do you mean
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/



This one should work, yeah.



Looks that that was the issue:

$ rados df|grep bucket
.us-east.rgw.buckets -  93740   24 
  00   0   3493746  216 
93740
.us-east.rgw.buckets.index -  01 
00   0   24   25 
270
.us-west.rgw.buckets -  93740   24 
  00   000  215 
93740
.us-west.rgw.buckets.index -  01 
00   0   19   18 
190


Now I reinstalled the Ceph patched apache2 and fastcgi module () not 
sure if needed to do apache2 as well):


$ cat /etc/apt/sources.list.d/ceph.list
...
deb 
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/ 
precise main
deb 
http://gitbuilder.ceph.com/apache2-deb-precise-x86_64-basic/ref/master/ 
precise main


Now that I've got that working I'll look at getting a more complex setup 
going.


regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] announcing ceph-announce

2014-11-24 Thread Sage Weil
On Mon, 24 Nov 2014, Christopher Armstrong wrote:
> Is it safe to say that if we're subbed to ceph-users, we don't need to sub
> separately to ceph-announce? (i.e. we'll already be aware of updates here?)

Yes.  Everything sent to -announce list will also copy -users and -devel 
at a minimum.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] announcing ceph-announce

2014-11-24 Thread Christopher Armstrong
Is it safe to say that if we're subbed to ceph-users, we don't need to sub
separately to ceph-announce? (i.e. we'll already be aware of updates here?)


*Chris Armstrong*Head of Services & Community
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Mon, Nov 24, 2014 at 1:32 PM, Sage Weil  wrote:

> This list will get release announcements and other general news items.
> It will be moderated. It's intended for those that aren't interested in
> the volume of messages on ceph-users but would like to hear general
> announcements about the project.
>
> Sign up here:
>
> http://lists.ceph.com/listinfo.cgi/ceph-announce-ceph.com
>
> Cheers,
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw agent only syncing metadata

2014-11-24 Thread Yehuda Sadeh
On Mon, Nov 24, 2014 at 2:43 PM, Mark Kirkwood
 wrote:
> On 22/11/14 10:54, Yehuda Sadeh wrote:
>>
>> On Thu, Nov 20, 2014 at 6:52 PM, Mark Kirkwood
>>  wrote:
>
>
>>> Fri Nov 21 02:13:31 2014
>>>
>>> x-amz-copy-source:bucketbig/_multipart_big.dat.2/fjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
>>> /bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
>>> 2014-11-21 15:13:31.914925 7fb5e3f87700 15 generated auth header: AWS
>>> us-west key:tk7RgBQMD92je2Nz1m2D/GV+VNM=
>>> 2014-11-21 15:13:31.914964 7fb5e3f87700 20 sending request to
>>>
>>> http://ceph2:80/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta?rgwx-uid=us-west&rgwx-region=us&rgwx-prepend-metadata=us
>>> 2014-11-21 15:13:31.920510 7fb5e3f87700 10 receive_http_header
>>> 2014-11-21 15:13:31.920525 7fb5e3f87700 10 received header:HTTP/1.1 411
>>> Length Required
>>
>>
>> It looks like you're running the wrong fastcgi module.
>>
>> Yehuda
>>
>
> Thanks Yehuda - so what would be the right fastcgi? Do you mean
> http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/
>

This one should work, yeah.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw agent only syncing metadata

2014-11-24 Thread Mark Kirkwood

On 22/11/14 10:54, Yehuda Sadeh wrote:

On Thu, Nov 20, 2014 at 6:52 PM, Mark Kirkwood
 wrote:



Fri Nov 21 02:13:31 2014
x-amz-copy-source:bucketbig/_multipart_big.dat.2/fjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta
2014-11-21 15:13:31.914925 7fb5e3f87700 15 generated auth header: AWS
us-west key:tk7RgBQMD92je2Nz1m2D/GV+VNM=
2014-11-21 15:13:31.914964 7fb5e3f87700 20 sending request to
http://ceph2:80/bucketbig/__multipart_big.dat.2%2Ffjid6CneDQYKisHf0pRFOT5cEWF_EQr.meta?rgwx-uid=us-west&rgwx-region=us&rgwx-prepend-metadata=us
2014-11-21 15:13:31.920510 7fb5e3f87700 10 receive_http_header
2014-11-21 15:13:31.920525 7fb5e3f87700 10 received header:HTTP/1.1 411
Length Required


It looks like you're running the wrong fastcgi module.

Yehuda



Thanks Yehuda - so what would be the right fastcgi? Do you mean 
http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/ref/master/


regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] announcing ceph-announce

2014-11-24 Thread Sage Weil
This list will get release announcements and other general news items.  
It will be moderated. It's intended for those that aren't interested in 
the volume of messages on ceph-users but would like to hear general 
announcements about the project.

Sign up here:

http://lists.ceph.com/listinfo.cgi/ceph-announce-ceph.com

Cheers,
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fiemap bug on giant

2014-11-24 Thread Samuel Just
Bug #10166 (http://tracker.ceph.com/issues/10166) can cause recovery
to result in incorrect object sizes on giant if the setting 'filestore
fiemap' is set to true.  This setting is disabled by default.  This
should be fixed in a future point release, though filestore fiemap
will probably continue to default to false.
-Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client forward compatibility

2014-11-24 Thread Gregory Farnum
On Thu, Nov 20, 2014 at 9:08 AM, Dan van der Ster
 wrote:
> Hi all,
> What is compatibility/incompatibility of dumpling clients to talk to firefly
> and giant clusters?

We sadly don't have a good matrix about this yet, but in general you
should assume that anything which changed the way the data is
physically placed on the cluster will prevent them from communicating;
if you don't enable those features then they should remain compatible.
In particular

> I know that tunables=firefly will prevent dumpling
> clients from talking to a firefly cluster, but how about the existence or
> not of erasure pools?

As you mention, updating the tunables will prevent old clients from
accessing them (although that shouldn't be the case in future now that
they're all set by the crush map for later interpretation). Erasure
pools are a special case (precisely because people had issues with
them) and you should be able to communicate with a cluster that has EC
pools while using old clients — but, no:

> Can a dumpling client talk to a Firefly/Giant erasure
> pool if the tunables are still dumping?

Definitely not. EC pools use a slightly different CRUSH algorithm than
the old clients could, among many other things.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-announce list

2014-11-24 Thread Gregory Farnum
On Fri, Nov 21, 2014 at 12:34 AM, JuanFra Rodriguez Cardoso
 wrote:
> Hi all:
>
> As it was asked weeks ago.. what is the way the ceph community uses to
> stay tuned on new features and bug fixes?

I asked Sage about this today and he said he'd set one up. Seems like
a good idea; just not something we've ever thought about before. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount cephfs from fstab

2014-11-24 Thread Alek Paunov

On 24.11.2014 19:08, Erik Logtenberg wrote:

...



So, how do my fellow cephfs-users do this?



I do not use cephfs yet, but there seems a measure for your problem for 
a systemd based OS:


http://www.cepheid.org/~jeff/?p=69

Kind Regards,
Alek




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Christopher Armstrong
You're likely not running a release which includes this command - it was
only introduced in v0.84. If you're running giant, it should work. If not,
you won't have it (and won't need to create a filesystem - it will exist by
default).


*Chris Armstrong*Head of Services & Community
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Thu, Nov 20, 2014 at 8:43 PM, Huynh Dac Nguyen 
wrote:

>  Hi All,
>
>  When i create a new cephfs, this error shows
>
>  Is it a bug?
>
>  [root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata
> cephfs_data
>
> no valid command found; 10 closest matches:
>
> fsid
>
> Error EINVAL: invalid command
>
>
>  Regards,
>
> Ndhuynh
>
> This e-mail message including any attachments is for the sole use of the
> intended(s) and may contain privileged or confidential information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not intended recipient, please immediately contact the sender by reply
> e-mail and delete the original message and destroy all copies thereof.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Virtual machines using RBD remount read-only on OSD slow requests

2014-11-24 Thread Alexandre DERUMIER
Hi,
try to mount your filesystems with errors=continue option


From the mount (8) man page

errors={continue|remount-ro|panic}
Define the behaviour when an error is encountered.  (Either ignore errors
and just  mark  the  filesystem  erroneous and continue, or remount the
filesystem read-only, or panic and halt the system.)  The default is set in
the  filesystem superblock, and can be changed using tune2fs(8).




- Mail original - 

De: "Paulo Almeida"  
À: ceph-users@lists.ceph.com 
Envoyé: Lundi 24 Novembre 2014 17:06:40 
Objet: [ceph-users] Virtual machines using RBD remount read-only on OSD slow 
requests 

Hi, 

I have a Ceph cluster with 4 disk servers, 14 OSDs and replica size of 
3. A number of KVM virtual machines are using RBD as their only storage 
device. Whenever some OSDs (always on a single server) have slow 
requests, caused, I believe, by flaky hardware or, in one occasion, by a 
S.M.A.R.T command that crashed the system disk of one of the disk 
servers, most virtual machines remount their disk read-only and need to 
be rebooted. 

One of the virtual machines still has Debian 6 installed, and it never 
crashes. It also has an ext3 filesystem, contrary to some other 
machines, which have ext4. ext3 does crash in systems with Debian 7, but 
those have different mount flags, such as "barrier" and "data=ordered". 
I suspect (but haven't tested) that using "nobarrier" may solve the 
problem, but that doesn't seem to be an ideal solution. 

Most of those machines have Debian 7 or Ubuntu 12.04, but two of them 
have Ubuntu 14.04 (and thus a more recent kernel) and they also remount 
read-only. 

I searched the mailing list and found a couple of relevant messages. One 
person seemed to have the same problem[1], but someone else replied that 
it didn't happen in his case ("I've had multiple VMs hang for hours at a 
time when I broke a Ceph cluster and after fixing it the VMs would start 
working again"). The other message[2] is not very informative. 

Are other people experiencing this problem? Is there a file system or 
kernel version that is recommended for KVM guests that would prevent it? 
Or does this problem indicate that something else is wrong and should be 
fixed? I did configure all machines to use "cache=writeback", but never 
investigated whether that makes a difference or even whether it is 
actually working. 

Thanks, 
Paulo Almeida 
Instituto Gulbenkian de Ciência, Oeiras, Portugal 


[1] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/8011 
[2] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/1742 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Jean-Charles LOPEZ
Hi,

Use ceph mds newfs {metaid} {dataid} instead

JC



> On Nov 20, 2014, at 20:43, Huynh Dac Nguyen  wrote:
> 
> Hi All,
> 
> When i create a new cephfs, this error shows
> 
> Is it a bug?
> 
> [root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata cephfs_data
> no valid command found; 10 closest matches:
> fsid
> Error EINVAL: invalid command
> 
> 
> Regards,
> Ndhuynh
> 
> This e-mail message including any attachments is for the sole use of the 
> intended(s) and may contain privileged or confidential information. Any 
> unauthorized review, use, disclosure or distribution is prohibited. If you 
> are not intended recipient, please immediately contact the sender by reply 
> e-mail and delete the original message and destroy all copies thereof.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2014-11-24 Thread Craig Lewis
I'm really not sure.  I'm using the S3 interface rather than the Swift
interface.  Once my non-systm user replicated, I was able to access
everything in the secondary cluster just fine.

Hopefully somebody else with Swift experience will chime in.



On Sat, Nov 22, 2014 at 12:47 AM, Vinod H I  wrote:

> Thanks for the clarification.
> Now I have done exactly as you suggested.
> "us-east" is the master zone and "us-west" is the secondary zone.
> Each zone has two system users "us-east" and "us-west".
> These system users have same access/secret keys in both zones.
> I have checked the pools to confirm that the non-system swift user which i
> created("east-user:swift") in the primary has been replicated to the
> secondary zone.
> The buckets which are created in primary by the swift user are also there
> in the pools of the secondary zone.
> But when i try to authenticate this swift user in secondary zone, it says
> access denied.
>
> Here are the relevant logs from the secondary zone, when i try to
> authenticate the swift user.
>
> 2014-11-22 14:19:14.239976 7f73ecff9700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2014-11-22 14:19:14.243454 7f73fe236780 20 get_obj_state: rctx=0x2316ce0
> obj=.us.rgw.root:region_info.us state=0x2319048 s->prefetch_data=0
> 2014-11-22 14:19:14.243454 7f73fe236780 10 cache get: name=.us.rgw.root+
> region_info.us : miss
> 2014-11-22 14:19:14.252263 7f73fe236780 10 cache put: name=.us.rgw.root+
> region_info.us
> 2014-11-22 14:19:14.252283 7f73fe236780 10 adding .us.rgw.root+
> region_info.us to cache LRU end
> 2014-11-22 14:19:14.252310 7f73fe236780 20 get_obj_state: s->obj_tag was
> set empty
> 2014-11-22 14:19:14.252336 7f73fe236780 10 cache get: name=.us.rgw.root+
> region_info.us : type miss (requested=1, cached=6)
> 2014-11-22 14:19:14.252376 7f73fe236780 20 get_obj_state: rctx=0x2316ce0
> obj=.us.rgw.root:region_info.us state=0x2319958 s->prefetch_data=0
> 2014-11-22 14:19:14.252386 7f73fe236780 10 cache get: name=.us.rgw.root+
> region_info.us : hit
> 2014-11-22 14:19:14.252391 7f73fe236780 20 get_obj_state: s->obj_tag was
> set empty
> 2014-11-22 14:19:14.252404 7f73fe236780 20 get_obj_state: rctx=0x2316ce0
> obj=.us.rgw.root:region_info.us state=0x2319958 s->prefetch_data=0
> 2014-11-22 14:19:14.252409 7f73fe236780 20 state for obj=.us.rgw.root:
> region_info.us is not atomic, not appending atomic test
> 2014-11-22 14:19:14.252412 7f73fe236780 20 rados->read obj-ofs=0
> read_ofs=0 read_len=524288
> 2014-11-22 14:19:14.264611 7f73fe236780 20 rados->read r=0 bl.length=266
> 2014-11-22 14:19:14.264650 7f73fe236780 10 cache put: name=.us.rgw.root+
> region_info.us
> 2014-11-22 14:19:14.264653 7f73fe236780 10 moving .us.rgw.root+
> region_info.us to cache LRU end
> 2014-11-22 14:19:14.264766 7f73fe236780 20 get_obj_state: rctx=0x2319860
> obj=.us-west.rgw.root:zone_info.us-west state=0x2313b98 s->prefetch_data=0
> 2014-11-22 14:19:14.264779 7f73fe236780 10 cache get:
> name=.us-west.rgw.root+zone_info.us-west : miss
> 2014-11-22 14:19:14.276114 7f73fe236780 10 cache put:
> name=.us-west.rgw.root+zone_info.us-west
> 2014-11-22 14:19:14.276131 7f73fe236780 10 adding
> .us-west.rgw.root+zone_info.us-west to cache LRU end
> 2014-11-22 14:19:14.276142 7f73fe236780 20 get_obj_state: s->obj_tag was
> set empty
> 2014-11-22 14:19:14.276161 7f73fe236780 10 cache get:
> name=.us-west.rgw.root+zone_info.us-west : type miss (requested=1, cached=6)
> 2014-11-22 14:19:14.276203 7f73fe236780 20 get_obj_state: rctx=0x2314660
> obj=.us-west.rgw.root:zone_info.us-west state=0x2313b98 s->prefetch_data=0
> 2014-11-22 14:19:14.276212 7f73fe236780 10 cache get:
> name=.us-west.rgw.root+zone_info.us-west : hit
> 2014-11-22 14:19:14.276218 7f73fe236780 20 get_obj_state: s->obj_tag was
> set empty
> 2014-11-22 14:19:14.276229 7f73fe236780 20 get_obj_state: rctx=0x2314660
> obj=.us-west.rgw.root:zone_info.us-west state=0x2313b98 s->prefetch_data=0
> 2014-11-22 14:19:14.276235 7f73fe236780 20 state for
> obj=.us-west.rgw.root:zone_info.us-west is not atomic, not appending atomic
> test
> 2014-11-22 14:19:14.276238 7f73fe236780 20 rados->read obj-ofs=0
> read_ofs=0 read_len=524288
> 2014-11-22 14:19:14.290757 7f73fe236780 20 rados->read r=0 bl.length=997
> 2014-11-22 14:19:14.290797 7f73fe236780 10 cache put:
> name=.us-west.rgw.root+zone_info.us-west
> 2014-11-22 14:19:14.290803 7f73fe236780 10 moving
> .us-west.rgw.root+zone_info.us-west to cache LRU end
> 2014-11-22 14:19:14.290857 7f73fe236780  2 zone us-west is NOT master
> 2014-11-22 14:19:14.290931 7f73fe236780 20 get_obj_state: rctx=0x2313cc0
> obj=.us-west.rgw.root:region_map state=0x2311e08 s->prefetch_data=0
> 2014-11-22 14:19:14.290949 7f73fe236780 10 cache get:
> name=.us-west.rgw.root+region_map : miss
> 2014-11-22 14:19:14.298169 7f73fe236780 10 cache put:
> name=.us-west.rgw.root+region_map
> 2014-11-22 14:19:14.298184 7f73fe236780 10 adding
> .us-west.rgw.root+region_map to cache LRU end
> 2014-11-22 14:19:14.298195 7f73fe2

Re: [ceph-users] how to run rados common by non-root user in ceph node

2014-11-24 Thread Michael Kuriger
The non root user needs to be able to read the key file.
Michael Kuriger
mk7...@yp.com
818-649-7235
MikeKuriger (IM)

From: Huynh Dac Nguyen mailto:ndhu...@spsvietnam.vn>>
Date: Wednesday, November 19, 2014 at 8:44 PM
To: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: [ceph-users] how to run rados common by non-root user in ceph node


Hi All,


After setting up the ceph, i can run ceph, rados command by root user,


how can i run them by non-root account?


i created a key already:

[root@ho-srv-ceph-02 ~]# ceph auth list

client.oneadmin

key: AQBQY21UaLLhCBAAKjsM8qRxFpJA4ppbA7Rn9A==

caps: [mon] allow r

caps: [osd] allow rw pool=one


[oneadmin@ho-srv-ceph-02 ~]$ rados df

2014-11-20 11:43:44.143434 7f53ac04a760 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication

2014-11-20 11:43:44.143611 7f53ac04a760  0 librados: client.admin 
initialization error (2) No such file or directory

couldn't connect to cluster! error -2



Regards,

Ndhuynh

This e-mail message including any attachments is for the sole use of the 
intended(s) and may contain privileged or confidential information. Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are 
not intended recipient, please immediately contact the sender by reply e-mail 
and delete the original message and destroy all copies thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to run rados common by non-root user in ceph node

2014-11-24 Thread Huynh Dac Nguyen

Hi All,  

After setting up the ceph, i can run ceph, rados command by root user, 


how can i run them by non-root account?  

i created a key already:  
[root@ho-srv-ceph-02 ~]# ceph auth list 
client.oneadmin 
key: AQBQY21UaLLhCBAAKjsM8qRxFpJA4ppbA7Rn9A== 
caps: [mon] allow r 
caps: [osd] allow rw pool=one 

[oneadmin@ho-srv-ceph-02 ~]$ rados df 
2014-11-20 11:43:44.143434 7f53ac04a760 -1 monclient(hunting): ERROR:
missing keyring, cannot use cephx for authentication 
2014-11-20 11:43:44.143611 7f53ac04a760  0 librados: client.admin
initialization error (2) No such file or directory 
couldn't connect to cluster! error -2 


Regards, 
Ndhuynh 
This e-mail message including any attachments is for the sole use of the
intended(s) and may contain privileged or confidential information. Any
unauthorized review, use, disclosure or distribution is prohibited. If
you are not intended recipient, please immediately contact the sender by
reply e-mail and delete the original message and destroy all copies
thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stanza to add cgroup support to ceph upstart jobs

2014-11-24 Thread vogelc
Is anyone else doing this?  I think it would be a very good use case when
we have systems with 40+ cores to separate the stacks.

cory
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid

2014-11-24 Thread Huynh Dac Nguyen

Hi All,  

When i create a new cephfs, this error shows 

Is it a bug?  

[root@ho-srv-ceph-02 ceph]# ceph fs new cephfs cephfs_metadata
cephfs_data 
no valid command found; 10 closest matches: 
fsid 
Error EINVAL: invalid command 


Regards,  
Ndhuynh 
This e-mail message including any attachments is for the sole use of the
intended(s) and may contain privileged or confidential information. Any
unauthorized review, use, disclosure or distribution is prohibited. If
you are not intended recipient, please immediately contact the sender by
reply e-mail and delete the original message and destroy all copies
thereof.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] isolate_freepages_block and excessive CPU usage by OSD process

2014-11-24 Thread Vlastimil Babka
On 11/15/2014 06:10 PM, Andrey Korolyov wrote:
> On Sat, Nov 15, 2014 at 7:32 PM, Vlastimil Babka  wrote:
>> On 11/15/2014 12:48 PM, Andrey Korolyov wrote:
>>> Hello,
>>>
>>> I had found recently that the OSD daemons under certain conditions
>>> (moderate vm pressure, moderate I/O, slightly altered vm settings) can
>>> go into loop involving isolate_freepages and effectively hit Ceph
>>> cluster performance. I found this thread
>>
>> Do you feel it is a regression, compared to some older kernel version or 
>> something?
> 
> No, it`s just a rare but very concerning stuff. The higher pressure
> is, the more chance to hit this particular issue, although absolute
> numbers are still very large (e.g. room for cache memory). Some
> googling also found simular question on sf:
> http://serverfault.com/questions/642883/cause-of-page-fragmentation-on-large-server-with-xfs-20-disks-and-ceph
> but there are no perf info unfortunately so I cannot say if the issue
> is the same or not.

Well it would be useful to find out what's doing the high-order allocations.
With 'perf -g -a' and then 'perf report -g' determine the call stack. Order and
allocation flags can be captured by enabling the page_alloc tracepoint.

>>
>>> https://lkml.org/lkml/2012/6/27/545, but looks like that the
>>> significant decrease of bdi max_ratio did not helped even for a bit.
>>> Although I have approximately a half of physical memory for cache-like
>>> stuff, the problem with mm persists, so I would like to try
>>> suggestions from the other people. In current testing iteration I had
>>> decreased vfs_cache_pressure to 10 and raised vm_dirty_ratio and
>>> background ratio to 15 and 10 correspondingly (because default values
>>> are too spiky for mine workloads). The host kernel is a linux-stable
>>> 3.10.
>>
>> Well I'm glad to hear it's not 3.18-rc3 this time. But I would recommend 
>> trying
>> it, or at least 3.17. Lot of patches went to reduce compaction overhead for
>> (especially for transparent hugepages) since 3.10.
> 
> Heh, I may say that I limited to pushing knobs in 3.10, because it has
> a well-known set of problems and any major version switch will lead to
> months-long QA procedures, but I may try that if none of mine knob
> selection will help. I am not THP user, the problem is happening with
> regular 4k pages and almost default VM settings. Also it worth to mean

OK that's useful to know. So it might be some driver (do you also have
mellanox?) or maybe SLUB (do you have it enabled?) is trying high-order 
allocations.

> that kernel messages are not complaining about allocation failures, as
> in case in URL from above, compaction just tightens up to some limit

Without the warnings, that's why we need tracing/profiling to find out what's
causing it.

> and (after it 'locked' system for a couple of minutes, reducing actual
> I/O and derived amount of memory operations) it goes back to normal.
> Cache flush fixing this just in a moment, so should large room for

That could perhaps suggest a poor coordination between reclaim and compaction,
made worse by the fact that there are more parallel ongoing attempts and the
watermark checking doesn't take that into account.

> min_free_kbytes. Over couple of days, depends on which nodes with
> certain settings issue will reappear, I may judge if my ideas was
> wrong.
> 
>>
>>> Non-default VM settings are:
>>> vm.swappiness = 5
>>> vm.dirty_ratio=10
>>> vm.dirty_background_ratio=5
>>> bdi_max_ratio was 100%, right now 20%, at a glance it looks like the
>>> situation worsened, because unstable OSD host cause domino-like effect
>>> on other hosts, which are starting to flap too and only cache flush
>>> via drop_caches is helping.
>>>
>>> Unfortunately there are no slab info from "exhausted" state due to
>>> sporadic nature of this bug, will try to catch next time.
>>>
>>> slabtop (normal state):
>>>  Active / Total Objects (% used): 8675843 / 8965833 (96.8%)
>>>  Active / Total Slabs (% used)  : 224858 / 224858 (100.0%)
>>>  Active / Total Caches (% used) : 86 / 132 (65.2%)
>>>  Active / Total Size (% used)   : 1152171.37K / 1253116.37K (91.9%)
>>>  Minimum / Average / Maximum Object : 0.01K / 0.14K / 15.75K
>>>
>>>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>>> 6890130 6889185  99%0.10K 176670   39706680K buffer_head
>>> 751232 721707  96%0.06K  11738   64 46952K kmalloc-64
>>> 251636 226228  89%0.55K   8987   28143792K radix_tree_node
>>> 121696  45710  37%0.25K   3803   32 30424K kmalloc-256
>>> 113022  80618  71%0.19K   2691   42 21528K dentry
>>> 112672  35160  31%0.50K   3521   32 56336K kmalloc-512
>>>  73136  72800  99%0.07K   1306   56  5224K Acpi-ParseExt
>>>  61696  58644  95%0.02K241  256   964K kmalloc-16
>>>  54348  36649  67%0.38K   1294   42 20704K ip6_dst_cache
>>>  53136  51787  97%0.11K   1476   36  5904K sysfs_dir_ca

[ceph-users] Questions about deploying multiple cluster on same servers

2014-11-24 Thread van
Dear Ceph Team,

  Our company was trying to using ceph as our backend support. And I’ve 
encountered an error while deploying multiple clusters on the same servers. I 
wonder whether this is a bug or I did somthing wrong.

  According to the documents on ceph.com , I successfully 
built a ceph cluster contains 2 mons, 2 osds & 1 mds on 2 servers via 
ceph-deploy & (installation quick tutorial on ceph.com ). 
I’ve noticed that the ceph supports deploying multiple cluster on the same 
hardware. So I try to use ceph-deploy on the same servers to create another 
cluster and I did execute all the ceph-deploy commands argumented with 
"—cluster ceph2” for a new cluster named “ceph2”.
  
  Below are command sequence I’ve executed:

# dev208 & dev209 already built a ceph cluster named with default name 
“ceph”
# now I’m trying to create a new cluster named ceph2 
mkdir cluster2 && cd cluster2
ceph-deploy —cluster ceph2 new dev208 # dev208 is the server hostname
ceph-deploy —cluster ceph2 mon create-initial # an error occurred in this 
step
…


Below are some of logs:

dev208][DEBUG ] determining if provided host has same hostname in remote
[dev208][DEBUG ] get remote short hostname
[dev208][DEBUG ] deploying mon to dev208
[dev208][DEBUG ] get remote short hostname
[dev208][DEBUG ] remote hostname: dev208
[dev208][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[dev208][DEBUG ] create the mon path if it does not exist
[dev208][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph2-dev208/done
[dev208][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph2-dev208/done
[dev208][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph2-dev208.mon.keyring
[dev208][DEBUG ] create the monitor keyring file
[dev208][INFO  ] Running command: sudo ceph-mon --cluster ceph2 --mkfs -i 
dev208 --keyring /var/lib/ceph/tmp/ceph2-dev208.mon.keyring
[dev208][DEBUG ] ceph-mon: renaming mon.noname-a 192.168.0.208:6789/0 to 
mon.dev208
[dev208][DEBUG ] ceph-mon: set fsid to 0df0dff1-5a6d-431f-9a28-4ef8fd53c207
[dev208][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph2-dev208 for 
mon.dev208
[dev208][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph2-dev208.mon.keyring
[dev208][DEBUG ] create a done file to avoid re-doing the mon deployment
[dev208][DEBUG ] create the init path if it does not exist
[dev208][DEBUG ] locating the `service` executable...
[dev208][INFO  ] Running command: sudo /usr/sbin/service ceph -c 
/etc/ceph/ceph2.conf start mon.dev208
[dev208][DEBUG ] === mon.dev208 ===
[dev208][DEBUG ] Starting Ceph mon.dev208 on dev208...already running
[dev208][INFO  ] Running command: sudo systemctl enable ceph
[dev208][WARNING] ceph.service is not a native service, redirecting to 
/sbin/chkconfig.
[dev208][WARNING] Executing /sbin/chkconfig ceph on
[dev208][WARNING] The unit files have no [Install] section. They are not meant 
to be enabled
[dev208][WARNING] using systemctl.
[dev208][WARNING] Possible reasons for having this kind of units are:
[dev208][WARNING] 1) A unit may be statically enabled by being symlinked from 
another unit's
[dev208][WARNING].wants/ or .requires/ directory.
[dev208][WARNING] 2) A unit's purpose may be to act as a helper for some other 
unit which has
[dev208][WARNING]a requirement dependency on it.
[dev208][INFO  ] Running command: sudo ceph --cluster=ceph2 --admin-daemon 
/var/run/ceph/ceph2-mon.dev208.asok mon_status
[dev208][ERROR ] admin_socket: exception getting command descriptions: [Errno 
2] No such file or directory

I’ve noticed that the mds for new cluster ceph2 is not started since there are 
already a ceph monitor running. so it cannot find the 
/var/run/ceph/ceph2-mon.dev208.asok.

It seems that the service script /etc/init.d/ceph does not handle correctly 
about multiple ceph services on the same server.

Could you give some guidances on deploying multiple clusters on the same 
servers?

Best regards,

van
chaofa...@owtware.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] isolate_freepages_block and excessive CPU usage by OSD process

2014-11-24 Thread Vlastimil Babka
On 11/15/2014 12:48 PM, Andrey Korolyov wrote:
> Hello,
> 
> I had found recently that the OSD daemons under certain conditions
> (moderate vm pressure, moderate I/O, slightly altered vm settings) can
> go into loop involving isolate_freepages and effectively hit Ceph
> cluster performance. I found this thread

Do you feel it is a regression, compared to some older kernel version or 
something?

> https://lkml.org/lkml/2012/6/27/545, but looks like that the
> significant decrease of bdi max_ratio did not helped even for a bit.
> Although I have approximately a half of physical memory for cache-like
> stuff, the problem with mm persists, so I would like to try
> suggestions from the other people. In current testing iteration I had
> decreased vfs_cache_pressure to 10 and raised vm_dirty_ratio and
> background ratio to 15 and 10 correspondingly (because default values
> are too spiky for mine workloads). The host kernel is a linux-stable
> 3.10.

Well I'm glad to hear it's not 3.18-rc3 this time. But I would recommend trying
it, or at least 3.17. Lot of patches went to reduce compaction overhead for
(especially for transparent hugepages) since 3.10.

> Non-default VM settings are:
> vm.swappiness = 5
> vm.dirty_ratio=10
> vm.dirty_background_ratio=5
> bdi_max_ratio was 100%, right now 20%, at a glance it looks like the
> situation worsened, because unstable OSD host cause domino-like effect
> on other hosts, which are starting to flap too and only cache flush
> via drop_caches is helping.
> 
> Unfortunately there are no slab info from "exhausted" state due to
> sporadic nature of this bug, will try to catch next time.
> 
> slabtop (normal state):
>  Active / Total Objects (% used): 8675843 / 8965833 (96.8%)
>  Active / Total Slabs (% used)  : 224858 / 224858 (100.0%)
>  Active / Total Caches (% used) : 86 / 132 (65.2%)
>  Active / Total Size (% used)   : 1152171.37K / 1253116.37K (91.9%)
>  Minimum / Average / Maximum Object : 0.01K / 0.14K / 15.75K
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 6890130 6889185  99%0.10K 176670   39706680K buffer_head
> 751232 721707  96%0.06K  11738   64 46952K kmalloc-64
> 251636 226228  89%0.55K   8987   28143792K radix_tree_node
> 121696  45710  37%0.25K   3803   32 30424K kmalloc-256
> 113022  80618  71%0.19K   2691   42 21528K dentry
> 112672  35160  31%0.50K   3521   32 56336K kmalloc-512
>  73136  72800  99%0.07K   1306   56  5224K Acpi-ParseExt
>  61696  58644  95%0.02K241  256   964K kmalloc-16
>  54348  36649  67%0.38K   1294   42 20704K ip6_dst_cache
>  53136  51787  97%0.11K   1476   36  5904K sysfs_dir_cache
>  51200  50724  99%0.03K400  128  1600K kmalloc-32
>  49120  46105  93%1.00K   1535   32 49120K xfs_inode
>  30702  30702 100%0.04K301  102  1204K Acpi-Namespace
>  28224  25742  91%0.12K882   32  3528K kmalloc-128
>  28028  22691  80%0.18K637   44  5096K vm_area_struct
>  28008  28008 100%0.22K778   36  6224K xfs_ili
>  18944  18944 100%0.01K 37  512   148K kmalloc-8
>  16576  15154  91%0.06K259   64  1036K anon_vma
>  16475  14200  86%0.16K659   25  2636K sigqueue
> 
> zoneinfo (normal state, attached)
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ann] fio plugin for libcephfs

2014-11-24 Thread Noah Watkins
I've posted a preliminary patch set to support a libcephfs io engine in fio:

   http://github.com/noahdesu/fio cephfs

You can use this right now to generate load through libcephfs, but the
plugin needs a bit more work before it goes upstream (patches
welcome), but feel free to play around with it. There is an example
script in examples/cephfs.fio.

Issues:
  Currently all the files that are created get the same size as the
total job size rather than the total size being divided by the number
of threads.

- Noah
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Way to improve "rados get" bandwidth?

2014-11-24 Thread Daniel Schwager
Hi,

I did some bandwidth tests with "rados put/get" (1), (2). 

I'm wondering, why "rados get" (read) bandwidth-results are not that good.

If I run 
- 1 instance of rados get (2), the bandwidth is about 170MB/sec, with
- 4 instances (2) in parallel, I get 4 times ~170MB/sec == ~ 680MB/sec

Is there a change to improve the bandwidth of one running instance?

regards
Danny


(1) create object
dd if=/dev/zero bs=4M count=400 | rados put -p test1 myobject -

(2) read object
rados get  -p test1 myobject - | dd of=/dev/null bs=4M


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy osd activate Hang - (doc followed step by step)

2014-11-24 Thread Massimiliano Cuttini

Everytime i try to create a second OSD i get this hang:

   $ ceph-deploy osd activate ceph-node2:/var/local/osd1
   [cut ...]
   [ceph_deploy.cli][INFO  ] Invoked (1.5.20): /usr/bin/ceph-deploy osd
   activate ceph-node2:/var/local/osd1
   [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
   ceph-node2:/var/local/osd1:
   [ceph-node2][DEBUG ] connection detected need for sudo
   [ceph-node2][DEBUG ] connected to host: ceph-node2
   [ceph-node2][DEBUG ] detect platform information from remote host
   [ceph-node2][DEBUG ] detect machine type
   [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.0.1406 Core
   [ceph_deploy.osd][DEBUG ] activating host ceph-node2 disk
   /var/local/osd1
   [ceph_deploy.osd][DEBUG ] will use init type: sysvinit
   [ceph-node2][INFO  ] Running command: sudo ceph-disk -v activate
   --mark-init sysvinit --mount /var/local/osd1
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Cluster uuid is
   9f774eb5-e430-4d38-a470-e39d76c98c2b
   [ceph-node2][WARNIN] INFO:ceph-disk:Running command:
   /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
   [ceph-node2][WARNIN] DEBUG:ceph-disk:OSD uuid is
   b8d7c3c1-d436-4f52-8b3d-05a9d4af64ba
   [ceph-node2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
   [ceph-node2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph
   --cluster ceph --name client.bootstrap-osd --keyring
   /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise
   b8d7c3c1-d436-4f52-8b3d-05a9d4af64ba
   *[ceph-node2][WARNIN] 2014-11-24 18:39:35.259728 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f951c026300 sd=4 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f951c026590).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:38.259891 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f951c00 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f951e90).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:41.260031 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f95100030e0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003370).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:44.260242 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510003a60 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003cf0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:47.260442 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510002510 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100027a0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:50.260763 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510003a60 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510003cf0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:53.260952 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510002510 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100027a0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:56.261208 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510003fb0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510004240).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:39:59.261422 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510004830 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510004ac0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:02.261659 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510005f40 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f95100061d0).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:05.261885 7f95207c1700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f95100092d0 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510009560).fault*
   *[ceph-node2][WARNIN] 2014-11-24 18:40:08.262083 7f95206c0700  0 --
   :/1017705 >> 172.20.20.105:6789/0 pipe(0x7f9510006010 sd=5 :0 s=1
   pgs=0 cs=0 l=1 c=0x7f9510006490).fault*

... and so on

Can somebody help me to exit from this problem?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount cephfs from fstab

2014-11-24 Thread Manfred Hollstein
Hi there,

On Mon, 24 Nov 2014, 18:08:50 +0100, Erik Logtenberg wrote:
> Hi,
> 
> I would like to mount a cephfs share from fstab, but it doesn't
> completely work.
> 
> First of all, I followed the documentation [1], which resulted in the
> following line in fstab:
> 
> ceph-01:6789:/ /mnt/cephfs/ ceph
> name=testhost,secretfile=/root/testhost.key,noacl 0 2

it is my understanding that the network must be fully initialized before
mounting a cephfs type directory is possible. The mount option "_netdev"
exists for this exact scenario. Can you try if the following entry
works?

  ceph-01:6789:/ /mnt/cephfs/ ceph 
name=testhost,secretfile=/root/testhost.key,noacl,_netdev 0 2

> Yes, this works when I manually try "mount /mnt/cephfs", but it does
> give me the following error/warning:
> 
> mount: error writing /etc/mtab: Invalid argument
> 
> Now, even though this error doesn't influence the mounting itself, it
> does prohibit my machine from booting right. Apparently Fedora/systemd
> doesn't like this error when going through fstab, so booting is not
> possible.
> 
> The mtab issue can easily be worked around, by calling mount manually
> and using the -n (--no-mtab) argument, like this:
> 
> mount -t ceph -n ceph-01:6789:/ /mnt/cephfs/ -o
> name=testhost,secretfile=/root/testhost.key,noacl
> 
> However, I can't find a way to put that -n option in /etc/fstab itself
> (since it's not a "-o option". Currently, I have the "noauto" setting in
> fstab, so it doesn't get mounted on boot at all. Then I have to manually
> log in and say "mount /mnt/cephfs" to explicitly mount the share. Far
> from ideal.

if the above doesn't work, this could be at least achieved by mounting
the cephfs directories via autofs.

> So, how do my fellow cephfs-users do this?
> 
> Thanks,
> 
> Erik.
> 
> [1] http://ceph.com/docs/giant/cephfs/fstab/

HTH, cheers.

l8er
manfred


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount cephfs from fstab

2014-11-24 Thread Michael Kuriger
I mount mine with an init-script.

Michael Kuriger 
mk7...@yp.com
818-649-7235

MikeKuriger (IM)




On 11/24/14, 9:08 AM, "Erik Logtenberg"  wrote:

>Hi,
>
>I would like to mount a cephfs share from fstab, but it doesn't
>completely work.
>
>First of all, I followed the documentation [1], which resulted in the
>following line in fstab:
>
>ceph-01:6789:/ /mnt/cephfs/ ceph
>name=testhost,secretfile=/root/testhost.key,noacl 0 2
>
>Yes, this works when I manually try "mount /mnt/cephfs", but it does
>give me the following error/warning:
>
>mount: error writing /etc/mtab: Invalid argument
>
>Now, even though this error doesn't influence the mounting itself, it
>does prohibit my machine from booting right. Apparently Fedora/systemd
>doesn't like this error when going through fstab, so booting is not
>possible.
>
>The mtab issue can easily be worked around, by calling mount manually
>and using the -n (--no-mtab) argument, like this:
>
>mount -t ceph -n ceph-01:6789:/ /mnt/cephfs/ -o
>name=testhost,secretfile=/root/testhost.key,noacl
>
>However, I can't find a way to put that -n option in /etc/fstab itself
>(since it's not a "-o option". Currently, I have the "noauto" setting in
>fstab, so it doesn't get mounted on boot at all. Then I have to manually
>log in and say "mount /mnt/cephfs" to explicitly mount the share. Far
>from ideal.
>
>So, how do my fellow cephfs-users do this?
>
>Thanks,
>
>Erik.
>
>[1] 
>https://urldefense.proofpoint.com/v2/url?u=http-3A__ceph.com_docs_giant_ce
>phfs_fstab_&d=AAICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9
>OS6Qd7fQySI2LDvlQ&m=fb1bHNdsLXcNI47iTaeTo8ilihg9v4zBgYrJxu_qRz4&s=degRv1ci
>xJhYSlA6i3W6kI_cCNdhTnxH5TzwG5cJDC4&e=
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf
>o.cgi_ceph-2Dusers-2Dceph.com&d=AAICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
>m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=fb1bHNdsLXcNI47iTaeTo8ilihg9v4zBgYrJ
>xu_qRz4&s=n-cv3qZjR4DxRjGiat-mt1PXDZ_431vYDG4X4AA7XZ8&e= 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to mount cephfs from fstab

2014-11-24 Thread Erik Logtenberg
Hi,

I would like to mount a cephfs share from fstab, but it doesn't
completely work.

First of all, I followed the documentation [1], which resulted in the
following line in fstab:

ceph-01:6789:/ /mnt/cephfs/ ceph
name=testhost,secretfile=/root/testhost.key,noacl 0 2

Yes, this works when I manually try "mount /mnt/cephfs", but it does
give me the following error/warning:

mount: error writing /etc/mtab: Invalid argument

Now, even though this error doesn't influence the mounting itself, it
does prohibit my machine from booting right. Apparently Fedora/systemd
doesn't like this error when going through fstab, so booting is not
possible.

The mtab issue can easily be worked around, by calling mount manually
and using the -n (--no-mtab) argument, like this:

mount -t ceph -n ceph-01:6789:/ /mnt/cephfs/ -o
name=testhost,secretfile=/root/testhost.key,noacl

However, I can't find a way to put that -n option in /etc/fstab itself
(since it's not a "-o option". Currently, I have the "noauto" setting in
fstab, so it doesn't get mounted on boot at all. Then I have to manually
log in and say "mount /mnt/cephfs" to explicitly mount the share. Far
from ideal.

So, how do my fellow cephfs-users do this?

Thanks,

Erik.

[1] http://ceph.com/docs/giant/cephfs/fstab/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [rgw] chunk size

2014-11-24 Thread ghislain.chevalier
Hi all

Context : firefly 0.80.7
OS : ubuntu 14.04.1 LTS

I'd like to change the chunk size for object stored with rgw to 1MB (4MB is the 
default)

I changed ceph.conf setting rgw object stripe size = 1048576 and restarted the 
rgw.

The chunk size remains to 4MB.

I saw in different exchanges that rgw max chunk size must also be set but it's 
not documented.

Is there a guideline to set the right parameters ?

Best regards

Ghislain


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Virtual machines using RBD remount read-only on OSD slow requests

2014-11-24 Thread Paulo Almeida
Hi,

I have a Ceph cluster with 4 disk servers, 14 OSDs and replica size of
3. A number of KVM virtual machines are using RBD as their only storage
device. Whenever some OSDs (always on a single server) have slow
requests, caused, I believe, by flaky hardware or, in one occasion, by a
S.M.A.R.T command that crashed the system disk of one of the disk
servers, most virtual machines remount their disk read-only and need to
be rebooted.

One of the virtual machines still has Debian 6 installed, and it never
crashes. It also has an ext3 filesystem, contrary to some other
machines, which have ext4. ext3 does crash in systems with Debian 7, but
those have different mount flags, such as "barrier" and "data=ordered".
I suspect (but haven't tested) that using "nobarrier" may solve the
problem, but that doesn't seem to be an ideal solution.

Most of those machines have Debian 7 or Ubuntu 12.04, but two of them
have Ubuntu 14.04 (and thus a more recent kernel) and they also remount
read-only.

I searched the mailing list and found a couple of relevant messages. One
person seemed to have the same problem[1], but someone else replied that
it didn't happen in his case ("I've had multiple VMs hang for hours at a
time when I broke a Ceph cluster and after fixing it the VMs would start
working again"). The other message[2] is not very informative.

Are other people experiencing this problem? Is there a file system or
kernel version that is recommended for KVM guests that would prevent it?
Or does this problem indicate that something else is wrong and should be
fixed? I did configure all machines to use "cache=writeback", but never
investigated whether that makes a difference or even whether it is
actually working.

Thanks,
Paulo Almeida
Instituto Gulbenkian de Ciência, Oeiras, Portugal


[1] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/8011
[2] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/1742

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph inconsistency after deep-scrub

2014-11-24 Thread Paweł Sadowski
On 11/21/2014 10:46 PM, Paweł Sadowski wrote:
> W dniu 21.11.2014 o 20:12, Gregory Farnum pisze:
>> On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski  wrote:
>>> Hi,
>>>
>>> During deep-scrub Ceph discovered some inconsistency between OSDs on my
>>> cluster (size 3, min size 2). I have fund broken object and calculated
>>> md5sum of it on each OSD (osd.195 is acting_primary):
>>>  osd.195 - md5sum_
>>>  osd.40 - md5sum_
>>>  osd.314 - md5sum_
>>>
>>> I run ceph pg repair and Ceph successfully reported that everything went
>>> OK. I checked md5sum of the objects again:
>>>  osd.195 - md5sum_
>>>  osd.40 - md5sum_
>>>  osd.314 - md5sum_
>>>
>>> This is a bit odd. How Ceph decides which copy is the correct one? Based
>>> on last modification time/sequence number (or similar)? If yes, then why
>>> object can be stored on one node only? If not, then why Ceph selected
>>> osd.314 as a correct one? What would happen if osd.314 goes down? Will
>>> ceph return wrong (old?) data, even with three copies and no failure in
>>> the cluster?
>> Right now, Ceph recovers replicated PGs by pushing the primary's copy
>> to everybody. There are tickets to improve this, but for now it's best
>> if you handle this yourself by moving the right things into place, or
>> removing the primary's copy if it's incorrect before running the
>> repair command. This is why it doesn't do repair automatically.
>> -Greg
> But in my case Ceph used non-primary's copy to repair data while two
> other OSDs had the same data (and one of them was primary). That
> should not happen.
>
> Beside that there should be big red warning in documentation[1]
> regarding /ceph pg repair/.
>
> 1:
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent

Does any of you use "filestore_sloppy_crc" option? It's not documented
(on purpose I assume) but it allows to detect bad/broken data on OSD
(and crash).

Cheers,
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can I use librgw APIS ?

2014-11-24 Thread baijia...@126.com
can I use librgw APIS like librados? if I can, how to do it?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple MDS servers...

2014-11-24 Thread Gregory Farnum
On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
> Hi Greg,
>
> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
> wondering where is it getting that value from?

No idea, sorry. Probably some odd mismatch between expectations and
how the names are actually being parsed and saved.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Advantages of using Ceph with LXC

2014-11-24 Thread Pavel V. Kaygorodov
Hi!

> What are a few advantages of using Ceph with LXC ?

I'm using ceph daemons, packed in docker containers (http://docker.io).
The main advantages is security and reliability, the software don't interact 
between each other, all daemons has different IP addresses, different 
filesystems, etc.
Also, almost all of configuration files are shared between containers, all 
containers has mounted configs read-only mode from host machine, so I'm always 
sure for config files consistency.
Main disadvantage -- you will have to install ceph by hands, without automation 
scripts provided.

Pavel.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com