Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-23 Thread Gregory Farnum
On Fri, Oct 23, 2015 at 7:08 AM, Burkhard Linke
 wrote:
> Hi,
>
> On 10/14/2015 06:32 AM, Gregory Farnum wrote:
>>
>> On Mon, Oct 12, 2015 at 12:50 AM, Burkhard Linke
>>  wrote:
>>>
>>>
> *snipsnap*
>>>
>>> Thanks, that did the trick. I was able to locate the host blocking the
>>> file
>>> handles and remove the objects from the EC pool.
>>>
>>> Well, all except one:
>>>
>>> # ceph df
>>>...
>>>  ec_ssd_cache 18  4216k 0 2500G  129
>>>  cephfs_ec_data   19  4096k 0 31574G1
>>>
>>> # rados -p ec_ssd_cache ls
>>> 1ef540f.0386
>>> # rados -p cephfs_ec_data ls
>>> 1ef540f.0386
>>> # ceph mds tell cb-dell-pe620r dumpcache cache.file
>>> # grep 1ef540f /cache.file
>>> #
>>>
>>> It does not show up in the dumped cache file, but keeps being promoted to
>>> the cache tier after MDS restarts. I've restarted most of the cephfs
>>> clients
>>> by unmounting cephfs and restarting ceph-fuse, but the object remains
>>> active.
>>
>> You can enable MDS debug logging and see if the inode shows up in the
>> log during replay. It's possible it's getting read in (from journal
>> operations) but then getting evicted from cache if nobody's accessing
>> it any more.
>> You can also look at the xattrs on the object to see what the
>> backtrace is and if that file is in cephfs.
>
> After the last MDS restart the stray object was not promoted to the cache
> anymore:
> ec_ssd_cache 18   120k 0 3842G  128
> cephfs_ec_data   19  4096k 0 10392G1
>
> There are no xattrs available for the stray object, so it's not possible to
> find out which file it belongs/belonged to:
> # rados -p cephfs_ec_data ls
> 1ef540f.0386
> # rados -p cephfs_ec_data listxattr 1ef540f.0386
> #
>
> Is it possible to list pending journal operations to be on the safe side?

Check out the cephfs-journal-tool. I don't remember the exact commands
but I think it has good help text.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] why was osd pool default size changed from 2 to 3.

2015-10-23 Thread Stefan Eriksson

Hi

I have been looking for info about "osd pool default size" and the 
reason its 3 as default.


I see it got changed in v0.82 from 2 to 3,

Here its 2.
http://docs.ceph.com/docs/v0.81/rados/configuration/pool-pg-config-ref/

and in v0.82 its 3.
http://docs.ceph.com/docs/v0.82/rados/configuration/pool-pg-config-ref/

likewise "osd pool default min size" went from 1 to 1.5 which goes up to 
2. (Default:0, which means no particular minimum. If 0, minimum is 
size - (size / 2).)


I've looked at the changelog for v0.82 but I cant find the reason for 
this change. I'm interested to know why this change was made, I 
understand 2 is less secure, but did something change which made it less 
secure after v0.82?


It seems pretty ok if you compare it to a RAID5,6
Openstack users, and other users which host virtual images on ceph, do 
you use the default "osd pool default min size = 3"?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow ssd journal

2015-10-23 Thread K K

I understand that my SSD is not suitable for journal. I want to test ceph using 
existing components before buy more expensive SSD (such as intel dc s3700).
I run fio with those options:
[global]
ioengine=libaio
invalidate=1
ramp_time=5
iodepth=1
runtime=300
time_based
direct=1 
bs=4k
size=1m
filename=/mnt/test.file
sync=1
fsync=1
direct=1
[seq-write]
stonewall
rw=write
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/16232KB/0KB /s] [0/4058/0 iops] [eta 
00m:00s]
seq-write: (groupid=0, jobs=1): err= 0: pid=338872: Fri Oct 23 19:59:38 2015
write: io=4955.1MB, bw=16916KB/s, iops=4229, runt=29msec
slat (usec): min=8, max=270, avg=14.62, stdev= 3.56
clat (usec): min=42, max=7673, avg=198.72, stdev=60.81
lat (usec): min=101, max=7689, avg=213.85, stdev=62.71
clat percentiles (usec):
| 1.00th=[ 137], 5.00th=[ 151], 10.00th=[ 155], 20.00th=[ 165],
| 30.00th=[ 181], 40.00th=[ 189], 50.00th=[ 193], 60.00th=[ 197],
| 70.00th=[ 203], 80.00th=[ 209], 90.00th=[ 227], 95.00th=[ 334],
| 99.00th=[ 386], 99.50th=[ 402], 99.90th=[ 486], 99.95th=[ 524],
| 99.99th=[ 796]
lat (usec) : 50=0.01%, 100=0.01%, 250=92.61%, 500=7.31%, 750=0.07%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
cpu : usr=6.21%, sys=19.22%, ctx=2580951, majf=0, minf=239
IO depths : 1=101.7%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1268728/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=4955.1MB, aggrb=16916KB/s, minb=16916KB/s, maxb=16916KB/s, 
mint=29msec, maxt=29msec
Disk stats (read/write):
sdh: ios=0/3869292, merge=0/3, ticks=0/210080, in_queue=208900, util=68.54%
Not so bad, but 16MB/sec with sequental 4k blocks.


>Пятница, 23 октября 2015, 16:35 +02:00 от Jan Schermer :
>
>The drive you have is not suitable at all for journal. Horrible, actually.
>
>"test with fio (qd=32,128,256, bs=4k) show very good performance of SSD disk 
>(10-30k write io)."
>
>This is not realistic. Try:
>
>fio --sync=1 --fsync=1 --direct=1 --iodepth=1 --ioengine=aio 
>
>Jan
>
>On 23 Oct 2015, at 16:31, K K < n...@mail.ru > wrote:
>Hello.
>Some strange things happen with my ceph installation after I was moved journal 
>to SSD disk.
>OS: Ubuntu 15.04 with ceph version 0.94.2-0ubuntu0.15.04.1
>server: dell r510 with PERC H700 Integrated 512MB RAID cache
>my cluster have:
>1 monitor node
>2 OSD nodes with 6 OSD daemons at each server (3Tb HDD SATA 7200 rpm disks XFS 
>system). 
>network: 1Gbit to hypervisor and 1 Gbit among all ceph nodes
>ceph.conf:
>[global]
>public network = 10.12.0.0/16
>cluster network = 192.168.133.0/24
>auth cluster required = cephx
>auth service required = cephx
>auth client required = cephx
>filestore xattr use omap = true
>filestore max sync interval = 10
>filestore min sync interval = 1
>filestore queue max ops = 500
>#filestore queue max bytes = 16 MiB
>#filestore queue committing max ops = 4096
>#filestore queue committing max bytes = 16 MiB
>filestore op threads = 20
>filestore flusher = false
>filestore journal parallel = false
>filestore journal writeahead = true
>#filestore fsync flushes journal data = true
>journal dio = true
>journal aio = true
>osd pool default size = 2 # Write an object n times.
>osd pool default min size = 1 # Allow writing n copy in a degraded state.
>osd pool default pg num = 333
>osd pool default pgp num = 333
>osd crush chooseleaf type = 1
>
>[client]
>rbd cache = true
>rbd cache size = 102400
>rbd cache max dirty = 12800
>[osd]
>osd journal size = 5200
>#osd journal = /dev/disk/by-partlabel/journal-$id
>Without SSD as a journal i have a ~112MB/sec throughput
>After I was added SSD 64Gb ADATA for a journal disk and create 6 raw 
>partitions. And I get a very slow bandwidth with rados bench:
>Total time run: 302.350730
>Total writes made: 1146
>Write size: 4194304
>Bandwidth (MB/sec): 15.161
>Stddev Bandwidth: 11.5658
>Max bandwidth (MB/sec): 52
>Min bandwidth (MB/sec): 0
>Average Latency: 4.21521
>Stddev Latency: 1.25742
>Max latency: 8.32535
>Min latency: 0.277449
>
>iostat show a few write io (no more than 200):
>
>Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
>w_await svctm %util
>sdh 0.00 0.00 0.00 8.00     0.00 1024.00    256.00 129.48  2120.50  0.00   
>2120.50 124.50 99.60
>sdh 0.00 0.00 0.00 124.00 0.00 14744.00  237.81 148.44  1723.81  0.00   
>1723.81  8.10 100.40
>sdh 0.00 0.00 0.00 114.00 0.00 13508.00  236.98 144.27  1394.91  0.00   
>1394.91  8.77 100.00
>sdh 0.00 0.00 0.00 122.00 0.00 13964.00  228.92 122.99  1439.74  0.00   
>1439.74  8.20 100.00
>sdh 0.00 0.00 0.00 161.00 0.00 19640.00  243.98 154.98  1251.16  0.00   
>1251.16  6.21 100.00
>sdh 0.00 0.00 0.00 11.00   0.00 1408.00    256.00 152.68   717.09   0.00   
>717.09    90.91 100.00
>sdh 0.00 0.00 0.00 154.00 0.00 

Re: [ceph-users] how to understand deep flatten implementation

2015-10-23 Thread Jason Dillaman
> After reading and understanding your mail, i moved on to do some experiments
> regarding deep flatten. some questions showed up:
> here is my experiement:
> ceph version I used: ceph -v output:
> ceph version 9.1.0-299-g89b2b9b

> 1. create a separate pool for test:
> rados mkpool pool100
> 2. create parent image with deep-flatten feature:
> rbd create --image-feature deep-flatten --image-feature layering -p pool100
> user1_image1 --size 1024 --image-format 2
> 3. create snap:
> rbd snap create pool100/user1_image1@user1_image1_snap
> 4. protect snap:
> rbd snap protect pool100/user1_image1@user1_image1_snap
> 5. clone child image based on this snap:
> rbd clone pool100/user1_image1@user1_image1_snap pool100/user1_image2
> 6. create snap on clone image:
> rbd snap create pool100/user1_image2@user1_image2_snap
> 7. flatten the clone image:
> rbd flatten pool100/user1_image2

> test output:
> rbd info pool100/user1_image2
> rbd image 'user1_image2':
> size 1024 MB in 256 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.1016317b2d6
> format: 2
> features: layering < why after flatten, cloned image is without
> deep-flatten feature?
> flags:


'rbd clone' doesn't copy features from the parent image -- you needed to 
specify "--image-feature deep-flatten" when creating the clone.  


> rbd info pool100/user1_image2@user1_image2_snap

> rbd image 'user1_image2':
> size 1024 MB in 256 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.1016317b2d6
> format: 2
> features: layering
> flags:
> protected: True
> parent: pool100/user1_image1@user1_image1_snap < why after flatten, child
> snapshot still has parent snap info?
> overlap: 1024 MB


Because deep-flatten wasn't enabled on the clone.


> Another question is since deep-flatten operations are applied to cloned
> image, why we need to create parent image with deep-flatten image features??


The deep-flatten feature is not required on the parent image (since non-cloned 
images cannot be flattened).


> Cory


-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Wido den Hollander


On 23-10-15 14:58, Jon Heese wrote:
> Hello,
> 
>  
> 
> We have two separate networks in our Ceph cluster design:
> 
>  
> 
> 10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe,
> intended to be a management or control plane network
> 
> 10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x
> bonded 10Gbe, intended to be the data network
> 
>  
> 
> So we want all of the OSD traffic to go over the "back end", and the MON
> traffic to go over the "front end".  We thought the following would do that:
> 
>  
> 
> public network = 10.197.5.0/24   # skinny pipe, mgmt & MON traffic
> 
> cluster network = 10.174.1.0/24  # fat pipe, OSD traffic
> 
>  
> 
> But that doesn't seem to be the case -- iftop and netstat show that
> little/no OSD communication is happening over the 10.174.1 network and
> it's all happening over the 10.197.5 network.
> 
>  
> 
> What configuration should we be running to enforce the networks per our
> design?  Thanks!
> 
> 

Do the OSD nodes have a IP in 10.174.1.0/24? And are you sure this is in
ceph.conf?

What does 'ip a' show on the OSD nodes?


> 
> /Jon Heese/
> /Systems Engineer/
> *INetU Managed Hosting*
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net 
> 
> /** This message contains confidential information, which also may be
> privileged, and is intended only for the person(s) addressed above. Any
> unauthorized use, distribution, copying or disclosure of confidential
> and/or privileged information is strictly prohibited. If you have
> received this communication in error, please erase all copies of the
> message and its attachments and notify the sender immediately via reply
> e-mail. **/
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Jon Heese
Bill,

Thanks for the explanation – that helps a lot.  In that case, I actually want 
the 10.174.1.0/24 network to be both my cluster and my public network, because 
I want all “heavy” data traffic to be on that network.  And by “heavy”, I mean 
large volumes of data, both normal Ceph client traffic and OSD-to-OSD 
communication.  Contrast this with the more “control plane” connections between 
the MONs and the OSDs, which we intend to go over the lighter-weight management 
network.

The documentation seems to indicate that the MONs also communicate on the 
“public” network, but our MONs aren’t currently on that network (we were 
treating it as an OSD/Client network).  I guess I need to put them on that 
network…?

Thanks.

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net
** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. **
From: Campbell, Bill [mailto:bcampb...@axcess-financial.com]
Sent: Friday, October 23, 2015 9:11 AM
To: Jon Heese 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Proper Ceph network configuration

The "public" network is where all storage accesses from other systems or 
clients will occur.  When you map RBD's to other hosts, access object storage 
through the RGW, or CephFS access, you will access the data through the 
"public" network.  The "cluster" network is where all internal replication 
between OSD processes will occur.  As an example in our set up, we have a 10GbE 
public network for hypervisor nodes to access, along with a 10GbE cluster 
network for back-end replication/communication.  Our 1GbE network is used for 
monitoring integration and system administration.


From: "Jon Heese" >
To: ceph-users@lists.ceph.com
Sent: Friday, October 23, 2015 8:58:28 AM
Subject: [ceph-users] Proper Ceph network configuration


Hello,



We have two separate networks in our Ceph cluster design:



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end".  We thought the following would do that:



public network = 10.197.5.0/24   # skinny pipe, mgmt & MON traffic

cluster network = 10.174.1.0/24  # fat pipe, OSD traffic



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network.


What configuration should we be running to enforce the networks per our design? 
 Thanks!

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net
** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. **

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Campbell, Bill
The "public" network is where all storage accesses from other systems or 
clients will occur. When you map RBD's to other hosts, access object storage 
through the RGW, or CephFS access, you will access the data through the 
"public" network. The "cluster" network is where all internal replication 
between OSD processes will occur. As an example in our set up, we have a 10GbE 
public network for hypervisor nodes to access, along with a 10GbE cluster 
network for back-end replication/communication. Our 1GbE network is used for 
monitoring integration and system administration. 

- Original Message -

From: "Jon Heese"  
To: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 8:58:28 AM 
Subject: [ceph-users] Proper Ceph network configuration 



Hello, 



We have two separate networks in our Ceph cluster design: 



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network 

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network 



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end". We thought the following would do that: 



public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic 

cluster network = 10.174.1.0/24 # fat pipe, OSD traffic 



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network. 



What configuration should we be running to enforce the networks per our design? 
Thanks! 



Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 

** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Proper Ceph network configuration

2015-10-23 Thread Jon Heese
Hello,



We have two separate networks in our Ceph cluster design:



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end".  We thought the following would do that:



public network = 10.197.5.0/24   # skinny pipe, mgmt & MON traffic

cluster network = 10.174.1.0/24  # fat pipe, OSD traffic



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network.


What configuration should we be running to enforce the networks per our design? 
 Thanks!

Jon Heese
Systems Engineer
INetU Managed Hosting
P: 610.266.7441 x 261
F: 610.266.7434
www.inetu.net
** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. **
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "stray" objects in empty cephfs data pool

2015-10-23 Thread Burkhard Linke

Hi,

On 10/14/2015 06:32 AM, Gregory Farnum wrote:

On Mon, Oct 12, 2015 at 12:50 AM, Burkhard Linke
 wrote:



*snipsnap*

Thanks, that did the trick. I was able to locate the host blocking the file
handles and remove the objects from the EC pool.

Well, all except one:

# ceph df
   ...
 ec_ssd_cache 18  4216k 0 2500G  129
 cephfs_ec_data   19  4096k 0 31574G1

# rados -p ec_ssd_cache ls
1ef540f.0386
# rados -p cephfs_ec_data ls
1ef540f.0386
# ceph mds tell cb-dell-pe620r dumpcache cache.file
# grep 1ef540f /cache.file
#

It does not show up in the dumped cache file, but keeps being promoted to
the cache tier after MDS restarts. I've restarted most of the cephfs clients
by unmounting cephfs and restarting ceph-fuse, but the object remains
active.

You can enable MDS debug logging and see if the inode shows up in the
log during replay. It's possible it's getting read in (from journal
operations) but then getting evicted from cache if nobody's accessing
it any more.
You can also look at the xattrs on the object to see what the
backtrace is and if that file is in cephfs.
After the last MDS restart the stray object was not promoted to the 
cache anymore:

ec_ssd_cache 18   120k 0 3842G  128
cephfs_ec_data   19  4096k 0 10392G1

There are no xattrs available for the stray object, so it's not possible 
to find out which file it belongs/belonged to:

# rados -p cephfs_ec_data ls
1ef540f.0386
# rados -p cephfs_ec_data listxattr 1ef540f.0386
#

Is it possible to list pending journal operations to be on the safe side?

Regards,
Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librbd regression with Hammer v0.94.4 -- use caution!

2015-10-23 Thread Dzianis Kahanovich
After upgrading on git + 802cf861352d3c77800488d812009cbbc7184c73.patch, I got 
one host repeated restart problem: 2/3 OSDs - "osd/ReplicatedPG.cc: 387: FAILED 
assert(needs_recovery)" after start of repair. After series of restart, on 
progressed repair %, HEALTH_OK. Are there subject to report more?



Sage Weil пишет:

There is a regression in librbd in v0.94.4 that can cause VMs to crash.
For now, please refrain from upgrading hypervisor nodes or other librbd
users to v0.94.4.

http://tracker.ceph.com/issues/13559

The problem does not affect server-side daemons (ceph-mon, ceph-osd,
etc.).

Jason's identified the bug and has a fix prepared, but it'll probably take
a few days before we have v0.94.5 out.


https://github.com/ceph/ceph/commit/4692c330bd992a06b97b5b8975ab71952b22477a

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap immediately consistent?

2015-10-23 Thread Ilya Dryomov
On Thu, Oct 22, 2015 at 10:59 PM, Allen Liao  wrote:
> Does ceph guarantee image consistency if an rbd image is unmapped on one
> machine then immediately mapped on another machine?  If so, does the same
> apply to issuing a snapshot command on machine B as soon as the unmap
> command finishes on machine A?
>
> In other words, does the unmap operation flush all changes to the ceph
> cluster synchronously?

Yes, rbd unmap won't return from the kernel until all cached buffers
are flushed.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrading to major releases

2015-10-23 Thread John-Paul Robinson
Hi,

When upgrading to the next release, is it necessary to first upgrade to
the most recent point release of the prior release or can one upgrade
from the initial release of the named version?  The release notes don't
appear to indicate it is necessary
(http://docs.ceph.com/docs/master/release-notes), just wondering if
there are benefits or assumptions.

Thanks,

~jpr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] inotify, etc?

2015-10-23 Thread Edward Ned Harvey (ceph)
Trying to figure out if ceph supports inotify, or some form of notification, I 
see this issue from 4 years ago:
http://tracker.ceph.com/issues/1296

And the corresponding discussion thread
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/3355

Basically no information there, and what is there is stale. But that's the best 
information I can find.

Is that all there is? Does ceph support distributed notification?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrading to major releases

2015-10-23 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Depends on the releases. To go from Hammer to Infernalis, you do for
example, but I don't think there is any requirement for Firefly to
Hammer. It is always a good idea to go through the latest point
release just to be safe.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Oct 23, 2015 at 10:27 AM, John-Paul Robinson  wrote:
> Hi,
>
> When upgrading to the next release, is it necessary to first upgrade to
> the most recent point release of the prior release or can one upgrade
> from the initial release of the named version?  The release notes don't
> appear to indicate it is necessary
> (http://docs.ceph.com/docs/master/release-notes), just wondering if
> there are benefits or assumptions.
>
> Thanks,
>
> ~jpr
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.2.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWKm+8CRDmVDuy+mK58QAAY2gQAKlH6mvSPgfhGx/XmeTK
6UlMro3tcPAe/J/NVfmgjB56CfDO/iDaxHch04PxMlu23kSKFjPYiuHsm/In
4Peqbumnn3rV48vZorSOUoycvwG+qBM0hiuzyc+3nBDU/zo2FCdKK2+70Dj3
lL6hVdps65yGma3nWJiNKyzT8vd5GeLFojTS6+LOeU4XGM6N0XmlasrNF/MN
1HOC2qzb4Ikvjt9xFlEg67OkF6AxwqFDz9uDgGRxQL7XKuL0hBzWZw06w0sh
MqNjVqbLsEbA5ZXVVPJkC17t8jepQdxg0L4i7ldZdNfvLj5Yq0qKaQPUYKM2
FeoBWxi3IR8lY/aN6YsNYyWKawegjVeUF7eXpVkZkQWxgp2aJYoDS2BFPWa6
39aiQj4GgjzwY5KUSQuiFIUcH/RUWdUAi3osQc864vi0WK5uAE1LD+U1KcW1
x00sBZ2Z58JGucE+q+yYFjxNzpNZzG0BelBmErodtNb9lJXZQ32mqEalqqP+
ueUSt2F5MTjsKn3aBKdICzQ0tSj65/k52mtTpQx/TlKfyd06dOSO4fmfu05i
+isbuum1cFJrWzo6yRpq0o3xEiVBoDxdTijmOFyYrIxipwFJk50aFhrVPv1h
AIW8Sb0DHyYEVoh04V+LPsSOwvOWykwLPRgrB+6bhKhrvvZiUkFR9CZmbs5g
EuEz
=C8xi
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Campbell, Bill
Yes, that's correct. 

We use the public/cluster networks exclusively, so in the configuration we 
specify the MON addresses on the public network, and define both the 
public/cluster network subnet. I've not tested, but wonder if it's possible to 
have the MON addresses on a 1GbE network, then define public/cluster networks 
in the config and things still operate? 

- Original Message -

From: "Jon Heese"  
To: "Bill Campbell"  
Cc: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 10:03:46 AM 
Subject: RE: [ceph-users] Proper Ceph network configuration 



Bill, 



Thanks for the explanation – that helps a lot. In that case, I actually want 
the 10.174.1.0/24 network to be both my cluster and my public network, because 
I want all “heavy” data traffic to be on that network. And by “heavy”, I mean 
large volumes of data, both normal Ceph client traffic and OSD-to-OSD 
communication. Contrast this with the more “control plane” connections between 
the MONs and the OSDs, which we intend to go over the lighter-weight management 
network. 



The documentation seems to indicate that the MONs also communicate on the 
“public” network, but our MONs aren’t currently on that network (we were 
treating it as an OSD/Client network). I guess I need to put them on that 
network…? 



Thanks. 




Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 


** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 


From: Campbell, Bill [mailto:bcampb...@axcess-financial.com] 
Sent: Friday, October 23, 2015 9:11 AM 
To: Jon Heese  
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] Proper Ceph network configuration 





The "public" network is where all storage accesses from other systems or 
clients will occur. When you map RBD's to other hosts, access object storage 
through the RGW, or CephFS access, you will access the data through the 
"public" network. The "cluster" network is where all internal replication 
between OSD processes will occur. As an example in our set up, we have a 10GbE 
public network for hypervisor nodes to access, along with a 10GbE cluster 
network for back-end replication/communication. Our 1GbE network is used for 
monitoring integration and system administration. 






From: "Jon Heese" < jhe...@inetu.net > 
To: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 8:58:28 AM 
Subject: [ceph-users] Proper Ceph network configuration 





Hello, 



We have two separate networks in our Ceph cluster design: 



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network 

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network 



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end". We thought the following would do that: 



public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic 

cluster network = 10.174.1.0/24 # fat pipe, OSD traffic 



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network. 



What configuration should we be running to enforce the networks per our design? 
Thanks! 



Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 

** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 







NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies. 




NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list

[ceph-users] Permission denied when activating a new OSD in 9.1.0

2015-10-23 Thread Max Yehorov
I am trying to add a filestore OSD node to my cluster and got this
during ceph-deploy activate.
The message still appears when "ceph-disk activate" is run as root. Is
this functionality broken in 9.1.0 or is it something misconfigured on
my box? And /var/lib/ceph is chown'ed to ceph:ceph.

[WARNING] 2015-10-23 13:42:57.424153 7f2212389980 -1
asok(0x7f2214bf2200) AdminSocketConfigObs::init: failed:
AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to
'/var/run/ceph/ceph-osd.0.asok': (13) Permission denied
[WARNING] 2015-10-23 13:42:57.427160 7f2212389980 -1
filestore(/var/lib/ceph/tmp/mnt.FHYolj) mkjournal error creating
journal on /var/lib/ceph/tmp/mnt.FHYolj/journal: (13) Permission
denied
[WARNING] 2015-10-23 13:42:57.427171 7f2212389980 -1 OSD::mkfs:
ObjectStore::mkfs failed with error -13
[WARNING] 2015-10-23 13:42:57.427191 7f2212389980 -1 .[0;31m ** ERROR:
error creating empty object store in /var/lib/ceph/tmp/mnt.FHYolj:
(13) Permission denied.[0m
[WARNING] ERROR:ceph-disk:Failed to activate
[WARNING] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.FHYolj
[WARNING] INFO:ceph-disk:Running command: /bin/umount --
/var/lib/ceph/tmp/mnt.FHYolj
[WARNING] Traceback (most recent call last):
[WARNING]   File "/usr/sbin/ceph-disk", line 3576, in 
[WARNING] main(sys.argv[1:])
[WARNING]   File "/usr/sbin/ceph-disk", line 3530, in main
[WARNING] args.func(args)
[WARNING]   File "/usr/sbin/ceph-disk", line 2424, in main_activate
[WARNING] dmcrypt_key_dir=args.dmcrypt_key_dir,
[WARNING]   File "/usr/sbin/ceph-disk", line 2197, in mount_activate
[WARNING] (osd_id, cluster) = activate(path,
activate_key_template, init)
[WARNING]   File "/usr/sbin/ceph-disk", line 2360, in activate
[WARNING] keyring=keyring,
[WARNING]   File "/usr/sbin/ceph-disk", line 1950, in mkfs
[WARNING] '--setgroup', get_ceph_user(),
[WARNING]   File "/usr/sbin/ceph-disk", line 349, in
command_check_call
[WARNING] return subprocess.check_call(arguments)
[WARNING]   File "/usr/lib/python2.7/subprocess.py", line 540, in
check_call
[WARNING] raise CalledProcessError(retcode, cmd)
[WARNING] subprocess.CalledProcessError: Command
'['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i',
'0', '--monmap', '/var/lib/ceph/tmp/mnt.FHYolj/activate.monmap',
'--osd-data', '/var/lib/ceph/tmp/mnt.FHYolj', '--osd-journal',
'/var/lib/ceph/
tmp/mnt.FHYolj/journal', '--osd-uuid',
'304b6d9b-9186-447d-9bf9-bcf70c8fc249', '--keyring',
'/var/lib/ceph/tmp/mnt.FHYolj/keyring', '--setuser', 'ceph',
'--setgroup', 'ceph']' returned non-zero exit status 1
[ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
ceph-disk -v activate --mark-init upstart --mount /dev/sdb1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why was osd pool default size changed from 2 to 3.

2015-10-23 Thread Corin Langosch
Am 23.10.2015 um 20:53 schrieb Gregory Farnum:
> On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson  wrote:
>
> Nothing changed to make two copies less secure. 3 copies is just so
> much more secure and is the number that all the companies providing
> support recommend, so we changed the default.
> (If you're using it for data you care about, you should really use 3 copies!)
> -Greg

I assume that number really depends on the (number of) OSDs you have in your 
crush rule for that pool. A replication of
2 might be ok for a pool spread over 10 osds, but not for one spread over 100 
osds

Corin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cache tier write-back upper bound?

2015-10-23 Thread Brian Kroth
Hi, I'm wondering when using a cache pool tier if there's an upper bound 
on when something written to the cache is flushed back to the backing 
pool?  Something like a cache_max_flush_age setting?  Basically I'm 
wondering if I have the unfortunate case of all of the SSD replicas for 
a cache pool object all go at once, how far behind is the backing pool 
object from the latest data?


Also, am I reading things correctly that if you wanted to turn the 
write-back mode into something close to a write-through (though not 
exactly), you'd do something like the following?


# ceph osd pool set cachepool cache_target_dirty_ratio 0.00
# ceph osd pool set cachepool cache_min_flush_age 0

That should still ack the client as soon as the replicas were confirmed 
on the cachepool layer, but then immediately let the background flusher 
start writing the updates to the backing pool, all while still leaving 
the object available for further updates from clients, correct?  Or does 
the background flusher need to lock the object while it writes it to the 
backing pool, thus stalling further client updates to to the object 
until that completes?


I'm guessing that setting cache_target_dirty_ratio to 0 and 
cache_min_flush_age to N, still wouldn't quite implement 
cache_max_flush_age since if the object is continually getting updated, 
then that timer is continually getting reset, so it never becomes a 
candidate to get updated in the backing store, right?


Thanks,
Brian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-23 Thread deeepdish
@John-Paul Robinson:

I’ve also experienced nfs being blocked when serving rbd devices (XFS system).  
In my scenario I had rbd device mapped on an OSD host and nfs exported (lab 
scenario).   Log entries below..  Running Centos 7 w/ 
3.10.0-229.14.1.el7.x86_64.   Next step for me is to compile 3.18.22 and test 
nfs and scst (iscsi / fc).

Oct 22 13:30:01 osdhost01 systemd: Started Session 14 of user root.
Oct 22 13:37:04 osdhost01 kernel: INFO: task nfsd:12672 blocked for more than 
120 seconds.
Oct 22 13:37:04 osdhost01 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 13:37:04 osdhost01 kernel: nfsdD 880627c73680 0 
12672  2 0x0080
Oct 22 13:37:04 osdhost01 kernel: 880bda763b08 0046 
880be73af1c0 880bda763fd8
Oct 22 13:37:04 osdhost01 kernel: 880bda763fd8 880bda763fd8 
880be73af1c0 880627c73f48
Oct 22 13:37:04 osdhost01 kernel: 880c3ff98ae8 0002 
811562e0 880bda763b80
Oct 22 13:37:04 osdhost01 kernel: Call Trace:
Oct 22 13:37:04 osdhost01 kernel: [] ? 
wait_on_page_read+0x60/0x60
Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130
Oct 22 13:37:04 osdhost01 kernel: [] sleep_on_page+0xe/0x20
Oct 22 13:37:04 osdhost01 kernel: [] __wait_on_bit+0x60/0x90
Oct 22 13:37:04 osdhost01 kernel: [] 
wait_on_page_bit+0x86/0xb0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
autoremove_wake_function+0x40/0x40
Oct 22 13:37:04 osdhost01 kernel: [] 
filemap_fdatawait_range+0x111/0x1b0
Oct 22 13:37:04 osdhost01 kernel: [] 
filemap_write_and_wait_range+0x3f/0x70
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_file_fsync+0x66/0x1f0 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] vfs_fsync_range+0x1d/0x30
Oct 22 13:37:04 osdhost01 kernel: [] nfsd_commit+0xb9/0xe0 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd4_commit+0x57/0x60 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] 
nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd_dispatch+0xbb/0x200 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] 
svc_process_common+0x453/0x6f0 [sunrpc]
Oct 22 13:37:04 osdhost01 kernel: [] svc_process+0x103/0x170 
[sunrpc]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd+0xe7/0x150 [nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] ? nfsd_destroy+0x80/0x80 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Oct 22 13:37:04 osdhost01 kernel: [] ret_from_fork+0x58/0x90
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Oct 22 13:37:04 osdhost01 kernel: INFO: task kworker/u50:81:15660 blocked for 
more than 120 seconds.
Oct 22 13:37:04 osdhost01 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 13:37:04 osdhost01 kernel: kworker/u50:81  D 880c3fc73680 0 
15660  2 0x0080
Oct 22 13:37:04 osdhost01 kernel: Workqueue: writeback bdi_writeback_workfn 
(flush-252:0)
Oct 22 13:37:04 osdhost01 kernel: 88086deeb738 0046 
880beb6796c0 88086deebfd8
Oct 22 13:37:04 osdhost01 kernel: 88086deebfd8 88086deebfd8 
880beb6796c0 880c3fc73f48
Oct 22 13:37:04 osdhost01 kernel: 88061aec0fc0 880c1bb2dea0 
88061aec0ff0 88061aec0fc0
Oct 22 13:37:04 osdhost01 kernel: Call Trace:
Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130
Oct 22 13:37:04 osdhost01 kernel: [] get_request+0x1b5/0x780
Oct 22 13:37:04 osdhost01 kernel: [] ? wake_up_bit+0x30/0x30
Oct 22 13:37:04 osdhost01 kernel: [] blk_queue_bio+0xc6/0x390
Oct 22 13:37:04 osdhost01 kernel: [] 
generic_make_request+0xe2/0x130
Oct 22 13:37:04 osdhost01 kernel: [] submit_bio+0x71/0x150
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_submit_ioend_bio.isra.12+0x33/0x40 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_submit_ioend+0xef/0x130 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_vm_writepage+0x36a/0x5d0 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] __writepage+0x13/0x50
Oct 22 13:37:04 osdhost01 kernel: [] 
write_cache_pages+0x251/0x4d0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
global_dirtyable_memory+0x70/0x70
Oct 22 13:37:04 osdhost01 kernel: [] 
generic_writepages+0x4d/0x80
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_vm_writepages+0x43/0x50 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] do_writepages+0x1e/0x40
Oct 22 13:37:04 osdhost01 kernel: [] 
__writeback_single_inode+0x40/0x220
Oct 22 13:37:04 osdhost01 kernel: [] 
writeback_sb_inodes+0x25e/0x420
Oct 22 13:37:04 osdhost01 kernel: [] 
__writeback_inodes_wb+0x9f/0xd0
Oct 22 13:37:04 osdhost01 kernel: [] wb_writeback+0x263/0x2f0
Oct 22 13:37:04 osdhost01 kernel: [] 
bdi_writeback_workfn+0x1cc/0x460
Oct 22 13:37:04 osdhost01 kernel: [] 
process_one_work+0x17b/0x470
Oct 22 13:37:04 osdhost01 kernel: [] worker_thread+0x11b/0x400
Oct 22 13:37:04 osdhost01 kernel: [] ? 
rescuer_thread+0x400/0x400
Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140

[ceph-users] Older version repo

2015-10-23 Thread Logan Barfield
I'm currently working on deploying a new VM cluster using KVM + RBD.  I've
noticed through the list that the latest "Hammer" (0.94.4) release can
cause issues with librbd and caching.

We've worked around this issue in our existing clusters by only upgrading
the OSD & MON hosts, while leaving the hypervisor/client hosts on v0.94.3
as recommended by another user on the list.

Is there a archive repo somewhere for Ceph that we can use to install
0.94.3 on Ubuntu 14.04, or is building from source our only option?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow ssd journal

2015-10-23 Thread K K

Hello.
Some strange things happen with my ceph installation after I was moved journal 
to SSD disk.
OS: Ubuntu 15.04 with ceph version 0.94.2-0ubuntu0.15.04.1
server: dell r510 with PERC H700 Integrated 512MB RAID cache
my cluster have:
1 monitor node
2 OSD nodes with 6 OSD daemons at each server (3Tb HDD SATA 7200 rpm disks XFS 
system). 
network: 1Gbit to hypervisor and 1 Gbit among all ceph nodes
ceph.conf:
[global]
public network = 10.12.0.0/16
cluster network = 192.168.133.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
filestore max sync interval = 10
filestore min sync interval = 1
filestore queue max ops = 500
#filestore queue max bytes = 16 MiB
#filestore queue committing max ops = 4096
#filestore queue committing max bytes = 16 MiB
filestore op threads = 20
filestore flusher = false
filestore journal parallel = false
filestore journal writeahead = true
#filestore fsync flushes journal data = true
journal dio = true
journal aio = true
osd pool default size = 2 # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1

[client]
rbd cache = true
rbd cache size = 102400
rbd cache max dirty = 12800
[osd]
osd journal size = 5200
#osd journal = /dev/disk/by-partlabel/journal-$id
Without SSD as a journal i have a ~112MB/sec throughput
After I was added SSD 64Gb ADATA for a journal disk and create 6 raw 
partitions. And I get a very slow bandwidth with rados bench:
Total time run: 302.350730
Total writes made: 1146
Write size: 4194304
Bandwidth (MB/sec): 15.161
Stddev Bandwidth: 11.5658
Max bandwidth (MB/sec): 52
Min bandwidth (MB/sec): 0
Average Latency: 4.21521
Stddev Latency: 1.25742
Max latency: 8.32535
Min latency: 0.277449

iostat show a few write io (no more than 200):

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util
sdh 0.00 0.00 0.00 8.00     0.00 1024.00    256.00 129.48  2120.50  0.00   
2120.50 124.50 99.60
sdh 0.00 0.00 0.00 124.00 0.00 14744.00  237.81 148.44  1723.81  0.00   1723.81 
 8.10 100.40
sdh 0.00 0.00 0.00 114.00 0.00 13508.00  236.98 144.27  1394.91  0.00   1394.91 
 8.77 100.00
sdh 0.00 0.00 0.00 122.00 0.00 13964.00  228.92 122.99  1439.74  0.00   1439.74 
 8.20 100.00
sdh 0.00 0.00 0.00 161.00 0.00 19640.00  243.98 154.98  1251.16  0.00   1251.16 
 6.21 100.00
sdh 0.00 0.00 0.00 11.00   0.00 1408.00    256.00 152.68   717.09   0.00   
717.09    90.91 100.00
sdh 0.00 0.00 0.00 154.00 0.00 18696.00  242.81 142.09  1278.65  0.00   1278.65 
 6.49 100.00
test with fio (qd=32,128,256, bs=4k) show very good performance of SSD disk 
(10-30k write io).
Can anybody help me? Can someone faced with similar problem?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow ssd journal

2015-10-23 Thread Jan Schermer
The drive you have is not suitable at all for journal. Horrible, actually.

"test with fio (qd=32,128,256, bs=4k) show very good performance of SSD disk 
(10-30k write io)."

This is not realistic. Try:

fio --sync=1 --fsync=1 --direct=1 --iodepth=1 --ioengine=aio 

Jan

On 23 Oct 2015, at 16:31, K K  wrote:

Hello.

Some strange things happen with my ceph installation after I was moved journal 
to SSD disk.

OS: Ubuntu 15.04 with ceph version 0.94.2-0ubuntu0.15.04.1
server: dell r510 with PERC H700 Integrated 512MB RAID cache
my cluster have:
1 monitor node
2 OSD nodes with 6 OSD daemons at each server (3Tb HDD SATA 7200 rpm disks XFS 
system). 
network: 1Gbit to hypervisor and 1 Gbit among all ceph nodes
ceph.conf:
[global]
public network = 10.12.0.0/16
cluster network = 192.168.133.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
filestore max sync interval = 10
filestore min sync interval = 1
filestore queue max ops = 500
#filestore queue max bytes = 16 MiB
#filestore queue committing max ops = 4096
#filestore queue committing max bytes = 16 MiB
filestore op threads = 20
filestore flusher = false
filestore journal parallel = false
filestore journal writeahead = true
#filestore fsync flushes journal data = true
journal dio = true
journal aio = true
osd pool default size = 2 # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1

[client]
rbd cache = true
rbd cache size = 102400
rbd cache max dirty = 12800

[osd]
osd journal size = 5200
#osd journal = /dev/disk/by-partlabel/journal-$id

Without SSD as a journal i have a ~112MB/sec throughput

After I was added SSD 64Gb ADATA for a journal disk and create 6 raw 
partitions. And I get a very slow bandwidth with rados bench:

Total time run: 302.350730
Total writes made: 1146
Write size: 4194304
Bandwidth (MB/sec): 15.161

Stddev Bandwidth: 11.5658
Max bandwidth (MB/sec): 52
Min bandwidth (MB/sec): 0
Average Latency: 4.21521
Stddev Latency: 1.25742
Max latency: 8.32535
Min latency: 0.277449

iostat show a few write io (no more than 200):


Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util
sdh 0.00 0.00 0.00 8.00 0.00 1024.00256.00 129.48  2120.50  0.00   
2120.50 124.50 99.60
sdh 0.00 0.00 0.00 124.00 0.00 14744.00  237.81 148.44  1723.81  0.00   1723.81 
 8.10 100.40
sdh 0.00 0.00 0.00 114.00 0.00 13508.00  236.98 144.27  1394.91  0.00   1394.91 
 8.77 100.00
sdh 0.00 0.00 0.00 122.00 0.00 13964.00  228.92 122.99  1439.74  0.00   1439.74 
 8.20 100.00
sdh 0.00 0.00 0.00 161.00 0.00 19640.00  243.98 154.98  1251.16  0.00   1251.16 
 6.21 100.00
sdh 0.00 0.00 0.00 11.00   0.00 1408.00256.00 152.68   717.09   0.00   
717.0990.91 100.00
sdh 0.00 0.00 0.00 154.00 0.00 18696.00  242.81 142.09  1278.65  0.00   1278.65 
 6.49 100.00

test with fio (qd=32,128,256, bs=4k) show very good performance of SSD disk 
(10-30k write io).

Can anybody help me? Can someone faced with similar problem?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to understand deep flatten implementation

2015-10-23 Thread Max Yehorov
I am trying to pass deep-flatten during clone creation and got this:

rbd clone --image-feature deep-flatten d0@s0 d1

rbd: image format can only be set when creating or importing an image

On Fri, Oct 23, 2015 at 6:27 AM, Jason Dillaman  wrote:
>> After reading and understanding your mail, i moved on to do some experiments
>> regarding deep flatten. some questions showed up:
>> here is my experiement:
>> ceph version I used: ceph -v output:
>> ceph version 9.1.0-299-g89b2b9b
>
>> 1. create a separate pool for test:
>> rados mkpool pool100
>> 2. create parent image with deep-flatten feature:
>> rbd create --image-feature deep-flatten --image-feature layering -p pool100
>> user1_image1 --size 1024 --image-format 2
>> 3. create snap:
>> rbd snap create pool100/user1_image1@user1_image1_snap
>> 4. protect snap:
>> rbd snap protect pool100/user1_image1@user1_image1_snap
>> 5. clone child image based on this snap:
>> rbd clone pool100/user1_image1@user1_image1_snap pool100/user1_image2
>> 6. create snap on clone image:
>> rbd snap create pool100/user1_image2@user1_image2_snap
>> 7. flatten the clone image:
>> rbd flatten pool100/user1_image2
>
>> test output:
>> rbd info pool100/user1_image2
>> rbd image 'user1_image2':
>> size 1024 MB in 256 objects
>> order 22 (4096 kB objects)
>> block_name_prefix: rbd_data.1016317b2d6
>> format: 2
>> features: layering < why after flatten, cloned image is without
>> deep-flatten feature?
>> flags:
>
>
> 'rbd clone' doesn't copy features from the parent image -- you needed to 
> specify "--image-feature deep-flatten" when creating the clone.
>
>
>> rbd info pool100/user1_image2@user1_image2_snap
>
>> rbd image 'user1_image2':
>> size 1024 MB in 256 objects
>> order 22 (4096 kB objects)
>> block_name_prefix: rbd_data.1016317b2d6
>> format: 2
>> features: layering
>> flags:
>> protected: True
>> parent: pool100/user1_image1@user1_image1_snap < why after flatten, child
>> snapshot still has parent snap info?
>> overlap: 1024 MB
>
>
> Because deep-flatten wasn't enabled on the clone.
>
>
>> Another question is since deep-flatten operations are applied to cloned
>> image, why we need to create parent image with deep-flatten image features??
>
>
> The deep-flatten feature is not required on the parent image (since 
> non-cloned images cannot be flattened).
>
>
>> Cory
>
>
> --
>
> Jason Dillaman
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to understand deep flatten implementation

2015-10-23 Thread Max Yehorov
Looks like it is a bug:

Features are parsed and set here:
https://github.com/ceph/ceph/blob/master/src/rbd.cc#L3235

format_specified is forced to true here:
https://github.com/ceph/ceph/blob/master/src/rbd.cc#L3268

Error is produced here:
https://github.com/ceph/ceph/blob/master/src/rbd.cc#L3449

On Fri, Oct 23, 2015 at 1:53 PM, Max Yehorov  wrote:
> I am trying to pass deep-flatten during clone creation and got this:
>
> rbd clone --image-feature deep-flatten d0@s0 d1
>
> rbd: image format can only be set when creating or importing an image
>
> On Fri, Oct 23, 2015 at 6:27 AM, Jason Dillaman  wrote:
>>> After reading and understanding your mail, i moved on to do some experiments
>>> regarding deep flatten. some questions showed up:
>>> here is my experiement:
>>> ceph version I used: ceph -v output:
>>> ceph version 9.1.0-299-g89b2b9b
>>
>>> 1. create a separate pool for test:
>>> rados mkpool pool100
>>> 2. create parent image with deep-flatten feature:
>>> rbd create --image-feature deep-flatten --image-feature layering -p pool100
>>> user1_image1 --size 1024 --image-format 2
>>> 3. create snap:
>>> rbd snap create pool100/user1_image1@user1_image1_snap
>>> 4. protect snap:
>>> rbd snap protect pool100/user1_image1@user1_image1_snap
>>> 5. clone child image based on this snap:
>>> rbd clone pool100/user1_image1@user1_image1_snap pool100/user1_image2
>>> 6. create snap on clone image:
>>> rbd snap create pool100/user1_image2@user1_image2_snap
>>> 7. flatten the clone image:
>>> rbd flatten pool100/user1_image2
>>
>>> test output:
>>> rbd info pool100/user1_image2
>>> rbd image 'user1_image2':
>>> size 1024 MB in 256 objects
>>> order 22 (4096 kB objects)
>>> block_name_prefix: rbd_data.1016317b2d6
>>> format: 2
>>> features: layering < why after flatten, cloned image is without
>>> deep-flatten feature?
>>> flags:
>>
>>
>> 'rbd clone' doesn't copy features from the parent image -- you needed to 
>> specify "--image-feature deep-flatten" when creating the clone.
>>
>>
>>> rbd info pool100/user1_image2@user1_image2_snap
>>
>>> rbd image 'user1_image2':
>>> size 1024 MB in 256 objects
>>> order 22 (4096 kB objects)
>>> block_name_prefix: rbd_data.1016317b2d6
>>> format: 2
>>> features: layering
>>> flags:
>>> protected: True
>>> parent: pool100/user1_image1@user1_image1_snap < why after flatten, 
>>> child
>>> snapshot still has parent snap info?
>>> overlap: 1024 MB
>>
>>
>> Because deep-flatten wasn't enabled on the clone.
>>
>>
>>> Another question is since deep-flatten operations are applied to cloned
>>> image, why we need to create parent image with deep-flatten image features??
>>
>>
>> The deep-flatten feature is not required on the parent image (since 
>> non-cloned images cannot be flattened).
>>
>>
>>> Cory
>>
>>
>> --
>>
>> Jason Dillaman
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph same rbd on multiple client

2015-10-23 Thread Lindsay Mathieson
On 22 May 2015 at 00:10, gjprabu  wrote:

> Hi All,
>
> We are using rbd and map the same rbd image to the rbd device on
> two different client but i can't see the data until i umount and mount -a
> partition. Kindly share the solution for this issue.
>

Whats the image used for? if its a filesystem image such as ext4, that
won't work. You'd need a cluster ware filesystem such as cephfs.


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph same rbd on multiple client

2015-10-23 Thread gjprabu
Hi Henrik,



Thanks for your reply, Still we are facing same issue. we found this dmesg logs 
and this is known logs because our self made down node1 and made up,  this is 
showing in logs and other then we didn't found error message. Even we do have 
problem while unmounting. umount process goes to "D" stat and  fsck through 
fsck.ocfs2: I/O error. If required to run any other command pls let me know. 



ocfs2 version

debugfs.ocfs2 1.8.0



# cat /etc/sysconfig/o2cb

#

# This is a configuration file for automatic startup of the O2CB

# driver.  It is generated by running /etc/init.d/o2cb configure.

# On Debian based systems the preferred method is running

# 'dpkg-reconfigure ocfs2-tools'.

#



# O2CB_STACK: The name of the cluster stack backing O2CB.

O2CB_STACK=o2cb



# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.

O2CB_BOOTCLUSTER=ocfs2



# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.

O2CB_HEARTBEAT_THRESHOLD=31



# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered 
dead.

O2CB_IDLE_TIMEOUT_MS=3



# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent

O2CB_KEEPALIVE_DELAY_MS=2000



# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts

O2CB_RECONNECT_DELAY_MS=2000



# fsck.ocfs2 -fy /home/build/downloads/

fsck.ocfs2 1.8.0

fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads/"



dmesg logs



[ 4229.886284] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051 ( 5 ) 1 
nodes

[ 4251.437451] o2dlm: Node 3 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 3 
5 ) 2 nodes

[ 4267.836392] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
3 5 ) 3 nodes

[ 4292.755589] o2dlm: Node 2 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
2 3 5 ) 4 nodes

[ 4306.262165] o2dlm: Node 4 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
2 3 4 5 ) 5 nodes

[316476.505401] (kworker/u192:0,95923,0):dlm_do_assert_master:1717 ERROR: Error 
-112 when sending message 502 (key 0xc3460ae7) to node 1

[316476.505470] o2cb: o2dlm has evicted node 1 from domain 
A895BC216BE641A8A7E20AA89D57E051

[316480.437231] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316480.442389] o2cb: o2dlm has evicted node 1 from domain 
A895BC216BE641A8A7E20AA89D57E051

[316480.442412] (kworker/u192:0,95923,20):dlm_begin_reco_handler:2765 
A895BC216BE641A8A7E20AA89D57E051: dead_node previously set to 1, node 3 
changing it to 1

[316480.541237] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316480.541241] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316485.542733] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316485.542740] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316485.542742] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316490.544535] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316490.544538] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316490.544539] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316495.546356] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316495.546362] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316495.546364] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316500.548135] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316500.548139] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316500.548140] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316505.549947] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316505.549951] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316505.549952] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316510.551734] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316510.551739] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316510.551740] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316515.553543] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316515.553547] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316515.553548] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316520.555337] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316520.555341] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain 

Re: [ceph-users] ceph same rbd on multiple client

2015-10-23 Thread Henrik Korkuc
can you paste dmesg and system logs? I am using 3 node OCFS2 with RBD 
and had no problems.


On 15-10-23 08:40, gjprabu wrote:

Hi Frederic,

   Can you give me some solution, we are spending more time to 
solve this issue.


Regards
Prabu




 On Thu, 15 Oct 2015 17:14:13 +0530 *Tyler Bishop 
* wrote 


I don't know enough on ocfs to help.  Sounds like you have
unconccurent writes though

Sent from TypeMail 
On Oct 15, 2015, at 1:53 AM, gjprabu > wrote:

Hi Tyler,

   Can please send me the next setup action to be taken on
this issue.

Regards
Prabu


 On Wed, 14 Oct 2015 13:43:29 +0530 *gjprabu
>* wrote 

Hi Tyler,

 Thanks for your reply. We have disabled rbd_cache
but still issue is persist. Please find our configuration
file.

# cat /etc/ceph/ceph.conf
[global]
fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
mon_initial_members = integ-hm5, integ-hm6, integ-hm7
mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

[mon]
mon_clock_drift_allowed = .500

[client]
rbd_cache = false


--

 cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
 health HEALTH_OK
 monmap e2: 3 mons at

{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}
election epoch 480, quorum 0,1,2
integ-hm5,integ-hm6,integ-hm7
 osdmap e49780: 2 osds: 2 up, 2 in
  pgmap v2256565: 190 pgs, 2 pools, 1364 GB data, 410
kobjects
2559 GB used, 21106 GB / 24921 GB avail
 190 active+clean
  client io 373 kB/s rd, 13910 B/s wr, 103 op/s


Regards
Prabu

 On Tue, 13 Oct 2015 19:59:38 +0530 *Tyler Bishop
>* wrote 

You need to disable RBD caching.





*Tyler Bishop
*Chief Technical Officer
513-299-7108 x10

tyler.bis...@beyondhosting.net


If you are not the intended recipient of this
transmission you are notified that disclosing,
copying, distributing or taking any action in reliance
on the contents of this information is strictly
prohibited.






*From: *"gjprabu" >
*To: *"Frédéric Nass" >
*Cc: *">"
>, "Siva Sokkumuthu"
>, "Kamal Kannan
Subramani(kamalakannan)" >
*Sent: *Tuesday, October 13, 2015 9:11:30 AM
*Subject: *Re: [ceph-users] ceph same rbd on multiple
client

Hi ,

 We have CEPH  RBD with OCFS2 mounted servers. we are
facing i/o errors simultaneously while move the folder
using one nodes in the same disk other nodes data
replicating with below said error (Copying is not
having any problem). Workaround if we remount the
partition this issue get resolved but after sometime
problem again reoccurred. please help on this issue.

Note : We have total 5 Nodes, here two nodes working
fine other nodes are showing like below input/output
error on moved data's.

ls -althr
ls: cannot access LITE_3_0_M4_1_TEST: Input/output error
ls: cannot access LITE_3_0_M4_1_OLD: Input/output error
total 0
d? ? ? 

Re: [ceph-users] why was osd pool default size changed from 2 to 3.

2015-10-23 Thread Gregory Farnum
On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson  wrote:
> Hi
>
> I have been looking for info about "osd pool default size" and the reason
> its 3 as default.
>
> I see it got changed in v0.82 from 2 to 3,
>
> Here its 2.
> http://docs.ceph.com/docs/v0.81/rados/configuration/pool-pg-config-ref/
>
> and in v0.82 its 3.
> http://docs.ceph.com/docs/v0.82/rados/configuration/pool-pg-config-ref/
>
> likewise "osd pool default min size" went from 1 to 1.5 which goes up to 2.
> (Default:0, which means no particular minimum. If 0, minimum is size -
> (size / 2).)
>
> I've looked at the changelog for v0.82 but I cant find the reason for this
> change. I'm interested to know why this change was made, I understand 2 is
> less secure, but did something change which made it less secure after v0.82?
>
> It seems pretty ok if you compare it to a RAID5,6
> Openstack users, and other users which host virtual images on ceph, do you
> use the default "osd pool default min size = 3"?

Nothing changed to make two copies less secure. 3 copies is just so
much more secure and is the number that all the companies providing
support recommend, so we changed the default.
(If you're using it for data you care about, you should really use 3 copies!)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com