Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Gregory Farnum
On Fri, Feb 6, 2015 at 7:11 AM, Dennis Kramer (DT) den...@holmes.nl wrote:

 On Fri, 6 Feb 2015, Gregory Farnum wrote:

 On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) den...@holmes.nl
 wrote:

 I've used the upstream module for our production cephfs cluster, but i've
 noticed a bug where timestamps aren't being updated correctly. Modified
 files are being reset to the beginning of Unix time.

 It looks like this bug only manifest itself in applications like MS
 Office
 where extra metadata is added to files. If I for example modify a text
 file
 in notepad everything is working fine, but when I modify a docx (or .xls
 for
 that matter), the timestamp is getting a reset to 1-1-1970.
 You can imagine that this could be a real dealbreaker for production use
 (think of backups/rsyncs based on mtimes which will render useless).

 Further more the return values for free/total disk space is also not
 working
 correctly when you mount a share in Windows. My 340TB cluster had 7.3EB
 storage available in Windows ;) This could be fixed with a workaround by
 using a custom dfree command = script in the smb.conf, but VFS will
 override this and thus this script will not work (unless you remove the
 lines of codes for these disk operations in vfs_ceph.c).

 My experience with the VFS module is pretty awesome nonetheless. I really
 noticed an improvement in throughput when using this module instead of an
 re-export with the kernel client. So I hope the VFS module will be
 maintained actively again any time soon.


 Can you file bugs for these? The timestamp one isn't anything I've
 heard of before.
 The weird free space on Windows actually does sound familiar; I think
 it has to do with either Windows or the Samba/Windows interface not
 handling our odd block sizes properly...
 -Greg


 Sure, just point me in the right direction for these bug reports.

 It's true BTW, IIRC Windows defaults to 1024k block size for calculating the
 free/total space, but this could be managed by the vfs module. Windows only
 expects two mandatory values, available- and total space in bytes and
 optionally the block size as a third value.

You can just report them at http://tracker.ceph.com/projects/cephfs.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Compilation problem

2015-02-06 Thread David J. Arias
Hello!

I am sysadmin for a small IT consulting enterprise in México. 

We are trying to integrate three servers running RHEL 5.9 into a new
CEPH cluster.

I downloaded the source code and tried compiling it, though I got stuck
with the requirements for leveldb and libblkid.

The versions installed by the OS are behind the ones recommended so I am
wondering if it is possible to compile updated ones from source, install
them in another location (/usr/local/{} )and use those for CEPH.

Upgrading the OS is (although not impossible) difficult since these are
production servers which hold critical applications, and some of those
are legacy ones :-(

I tried googling around but had no luck as to how to accomplish
this, ./configure --help doesn't show anyway and tried --system-root
without success.

I am following the instructions from:

https://wiki.ceph.com/FAQs/What_Kind_of_OS_Does_Ceph_Require%3F
http://docs.ceph.com/docs/master/install/install-storage-cluster/#installing-a-build
http://docs.ceph.com/docs/master/install/#get-software
http://wiki.ceph.com/FAQs

The only data I've found so far although related doesn't really apply to
my case:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-July/041683.html
http://article.gmane.org/gmane.comp.file-systems.ceph.user/3010/match=redhat+5.9

Any help/ideas/pointers would be great.


-- 
Saludos
David J. Arias López M.
---
Toto, I don't think we're in Kansas anymore. -- Judy Garland, Wizard of
Oz 
---



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] updation of container and account while using Swift API

2015-02-06 Thread Abhishek L

Hi 

pragya jain writes:

 Hello all!
 I have some basic questions about the process followed by Ceph
 software when a user use SwiftAPIs for accessing its
 storage1. According to my understanding, to keep the objects listing
 in containers and containers listing in an account, Ceph software
 maintains different pools for accounts and containers. To which
 extent, it is right?2

Yes, they are maintained in different pools.

The pools .rgw stores the buckets, the users/accounts are stored in
.users (or users.uid) pool, IIRC, the list of buckets per user is stored
as omap keys of the user object

. When a user upload an object using Swift APIs, then what procedure
 does Ceph software follow to update object listing and bytes used in
container and account?  Please help in this regard.

The objects are stored in .rgw.buckets pool, a bucket stores the list of
objects it contains as an omap keys of the bucket object. So this would
be the place to look for. 

I'm not sure where the size info for objects/buckets are stored while
doing a HEAD on the swift account, though it would be interesting to know.

I had written some notes on this mostly from different mailing list
discussions in ceph-devel[1] though it was not updated as much I wanted
to.

[1] https://github.com/theanalyst/notes/blob/master/rgw.org#buckets 


 Thank you-RegardsPragya JainDepartment of Computer ScienceUniversity of 
 DelhiDelhi, India___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Abhishek


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Supermicro hardware recommendation

2015-02-06 Thread Mohamed Pakkeer
Hi all,

We are building EC cluster with cache tier for CephFS. We are planning to
use the following 1U chassis along with Intel SSD DC S3700 for cache tier.
It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and
amount of RAM to cater 10 * SSDs?.

http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm


Regards

K.Mohamed Pakkeer



On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de
wrote:

 Hi,

 Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco:
  Hi all,
   I have to build a new Ceph storage cluster, after i‘ve read the
  hardware recommendations and some mail from this mailing list i would
  like to buy these servers:

 just FYI:

 SuperMicro already focuses on ceph with a productline:
 http://www.supermicro.com/solutions/datasheet_Ceph.pdf
 http://www.supermicro.com/solutions/storage_ceph.cfm



 regards,


 Stephan Seitz

 --

 Heinlein Support GmbH
 Schwedter Str. 8/9b, 10119 Berlin

 http://www.heinlein-support.de

 Tel: 030 / 405051-44
 Fax: 030 / 405051-19

 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht
 Berlin-Charlottenburg,
 Geschäftsführer: Peer Heinlein -- Sitz: Berlin


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Thanks  Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Introducing Learning Ceph : The First ever Book on Ceph

2015-02-06 Thread Karan Singh
Hello Community Members

I am happy to introduce the first book on Ceph with the title “Learning Ceph”. 

Me and many folks from the publishing house together with technical reviewers 
spent several months to get this book compiled and published.

Finally the book is up for sale on , i hope you would like it and surely will 
learn a lot from it.

Amazon :  
http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=booksie=UTF8qid=1423174441sr=1-1keywords=ceph
 
http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=booksie=UTF8qid=1423174441sr=1-1keywords=ceph
Packtpub : https://www.packtpub.com/application-development/learning-ceph 
https://www.packtpub.com/application-development/learning-ceph

You can grab the sample copy from here :  
https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0 
https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0

Finally , I would like to express my sincere thanks to 

Sage Weil - For developing Ceph and everything around it as well as writing 
foreword for “Learning Ceph”.
Patrick McGarry - For his usual off the track support that too always.

Last but not the least , to our great community members , who are also 
reviewers of the book Don Talton , Julien Recurt , Sebastien Han and Zihong 
Chen , Thank you guys for your efforts.



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/ http://www.csc.fi/




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)

Hi,

Is the Samba VFS module for CephFS actively maintained at this moment?
I haven't seen much updates in the ceph/samba git repo.

With regards,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 0.80.8 ReplicationPG Fail

2015-02-06 Thread Irek Fasikhov
Morning found that some OSD dropped out of Tier Cache Pool. Maybe it's a
coincidence, but at this point was rollback.

2015-02-05 23:23:18.231723 7fd747ff1700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7fd747ff1700

 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)
 1: /usr/bin/ceph-osd() [0x9bde51]
 2: (()+0xf710) [0x7fd766f97710]
 3: (std::_Rb_tree_decrement(std::_Rb_tree_node_base*)+0xa) [0x7fd7666c1eca]
 4: (ReplicatedPG::make_writeable(ReplicatedPG::OpContext*)+0x14c) [0x87cd5c]
 5: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x1db)
[0x89d29b]
 6: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xcd4) [0x89e0f4]
 7: (ReplicatedPG::do_op(std::tr1::shared_ptrOpRequest)+0x2ca5) [0x8a2a55]
 8: (ReplicatedPG::do_request(std::tr1::shared_ptrOpRequest,
ThreadPool::TPHandle)+0x5b1) [0x832251]
 9: (OSD::dequeue_op(boost::intrusive_ptrPG,
std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle)+0x37c)
[0x61344c]
 10: (OSD::OpWQ::_process(boost::intrusive_ptrPG,
ThreadPool::TPHandle)+0x63d) [0x6472ad]
 11: (ThreadPool::WorkQueueValstd::pairboost::intrusive_ptrPG,
std::tr1::shared_ptrOpRequest , boost::intrusive_ptrPG
::_void_process(void*, ThreadPool::TPHandle)+0xae) [0x67dcde]
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa2a181]
 13: (ThreadPool::WorkThread::entry()+0x10) [0xa2d260]
 14: (()+0x79d1) [0x7fd766f8f9d1]
 15: (clone()+0x6d) [0x7fd765f088fd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.

Are there any ideas? Thank.

http://tracker.ceph.com/issues/10778
-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)
I've used the upstream module for our production cephfs cluster, but i've 
noticed a bug where timestamps aren't being updated correctly. 
Modified files are being reset to the beginning of Unix time.


It looks like this bug only manifest itself in applications like MS Office 
where extra metadata is added to files. If I for example modify a text 
file in notepad everything is working fine, but when I modify a docx (or 
.xls for that matter), the timestamp is getting a reset to 1-1-1970.
You can imagine that this could be a real dealbreaker for production use 
(think of backups/rsyncs based on mtimes which will render useless).


Further more the return values for free/total disk space is also not 
working correctly when you mount a share in Windows. My 340TB cluster had 
7.3EB storage available in Windows ;) This could be fixed with a 
workaround by using a custom dfree command = script in the smb.conf, 
but VFS will override this and thus this script will not work (unless you 
remove the lines of codes for these disk operations in vfs_ceph.c).


My experience with the VFS module is pretty awesome nonetheless. I really 
noticed an improvement in throughput when using this module instead of an 
re-export with the kernel client. So I hope the VFS module will be 
maintained actively again any time soon.



On Fri, 6 Feb 2015, Sage Weil wrote:


On Fri, 6 Feb 2015, Dennis Kramer (DT) wrote:

Hi,

Is the Samba VFS module for CephFS actively maintained at this moment?
I haven't seen much updates in the ceph/samba git repo.


You should really ignore the ceph/samba fork; it isn't used.  The Ceph VFS
driver is upstream in Samba and maintained there.

That said, it isn't being actively developed at the moment, but I'm hoping
to change that shortly!  We do some basic nightly testing in the ceph lab
but I'd be very interested in hearing about users' experiences.

Thanks!
sage




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] parsing ceph -s and how much free space, really?

2015-02-06 Thread pixelfairy
heres output of 'ceph -s' from a kvm instance running as a ceph node.
all 3 nodes are monitors, each with 6 4gig osds.

mon_osd_full ratio: .611
mon_osd_nearfull ratio: .60

whats 23689MB used? is that a buffer because of mon_osd_full ratio?

is there a way to query a pool for how much usable space is really
available to clients? for example, in this case, 3 nodes, 6osds, 4G
each = 72G, so with a replica size of 3, id like to see something that
says close 20G available, 1.7G in use.

ceph3:~# ceph -s
cluster 2198abdb-2669-438a-8673-fc4f226a226c
 health HEALTH_OK
 monmap e1: 3 mons at
{ceph1=172.21.0.31:6789/0,ceph2=172.21.0.32:6789/0,ceph3=172.21.0.33:6789/0},
election epoch 16, quorum 0,1,2 ceph1,ceph2,ceph3
 osdmap e104: 18 osds: 18 up, 18 in
  pgmap v5557: 600 pgs, 1 pools, 1694 MB data, 432 objects
23689 MB used, 49858 MB / 73548 MB avail
 600 active+clean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing Learning Ceph : The First ever Book on Ceph

2015-02-06 Thread pixelfairy
congrats!

page 17, xen is spelled with an X, not Z.

On Fri, Feb 6, 2015 at 1:17 AM, Karan Singh karan.si...@csc.fi wrote:
 Hello Community Members

 I am happy to introduce the first book on Ceph with the title “Learning
 Ceph”.

 Me and many folks from the publishing house together with technical
 reviewers spent several months to get this book compiled and published.

 Finally the book is up for sale on , i hope you would like it and surely
 will learn a lot from it.

 Amazon :
 http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=booksie=UTF8qid=1423174441sr=1-1keywords=ceph
 Packtpub : https://www.packtpub.com/application-development/learning-ceph

 You can grab the sample copy from here :
 https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0

 Finally , I would like to express my sincere thanks to

 Sage Weil - For developing Ceph and everything around it as well as writing
 foreword for “Learning Ceph”.
 Patrick McGarry - For his usual off the track support that too always.

 Last but not the least , to our great community members , who are also
 reviewers of the book Don Talton , Julien Recurt , Sebastien Han and Zihong
 Chen , Thank you guys for your efforts.


 
 Karan Singh
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] journal placement for small office?

2015-02-06 Thread pixelfairy
3 nodes, each with 2x1TB in a raid (for /) and 6x4TB for storage. all
of this will be used for block devices for kvm instances. typical
office stuff. databases, file servers, internal web servers, a couple
dozen thin clients. not using the object store or cephfs.

i was thinking about putting the journals on the root disk (this is
how my virtual cluster works, because in that version the osds are 4G
instead of 4TB), and keeping that on its current raid 1, for
resiliency but im worried about making a performance bottleneck.
tempted to swap these out with ssds. if so, how big should i get? is
1/2TB enough?

the other thought was little partitions on each osd. were doing xfs
because i dont know enough about brtfs to feel comfortable with that.
would the performance degredation be worse?

is there a better way?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] replica or erasure coding for small office?

2015-02-06 Thread pixelfairy
is there any reliability trade off with erasure coding vs a relica size of 3?

how would you get the most out of 6x4TB osds in 3 nodes?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Replacing an OSD Drive

2015-02-06 Thread Gaylord Holder

When the time comes to replace an OSD I've used the following procedure

1) Stop/down/out the osd and replace the drive
2) Create the ceph osd directory: ceph-osd -i N --mkfs
3) Copy the osd key out of the authorized keys list
4) ceph osd crush rm osd.N
5) ceph osd crush add osd.$i $osd_size root=default host=$(hostname -s)
6) ceph osd in osd.N
7) service ceph start osd.N

If I don't do steps 4 and 5, the osd process times out in futex:

[pid 22822] futex(0x4604cc4, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 98, {1423237460, 
296281000},  unfinished ...

[pid 22821] futex(0x4604cc0, FUTEX_WAKE_PRIVATE, 1 unfinished ...
[pid 22822] ... futex resumed )   = -1 EAGAIN (Resource 
temporarily unavailable)


Upping the debugging only shows:

2015-02-06 10:48:22.656012 7f9acf967700 20 osd.40 396 update_osd_stat 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])
2015-02-06 10:48:22.656025 7f9acf967700  5 osd.40 396 heartbeat: 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])

2015-02-06 10:48:23.356299 7f9ae76c7700  5 osd.40 396 tick
2015-02-06 10:48:23.356308 7f9ae76c7700 10 osd.40 396 do_waiters -- start
2015-02-06 10:48:23.356310 7f9ae76c7700 10 osd.40 396 do_waiters -- finish
2015-02-06 10:48:24.356114 7f9acf967700 20 osd.40 396 update_osd_stat 
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op 
hist [])


in the osd log file.

What is ceph-osd doing that recreating the osd in the crush map changes?

Thanks for any enlightenment on this.
-Gaylord

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-06 Thread Hector Martin
On 06/02/15 21:07, Udo Lembke wrote:
 Am 06.02.2015 09:06, schrieb Hector Martin:
 On 02/02/15 03:38, Udo Lembke wrote:
 With 3 hosts only you can't survive an full node failure, because for
 that you need
 host = k + m.

 Sure you can. k=2, m=1 with the failure domain set to host will survive
 a full host failure.

 
 Hi,
 Alexandre has the requirement of 2 failed disk or one full node failure.
 This is the reason why I wrote, that this is not possible...

But it is, I just explained how that can be achieved with only 3 nodes,
with k=4, m=2, and a custom CRUSH rule. Placing precisely two chunks on
each host, on two distinct OSDs, satisfies this requirement: any two
OSDs can fail (leaving 4/6 chunks) or any host can fail (again leaving
4/6 chunks).

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Dennis Kramer (DT)


On Fri, 6 Feb 2015, Gregory Farnum wrote:


On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) den...@holmes.nl wrote:

I've used the upstream module for our production cephfs cluster, but i've
noticed a bug where timestamps aren't being updated correctly. Modified
files are being reset to the beginning of Unix time.

It looks like this bug only manifest itself in applications like MS Office
where extra metadata is added to files. If I for example modify a text file
in notepad everything is working fine, but when I modify a docx (or .xls for
that matter), the timestamp is getting a reset to 1-1-1970.
You can imagine that this could be a real dealbreaker for production use
(think of backups/rsyncs based on mtimes which will render useless).

Further more the return values for free/total disk space is also not working
correctly when you mount a share in Windows. My 340TB cluster had 7.3EB
storage available in Windows ;) This could be fixed with a workaround by
using a custom dfree command = script in the smb.conf, but VFS will
override this and thus this script will not work (unless you remove the
lines of codes for these disk operations in vfs_ceph.c).

My experience with the VFS module is pretty awesome nonetheless. I really
noticed an improvement in throughput when using this module instead of an
re-export with the kernel client. So I hope the VFS module will be
maintained actively again any time soon.


Can you file bugs for these? The timestamp one isn't anything I've
heard of before.
The weird free space on Windows actually does sound familiar; I think
it has to do with either Windows or the Samba/Windows interface not
handling our odd block sizes properly...
-Greg



Sure, just point me in the right direction for these bug reports.

It's true BTW, IIRC Windows defaults to 1024k block size for calculating 
the free/total space, but this could be managed by the vfs module. 
Windows only expects two mandatory values, available- and total space in 
bytes and optionally the block size as a third value.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-06 Thread Udo Lembke
Am 06.02.2015 09:06, schrieb Hector Martin:
 On 02/02/15 03:38, Udo Lembke wrote:
 With 3 hosts only you can't survive an full node failure, because for
 that you need
 host = k + m.
 
 Sure you can. k=2, m=1 with the failure domain set to host will survive
 a full host failure.
 

Hi,
Alexandre has the requirement of 2 failed disk or one full node failure.
This is the reason why I wrote, that this is not possible...

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure code : number of chunks for a small cluster ?

2015-02-06 Thread Alexandre DERUMIER
Oh, I didn't thinked about this.

Thanks Hector !


- Mail original -
De: Hector Martin hec...@marcansoft.com
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Vendredi 6 Février 2015 09:06:29
Objet: Re: [ceph-users] erasure code : number of chunks for a small cluster ?

On 02/02/15 03:38, Udo Lembke wrote: 
 With 3 hosts only you can't survive an full node failure, because for 
 that you need 
 host = k + m. 

Sure you can. k=2, m=1 with the failure domain set to host will survive 
a full host failure. 

Configuring an encoding that survives one full host failure or two OSDs 
anywhere on the cluster is possible. Use k=4, m=2, then define a CRUSH 
rule like this: 

step take default 
step choose indep 3 type host 
step choose indep 2 type osd 
step emit 

That will ensure that for each PG, each host gets two chunks on two 
independent OSDs. That means that you can lose any pair of OSDs (since 
no PG will have two chunks on the same OSD, and the encoding can survive 
a two-chunk loss). You can also lose any host, which will cause the loss 
of exactly two chunks for every PG. 

Of course, with a setup like this, if you lose a host, the cluster will 
be degraded until you can bring the host back, and will not be able to 
recover those chunks anywhere (since the ruleset prevents so), so any 
further failure of an OSD while a host is down will necessarily lose data. 

-- 
Hector Martin (hec...@marcansoft.com) 
Public Key: https://marcan.st/marcan.asc 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Status of SAMBA VFS

2015-02-06 Thread Gregory Farnum
On Fri, Feb 6, 2015 at 6:39 AM, Dennis Kramer (DT) den...@holmes.nl wrote:
 I've used the upstream module for our production cephfs cluster, but i've
 noticed a bug where timestamps aren't being updated correctly. Modified
 files are being reset to the beginning of Unix time.

 It looks like this bug only manifest itself in applications like MS Office
 where extra metadata is added to files. If I for example modify a text file
 in notepad everything is working fine, but when I modify a docx (or .xls for
 that matter), the timestamp is getting a reset to 1-1-1970.
 You can imagine that this could be a real dealbreaker for production use
 (think of backups/rsyncs based on mtimes which will render useless).

 Further more the return values for free/total disk space is also not working
 correctly when you mount a share in Windows. My 340TB cluster had 7.3EB
 storage available in Windows ;) This could be fixed with a workaround by
 using a custom dfree command = script in the smb.conf, but VFS will
 override this and thus this script will not work (unless you remove the
 lines of codes for these disk operations in vfs_ceph.c).

 My experience with the VFS module is pretty awesome nonetheless. I really
 noticed an improvement in throughput when using this module instead of an
 re-export with the kernel client. So I hope the VFS module will be
 maintained actively again any time soon.

Can you file bugs for these? The timestamp one isn't anything I've
heard of before.
The weird free space on Windows actually does sound familiar; I think
it has to do with either Windows or the Samba/Windows interface not
handling our odd block sizes properly...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com