[ceph-users] help,ceph stuck in pg creating and never end

2015-01-16 Thread wrong
Hi all,
My CEPH cluster is stuck when re-creating the PG ,following are the information 
Version 0.9   OS centos 7.0 
How can I do a further analyze ,thanks.
ceph@dev140 ~]$ ceph -s 
cluster 84d08382-9f31-4d1b-9870-75e6975f69fe
 health HEALTH_WARN
9 pgs stuck inactive
9 pgs stuck unclean
13 requests are blocked > 32 sec
recovery 926/26092 unfound (3.549%)
mds 0 is laggy
 monmap e3: 3 mons at 
{dev140=192.168.193.140:6789/0,dev141=192.168.193.141:6789/0,dev144=192.168.193.144:6789/0}
election epoch 8, quorum 0,1,2 dev140,dev141,dev144
 mdsmap e7: 1/1/1 up {0=0=up:active(laggy or crashed)}
 osdmap e1472: 7 osds: 7 up, 7 in
  pgmap v12486: 364 pgs, 4 pools, 101 GB data, 26092 objects
210 GB used, 1082 GB / 1293 GB avail
926/26092 unfound (3.549%)
   9 creating
 353 active+clean
   2 active+clean+scrubbing+deep



[ceph@dev140 ~]$ ceph osd tree
# idweight  type name   up/down reweight
-1  7   root default
-2  2   host dev141
0   1   osd.0   up  1
3   1   osd.3   up  1
-3  2   host dev143
1   1   osd.1   up  1
4   1   osd.4   up  1
-4  1   host dev140
2   1   osd.2   up  1
-5  2   host dev144
5   1   osd.5   up  1
6   1   osd.6   up  1



[ceph@dev140 ~]$ ceph pg dump | grep creating
dumped all in format plain
2.430   0   0   0   0   0   0   0   
creating2015-01-14 08:54:45.286672  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.320   0   0   0   0   0   0   0   
creating2015-01-14 08:55:07.210229  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.300   0   0   0   0   0   0   0   
creating2015-01-14 08:55:10.775056  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.290   0   0   0   0   0   0   0   
creating2015-01-14 08:54:58.151515  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.200   0   0   0   0   0   0   0   
creating2015-01-14 08:54:30.055803  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.150   0   0   0   0   0   0   0   
creating2015-01-14 08:46:23.051705  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.140   0   0   0   0   0   0   0   
creating2015-01-14 08:54:52.968815  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.d 0   0   0   0   0   0   0   0   
creating2015-01-13 18:22:33.626593  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00
0.6 0   0   0   0   0   0   0   0   
creating2015-01-14 08:55:02.627555  0'0 0:0 []  -1  
[]  -1  0'0 0.000'0 0.00___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding monitors to osd nodes failed

2015-01-16 Thread Hoc Phan
Hi, I am adding two monitors in osd0 (node2) and osd1 (node3) in the "ADD 
MONITORS" step http://ceph.com/docs/master/start/quick-ceph-deploy/
But it failed to create 2 monitors http://pastebin.com/aSPwKs0H
Can you help why?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread JM
Hi Roland,

You should tune your Ceph Crushmap with a custom rule in order to do that
(write first on s3 and then to others). This custom rule will be applied
then to your proxmox pool.
(what you want to do is only interesting if you run VM from host s3)

Can you give us your crushmap ?



2015-01-13 22:03 GMT+01:00 Roland Giesler :

> I have a 4 node ceph cluster, but the disks are not equally distributed
> across all machines (they are substantially different from each other)
>
> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
> and two machines have only two 1 TB drives each (s2 & s1).
>
> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
> mostly from there, but I want to make sure that the writes that happen to
> the ceph cluster get written to the "local" osd's on s3 first and then the
> additional writes/copies get done to the network.
>
> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
> relevant.
>
> regards
>
>
> *Roland *
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Spark/Mesos on top of Ceph/Btrfs

2015-01-16 Thread wireless

On 01/14/2015 07:37 AM, John Spray wrote:

On Tue, Jan 13, 2015 at 1:25 PM, James  wrote:

I was wondering if anyone has Mesos running on top of Ceph?
I want to test/use Ceph if lieu of HDFS.


You might be interested in http://ceph.com/docs/master/cephfs/hadoop/

It allows you to expose CephFS to applications that use expect HDFS.

However, as I understand it HDFS is optional with Mesos anyway, so
it's not completely clear what you're trying to accomplish.

Cheers,
John




Hello one and all,

I am suppose to be able to post to this group via gmane, but
I'm not seeing the postings via gmane:

http://news.gmane.org/gmane.comp.file-systems.ceph.user

Maybe my application to this group did not get processed?


Long version (hopefully more clear)?

I want a distributed, heterogeneous cluster, without Hadoop. Spark 
(in-memory) processing [1] of large FEM [2] (Finite Element [math] 
Methods) is the daunting application that will be used in all sorts of 
scientific simulations with very large  datasets. This will also include 
rendering some very complex 3D video/simulations of  fluid type flows 
[3]. Hopefully the simulations will be computed and rendered in real 
time (less that 200ms of latency). Surely other massive types of 
scientific simulations can benefit from Spark/mesos/cephfs/btrfs, imho.


Also being able to use the cluster for routine distcc compilations, 
Continuous Integration [4], log file processing, security scans and most 
other forms of routine server usage as of keen interest too.



Eventually, the cluster(s) will use both the GPUs, x64 and the new 
Arm_64 bit processors, all with as much ram as possible. This is a long 
journey, but I believe that cephfs on top of btrfs will eventually 
mature into the robust solution that is necessary.


The other portions of the solution like Distribute Features (Chronos, 
ansible/puppet/chef, DB  etc etc will also be needed, but there does 
seem to be an abundance of choices for those needs; so discussion

is warmly received in these areas too, as they relate to cephfs/btrfs.


Cephfs  on top of Btrfs is the most challenging part of this journey so
far. I use openrc on gentoo, and have no interest in systemd, just so
you know.


James

[1] https://amplab.cs.berkeley.edu/

[2] http://dune.mathematik.uni-freiburg.de/

[3] http://www.opengeosys.org/

[4] http://www.zentoo.org/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] subscribe

2015-01-16 Thread wireless

subscribe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs modification time

2015-01-16 Thread 严正
I tracked down the bug. Please try the attached patch

Regards
Yan, Zheng




patch
Description: Binary data

> 在 2015年1月13日,07:40,Gregory Farnum  写道:
> 
> Zheng, this looks like a kernel client issue to me, or else something
> funny is going on with the cap flushing and the timestamps (note how
> the reading client's ctime is set to an even second, while the mtime
> is ~.63 seconds later and matches what the writing client sees). Any
> ideas?
> -Greg
> 
> On Mon, Jan 12, 2015 at 12:19 PM, Lorieri  wrote:
>> Hi Gregory,
>> 
>> 
>> $ uname -a
>> Linux coreos2 3.17.7+ #2 SMP Tue Jan 6 08:22:04 UTC 2015 x86_64
>> Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz GenuineIntel GNU/Linux
>> 
>> 
>> Kernel Client, using  `mount -t ceph ...`
>> 
>> 
>> core@coreos2 /var/run/systemd/system $ modinfo ceph
>> filename:   /lib/modules/3.17.7+/kernel/fs/ceph/ceph.ko
>> license:GPL
>> description:Ceph filesystem for Linux
>> author: Patience Warnick 
>> author: Yehuda Sadeh 
>> author: Sage Weil 
>> alias:  fs-ceph
>> depends:libceph
>> intree: Y
>> vermagic:   3.17.7+ SMP mod_unload
>> signer: Magrathea: Glacier signing key
>> sig_key:D4:BB:DE:E9:C6:D8:FC:90:9F:23:59:B2:19:1B:B8:FA:57:A1:AF:D2
>> sig_hashalgo:   sha256
>> 
>> core@coreos2 /var/run/systemd/system $ modinfo libceph
>> filename:   /lib/modules/3.17.7+/kernel/net/ceph/libceph.ko
>> license:GPL
>> description:Ceph filesystem for Linux
>> author: Patience Warnick 
>> author: Yehuda Sadeh 
>> author: Sage Weil 
>> depends:libcrc32c
>> intree: Y
>> vermagic:   3.17.7+ SMP mod_unload
>> signer: Magrathea: Glacier signing key
>> sig_key:D4:BB:DE:E9:C6:D8:FC:90:9F:23:59:B2:19:1B:B8:FA:57:A1:AF:D2
>> sig_hashalgo:   sha256
>> 
>> 
>> 
>> ceph is installed on a ubuntu containers (same kernel):
>> 
>> $ dpkg -l |grep ceph
>> 
>> ii  ceph 0.87-1trusty
>> amd64distributed storage and file system
>> ii  ceph-common  0.87-1trusty
>> amd64common utilities to mount and interact with a ceph
>> storage cluster
>> ii  ceph-fs-common   0.87-1trusty
>> amd64common utilities to mount and interact with a ceph file
>> system
>> ii  ceph-fuse0.87-1trusty
>> amd64FUSE-based client for the Ceph distributed file system
>> ii  ceph-mds 0.87-1trusty
>> amd64metadata server for the ceph distributed file system
>> ii  libcephfs1   0.87-1trusty
>> amd64Ceph distributed file system client library
>> ii  python-ceph  0.87-1trusty
>> amd64Python libraries for the Ceph distributed filesystem
>> 
>> 
>> 
>> Reproducing the error:
>> 
>> at machine 1:
>> core@coreos1 /var/lib/deis/store/logs $ > test.log
>> core@coreos1 /var/lib/deis/store/logs $ echo 1 > test.log
>> core@coreos1 /var/lib/deis/store/logs $ stat test.log
>>  File: 'test.log'
>>  Size: 2 Blocks: 1  IO Block: 4194304 regular file
>> Device: 0h/0d Inode: 1099511629882  Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (  500/core)   Gid: (  500/core)
>> Access: 2015-01-12 20:05:03.0 +
>> Modify: 2015-01-12 20:06:09.637234229 +
>> Change: 2015-01-12 20:06:09.637234229 +
>> Birth: -
>> 
>> at machine 2:
>> core@coreos2 /var/lib/deis/store/logs $ stat test.log
>>  File: 'test.log'
>>  Size: 2 Blocks: 1  IO Block: 4194304 regular file
>> Device: 0h/0d Inode: 1099511629882  Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (  500/core)   Gid: (  500/core)
>> Access: 2015-01-12 20:05:03.0 +
>> Modify: 2015-01-12 20:06:09.637234229 +
>> Change: 2015-01-12 20:06:09.0 +
>> Birth: -
>> 
>> 
>> Change time is not updated making some tail libs to not show new
>> content until you force the change time be updated, like running a
>> "touch" in the file.
>> Some tools freeze and trigger other issues in the system.
>> 
>> 
>> Tests, all in the machine #2:
>> 
>> FAILED -> https://github.com/ActiveState/tail
>> FAILED -> /usr/bin/tail of a Google docker image running debian wheezy
>> PASSED -> /usr/bin/tail of a ubuntu 14.04 docker image
>> PASSED -> /usr/bin/tail of the coreos release 494.5.0
>> 
>> 
>> Tests in machine #1 (same machine that is writing the file) all tests pass.
>> 
>> 
>> 
>> On Mon, Jan 12, 2015 at 5:14 PM, Gregory Farnum  wrote:
>>> What versions of all the Ceph pieces are you using? (Kernel
>>> client/ceph-fuse, MDS, etc)
>>> 
>>> Can you provide more details on exactly what the program is doing on
>>> which nodes?
>>> -Greg
>>> 
>>> On Fri, Jan 9, 2015 at 5:15 PM, Lorieri  wrote:
 first 3 stat commands shows blocks and size changing, but not the times
 after a touch it changes and tail works
 
 I saw some cephfs freezes related to it, it came back after touching the 
 file

Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 12:08, JM  wrote:

> Hi Roland,
>
> You should tune your Ceph Crushmap with a custom rule in order to do that
> (write first on s3 and then to others). This custom rule will be applied
> then to your proxmox pool.
> (what you want to do is only interesting if you run VM from host s3)
>
> Can you give us your crushmap ?
>

Please note that I made a mistake in my email.  The machine that I want to
run on write first, is S1 not S3

For the life of me I cannot find how to extract the crush map.  I found:

ceph osd getcrushmap -o crushfilename

Where can I find the crush file?  I've never needed this.
This is my first installation, so please bear with my while I learn!

Lionel: I read what you're saying.  However, the strange thing is that last
year I had this Windows 2008 VM running on the same cluster without changes
and coming back from leave in the new year, it has crawled to a painfully
slow state.  And I don't quite know where to start to trace this.  The
windows machine is not the problem, since even before windows starts up the
boot process of the VM is very slow.

thanks

Roland




>
>
>
> 2015-01-13 22:03 GMT+01:00 Roland Giesler :
>
>> I have a 4 node ceph cluster, but the disks are not equally distributed
>> across all machines (they are substantially different from each other)
>>
>> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>> and two machines have only two 1 TB drives each (s2 & s1).
>>
>> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>> mostly from there, but I want to make sure that the writes that happen to
>> the ceph cluster get written to the "local" osd's on s3 first and then the
>> additional writes/copies get done to the network.
>>
>> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>> relevant.
>>
>> regards
>>
>>
>> *Roland *
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cold-storage tuning Ceph

2015-01-16 Thread Martin Millnert
Hello list,

I'm currently trying to understand what I can do with Ceph to optimize
it for a cold-storage (write-once, read-very-rarely) like scenario,
trying to compare cost against LTO-6 tape.

There is a single main objective:
 - minimal cost/GB/month of operations (including power, DC)

To achieve this, I can break it down to:
 - Use best cost/GB HDD
   * SMR today
 - Minimal cost/3.5"-slot
 - Minimal power-utilization/drive

While staying within what is available today, I don't imagine going to
power-down individual disk slots using IPMI etc, as some vendors allow.

Now, putting Ceph on this, drives will be on, but it would be very
useful to be able to spin-down drives that aren't used.

It then seems to me that I want to do a few things with Ceph:
 - Have only a subest of the cluster 'active' for writes at any point in
   time
 - Yet, still have the entire cluster online and available for reads
 - Minimize concurrent OSD operations in a node that uses RAM, e.g.
   - Scrubbing, minimal number of OSDs active (ideally max 1)
   - In general, minimize concurrent "active" OSDs as per above
 - Minimize risk that any type of re-balancing of data occurs at all
   - E.g. use a "high" number of EC parity chunks


Assuming e.g. 16 drives/host and 10TB drives, at ~100MB/s read and
nearly full cluster, deep scrubbing the host would take 18.5 days.
This means roughly 2 deep scrubs per month.
Using EC pool, I wouldn't be very worried about errors, so perhaps
that's ok (calculable), but they need to be repaired obviously.
Mathematically, I can use an increase of parity chunks to lengthen the
interval between deep scrubs.


Is there anyone on the list who can provide some thoughts on the
higher-order goal of "Minimizing concurrently active OSDs in a node"?

I imagine I need to steer writes towards a subset of the system - but I
have no idea how to implement it - using multiple separate clusters eg.
each OSD on a node participate in unique clusters could perhaps help.

Any feedback appreciated.  It does appear a hot topic (pun intended).

Best,
Martin


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph, LIO, VMWARE anyone?

2015-01-16 Thread Jake Young
Yes, it's active/active and I found that VMWare can switch from path to
path with no issues or service impact.


I posted some config files here: github.com/jak3kaj/misc

One set is from my LIO nodes, both the primary and secondary configs so you
can see what I needed to make unique.  The other set (targets.conf) are
from my tgt nodes.  They are both 4 LUN configs.

Like I said in my previous email, there is no performance difference
between LIO and tgt.  The only service I'm running on these nodes is a
single iscsi target instance (either LIO or tgt).

Jake

On Wed, Jan 14, 2015 at 8:41 AM, Nick Fisk  wrote:

> Hi Jake,
>
>
>
> I can’t remember the exact details, but it was something to do with a
> potential problem when using the pacemaker resource agents. I think it was
> to do with a potential hanging issue when one LUN on a shared target failed
> and then it tried to kill all the other LUNS to fail the target over to
> another host. This then leaves the TCM part of LIO locking the RBD which
> also can’t fail over.
>
>
>
> That said I did try multiple LUNS on one target as a test and didn’t
> experience any problems.
>
>
>
> I’m interested in the way you have your setup configured though. Are you
> saying you effectively have an active/active configuration with a path
> going to either host, or are you failing the iSCSI IP between hosts? If
> it’s the former, have you had any problems with scsi
> locking/reservations…etc between the two targets?
>
>
>
> I can see the advantage to that configuration as you reduce/eliminate a
> lot of the troubles I have had with resources failing over.
>
>
>
> Nick
>
>
>
> *From:* Jake Young [mailto:jak3...@gmail.com]
> *Sent:* 14 January 2015 12:50
> *To:* Nick Fisk
> *Cc:* Giuseppe Civitella; ceph-users
> *Subject:* Re: [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Nick,
>
>
>
> Where did you read that having more than 1 LUN per target causes stability
> problems?
>
>
>
> I am running 4 LUNs per target.
>
>
>
> For HA I'm running two linux iscsi target servers that map the same 4 rbd
> images. The two targets have the same serial numbers, T10 address, etc.  I
> copy the primary's config to the backup and change IPs. This way VMWare
> thinks they are different target IPs on the same host. This has worked very
> well for me.
>
>
>
> One suggestion I have is to try using rbd enabled tgt. The performance is
> equivalent to LIO, but I found it is much better at recovering from a
> cluster outage. I've had LIO lock up the kernel or simply not recognize
> that the rbd images are available; where tgt will eventually present the
> rbd images again.
>
>
>
> I have been slowly adding servers and am expanding my test setup to a
> production setup (nice thing about ceph). I now have 6 OSD hosts with 7
> disks on each. I'm using the LSI Nytro cache raid controller, so I don't
> have a separate journal and have 40Gb networking. I plan to add another 6
> OSD hosts in another rack in the next 6 months (and then another 6 next
> year). I'm doing 3x replication, so I want to end up with 3 racks.
>
>
>
> Jake
>
> On Wednesday, January 14, 2015, Nick Fisk  wrote:
>
> Hi Giuseppe,
>
>
>
> I am working on something very similar at the moment. I currently have it
> working on some test hardware but seems to be working reasonably well.
>
>
>
> I say reasonably as I have had a few instability’s but these are on the HA
> side, the LIO and RBD side of things have been rock solid so far. The main
> problems I have had seem to be around recovering from failure with
> resources ending up in a unmanaged state. I’m not currently using fencing
> so this may be part of the cause.
>
>
>
> As a brief description of my configuration.
>
>
>
> 4 Hosts each having 2 OSD’s also running the monitor role
>
> 3 additional host in a HA cluster which act as iSCSI proxy nodes.
>
>
>
> I’m using the IP, RBD, iSCSITarget and iSCSILUN resource agents to provide
> HA iSCSI LUN which maps back to a RBD. All the agents for each RBD are in a
> group so they follow each other between hosts.
>
>
>
> I’m using 1 LUN per target as I read somewhere there are stability
> problems using more than 1 LUN per target.
>
>
>
> Performance seems ok, I can get about 1.2k random IO’s out the iSCSI LUN.
> These seems to be about right for the Ceph cluster size, so I don’t think
> the LIO part is causing any significant overhead.
>
>
>
> We should be getting our production hardware shortly which wil have 40
> OSD’s with journals and a SSD caching tier, so within the next month or so
> I will have a better idea of running it in a production environment and the
> performance of the system.
>
>
>
> Hope that helps, if you have any questions, please let me know.
>
>
>
> Nick
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Giuseppe Civitella
> *Sent:* 13 January 2015 11:23
> *To:* ceph-users
> *Subject:* [ceph-users] Ceph, LIO, VMWARE anyone?
>
>
>
> Hi all,
>
>
>
> I'm working on a lab setup regarding C

Re: [ceph-users] Part 2: ssd osd fails often with "FAILED assert(soid < scrubber.start || soid >= scrubber.end)"

2015-01-16 Thread Udo Lembke
Hi Loic,
thanks for the answer. I hope it's not like in
http://tracker.ceph.com/issues/8747 where the issue happens with an
patched version if understand right.

So I must only wait few month ;-) for an backport...

Udo

Am 14.01.2015 09:40, schrieb Loic Dachary:
> Hi,
> 
> This is http://tracker.ceph.com/issues/8011 which is being
> backported.
> 
> Cheers
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread JM
# Get the compiled crushmap
root@server01:~# ceph osd getcrushmap -o /tmp/myfirstcrushmap

# Decompile the compiled crushmap above
root@server01:~# crushtool -d /tmp/myfirstcrushmap -o
/tmp/myfirstcrushmap.txt

then give us your /tmp/myfirstcrushmap.txt file.. :)


2015-01-14 17:36 GMT+01:00 Roland Giesler :

> On 14 January 2015 at 12:08, JM  wrote:
>
>> Hi Roland,
>>
>> You should tune your Ceph Crushmap with a custom rule in order to do that
>> (write first on s3 and then to others). This custom rule will be applied
>> then to your proxmox pool.
>> (what you want to do is only interesting if you run VM from host s3)
>>
>> Can you give us your crushmap ?
>>
>
> Please note that I made a mistake in my email.  The machine that I want to
> run on write first, is S1 not S3
>
> For the life of me I cannot find how to extract the crush map.  I found:
>
> ceph osd getcrushmap -o crushfilename
>
> Where can I find the crush file?  I've never needed this.
> This is my first installation, so please bear with my while I learn!
>
> Lionel: I read what you're saying.  However, the strange thing is that
> last year I had this Windows 2008 VM running on the same cluster without
> changes and coming back from leave in the new year, it has crawled to a
> painfully slow state.  And I don't quite know where to start to trace
> this.  The windows machine is not the problem, since even before windows
> starts up the boot process of the VM is very slow.
>
> thanks
>
> Roland
>
>
>
>
>>
>>
>>
>> 2015-01-13 22:03 GMT+01:00 Roland Giesler :
>>
>>> I have a 4 node ceph cluster, but the disks are not equally distributed
>>> across all machines (they are substantially different from each other)
>>>
>>> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>>> and two machines have only two 1 TB drives each (s2 & s1).
>>>
>>> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>>> mostly from there, but I want to make sure that the writes that happen to
>>> the ceph cluster get written to the "local" osd's on s3 first and then the
>>> additional writes/copies get done to the network.
>>>
>>> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>>> relevant.
>>>
>>> regards
>>>
>>>
>>> *Roland *
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
So you can see my server names and their osd's too...

# idweighttype nameup/downreweight
-111.13root default
-28.14host h1
 1 0.9osd.1 up1
 3 0.9osd.3 up1
 4 0.9osd.4 up1
 50.68osd.5 up1
 60.68osd.6 up1
 70.68osd.7 up1
 80.68osd.8 up1
 90.68osd.9 up1
100.68osd.10up1
110.68osd.11up1
120.68osd.12up1
-30.45host s3
 20.45osd.2 up1
-4 0.9host s2
13 0.9osd.13up1
-51.64host s1
140.29osd.14up1
 00.27osd.0down   0
150.27osd.15up1
160.27osd.16up1
170.27osd.17up1
180.27osd.18up1

regards

Roland
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Part 2: ssd osd fails often with "FAILED assert(soid < scrubber.start || soid >= scrubber.end)"

2015-01-16 Thread Loic Dachary


On 14/01/2015 18:33, Udo Lembke wrote:
> Hi Loic,
> thanks for the answer. I hope it's not like in
> http://tracker.ceph.com/issues/8747 where the issue happens with an
> patched version if understand right.

http://tracker.ceph.com/issues/8747 is a duplicate of 
http://tracker.ceph.com/issues/8011 indeed :-)
> 
> So I must only wait few month ;-) for an backport...
> 
> Udo
> 
> Am 14.01.2015 09:40, schrieb Loic Dachary:
>> Hi,
>>
>> This is http://tracker.ceph.com/issues/8011 which is being
>> backported.
>>
>> Cheers
>>
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible to compile and use ceph with Raspberry Pi single-board computers?

2015-01-16 Thread Prof. Dr. Christian Baun
Hi all,

I try to compile and use Ceph on a cluster of Raspberry Pi
single-board computers with Raspbian as operating system. I tried it
this way:

wget http://ceph.com/download/ceph-0.91.tar.bz2
tar -xvjf ceph-0.91.tar.bz2
cd ceph-0.91
./autogen.sh
./configure  --without-tcmalloc
make -j2

But result, I got this error message:

...
 CC common/module.lo
 CXXcommon/Readahead.lo
 CXXcommon/Cycles.lo
In file included from common/Cycles.cc:38:0:
common/Cycles.h:76:2: error: #error No high-precision counter
available for your OS/arch
common/Cycles.h: In static member function 'static uint64_t Cycles::rdtsc()':
common/Cycles.h:78:3: warning: no return statement in function
returning non-void [-Wreturn-type]
Makefile:13166: recipe for target 'common/Cycles.lo' failed
make[3]: *** [common/Cycles.lo] Error 1
make[3]: *** Waiting for unfinished jobs
make[3]: Leaving directory '/usr/src/ceph-0.91/src'
Makefile:17129: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/usr/src/ceph-0.91/src'
Makefile:6645: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/usr/src/ceph-0.91/src'
Makefile:405: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

Is it possible at all to build and use Ceph on the ARMv6 architecture?

Thanks for any help.

Best Regards
   Christian Baun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with Rados gateway

2015-01-16 Thread Walter Valenti






- Messaggio originale -
> Da: Yehuda Sadeh 
> A: Walter Valenti 
> Cc: "ceph-users@lists.ceph.com" 
> Inviato: Martedì 13 Gennaio 2015 1:13
> Oggetto: Re: [ceph-users] Problem with Rados gateway
> 
>T ry setting 'rgw print continue = false' in your ceph.conf.
> 
> Yehuda


Thanks, but already I've got rgw_print_continue = false

 

Walter
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread Wido den Hollander
On 01/16/2015 08:37 AM, Mohd Bazli Ab Karim wrote:
> Dear Ceph-Users, Ceph-Devel,
> 
> Apologize me if you get double post of this email.
> 
> I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
> down and only 1 up) at the moment.
> Plus I have one CephFS client mounted to it.
> 

In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

> Now, the MDS always get aborted after recovery and active for 4 secs.
> Some parts of the log are as below:
> 
> -3> 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.19 10.4.118.32:6821/243161 73  osd_op_re
> ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
> uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
> 7770bc80 con 0x69c7dc0
> -2> 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.18 10.4.118.32:6818/243072 67  osd_op_re
> ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
> 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
> 0x1c6bb00
> -1> 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.47 10.4.118.35:6809/8290 79  osd_op_repl
> y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
> too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
> a00 con 0x1c6b9a0
>  0> 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
> function 'void MDSTable::save_2(int, version_t)' thread 7
> fbcc8226700 time 2015-01-15 14:10:28.46
> mds/MDSTable.cc: 83: FAILED assert(r >= 0)
> 
>  ceph version  ()
>  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
>  2: (Context::complete(int)+0x9) [0x568d29]
>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
>  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
>  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
>  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
>  7: (DispatchQueue::entry()+0x549) [0x975739]
>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
>  9: (()+0x7e9a) [0x7fbcccb0de9a]
>  10: (clone()+0x6d) [0x7fbccb4ba3fd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> 
> Is there any workaround/patch to fix this issue? Let me know if need to see 
> the log with debug-mds of certain level as well.
> Any helps would be very much appreciated.
> 
> Thanks.
> Bazli
> 
> 
> DISCLAIMER:
> 
> 
> This e-mail (including any attachments) is for the addressee(s) only and may 
> be confidential, especially as regards personal data. If you are not the 
> intended recipient, please note that any dealing, review, distribution, 
> printing, copying or use of this e-mail is strictly prohibited. If you have 
> received this email in error, please notify the sender immediately and delete 
> the original message (including any attachments).
> 
> 
> MIMOS Berhad is a research and development institution under the purview of 
> the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
> conclusions and other information in this e-mail that do not relate to the 
> official business of MIMOS Berhad and/or its subsidiaries shall be understood 
> as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
> neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
> same. All liability arising from or in connection with computer viruses 
> and/or corrupted e-mails is excluded to the fullest extent permitted by law.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with Rados gateway

2015-01-16 Thread Yehuda Sadeh
2015-01-15 1:08 GMT-08:00 Walter Valenti :
>
>
>
>
>
>
> - Messaggio originale -
>> Da: Yehuda Sadeh 
>> A: Walter Valenti 
>> Cc: "ceph-users@lists.ceph.com" 
>> Inviato: Martedì 13 Gennaio 2015 1:13
>> Oggetto: Re: [ceph-users] Problem with Rados gateway
>>
>>T ry setting 'rgw print continue = false' in your ceph.conf.
>>
>> Yehuda
>
>
> Thanks, but already I've got rgw_print_continue = false
>

Looking at it again, not sure what exactly is going wrong. Looks like
you let apache to spawn radosgw. You should set radosgw as external
server (as specified in the docs), then make sure it initializes
correctly (note the 'failed to initialize' notice that shouldn't be
there).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd cp vs rbd snap flatten

2015-01-16 Thread Fabian Zimmermann
Hi,

if I want to clone a running vm-hdd, would it be enough to "cp" or do I
have to "snap, protect, flatten, unprotect, rm" the snapshot to get a as
consistent as possible clone?

Or: Does cp use a internal snapshot while copying the blocks?

Thanks,

Fabian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH Expansion

2015-01-16 Thread Georgios Dimitrakakis

Hi all!

I would like to expand our CEPH Cluster and add a second OSD node.

In this node I will have ten 4TB disks dedicated to CEPH.

What is the proper way of putting them in the already available CEPH 
node?


I guess that the first thing to do is to prepare them with ceph-deploy 
and mark them as out at preparation.


I should then restart the services and add (mark as in) one of them. 
Afterwards, I have to wait for the rebalance
to occur and upon finishing I will add the second and so on. Is this 
safe enough?



How long do you expect the rebalancing procedure to take?


I already have ten more 4TB disks at another node and the amount of 
data is around 40GB with 2x replication factor.

The connection is over Gigabit.


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] got "XmlParseFailure" when libs3 client accessing radosgw object gateway

2015-01-16 Thread Liu, Xuezhao
Thanks for the replying.

After disable the default site (a2dissite 000-default), I can use libs3's 
commander s3 to create/list bucket, get object also works.

But put object failed:

root@xuezhaoUbuntu74:~# s3 -u put bucket11/seqdata filename=seqdata

it hangs forever, and one the object gateway server, I can see such log:
root@xuezhaoUbuntu73:~# cat /var/log/apache2/error.log
[Thu Jan 15 10:24:10.896926 2015] [:warn] [pid 20751:tid 140666587821824] 
FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
VTT37Z56CWEPI3LCI38U:r3cX5AKJBA0t6zwygFvk5i1YHIY=
[Thu Jan 15 10:24:32.906128 2015] [:warn] [pid 20752:tid 1401236] 
FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
VTT37Z56CWEPI3LCI38U:bm23T98hihu1LXfHSJK0o4wKW7M=
[Thu Jan 15 10:24:40.918845 2015] [fastcgi:error] [pid 20751:tid 
140666587821824] [client 10.32.231.74:36311] FastCGI: comm with server 
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Thu Jan 15 10:24:40.919282 2015] [fastcgi:error] [pid 20751:tid 
140666587821824] [client 10.32.231.74:36311] FastCGI: incomplete headers (0 
bytes) received from server "/var/www/s3gw.fcgi"
[Thu Jan 15 10:24:49.908398 2015] [:warn] [pid 20751:tid 140666579429120] 
FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
VTT37Z56CWEPI3LCI38U:CNy4pmZcslar7u5+AaW0fPGUEbY=

Same result when using different libs3 (https://github.com/bji/libs3 or 
http://github.com/wido/libs3.git )

But if using the python tool s3cmd the object putting can work.

Thanks,
Xuezhao
> -Original Message-
> 
> Looks like your apache is misconfigured. Did you disable the default site?
> 
> Yehuda

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] got "XmlParseFailure" when libs3 client accessing radosgw object gateway

2015-01-16 Thread Yehuda Sadeh
On Wed, Jan 14, 2015 at 7:27 PM, Liu, Xuezhao  wrote:
> Thanks for the replying.
>
> After disable the default site (a2dissite 000-default), I can use libs3's 
> commander s3 to create/list bucket, get object also works.
>
> But put object failed:
>
> root@xuezhaoUbuntu74:~# s3 -u put bucket11/seqdata filename=seqdata
>
> it hangs forever, and one the object gateway server, I can see such log:
> root@xuezhaoUbuntu73:~# cat /var/log/apache2/error.log
> [Thu Jan 15 10:24:10.896926 2015] [:warn] [pid 20751:tid 140666587821824] 
> FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
> VTT37Z56CWEPI3LCI38U:r3cX5AKJBA0t6zwygFvk5i1YHIY=
> [Thu Jan 15 10:24:32.906128 2015] [:warn] [pid 20752:tid 1401236] 
> FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
> VTT37Z56CWEPI3LCI38U:bm23T98hihu1LXfHSJK0o4wKW7M=
> [Thu Jan 15 10:24:40.918845 2015] [fastcgi:error] [pid 20751:tid 
> 140666587821824] [client 10.32.231.74:36311] FastCGI: comm with server 
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> [Thu Jan 15 10:24:40.919282 2015] [fastcgi:error] [pid 20751:tid 
> 140666587821824] [client 10.32.231.74:36311] FastCGI: incomplete headers (0 
> bytes) received from server "/var/www/s3gw.fcgi"
> [Thu Jan 15 10:24:49.908398 2015] [:warn] [pid 20751:tid 140666579429120] 
> FastCGI: 10.32.231.74 PUT http://xuezhaoubuntu73/bucket11/seqdata auth AWS 
> VTT37Z56CWEPI3LCI38U:CNy4pmZcslar7u5+AaW0fPGUEbY=
>
> Same result when using different libs3 (https://github.com/bji/libs3 or 
> http://github.com/wido/libs3.git )
>
> But if using the python tool s3cmd the object putting can work.

This sounds like you're having trouble with 100-continue. Try setting
'rgw print continue = false' in your ceph.conf, or replace the apache
fastcgi module (as specified in the ceph docs).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] two mount points, two diffrent data

2015-01-16 Thread Robert Sander
On 14.01.2015 14:20, Rafał Michalak wrote:
> 
> #node1
> mount /dev/rbd/rbd/test /mnt
> 
> #node2
> mount /dev/rbd/rbd/test /mnt

If you want to mount a filesystem on one block device onto multiple
clients, the filesystem has to be clustered, e.g. OCFS2.

A "normal" local filesystem like ext4 or XFS is not aware that other
clients may alter the underlying block device. This is a sure receipt
for data corruption and loss.

Maybe you should look at CephFS.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread Mohd Bazli Ab Karim
Dear Ceph-Users, Ceph-Devel,

Apologize me if you get double post of this email.

I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 down 
and only 1 up) at the moment.
Plus I have one CephFS client mounted to it.

Now, the MDS always get aborted after recovery and active for 4 secs.
Some parts of the log are as below:

-3> 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 <== 
osd.19 10.4.118.32:6821/243161 73  osd_op_re
ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
7770bc80 con 0x69c7dc0
-2> 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 <== 
osd.18 10.4.118.32:6818/243072 67  osd_op_re
ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 0) 
v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
0x1c6bb00
-1> 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 <== 
osd.47 10.4.118.35:6809/8290 79  osd_op_repl
y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
a00 con 0x1c6b9a0
 0> 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In function 
'void MDSTable::save_2(int, version_t)' thread 7
fbcc8226700 time 2015-01-15 14:10:28.46
mds/MDSTable.cc: 83: FAILED assert(r >= 0)

 ceph version  ()
 1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
 2: (Context::complete(int)+0x9) [0x568d29]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
 4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
 5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
 6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
 7: (DispatchQueue::entry()+0x549) [0x975739]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
 9: (()+0x7e9a) [0x7fbcccb0de9a]
 10: (clone()+0x6d) [0x7fbccb4ba3fd]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Is there any workaround/patch to fix this issue? Let me know if need to see the 
log with debug-mds of certain level as well.
Any helps would be very much appreciated.

Thanks.
Bazli


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread Mohd Bazli Ab Karim
Agree. I was about to upgrade to 0.90, but has postponed it due to this error.
Any chance for me to recover it first before upgrading it?

Thanks Wido.

Regards,
Bazli

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Wido den Hollander
Sent: Friday, January 16, 2015 3:50 PM
To: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r >=0)

On 01/16/2015 08:37 AM, Mohd Bazli Ab Karim wrote:
> Dear Ceph-Users, Ceph-Devel,
>
> Apologize me if you get double post of this email.
>
> I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
> down and only 1 up) at the moment.
> Plus I have one CephFS client mounted to it.
>

In Ceph world 0.72.2 is ancient en pretty old. If you want to play with CephFS 
I recommend you upgrade to 0.90 and also use at least kernel 3.18

> Now, the MDS always get aborted after recovery and active for 4 secs.
> Some parts of the log are as below:
>
> -3> 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
> 10.4.118.21:6800/5390 <== osd.19 10.4.118.32:6821/243161 73 
> osd_op_re
> ply(3742 1000240c57e. [create 0~0,setxattr (99)]
> v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
> 0x
> 7770bc80 con 0x69c7dc0
> -2> 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
> 10.4.118.21:6800/5390 <== osd.18 10.4.118.32:6818/243072 67 
> osd_op_re
> ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
> ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
> 0x1c6bb00
> -1> 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
> 10.4.118.21:6800/5390 <== osd.47 10.4.118.35:6809/8290 79 
> osd_op_repl
> y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
> (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
> a00 con 0x1c6b9a0
>  0> 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In
> function 'void MDSTable::save_2(int, version_t)' thread 7
> fbcc8226700 time 2015-01-15 14:10:28.46
> mds/MDSTable.cc: 83: FAILED assert(r >= 0)
>
>  ceph version  ()
>  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
>  2: (Context::complete(int)+0x9) [0x568d29]
>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
>  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
>  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
>  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
>  7: (DispatchQueue::entry()+0x549) [0x975739]
>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
>  9: (()+0x7e9a) [0x7fbcccb0de9a]
>  10: (clone()+0x6d) [0x7fbccb4ba3fd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> Is there any workaround/patch to fix this issue? Let me know if need to see 
> the log with debug-mds of certain level as well.
> Any helps would be very much appreciated.
>
> Thanks.
> Bazli
>
> 
> DISCLAIMER:
>
>
> This e-mail (including any attachments) is for the addressee(s) only and may 
> be confidential, especially as regards personal data. If you are not the 
> intended recipient, please note that any dealing, review, distribution, 
> printing, copying or use of this e-mail is strictly prohibited. If you have 
> received this email in error, please notify the sender immediately and delete 
> the original message (including any attachments).
>
>
> MIMOS Berhad is a research and development institution under the purview of 
> the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
> conclusions and other information in this e-mail that do not relate to the 
> official business of MIMOS Berhad and/or its subsidiaries shall be understood 
> as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
> neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
> same. All liability arising from or in connection with computer viruses 
> and/or corrupted e-mails is excluded to the fullest extent permitted by law.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in the body of a message to majord...@vger.kernel.org More majordomo
> info at  http://vger.kernel.org/majordomo-info.html
>


--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please not

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread John Spray
Hmm, upgrading should help here, as the problematic data structure
(anchortable) no longer exists in the latest version.  I haven't
checked, but hopefully we don't try to write it during upgrades.

The bug you're hitting is more or less the same as a similar one we
have with the sessiontable in the latest ceph, but you won't hit it
there unless you're very unlucky!

John

On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 wrote:
> Dear Ceph-Users, Ceph-Devel,
>
> Apologize me if you get double post of this email.
>
> I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
> down and only 1 up) at the moment.
> Plus I have one CephFS client mounted to it.
>
> Now, the MDS always get aborted after recovery and active for 4 secs.
> Some parts of the log are as below:
>
> -3> 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.19 10.4.118.32:6821/243161 73  osd_op_re
> ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
> uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
> 7770bc80 con 0x69c7dc0
> -2> 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.18 10.4.118.32:6818/243072 67  osd_op_re
> ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
> 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
> 0x1c6bb00
> -1> 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
> <== osd.47 10.4.118.35:6809/8290 79  osd_op_repl
> y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
> too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
> a00 con 0x1c6b9a0
>  0> 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
> function 'void MDSTable::save_2(int, version_t)' thread 7
> fbcc8226700 time 2015-01-15 14:10:28.46
> mds/MDSTable.cc: 83: FAILED assert(r >= 0)
>
>  ceph version  ()
>  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
>  2: (Context::complete(int)+0x9) [0x568d29]
>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
>  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
>  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
>  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
>  7: (DispatchQueue::entry()+0x549) [0x975739]
>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
>  9: (()+0x7e9a) [0x7fbcccb0de9a]
>  10: (clone()+0x6d) [0x7fbccb4ba3fd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> Is there any workaround/patch to fix this issue? Let me know if need to see 
> the log with debug-mds of certain level as well.
> Any helps would be very much appreciated.
>
> Thanks.
> Bazli
>
> 
> DISCLAIMER:
>
>
> This e-mail (including any attachments) is for the addressee(s) only and may 
> be confidential, especially as regards personal data. If you are not the 
> intended recipient, please note that any dealing, review, distribution, 
> printing, copying or use of this e-mail is strictly prohibited. If you have 
> received this email in error, please notify the sender immediately and delete 
> the original message (including any attachments).
>
>
> MIMOS Berhad is a research and development institution under the purview of 
> the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
> conclusions and other information in this e-mail that do not relate to the 
> official business of MIMOS Berhad and/or its subsidiaries shall be understood 
> as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and 
> neither MIMOS Berhad nor its subsidiaries accepts responsibility for the 
> same. All liability arising from or in connection with computer viruses 
> and/or corrupted e-mails is excluded to the fullest extent permitted by law.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 21:46, Gregory Farnum  wrote:

> On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler 
> wrote:
> > I have a 4 node ceph cluster, but the disks are not equally distributed
> > across all machines (they are substantially different from each other)
> >
> > One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
> and
> > two machines have only two 1 TB drives each (s2 & s1).
> >
> > Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
> > mostly from there, but I want to make sure that the writes that happen to
> > the ceph cluster get written to the "local" osd's on s3 first and then
> the
> > additional writes/copies get done to the network.
> >
> > Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
> > relevant.
>
> In general you can't set up Ceph to write to the local node first. In
> some specific cases you can if you're willing to do a lot more work
> around placement, and this *might* be one of those cases.
>
> To do this, you'd need to change the CRUSH rules pretty extensively,
> so that instead of selecting OSDs at random, they have two steps:
> 1) starting from bucket s3, select a random OSD and put it at the
> front of the OSD list for the PG.
> 2) Starting from a bucket which contains all the other OSDs, select
> N-1 more at random (where N is the number of desired replicas).
>

I understand in principle what you're saying.  Let me go back a step and
ask the question somewhat differently then:

I have set up 4 machines in a cluster.  When I created the Windows 2008
server VM on S1 (I corrected my first email: I have three Sunfire X series
servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it
was running normally, pretty close to what I had on the bare metal.  About
a month later (after being on leave to 2 weeks), I found a machine that is
crawling at a snail pace and I cannot figure out why.

So instead of suggesting something from my side (without in-depth knowledge
yet), what should I do to get this machine to run at speed again?

Further to my hardware and network:

S1: 2 x Quad Code Xeon, 36GB RAM, 8 x 300GB HDD's
S2: 1 x Opteron Dual Core, 8GB RAM, 2 x 750GB HDD's
S3: 1 x Opetron Dual Core, 8GB RAM, 2 x 750GB HDD's
H1: 1 x Xeon Dual Core, 5GB RAM, 12 x 1TB HDD's
(All these machines are at full drive capacity, that is all their slots are
being utilised)

All the servers are linked with Dual Gigabit Ethernet connections to a
switch with LACP enabled and the links are bonded on each server.  While
this doesn't raise the total transfer speed, it does allow more bandwidth
between the servers.

The H1 machine is only running ceph and thus acts only as storage.  The
other machines (S1, S2 & S3) are for web servers (development and
production), the Windows 2008 server and a few other functions all managed
from proxmox.

The hardware is what my client has been using, but there were lots of
inefficiencies and little redundancy in the setup before we embarked on
this project.  However, the hardware is sufficient for their needs.

I hope that gives you a reasonable picture of the setup so be able to give
me some advice on how to troubleshoot this.

regards

Roland



>
> You can look at the documentation on CRUSH or search the list archives
> for more on this subject.
>
> Note that doing this has a bunch of down sides: you'll have balance
> issues because every piece of data will be on the s3 node (that's a
> TERRIBLE name for a project which has API support for Amazon S3, btw
> :p), if you add new VMs on a different node they'll all be going to
> the s3 node for all their writes (unless you set them up on a
> different pool with different CRUSH rules), s3 will be satisfying all
> the read requests so the other nodes are just backups in case of disk
> failure, etc.
> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-agent failed to parse

2015-01-16 Thread ghislain.chevalier
HI all,

Context : Ubuntu 14.04 TLS firefly 0.80.7

I recently encountered the same issue as described below.
Maybe I missed something between July and January…


I found that the http request wasn't correctly built by 
/usr/lib/python2.7/dist-packages/radosgw_agent/client.py



I did the changes below

#url = '{protocol}://{host}{path}'.format(protocol=request.protocol,

# host=request.host,

# path=request.path)

 url = '{path}'.format(protocol="", host="", path=request.path)



The request is then correctly built and sent.

Best regards

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Peter
Envoyé : mercredi 23 juillet 2014 13:38
À : Craig Lewis; Ceph Users
Objet : Re: [ceph-users] radosgw-agent failed to parse

hello again,

i have reviewed my deployment, and --name argument is used everywhere with 
radosgw-admin command.

as i create the pools during deploy, there is no rgw.root pool, so i cannot 
make changes to it.

I still think this is an issue with radosgw-agent, as if i change the 
destination on the command line, it also changes in the botched up URL it tries 
to hit:  "radosgw-agent http://example2.net"; this appears in the error:

DEBUG:boto:url = 
'http://example2.nethttp://example2.net/admin/config'

so it is definitely coming from input from command line.
On 22/07/14 20:44, Craig Lewis wrote:
You should use the --name argument with every radosgw-admin command.  If you 
don't, you'll end up making changes to .rgw.root, not .us.rgw.root.

I'd run through the federation setup again, making sure to include the 
appropriate --name.  As Kurt said, it's safe to reload and reapply the configs. 
 Make sure you restart radosgw when it says.

One of my problems during setup was that I had a bad config loaded in 
.rgw.root, but the correct one in .us.rgw.root.  It caused all sorts of 
problems when I forgot the --name arg.

Setting up federation is somewhat sensitive to order of operations.  When I was 
testing it, I frequently messed something up.  Several times it was faster to 
delete all the pools and start over, rather than figuring out what I broke.


On Tue, Jul 22, 2014 at 7:46 AM, Peter 
mailto:ptier...@tchpc.tcd.ie>> wrote:
adding --name to regionmap update command has allowed me to update the 
regionmap:


radosgw-admin regionmap update --name client.radosgw.us-master-1

so now i have reloaded zone and region and updated region map on the gateway in 
each zone, then restarted whole clusters, then restarted apahce and radosgw, 
same problem.

I cannot see how this can be anything other than an issue inside radosgw-agent 
as it is not hitting the gateway due to the botched


DEBUG:boto:url = 
'https://example.comhttps://example.com/admin/config'

Im out of ideas. Should i submit this as a bug?


On 22/07/14 15:25, Bachelder, Kurt wrote:
It certainly doesn’t hurt to reload your zone and region configurations on your 
RGWs and re-run the regionmap update for the instances tied to each zone, just 
to ensure consistency.

From: Peter [mailto:ptier...@tchpc.tcd.ie]
Sent: Tuesday, July 22, 2014 10:20 AM
To: Bachelder, Kurt; Craig Lewis
Cc: Ceph Users
Subject: Re: [ceph-users] radosgw-agent failed to parse

thanks for the suggestion. ive attempted a regionmap update but im hitting this 
error:

failed to list regions: (2) No such file or directory
2014-07-22 14:13:04.096601 7ff825ac77c0 -1 failed to list objects 
pool_iterate_begin() returned r=-2

so perhaps i do have some issue with my configuration. Although i would have 
thought that if the gateway is outputting the correct regionmap at 
/admin/config path, then all should be well with regionmap.


On 22/07/14 14:13, Bachelder, Kurt wrote:
I’m sure you’ve already tried this, but we’ve gotten burned a few times by not 
running radosgw-admin regionmap update after making region/zone changes.  
Bouncing the RGW’s probably wouldn’t hurt either.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Peter
Sent: Tuesday, July 22, 2014 4:51 AM
To: Craig Lewis
Cc: Ceph Users
Subject: Re: [ceph-users] radosgw-agent failed to parse

yes, im scratching my head over this too. It doesn't seem to be an 
authentication issue as the radosgw-agent never reaches the us-secondary 
gateway (i've kept an eye on us-secondary logs as i execute radosgw-agent on 
us-master).

On 22/07/14 03:51, Craig Lewis wrote:
I was hoping for some easy fixes :-P

I created two system users, in both zones.  Each user has different access and 
secret, but I copied the access and secret from the primary to the secondary.  
I can't imaging that this would cause the problem you're seeing, but it is 
something different from the examples.

Sorry, I'm out of ideas.


On Mon, Jul 21, 2014 at 7:13 AM, Peter 
mailto:ptier...@tchpc.tcd.ie>> wrote:
hello 

Re: [ceph-users] Better way to use osd's of different size

2015-01-16 Thread Udo Lembke
Hi Megov,
you should weight the OSD so it's represent the size (like an weight of
3.68 for an 4TB HDD).
cephdeploy do this automaticly.

Nevertheless also with the correct weight the disk was not filled in
equal distribution. For that purposes you can use reweight for single
OSDs, or automaticly with "ceph osd reweight-by-utilization".

Udo

On 14.01.2015 16:36, Межов Игорь Александрович wrote:
>
> Hi!
>
>
> We have a small production ceph cluster, based on firefly release.
>
>
> It was built using hardware we already have in our site so it is not
> "new & shiny",
>
> but works quite good. It was started in 2014.09 as a "proof of
> concept" from 4 hosts
>
> with 3 x 1tb osd's each: 1U dual socket Intel 54XX & 55XX platforms on
> 1 gbit network.
>
>
> Now it contains 4x12 osd nodes on shared 10Gbit network. We use it as
> a backstore
>
> for running VMs under qemu+rbd.
>
>
> During migration we temporarily use 1U nodes with 2tb osds and already
> face some
>
> problems with uneven distribution. I know, that the best practice is
> to use osds of same
>
> capacity, but it is impossible sometimes.
>
>
> Now we have 24-28 spare 2tb drives and want to increase capacity on
> the same boxes.
>
> What is the more right way to do it:
>
> - replace 12x1tb drives with 12x2tb drives, so we will have 2 nodes
> full of 2tb drives and
>
> other nodes remains in 12x1tb confifg
>
> - or replace 1tb to 2tb drives in more unify way, so every node will
> have 6x1tb + 6x2tb drives?
>
>
> I feel that the second way will give more smooth distribution among
> the nodes, and
>
> outage of one node may give lesser impact on cluster. Am I right and
> what you can
>
> advice me in such a situation?
>
>
>
>
> Megov Igor
> yuterra.ru, CIO
> me...@yuterra.ru
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.91 released

2015-01-16 Thread Sage Weil
We are quickly approaching the Hammer feature freeze but have a few more 
dev releases to go before we get there.  The headline items are 
subtree-based quota support in CephFS (ceph-fuse/libcephfs client support 
only for now), a rewrite of the watch/notify librados API used by RBD and 
RGW, OSDMap checksums to ensure that maps are always consistent inside the 
cluster, new API calls in librados and librbd for IO hinting modeled after 
posix_fadvise, and improved storage of per-PG state.

We expect two more releases before the Hammer feature freeze (v0.93).

Upgrading
-

* The 'category' field for objects has been removed.  This was originally 
  added to track PG stat summations over different categories of objects 
  for use by radosgw.  It is no longer has any known users and is prone to 
  abuse because it can lead to a pg_stat_t structure that is unbounded.  
  The librados API calls that accept this field now ignore it, and the OSD 
  no longers tracks the per-category summations.

* The output for 'rados df' has changed.  The 'category' level has been
  eliminated, so there is now a single stat object per pool.  The 
  structure of the JSON output is different, and the plaintext output has 
  one less column.

* The 'rados create  [category]' optional category argument is 
  no longer supported or recognized.

* rados.py's Rados class no longer has a __del__ method; it was causing
  problems on interpreter shutdown and use of threads.  If your code has
  Rados objects with limited lifetimes and you're concerned about locked
  resources, call Rados.shutdown() explicitly.

* There is a new version of the librados watch/notify API with vastly
  improved semantics.  Any applications using this interface are
  encouraged to migrate to the new API.  The old API calls are marked
  as deprecated and will eventually be removed.

* The librados rados_unwatch() call used to be safe to call on an
  invalid handle.  The new version has undefined behavior when passed
  a bogus value (for example, when rados_watch() returns an error and
  handle is not defined).

* The structure of the formatted 'pg stat' command is changed for the
  portion that counts states by name to avoid using the '+' character
  (which appears in state names) as part of the XML token (it is not
  legal).

Notable Changes
---

* asyncmsgr: misc fixes (Haomai Wang)
* buffer: add 'shareable' construct (Matt Benjamin)
* build: aarch64 build fixes (Noah Watkins, Haomai Wang)
* build: support for jemalloc (Shishir Gowda)
* ceph-disk: allow journal partition re-use (#10146 Loic Dachary, Dav van 
  der Ster)
* ceph-disk: misc fixes (Christos Stavrakakis)
* ceph-fuse: fix kernel cache trimming (#10277 Yan, Zheng)
* ceph-objectstore-tool: many many improvements (David Zafman)
* common: support new gperftools header locations (Key Dreyer)
* crush: straw bucket weight calculation fixes (#9998 Sage Weil)
* doc: misc improvements (Nilamdyuti Goswami, John Wilkins, Chris 
  Holcombe)
* libcephfs,ceph-fuse: add 'status' asok (John Spray)
* librados, osd: new watch/notify implementation (Sage Weil)
* librados: drop 'category' feature (Sage Weil)
* librados: fix pool deletion handling (#10372 Sage Weil)
* librados: new fadvise API (Ma Jianpeng)
* libradosstriper: fix remove() (Dongmao Zhang)
* librbd: complete pending ops before closing image (#10299 Josh Durgin)
* librbd: fadvise API (Ma Jianpeng)
* mds: ENOSPC and OSDMap epoch barriers (#7317 John Spray)
* mds: dirfrag buf fix (Yan, Zheng)
* mds: disallow most commands on inactive MDS's (Greg Farnum)
* mds: drop dentries, leases on deleted directories (#10164 Yan, Zheng)
* mds: handle zero-size xattr (#10335 Yan, Zheng)
* mds: subtree quota support (Yunchuan Wen)
* memstore: free space tracking (John Spray)
* misc cleanup (Danny Al-Gaaf, David Anderson)
* mon: 'osd crush reweight-all' command (Sage Weil)
* mon: allow full flag to be manually cleared (#9323 Sage Weil)
* mon: delay failure injection (Joao Eduardo Luis)
* mon: fix paxos timeouts (#10220 Joao Eduardo Luis)
* mon: get canonical OSDMap from leader (#10422 Sage Weil)
* msgr: fix RESETSESSION bug (#10080 Greg Farnum)
* objectstore: deprecate collection attrs (Sage Weil)
* osd, mon: add checksums to all OSDMaps (Sage Weil)
* osd: allow deletion of objects with watcher (#2339 Sage Weil)
* osd: allow sparse read for Push/Pull (Haomai Wang)
* osd: cache reverse_nibbles hash value (Dong Yuan)
* osd: drop upgrade support for pre-dumpling (Sage Weil)
* osd: enable and use posix_fadvise (Sage Weil)
* osd: erasure-code: enforce chunk size alignment (#10211 Loic Dachary)
* osd: erasure-code: jerasure support for NEON (Loic Dachary)
* osd: erasure-code: relax cauchy w restrictions (#10325 David Zhang, Loic 
  Dachary)
* osd: erasure-code: update gf-complete to latest upstream (Loic Dachary)
* osd: fix WBTHrottle perf counters (Haomai Wang)
* osd: fix backfill bug (#10150 Samuel Just)
* osd: fix occasional peering stalls (#10431 Sage Weil

Re: [ceph-users] two mount points, two diffrent data

2015-01-16 Thread Lindsay Mathieson
On Wed, 14 Jan 2015 02:20:21 PM Rafał Michalak wrote:
> Why data not replicating on mounting fs ?
> I try with filesystems ext4 and xfs
> The data is visible only when unmounted and mounted again


Because you are not using a cluster aware filesystem - the respective mounts 
don't know when changes are made to the underlying block device (rbd) by the 
other mount. What you are doing *will* lead to file corruption.

Your need to use a distributed filesystem such as GFS2 or cephfs.

CephFS would be probably be the easiest to setup.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] НА: Better way to use osd's of different size

2015-01-16 Thread Межов Игорь Александрович
Thanks!


Of course, I know about osd weights and ability to adjust them to make 
distribution

more-or-less unified. And we use ceph-deploy to bring up osds and already

noticed, that weights of different sized osds are choose proportionally their 
sizes.


But the question is slightly about different thing - what variant (whole 2 tb 
nodes + whole 1 tb nodes OR all nodes have 6x2+6x1 tb) will give more unified 
distribution of used space

and possibly more unified IO load to nodes by default, i.e. without hand-tuning 
crushmap

and weights? And also, what variant will better survive at least full single 
node failure?


Indeed, even with osds of the same size, but different count per node, we face

"backfilltoofull" situations rather often. For example, during migration from 
3OSDs

"proof of concept" nodes to 12OSDs pre-production nodes there will be a plenty

of room on newer 12OSDs nodes, but space shortage on old 3OSDs. And we have

only a single solution - temporarily add some 2-3OSDs nodes to cluster as a 
"helpers",

and remove them after rebalancing was near complete.



Межов Игорь

Директор по информационным
технологиям и операциям
федеральной  сети супермаркетов
"Уютерра"

me...@yuterra.ru
me...@mail.ru
+7 915 855 3139
+7 4742 762 909

От: Udo Lembke 
Отправлено: 15 января 2015 г. 10:41
Кому: Межов Игорь Александрович
Копия: ceph-users@lists.ceph.com >> Ceph Users
Тема: Re: [ceph-users] Better way to use osd's of different size

Hi Megov,
you should weight the OSD so it's represent the size (like an weight of 3.68 
for an 4TB HDD).
cephdeploy do this automaticly.

Nevertheless also with the correct weight the disk was not filled in equal 
distribution. For that purposes you can use reweight for single OSDs, or 
automaticly with "ceph osd reweight-by-utilization".

Udo

On 14.01.2015 16:36, Межов Игорь Александрович wrote:

Hi!


We have a small production ceph cluster, based on firefly release.


It was built using hardware we already have in our site so it is not "new & 
shiny",

but works quite good. It was started in 2014.09 as a "proof of concept" from 4 
hosts

with 3 x 1tb osd's each: 1U dual socket Intel 54XX & 55XX platforms on 1 gbit 
network.


Now it contains 4x12 osd nodes on shared 10Gbit network. We use it as a 
backstore

for running VMs under qemu+rbd.


During migration we temporarily use 1U nodes with 2tb osds and already face some

problems with uneven distribution. I know, that the best practice is to use 
osds of same

capacity, but it is impossible sometimes.


Now we have 24-28 spare 2tb drives and want to increase capacity on the same 
boxes.

What is the more right way to do it:

- replace 12x1tb drives with 12x2tb drives, so we will have 2 nodes full of 2tb 
drives and

other nodes remains in 12x1tb confifg

- or replace 1tb to 2tb drives in more unify way, so every node will have 6x1tb 
+ 6x2tb drives?


I feel that the second way will give more smooth distribution among the nodes, 
and

outage of one node may give lesser impact on cluster. Am I right and what you 
can

advice me in such a situation?




Megov Igor
yuterra.ru, CIO
me...@yuterra.ru



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Gregory Farnum
On Fri, Jan 16, 2015 at 2:52 AM, Roland Giesler  wrote:
> On 14 January 2015 at 21:46, Gregory Farnum  wrote:
>>
>> On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler 
>> wrote:
>> > I have a 4 node ceph cluster, but the disks are not equally distributed
>> > across all machines (they are substantially different from each other)
>> >
>> > One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>> > and
>> > two machines have only two 1 TB drives each (s2 & s1).
>> >
>> > Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>> > mostly from there, but I want to make sure that the writes that happen
>> > to
>> > the ceph cluster get written to the "local" osd's on s3 first and then
>> > the
>> > additional writes/copies get done to the network.
>> >
>> > Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>> > relevant.
>>
>> In general you can't set up Ceph to write to the local node first. In
>> some specific cases you can if you're willing to do a lot more work
>> around placement, and this *might* be one of those cases.
>>
>> To do this, you'd need to change the CRUSH rules pretty extensively,
>> so that instead of selecting OSDs at random, they have two steps:
>> 1) starting from bucket s3, select a random OSD and put it at the
>> front of the OSD list for the PG.
>> 2) Starting from a bucket which contains all the other OSDs, select
>> N-1 more at random (where N is the number of desired replicas).
>
>
> I understand in principle what you're saying.  Let me go back a step and ask
> the question somewhat differently then:
>
> I have set up 4 machines in a cluster.  When I created the Windows 2008
> server VM on S1 (I corrected my first email: I have three Sunfire X series
> servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it
> was running normally, pretty close to what I had on the bare metal.  About a
> month later (after being on leave to 2 weeks), I found a machine that is
> crawling at a snail pace and I cannot figure out why.

You mean one of the VMs has very slow disk access? Or one of the hosts
is very slow?

In any case, you'd need to look at what about that system is different
from the others and poke at that difference until it exposes an issue,
I suppose.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem for remove files in cephfs

2015-01-16 Thread Daniel Takatori Ohara
Hi,

I have a problem for remove one file in cephfs. With the command ls, all
the arguments show me with ???.

*ls: cannot access refseq/source_step2: No such file or directory*
*total 0*
*drwxrwxr-x 1 dtohara BioInfoHSL Users0 Jan 15 15:01 .*
*drwxrwxr-x 1 dtohara BioInfoHSL Users 3.8G Jan 15 14:55 ..*
*l? ? ?   ?   ?? source_step2*

Anyone, help me.

Ps.: Sorry for my english.

Thaks,

Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês de Ensino e Pesquisa
Hospital Sírio-Libanês
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
São Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] got "XmlParseFailure" when libs3 client accessing radosgw object gateway

2015-01-16 Thread Liu, Xuezhao
Thanks for the hints.
My original configuration is with " rgw print continue = false", but does not 
work.

Just now I tested to change it be "true" and restart radosgw and apache2 
service, strangely to see that everything can work now.

Best,
Xuezhao
> 
> This sounds like you're having trouble with 100-continue. Try setting 'rgw 
> print
> continue = false' in your ceph.conf, or replace the apache fastcgi module (as
> specified in the ceph docs).
> 
> Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 16 January 2015 at 17:15, Gregory Farnum  wrote:

> > I have set up 4 machines in a cluster.  When I created the Windows 2008
> > server VM on S1 (I corrected my first email: I have three Sunfire X
> series
> > servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it
> > was running normally, pretty close to what I had on the bare metal.
> About a
> > month later (after being on leave to 2 weeks), I found a machine that is
> > crawling at a snail pace and I cannot figure out why.
>
> You mean one of the VMs has very slow disk access? Or one of the hosts
> is very slow?
>

The Windows 2008 VM is very slow.  Inside Windows all seems normal, the
CPU's are never more 20% used and when navigating even the menus take a
long time to respond.  The Host (S1) is not slow.


> In any case, you'd need to look at what about that system is different
> from the others and poke at that difference until it exposes an issue,
> I suppose.
>

I'll move the machine to one of the smaller hosts (S2 or S3).  I'll just
have to lower the spec of the VM, since I've set RAM at 10GB, which is much
more than S2 or S3 have.  Let's see what happens.



> -Greg
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Total number PGs using multiple pools

2015-01-16 Thread Italo Santos
Hello,

Into placement groups documentation 
(http://ceph.com/docs/giant/rados/operations/placement-groups/) we have the 
message bellow:

“When using multiple data pools for storing objects, you need to ensure that 
you balance the number of placement groups per pool with the number of 
placement groups per OSD so that you arrive at a reasonable total number of 
placement groups that provides reasonably low variance per OSD without taxing 
system resources or making the peering process too slow.”

This means that, if I have a cluster with 10 OSD and 3 pools with size = 3 each 
pool can have only ~111 PGs?

Ex.: (100 * 10 OSDs) / 3 replicas = 333 PGs / 3 pools = 111 PGS per pool

I don't know if reasoning is right… I’ll glad for any help.

Regards.

Italo Santos
http://italosantos.com.br/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy dependency errors on fc20 with firefly

2015-01-16 Thread Noah Watkins
Thanks! I'll give this a shot.

On Thu, Jan 8, 2015 at 8:51 AM, Travis Rhoden  wrote:
> Hi Noah,
>
> The root cause has been found.  Please see
> http://tracker.ceph.com/issues/10476 for details.
>
> In short, it's an issue between RPM obsoletes and yum priorities
> plugin.  Final solution is pending, but details of a work around are
> in the issue comments.
>
>  - Travis
>
> On Wed, Jan 7, 2015 at 4:05 PM, Travis Rhoden  wrote:
>> Hi Noah,
>>
>> I'll try to recreate this on a fresh FC20 install as well.  Looks to
>> me like there might be a repo priority issue.  It's mixing packages
>> from Fedora downstream repos and the ceph.com upstream repos.  That's
>> not supposed to happen.
>>
>>  - Travis
>>
>> On Wed, Jan 7, 2015 at 2:15 PM, Noah Watkins  
>> wrote:
>>> I'm trying to install Firefly on an up-to-date FC20 box. I'm getting
>>> the following errors:
>>>
>>> [nwatkins@kyoto cluster]$ ../ceph-deploy/ceph-deploy install --release
>>> firefly kyoto
>>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>>> /home/nwatkins/.cephdeploy.conf
>>> [ceph_deploy.cli][INFO  ] Invoked (1.5.21): ../ceph-deploy/ceph-deploy
>>> install --release firefly kyoto
>>> [ceph_deploy.install][DEBUG ] Installing stable version firefly on
>>> cluster ceph hosts kyoto
>>> [ceph_deploy.install][DEBUG ] Detecting platform for host kyoto ...
>>> [kyoto][DEBUG ] connection detected need for sudo
>>> [kyoto][DEBUG ] connected to host: kyoto
>>> [kyoto][DEBUG ] detect platform information from remote host
>>> [kyoto][DEBUG ] detect machine type
>>> [ceph_deploy.install][INFO  ] Distro info: Fedora 20 Heisenbug
>>> [kyoto][INFO  ] installing ceph on kyoto
>>> [kyoto][INFO  ] Running command: sudo yum -y install yum-plugin-priorities
>>> [kyoto][DEBUG ] Loaded plugins: langpacks, priorities, refresh-packagekit
>>> [kyoto][DEBUG ] Package yum-plugin-priorities-1.1.31-27.fc20.noarch
>>> already installed and latest version
>>> [kyoto][DEBUG ] Nothing to do
>>> [kyoto][INFO  ] Running command: sudo rpm --import
>>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [kyoto][INFO  ] Running command: sudo rpm -Uvh --replacepkgs --force
>>> --quiet 
>>> http://ceph.com/rpm-firefly/fc20/noarch/ceph-release-1-0.fc20.noarch.rpm
>>> [kyoto][DEBUG ] 
>>> [kyoto][DEBUG ] Updating / installing...
>>> [kyoto][DEBUG ] 
>>> [kyoto][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a
>>> high priority
>>> [kyoto][WARNIN] altered ceph.repo priorities to contain: priority=1
>>> [kyoto][INFO  ] Running command: sudo yum -y -q install ceph
>>> [kyoto][WARNIN] Error: Package: 1:python-cephfs-0.80.7-1.fc20.x86_64 
>>> (updates)
>>> [kyoto][WARNIN]Requires: libcephfs1 = 1:0.80.7-1.fc20
>>> [kyoto][WARNIN]Available: libcephfs1-0.80.1-0.fc20.x86_64 (Ceph)
>>> [kyoto][DEBUG ]  You could try using --skip-broken to work around the 
>>> problem
>>> [kyoto][WARNIN]libcephfs1 = 0.80.1-0.fc20
>>> [kyoto][WARNIN]Available: libcephfs1-0.80.3-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]libcephfs1 = 0.80.3-0.fc20
>>> [kyoto][WARNIN]Available: libcephfs1-0.80.4-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]libcephfs1 = 0.80.4-0.fc20
>>> [kyoto][WARNIN]Available: libcephfs1-0.80.5-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]libcephfs1 = 0.80.5-0.fc20
>>> [kyoto][WARNIN]Available: libcephfs1-0.80.6-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]libcephfs1 = 0.80.6-0.fc20
>>> [kyoto][WARNIN]Installing: libcephfs1-0.80.7-0.fc20.x86_64 
>>> (Ceph)
>>> [kyoto][WARNIN]libcephfs1 = 0.80.7-0.fc20
>>> [kyoto][WARNIN] Error: Package: 1:python-rbd-0.80.7-1.fc20.x86_64 (updates)
>>> [kyoto][WARNIN]Requires: librbd1 = 1:0.80.7-1.fc20
>>> [kyoto][WARNIN]Available: librbd1-0.80.1-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.1-0.fc20
>>> [kyoto][WARNIN]Available: librbd1-0.80.3-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.3-0.fc20
>>> [kyoto][WARNIN]Available: librbd1-0.80.4-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.4-0.fc20
>>> [kyoto][WARNIN]Available: librbd1-0.80.5-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.5-0.fc20
>>> [kyoto][WARNIN]Available: librbd1-0.80.6-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.6-0.fc20
>>> [kyoto][WARNIN]Installing: librbd1-0.80.7-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]librbd1 = 0.80.7-0.fc20
>>> [kyoto][WARNIN] Error: Package: 1:python-rados-0.80.7-1.fc20.x86_64 
>>> (updates)
>>> [kyoto][WARNIN]Requires: librados2 = 1:0.80.7-1.fc20
>>> [kyoto][WARNIN]Available: librados2-0.80.1-0.fc20.x86_64 (Ceph)
>>> [kyoto][WARNIN]lib

Re: [ceph-users] How to tell a VM to write more local ceph nodes than to the network.

2015-01-16 Thread Roland Giesler
On 14 January 2015 at 12:08, JM  wrote:

> Hi Roland,
>
> You should tune your Ceph Crushmap with a custom rule in order to do that
> (write first on s3 and then to others). This custom rule will be applied
> then to your proxmox pool.
> (what you want to do is only interesting if you run VM from host s3)
>
> Can you give us your crushmap ?
>


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host h1 {
id -2# do not change unnecessarily
# weight 8.140
alg straw
hash 0# rjenkins1
item osd.1 weight 0.900
item osd.3 weight 0.900
item osd.4 weight 0.900
item osd.5 weight 0.680
item osd.6 weight 0.680
item osd.7 weight 0.680
item osd.8 weight 0.680
item osd.9 weight 0.680
item osd.10 weight 0.680
item osd.11 weight 0.680
item osd.12 weight 0.680
}
host s3 {
id -3# do not change unnecessarily
# weight 0.450
alg straw
hash 0# rjenkins1
item osd.2 weight 0.450
}
host s2 {
id -4# do not change unnecessarily
# weight 0.900
alg straw
hash 0# rjenkins1
item osd.13 weight 0.900
}
host s1 {
id -5# do not change unnecessarily
# weight 1.640
alg straw
hash 0# rjenkins1
item osd.14 weight 0.290
item osd.0 weight 0.270
item osd.15 weight 0.270
item osd.16 weight 0.270
item osd.17 weight 0.270
item osd.18 weight 0.270
}
root default {
id -1# do not change unnecessarily
# weight 11.130
alg straw
hash 0# rjenkins1
item h1 weight 8.140
item s3 weight 0.450
item s2 weight 0.900
item s1 weight 1.640
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map

thanks so far!

regards

Roland



>
>
>
> 2015-01-13 22:03 GMT+01:00 Roland Giesler :
>
>> I have a 4 node ceph cluster, but the disks are not equally distributed
>> across all machines (they are substantially different from each other)
>>
>> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3)
>> and two machines have only two 1 TB drives each (s2 & s1).
>>
>> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's
>> mostly from there, but I want to make sure that the writes that happen to
>> the ceph cluster get written to the "local" osd's on s3 first and then the
>> additional writes/copies get done to the network.
>>
>> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's
>> relevant.
>>
>> regards
>>
>>
>> *Roland *
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread John Spray
It has just been pointed out to me that you can also workaround this
issue on your existing system by increasing the osd_max_write_size
setting on your OSDs (default 90MB) to something higher, but still
smaller than your osd journal size.  That might get you on a path to
having an accessible filesystem before you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray  wrote:
> Hmm, upgrading should help here, as the problematic data structure
> (anchortable) no longer exists in the latest version.  I haven't
> checked, but hopefully we don't try to write it during upgrades.
>
> The bug you're hitting is more or less the same as a similar one we
> have with the sessiontable in the latest ceph, but you won't hit it
> there unless you're very unlucky!
>
> John
>
> On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
>  wrote:
>> Dear Ceph-Users, Ceph-Devel,
>>
>> Apologize me if you get double post of this email.
>>
>> I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
>> down and only 1 up) at the moment.
>> Plus I have one CephFS client mounted to it.
>>
>> Now, the MDS always get aborted after recovery and active for 4 secs.
>> Some parts of the log are as below:
>>
>> -3> 2015-01-15 14:10:28.464706 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
>> <== osd.19 10.4.118.32:6821/243161 73  osd_op_re
>> ply(3742 1000240c57e. [create 0~0,setxattr (99)] v56640'1871414 
>> uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0) 0x
>> 7770bc80 con 0x69c7dc0
>> -2> 2015-01-15 14:10:28.464730 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
>> <== osd.18 10.4.118.32:6818/243072 67  osd_op_re
>> ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567 ondisk = 
>> 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
>> 0x1c6bb00
>> -1> 2015-01-15 14:10:28.464754 7fbcc8226700  1 -- 10.4.118.21:6800/5390 
>> <== osd.47 10.4.118.35:6809/8290 79  osd_op_repl
>> y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90 (Message 
>> too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
>> a00 con 0x1c6b9a0
>>  0> 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc: In 
>> function 'void MDSTable::save_2(int, version_t)' thread 7
>> fbcc8226700 time 2015-01-15 14:10:28.46
>> mds/MDSTable.cc: 83: FAILED assert(r >= 0)
>>
>>  ceph version  ()
>>  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
>>  2: (Context::complete(int)+0x9) [0x568d29]
>>  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
>>  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
>>  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
>>  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
>>  7: (DispatchQueue::entry()+0x549) [0x975739]
>>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
>>  9: (()+0x7e9a) [0x7fbcccb0de9a]
>>  10: (clone()+0x6d) [0x7fbccb4ba3fd]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
>> interpret this.
>>
>> Is there any workaround/patch to fix this issue? Let me know if need to see 
>> the log with debug-mds of certain level as well.
>> Any helps would be very much appreciated.
>>
>> Thanks.
>> Bazli
>>
>> 
>> DISCLAIMER:
>>
>>
>> This e-mail (including any attachments) is for the addressee(s) only and may 
>> be confidential, especially as regards personal data. If you are not the 
>> intended recipient, please note that any dealing, review, distribution, 
>> printing, copying or use of this e-mail is strictly prohibited. If you have 
>> received this email in error, please notify the sender immediately and 
>> delete the original message (including any attachments).
>>
>>
>> MIMOS Berhad is a research and development institution under the purview of 
>> the Malaysian Ministry of Science, Technology and Innovation. Opinions, 
>> conclusions and other information in this e-mail that do not relate to the 
>> official business of MIMOS Berhad and/or its subsidiaries shall be 
>> understood as neither given nor endorsed by MIMOS Berhad and/or its 
>> subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts 
>> responsibility for the same. All liability arising from or in connection 
>> with computer viruses and/or corrupted e-mails is excluded to the fullest 
>> extent permitted by law.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] two mount points, two diffrent data

2015-01-16 Thread Michael Kuriger
You’re using a file system on 2 hosts that is not cluster aware.  Metadata 
written on hosta is not sent to hostb in this case.  You may be interested in 
looking at cephfs for this use case.


Michael Kuriger
mk7...@yp.com
818-649-7235
MikeKuriger (IM)

From: Rafał Michalak mailto:rafa...@gmail.com>>
Date: Wednesday, January 14, 2015 at 5:20 AM
To: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: [ceph-users] two mount points, two diffrent data

Hello I have trouble with this situation

#node1
mount /dev/rbd/rbd/test /mnt
cd /mnt
touch test1
ls (i see test1, OK)

#node2
mount /dev/rbd/rbd/test /mnt
cd /mnt
(i see test1, OK)
touch test2
ls (i see test2, OK)

#node1
ls (i see test1, BAD)
touch test3
ls (i see test1, test3 BAD)

#node2
ls (i see test1, test2 BAD)

Why data not replicating on mounting fs ?
I try with filesystems ext4 and xfs
The data is visible only when unmounted and mounted again

I check health in "ceph status" and is HEALTH_OK

What I doing wrong ?
Thanks for any help.


My system
Ubuntu 14.04.01 LTS

#ceph --version
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

#modinfo libceph
filename:   /lib/modules/3.13.0-44-generic/kernel/net/ceph/libceph.ko
license: GPL
description:Ceph filesystem for Linux
author:   Patience Warnick 
mailto:patie...@newdream.net>>
author:   Yehuda Sadeh 
mailto:yeh...@hq.newdream.net>>
author:   Sage Weil mailto:s...@newdream.net>>
srcversion: B8E83D4DFC53B113603CF52
depends:libcrc32c
intree:Y
vermagic:   3.13.0-44-generic SMP mod_unload modversions
signer:   Magrathea: Glacier signing key
sig_key:  50:8C:3B:4B:F1:08:ED:36:B6:06:2F:81:27:82:F7:7C:37:B9:85:37
sig_hashalgo:   sha512

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Better way to use osd's of different size

2015-01-16 Thread John Spray
On Wed, Jan 14, 2015 at 3:36 PM, Межов Игорь Александрович
 wrote:
> What is the more right way to do it:
>
> - replace 12x1tb drives with 12x2tb drives, so we will have 2 nodes full of
> 2tb drives and
>
> other nodes remains in 12x1tb confifg
>
> - or replace 1tb to 2tb drives in more unify way, so every node will have
> 6x1tb + 6x2tb drives?
>
>
> I feel that the second way will give more smooth distribution among the
> nodes, and
>
> outage of one node may give lesser impact on cluster. Am I right and what
> you can
>
> advice me in such a situation?

You are correct.  The CRUSH weight assigned to an OSD depends on its
capacity, so in order to fill a cluster evenly we have to write 2x as
quickly to a 2TB OSD than a 1TB OSD.  If some nodes had all the big
drives, then the network interfaces to those nodes would be overloaded
compared with the network interfaces to the other nodes.

However, even if the drives are spread out across nodes such that
there is no network imbalance, you will still have the local imbalance
within a node: if you are writing (across many PGs) 100MB/s to the 2TB
drives then you will only be writing 50MB/s to the 1TB drives.  You
could solve this in turn with some creative arrangement of pools with
crush rules to make sure that each pool was only using a single drive
size: that way you could have two pools that each got full bandwidth,
but one pool would be smaller than the other.  But if you don't care
about the bandwidth under-utilization on the older drives, then that
would be unnecessary complication.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.80.8 Firefly released

2015-01-16 Thread Sage Weil
This is a long-awaited bugfix release for firefly.  It has several 
imporant (but relatively rare) OSD peering fixes, performance issues when 
snapshots are trimmed, several RGW fixes, a paxos corner case fix, and 
some packaging updates.

We recommend that all users for v0.80.x firefly upgrade when it is 
convenient to do so.

Notable Changes
---

* build: remove stack-execute bit from assembled code sections (#10114 Dan 
  Mick)
* ceph-disk: fix dmcrypt key permissions (#9785 Loic Dachary)
* ceph-disk: fix keyring location (#9653 Loic Dachary)
* ceph-disk: make partition checks more robust (#9721 #9665 Loic Dachary)
* ceph: cleanly shut down librados context on shutdown (#8797 Dan Mick)
* common: add $cctid config metavariable (#6228 Adam Crume)
* crush: align rule and ruleset ids (#9675 Xiaoxi Chen)
* crush: fix negative weight bug during create_or_move_item (#9998 Pawel 
  Sadowski)
* crush: fix potential buffer overflow in erasure rules (#9492 Johnu 
  George)
* debian: fix python-ceph -> ceph file movement (Sage Weil)
* libcephfs,ceph-fuse: fix flush tid wraparound bug (#9869 Greg Farnum, 
  Yan, Zheng)
* libcephfs: close fd befure umount (#10415 Yan, Zheng)
* librados: fix crash from C API when read timeout is enabled (#9582 Sage 
  Weil)
* librados: handle reply race with pool deletion (#10372 Sage Weil)
* librbd: cap memory utilization for read requests (Jason Dillaman)
* librbd: do not close a closed parent image on failure (#10030 Jason 
  Dillaman)
* librbd: fix diff tests (#10002 Josh Durgin)
* librbd: protect list_children from invalid pools (#10123 Jason Dillaman)
* make check improvemens (Loic Dachary)
* mds: fix ctime updates (#9514 Greg Farnum)
* mds: fix journal import tool (#10025 John Spray)
* mds: fix rare NULL deref in cap flush handler (Greg Farnum)
* mds: handle unknown lock messages (Yan, Zheng)
* mds: store backtrace for straydir (Yan, Zheng)
* mon: abort startup if disk is full (#9502 Joao Eduardo Luis)
* mon: add paxos instrumentation (Sage Weil)
* mon: fix double-free in rare OSD startup path (Sage Weil)
* mon: fix osdmap trimming (#9987 Sage Weil)
* mon: fix paxos corner cases (#9301 #9053 Sage Weil)
* osd: cancel callback on blacklisted watchers (#8315 Samuel Just)
* osd: cleanly abort set-alloc-hint operations during upgrade (#9419 David 
  Zafman)
* osd: clear rollback PG metadata on PG deletion (#9293 Samuel Just)
* osd: do not abort deep scrub if hinfo is missing (#10018 Loic Dachary)
* osd: erasure-code regression tests (Loic Dachary)
* osd: fix distro metadata reporting for SUSE (#8654 Danny Al-Gaaf)
* osd: fix full OSD checks during backfill (#9574 Samuel Just)
* osd: fix ioprio parsing (#9677 Loic Dachary)
* osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, 
  Somnath Roy)
* osd: fix journal dump (Ma Jianpeng)
* osd: fix occasional stall during peering or activation (Sage Weil)
* osd: fix past_interval display bug (#9752 Loic Dachary)
* osd: fix rare crash triggered by admin socket dump_ops_in_filght (#9916 
  Dong Lei)
* osd: fix snap trimming performance issues (#9487 #9113 Samuel Just, Sage 
  Weil, Dan van der Ster, Florian Haas)
* osd: fix snapdir handling on cache eviction (#8629 Sage Weil)
* osd: handle map gaps in map advance code (Sage Weil)
* osd: handle undefined CRUSH results in interval check (#9718 Samuel 
  Just)
* osd: include shard in JSON dump of ghobject (#10063 Loic Dachary)
* osd: make backfill reservation denial handling more robust (#9626 Samuel 
  Just)
* osd: make misdirected op checks handle EC + primary affinity (#9835 
  Samuel Just, Sage Weil)
* osd: mount XFS with inode64 by default (Sage Weil)
* osd: other misc bugs (#9821 #9875 Samuel Just)
* rgw: add .log to default log path (#9353 Alexandre Marangone)
* rgw: clean up fcgi request context (#10194 Yehuda Sadeh)
* rgw: convet header underscores to dashes (#9206 Yehuda Sadeh)
* rgw: copy object data if copy target is in different pool (#9039 Yehuda 
  Sadeh)
* rgw: don't try to authenticate CORS peflight request (#8718 Robert 
  Hubbard, Yehuda Sadeh)
* rgw: fix civetweb URL decoding (#8621 Yehuda Sadeh)
* rgw: fix hash calculation during PUT (Yehuda Sadeh)
* rgw: fix misc bugs (#9089 #9201 Yehuda Sadeh)
* rgw: fix object tail test (#9226 Sylvain Munaut, Yehuda Sadeh)
* rgw: make sysvinit script run rgw under systemd context as needed 
  (#10125 Loic Dachary)
* rgw: separate civetweb log from rgw log (Yehuda Sadeh)
* rgw: set length for keystone token validations (#7796 Mark Kirkwood, 
  Yehuda Sadeh)
* rgw: subuser creation fixes (#8587 Yehuda Sadeh)
* rpm: misc packaging improvements (Sandon Van Ness, Dan Mick, Erik 
  Logthenberg, Boris Ranto)
* rpm: use standard udev rules for CentOS7/RHEL7 (#9747 Loic Dachary)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.8.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master

Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread Lindsay Mathieson
On Fri, 16 Jan 2015 08:48:38 AM Wido den Hollander wrote:
> In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
> CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18

Does the kernel version matter if you are using ceph-fuse?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r >=0)

2015-01-16 Thread Yan, Zheng
On Sat, Jan 17, 2015 at 11:47 AM, Lindsay Mathieson
 wrote:
> On Fri, 16 Jan 2015 08:48:38 AM Wido den Hollander wrote:
>> In Ceph world 0.72.2 is ancient en pretty old. If you want to play with
>> CephFS I recommend you upgrade to 0.90 and also use at least kernel 3.18
>
> Does the kernel version matter if you are using ceph-fuse?
>

no, kernel version does not matter if you use ceph-fuse

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com