Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-12-11 Thread SCHAER Frederic
Hi,

Back on this.
I finally found out a logic in the mapping.

So after taking the time to note all the disks serial numbers on 3 different 
machines and 2 different OSes, I now know that my specific LSI SAS 2008 cards 
(no reference on them, but I think those are LSI sas 9207-8i) map the disks of 
the MD1000  in the reverse alphabetic order :

sd{b..p} map to slot{14..0}

There is absolutely nothing else that appears usable, except the sas_address of 
the disks which seems associated with slots. 
But even this one is different depending on machines, and the address <-> slot 
mapping does not seem very obvious at the very least...

Good thing is that I now know that fun tools exist in packages such as 
sg3_tils, smp_utils and others like mpt-status...
Next step is to try an md1200 ;)

Thanks again
Cheers

-Message d'origine-
De : JF Le Fillâtre [mailto:jean-francois.lefilla...@uni.lu] 
Envoyé : mercredi 19 novembre 2014 13:42
À : SCHAER Frederic
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ?


Hello again,

So whatever magic allows the Dell MD1200 to report the slot position for
each disk isn't present in your JBODs. Time for something else.

There are two sides to your problem:

1) Identifying which disk is where in your JBOD

Quite easy. Again I'd go for a udev rule + script that will either
rename the disks entirely, or create a symlink with a name like
"jbodX-slotY" or something to figure out easily which is which. The
mapping end-device-to-slot can be static in the script, so you need to
identify once the order in which the kernel scans the slots and then you
can map.

But it won't survive a disk swap or a change of scanning order from a
kernel upgrade, so it's not enough.

2) Finding a way of identification independent of hot-plugs and scan order

That's the tricky part. If you remove a disk from your JBOD and replace
it with another one, the other one will get another "sdX" name, and in
my experience even another "end_device-..." name. But given that you
want the new disk to have the exact same name or symlink as the previous
one, you have to find something in the path of the device or (better) in
the udev attributes that is immutable.

If possible at all, it will depend on your specific hardware
combination, so you will have to try for yourself.

Suggested methodology:

1) write down the serial number of one drive in any slot, and figure out
its device name (sdX) with "smartctl -i /dev/sd..."

2) grab the detailed /sys path name and list of udev attributes:
readlink -f /sys/class/block/sdX
udevadm info --attribute-walk /dev/sdX

3) pull that disk and replace it. Check the logs to see which is its new
device name (sdY)

4) rerun the commands from #2 with sdY

5) compare the outputs and find something in the path or in the
attributes that didn't change and is unique to that disk (ie not a
common parent for example).

If you have something that really didn't change, you're in luck. Either
use the serial numbers or unplug and replug all disks one by one to
figure out the mapping slot number / immutable item.

Then write the udev rule. :)

Thanks!
JF



On 19/11/14 11:29, SCHAER Frederic wrote:
> Hi
> 
> Thanks.
> I hoped it would be it, but no ;)
> 
> With this mapping :
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:6/end_device-1:1:6/target1:0:7/1:0:7:0/block/sdh
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdi -> 
> ../../devices/pci:00/:00:04.0

Re: [ceph-users] Again: full ssd ceph cluster

2014-12-11 Thread Christian Balzer

Hello,

On Wed, 10 Dec 2014 18:08:23 +0300 Mike wrote:

> Hello all!
> Some our customer asked for only ssd storage.
> By now we looking to 2027R-AR24NV w/ 3 x HBA controllers (LSI3008 chip,
> 8 internal 12Gb ports on each), 24 x Intel DC S3700 800Gb SSD drives, 2
> x mellanox 40Gbit ConnectX-3 (maybe newer ConnectX-4 100Gbit) and Xeon
> e5-2660V2 with 64Gb RAM.

A bit skimpy on the RAM given the amount of money you're willing to spend
otherwise.
And while you're giving it 20 2.2GHz cores, that's not going to cut, not
by a long shot. 
I did some brief tests with a machine having 8 DC S3700 100GB for OSDs
(replica 1) under 0.80.6 and the right (make that wrong) type of load
(small, 4k I/Os) did melt all of the 8 3.5GHz cores in that box.

The suggest 1GHz per OSD by the Ceph team is for pure HDD based OSDs, the
moment you add journals on SSDs it already becomes barely enough with 3GHz
cores when dealing with many small I/Os.

> Replica is 2.
> Or something like that but in 1U w/ 8 SSD's.
> 
The potential CPU power to OSD ratio will be much better with this.

> We see a little bottle neck on network cards, but the biggest question
> can ceph (giant release) with sharding io and new cool stuff release
> this potential?
> 
You shouldn't worry too much about network bandwidth unless you're going
to use this super expensive setup for streaming backups. ^o^ 
I'm certain you'll run out of IOPS long before you'll run out of network
bandwidth.

Given that what I recall of the last SSD cluster discussion, most of the
Giant benefits were for read operations and the write improvement was
about double that of Firefly. While nice, given my limited tests that is
still a far cry away from what those SSDs can do, see above.

> Any ideas?
>
Somebody who actually has upgraded an SSD cluster from Firefly to Giant
would be in the correct position to answer that.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again: full ssd ceph cluster

2014-12-11 Thread Florent MONTHEL
Hi

Is it possible to share performance results with this kind of config? How many 
iops? Bandwidth ? Latency ?
Thanks

Sent from my iPhone

> On 11 déc. 2014, at 09:35, Christian Balzer  wrote:
> 
> 
> Hello,
> 
>> On Wed, 10 Dec 2014 18:08:23 +0300 Mike wrote:
>> 
>> Hello all!
>> Some our customer asked for only ssd storage.
>> By now we looking to 2027R-AR24NV w/ 3 x HBA controllers (LSI3008 chip,
>> 8 internal 12Gb ports on each), 24 x Intel DC S3700 800Gb SSD drives, 2
>> x mellanox 40Gbit ConnectX-3 (maybe newer ConnectX-4 100Gbit) and Xeon
>> e5-2660V2 with 64Gb RAM.
> 
> A bit skimpy on the RAM given the amount of money you're willing to spend
> otherwise.
> And while you're giving it 20 2.2GHz cores, that's not going to cut, not
> by a long shot. 
> I did some brief tests with a machine having 8 DC S3700 100GB for OSDs
> (replica 1) under 0.80.6 and the right (make that wrong) type of load
> (small, 4k I/Os) did melt all of the 8 3.5GHz cores in that box.
> 
> The suggest 1GHz per OSD by the Ceph team is for pure HDD based OSDs, the
> moment you add journals on SSDs it already becomes barely enough with 3GHz
> cores when dealing with many small I/Os.
> 
>> Replica is 2.
>> Or something like that but in 1U w/ 8 SSD's.
> The potential CPU power to OSD ratio will be much better with this.
> 
>> We see a little bottle neck on network cards, but the biggest question
>> can ceph (giant release) with sharding io and new cool stuff release
>> this potential?
> You shouldn't worry too much about network bandwidth unless you're going
> to use this super expensive setup for streaming backups. ^o^ 
> I'm certain you'll run out of IOPS long before you'll run out of network
> bandwidth.
> 
> Given that what I recall of the last SSD cluster discussion, most of the
> Giant benefits were for read operations and the write improvement was
> about double that of Firefly. While nice, given my limited tests that is
> still a far cry away from what those SSDs can do, see above.
> 
>> Any ideas?
> Somebody who actually has upgraded an SSD cluster from Firefly to Giant
> would be in the correct position to answer that.
> 
> Christian
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Joao Eduardo Luis

On 12/11/2014 04:18 AM, Christian Balzer wrote:

On Wed, 10 Dec 2014 20:09:01 -0800 Christopher Armstrong wrote:


Christian,

That indeed looks like the bug! We tried with moving the monitor
host/address into global and everything works as expected - see
https://github.com/deis/deis/issues/2711#issuecomment-66566318

This seems like a potentially bad bug - how has it not come up before?


Ah, but as you can see from the issue report is has come up before.
But that discussion as well as that report clearly fell through the cracks.

It's another reason I dislike ceph-deploy, as people using just it
(probably the vast majority) will be unaffected as it stuffs everything
into [global].

People reading the documentation examples or coming from older versions
(and making changes to their config) will get bitten.


I find this extremely weird.  I, as I suppose most devs do, use clusters 
for testing that are not deployed using ceph-deploy.  These are built 
and configured using vstart.sh, which builds a ceph.conf from scratch 
without 'mon initial members' or 'mon hosts', being the monmap derived 
from specific [mon.X] sections.


In any case, I decided to give this a shot and build a local, 3-node 
cluster on addresses 127.0.0.{1,2,3}:6789, using Christopher's 
configuration file, relying as much as possible on specific mon config 
(attached).


You will notice that the main differences between this config file and a 
production config would be the lot of config keys that were overridden 
from default paths to something like 
'/home/ubuntu/tmp/foo/{run,dev,out}/' -- this allows me to run ceph from 
a dev branch instead of having to install it on the system (could have 
used docker but didn't think that was the point).


Anyway, you'll also notice that each mon section has a bunch of config 
options that you wouldn't otherwise see;  this is mostly a dev conf, and 
I copied whatever I found reasonable from vstart.sh-generated ceph.conf.


I also dropped the 'mon initial members = nodo-3' from the [global] 
section.  Keeping it would lead the monitors to be unable to create a 
proper monmap on bootstrap, as each would only know of a single monitor. 
 Besides, the point was to test the specific [mon.X] config sections, 
and if I were to properly config 'mon initial members' we would end up 
in the scenario that Christian is complaining about.


Anyway, I aliased a few lo addresses to make sure each mon gets a 
different ip (albeit local).  Monitors were built relying solely on the 
config file:


for i in 1 2 3; do ceph-mon -i nodo-$i --mkfs -d || break ; done

and were run in much the same way:

for i in 1 2 3; do ceph-mon -i nodo-$i || break ; done

from tailing the logs it was clear the monitors had formed a quorum (one 
was a leader, two were peons), so they were clearly able to build a 
proper monmap and find each other.


Running the 'ceph' tool with '--debug-monc 10' also shows the monitors 
are able to build an initial monmap (and they later on reach the 
monitors for a status report):


ubuntu@terminus:~/tmp/foo$ ceph -s --debug-monc 10
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2014-12-11 09:47:44.114010 7fb4660c0700 10 monclient(hunting): 
build_initial_monmap

2014-12-11 09:47:44.114338 7fb4660c0700 10 monclient(hunting): init
2014-12-11 09:47:44.114345 7fb4660c0700 10 monclient(hunting): 
auth_supported 2 method cephx
2014-12-11 09:47:44.114564 7fb4660c0700 10 monclient(hunting): 
_reopen_session rank -1 name
2014-12-11 09:47:44.114606 7fb4660c0700 10 monclient(hunting): picked 
mon.nodo-2 con 0x7fb4600102f0 addr 127.0.0.2:6789/0
2014-12-11 09:47:44.114623 7fb4660c0700 10 monclient(hunting): 
_send_mon_message to mon.nodo-2 at 127.0.0.2:6789/0


[...]

2014-12-11 09:47:44.151646 7fb4577fe700 10 monclient: 
handle_mon_command_ack 2 [{"prefix": "status"}]

2014-12-11 09:47:44.151648 7fb4577fe700 10 monclient: _finish_command 2 = 0
cluster fc0e2e09-ade3-4ff6-b23e-f789775b2515
 health HEALTH_ERR
64 pgs stuck inactive
64 pgs stuck unclean
no osds
mon.nodo-1 low disk space
mon.nodo-2 low disk space
mon.nodo-3 low disk space
 monmap e1: 3 mons at 
{nodo-1=127.0.0.1:6789/0,nodo-2=127.0.0.2:6789/0,nodo-3=127.0.0.3:6789/0}

election epoch 6, quorum 0,1,2 nodo-1,nodo-2,nodo-3
 osdmap e1: 0 osds: 0 up, 0 in
  pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
  64 creating

Next, radosgw:

ubuntu@terminus:~/tmp/foo$ radosgw -d --debug-monc 10
warning: line 19: 'log_file' in section 'global' redefined
2014-12-11 09:51:05.015312 7f6ad66787c0  0 ceph version 
0.89-420-g6f4b98d (6f4b98df317816c11838f0c339be3a8d19d47a25), process 
lt-radosgw, pid 4480
2014-12-11 09:51:05.017022 7f6ad66787c0 10 monclient(hunting): 
build_initial_monmap

2014-12-11 09:51:05.017376 7f6ad66787c0 10 monclient(hunting): init
2014-12-11 09:51:05.017398 7f6ad

Re: [ceph-users] unable to repair PG

2014-12-11 Thread Luis Periquito
Hi,

I've stopped OSD.16, removed the PG from the local filesystem and started
the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
deep-scrub and the PG is still inconsistent.

I'm running out of ideas on trying to solve this. Does this mean that all
copies of the object should also be inconsistent? Should I just try to
figure which object/bucket this belongs to and delete it/copy it again to
the ceph cluster?

Also, do you know what the error message means? is it just some sort of
metadata for this object that isn't correct, not the object itself?

On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito 
wrote:

> Hi,
>
> In the last few days this PG (pool is .rgw.buckets) has been in error
> after running the scrub process.
>
> After getting the error, and trying to see what may be the issue (and
> finding none), I've just issued a ceph repair followed by a ceph
> deep-scrub. However it doesn't seem to have fixed the issue and it still
> remains.
>
> The relevant log from the OSD is as follows.
>
> 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
> missing, 1 inconsistent objects
> 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
> errors
> 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180 repair ok, 0
> fixed
> 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard 6: soid
> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr _user.rgw.acl,
> missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing
> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr
> _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat,
> missing attr snapset
> 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
> missing, 1 inconsistent objects
> 2014-12-10 10:56:50.597957 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
> errors
>
> I'm running version firefly 0.80.7.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's the difference between ceph-0.87-0.el6.x86_64.rpm and ceph-0.80.7-0.el6.x86_64.rpm

2014-12-11 Thread Rodrigo Severo
On Thu, Dec 11, 2014 at 3:18 AM, Irek Fasikhov  wrote:
> Hi, Cao.
>
> https://github.com/ceph/ceph/commits/firefly
>
>
> 2014-12-11 5:00 GMT+03:00 Cao, Buddy :
>>
>> Hi, I tried to download firefly rpm package, but found two rpms existing
>> in different folders, what is the difference of 0.87.0 and  0.80.7?


May be a better place to understand the difference:
http://ceph.com/category/releases/

0.80.7 is firefly
0.87 is giant


Regards,

Rodrigo Severo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again: full ssd ceph cluster

2014-12-11 Thread Mike
Hello,
On 12/11/2014 11:35 AM, Christian Balzer wrote:
> 
> Hello,
> 
> On Wed, 10 Dec 2014 18:08:23 +0300 Mike wrote:
> 
>> Hello all!
>> Some our customer asked for only ssd storage.
>> By now we looking to 2027R-AR24NV w/ 3 x HBA controllers (LSI3008 chip,
>> 8 internal 12Gb ports on each), 24 x Intel DC S3700 800Gb SSD drives, 2
>> x mellanox 40Gbit ConnectX-3 (maybe newer ConnectX-4 100Gbit) and Xeon
>> e5-2660V2 with 64Gb RAM.
> 
> A bit skimpy on the RAM given the amount of money you're willing to spend
> otherwise.
I think more amount of RAM can help with re-balance process in cluster
when one node fail.

> And while you're giving it 20 2.2GHz cores, that's not going to cut, not
> by a long shot. 
> I did some brief tests with a machine having 8 DC S3700 100GB for OSDs
> (replica 1) under 0.80.6 and the right (make that wrong) type of load
> (small, 4k I/Os) did melt all of the 8 3.5GHz cores in that box.
We can choose something more powerful from E5-266xV3 family.

> The suggest 1GHz per OSD by the Ceph team is for pure HDD based OSDs, the
> moment you add journals on SSDs it already becomes barely enough with 3GHz
> cores when dealing with many small I/Os.
> 
>> Replica is 2.
>> Or something like that but in 1U w/ 8 SSD's.
>>
> The potential CPU power to OSD ratio will be much better with this.
> 
Yes, it looks more right.

>> We see a little bottle neck on network cards, but the biggest question
>> can ceph (giant release) with sharding io and new cool stuff release
>> this potential?
>>
> You shouldn't worry too much about network bandwidth unless you're going
> to use this super expensive setup for streaming backups. ^o^ 
> I'm certain you'll run out of IOPS long before you'll run out of network
> bandwidth.
> 
I think about a bottle neck in kernel IO subsystem also.

> Given that what I recall of the last SSD cluster discussion, most of the
> Giant benefits were for read operations and the write improvement was
> about double that of Firefly. While nice, given my limited tests that is
> still a far cry away from what those SSDs can do, see above.
> 
I also read all this treads about giant read perfomans. But on write it
double worst now?

>> Any ideas?
>>
> Somebody who actually has upgraded an SSD cluster from Firefly to Giant
> would be in the correct position to answer that.
> 
> Christian
> 

Thank you for useful opinion, Christian!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Joao Eduardo Luis

On 12/11/2014 02:46 PM, Gregory Farnum wrote:

On Thu, Dec 11, 2014 at 2:21 AM, Joao Eduardo Luis  wrote:

On 12/11/2014 04:28 AM, Christopher Armstrong wrote:


If someone could point me to where this fix should go in the code, I'd
actually love to dive in - I've been wanting to contribute back to Ceph,
and this bug has hit us personally so I think it's a good candidate :)



I'm not sure where the bug is or what it may be (see reply to Christian's
email sent a few minutes ago).

I believe the first step to assess what's happening is to reliably reproduce
this.  Ideally in a different environment, or such that it makes it clear
it's not an issue specific to your deployment.

Next, say it's indeed the config file that is being misread: you'd probably
want to look into common/config.{cc,h}.  If it happens to be a bug while
building a monmap, you'd want to look into mon/MonMap.cc and
mon/MonClient.cc.  Being a radosgw issue, you'll probably want to look into
'rgw/*' and/or 'librados/*', but maybe someone else could give you the
pointers for those.

I think the main task now is to reliably reproduce this.  I haven't been
able to from the config you provided, but I may have made some assumptions
that end up negating the whole bug.


It might require some kind of library bug, maybe...? What OS did you
do this on, and what are people running when they see this not work?
-Greg



For what is worth, I tested this against few-days-old master 
(bc2b9f6bf5fa629e127852720d6ad42ef1276b12) on ubuntu 14.04.


  -Joao

--
Joao Eduardo Luis
Software Engineer | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] tgt / rbd performance

2014-12-11 Thread ano nym
there is a ceph pool on a hp dl360g5 with 25 sas 10k (sda-sdy) on a msa70
which gives me about 600 MB/s continous write speed with rados write bench.
tgt on the server with rbd backend uses this pool. mounting local(host)
with iscsiadm, sdz is the virtual iscsi device. As you can see, sdz max out
with 100%util at ~55MB/s when writing to it.

I know that tgt-rbd is more a proof-of-concept then production-ready.

Anyway, is someone using it and/or are there any hints to speed it up?

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  18.310.00   26.01   53.280.002.40

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 5.000.00   35.00 0.00  8316.50   475.23
2.36   68.740.00   68.74  15.11  52.90
sdb   0.00 0.000.00   24.00 0.00  3718.50   309.88
0.87   49.540.00   49.54  14.96  35.90
sdc   0.00 9.000.00   64.00 0.00 19739.00   616.84
3.33   51.120.00   51.12   9.20  58.90
sdd   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sde   0.00 0.000.002.00 0.00   520.00   520.00
0.66   86.000.00   86.00  78.00  15.60
sdf   0.00 8.000.00  103.00 0.00 34638.00   672.58
7.45   78.690.00   78.69   9.46  97.40
sdg   0.00 3.000.00   74.00 0.00 25472.00   688.43
4.47   60.500.00   60.50   9.62  71.20
sdh   0.00 0.000.00   17.00 0.00  5760.00   677.65
0.62   45.650.00   45.65  14.29  24.30
sdi   0.00 3.000.00   43.00 0.00 14724.00   684.84
2.92   67.910.00   67.91  12.98  55.80
sdj   0.00 4.001.00   45.00 4.00 14152.00   615.48
3.18   58.52   19.00   59.40  13.59  62.50
sdk   0.00 6.000.00   75.00 0.00 29047.50   774.60
6.35   61.080.00   61.08  10.23  76.70
sdm   0.00 3.000.00   48.00 0.00 16488.00   687.00
1.75   36.400.00   36.40   9.08  43.60
sdl   0.00 1.000.00   46.00 0.00 17412.00   757.04
4.94  108.630.00  108.63   9.48  43.60
sdn   0.00 0.000.00   51.00 0.00 17692.00   693.80
4.49   88.100.00   88.10   9.90  50.50
sdo   0.00 8.000.00   55.00 0.00 13574.00   493.60
3.44   62.890.00   62.89   8.55  47.00
sdp   0.00 0.000.00   40.00 0.00 14488.00   724.40
2.82   69.720.00   69.72  16.88  67.50
sdq   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdr   0.00 1.000.00   32.00 0.00  8856.00   553.50
0.78   24.220.00   24.22   9.53  30.50
sds   0.00 7.001.00   51.00 4.00 16132.50   620.63
3.14   58.98   79.00   58.59  16.48  85.70
sdt   0.00 2.000.00   50.00 0.00 19040.00   761.60
3.56   71.120.00   71.12   9.54  47.70
sdu   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdw   0.00 0.000.00   16.00 0.00  6388.00   798.50
0.95   77.620.00   77.62  11.56  18.50
sdx   0.00 1.000.00   28.00 0.00  9840.00   702.86
1.40   49.860.00   49.86  12.43  34.80
sdy   0.00 2.000.00   54.00 0.00 20168.00   746.96
5.64   72.650.00   72.65   9.72  52.50
sdv   0.00 0.000.00   15.00 0.00  4300.00   573.33
0.49   33.070.00   33.07  19.00  28.50

sdz   0.00 0.000.00  115.00 0.00 57468.00   999.44
  143.50  736.480.00  736.48   8.70 100.00

b.r. Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RESOLVED Re: Cluster with pgs in active (unclean) status

2014-12-11 Thread Gregory Farnum
Was there any activity against your cluster when you reduced the size
from 3 -> 2? I think maybe it was just taking time to percolate
through the system if nothing else was going on. When you reduced them
to size 1 then data needed to be deleted so everything woke up and
started processing.
-Greg

On Wed, Dec 10, 2014 at 5:27 AM, Eneko Lacunza  wrote:
> Hi all,
>
> I fixed the issue with the following commands:
> # ceph osd pool set data size 1
> (wait some seconds for clean+active state of +64pgs)
> # ceph osd pool set data size 2
> # ceph osd pool set metadata size 1
> (wait some seconds for clean+active state of +64pgs)
> # ceph osd pool set metadata size 2
> # ceph osd pool set rbd size 1
> (wait some seconds for clean+active state of +64pgs)
> # ceph osd pool set rbd size 2
>
> This now gives me:
> # ceph status
> cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
>  health HEALTH_OK
>  monmap e3: 3 mons at
> {0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
> quorum 0,1,2 1,2,0
>  osdmap e275: 2 osds: 2 up, 2 in
>   pgmap v395557: 256 pgs, 4 pools, 194 GB data, 49820 objects
> 388 GB used, 116 GB / 505 GB avail
>  256 active+clean
>
> I'm still curious whether this can be fixed without this trick?
>
> Cheers
> Eneko
>
>
> On 10/12/14 13:14, Eneko Lacunza wrote:
>>
>> Hi all,
>>
>> I have a small ceph cluster with just 2 OSDs, latest firefly.
>>
>> Default data, metadata and rbd pools were created with size=3 and
>> min_size=1
>> An additional pool rbd2 was created with size=2 and min_size=1
>>
>> This would give me a warning status, saying that 64 pgs were active+clean
>> and 192 active+degraded. (there are 64 pg per pool).
>>
>> I realized it was due to the size=3 in the three pools, so I changed that
>> value to 2:
>> # ceph osd pool set data size 2
>> # ceph osd pool set metadata size 2
>> # ceph osd pool set rbd size 2
>>
>> Those 3 pools are empty. After those commands status would report 64 pgs
>> active+clean, and 192 pgs active, with a warning saying 192 pgs were
>> unclean.
>>
>> I have created a rbd block with:
>> rbd create -p rbd --image test --size 1024
>>
>> And now the status is:
>> # ceph status
>> cluster 3e91b908-2af3-4288-98a5-dbb77056ecc7
>>  health HEALTH_WARN 192 pgs stuck unclean; recovery 2/99640 objects
>> degraded (0.002%)
>>  monmap e3: 3 mons at
>> {0=10.0.3.3:6789/0,1=10.0.3.1:6789/0,2=10.0.3.2:6789/0}, election epoch 32,
>> quorum 0,1,2 1,2,0
>>  osdmap e263: 2 osds: 2 up, 2 in
>>   pgmap v393763: 256 pgs, 4 pools, 194 GB data, 49820 objects
>> 388 GB used, 116 GB / 505 GB avail
>> 2/99640 objects degraded (0.002%)
>>  192 active
>>   64 active+clean
>>
>> Looking to an unclean non-empty pg:
>> # ceph pg 2.14 query
>> { "state": "active",
>>   "epoch": 263,
>>   "up": [
>> 0,
>> 1],
>>   "acting": [
>> 0,
>> 1],
>>   "actingbackfill": [
>> "0",
>> "1"],
>>   "info": { "pgid": "2.14",
>>   "last_update": "263'1",
>>   "last_complete": "263'1",
>>   "log_tail": "0'0",
>>   "last_user_version": 1,
>>   "last_backfill": "MAX",
>>   "purged_snaps": "[]",
>>   "history": { "epoch_created": 1,
>>   "last_epoch_started": 136,
>>   "last_epoch_clean": 136,
>>   "last_epoch_split": 0,
>>   "same_up_since": 135,
>>   "same_interval_since": 135,
>>   "same_primary_since": 11,
>>   "last_scrub": "0'0",
>>   "last_scrub_stamp": "2014-11-26 12:23:57.023493",
>>   "last_deep_scrub": "0'0",
>>   "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493",
>>   "last_clean_scrub_stamp": "0.00"},
>>   "stats": { "version": "263'1",
>>   "reported_seq": "306",
>>   "reported_epoch": "263",
>>   "state": "active",
>>   "last_fresh": "2014-12-10 12:53:37.766465",
>>   "last_change": "2014-12-10 10:32:24.189000",
>>   "last_active": "2014-12-10 12:53:37.766465",
>>   "last_clean": "0.00",
>>   "last_became_active": "0.00",
>>   "last_unstale": "2014-12-10 12:53:37.766465",
>>   "mapping_epoch": 128,
>>   "log_start": "0'0",
>>   "ondisk_log_start": "0'0",
>>   "created": 1,
>>   "last_epoch_clean": 136,
>>   "parent": "0.0",
>>   "parent_split_bits": 0,
>>   "last_scrub": "0'0",
>>   "last_scrub_stamp": "2014-11-26 12:23:57.023493",
>>   "last_deep_scrub": "0'0",
>>   "last_deep_scrub_stamp": "2014-11-26 12:23:57.023493",
>>   "last_clean_scrub_stamp": "0.00",
>>   "log_size": 1,
>>   "ondisk_log_size": 1,
>>   "stats_invalid": "0",
>>   "stat_sum": { "num_bytes": 112,
>>   "num_objects": 1,
>>   "num_object_clones": 0,
>>   "num_object_copies": 

Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Christopher Armstrong
Our users are running CoreOS with kernel 3.17.2. Our user tested this by
setting up the config and then bringing down one of the mons. See
https://github.com/deis/deis/issues/2711#issuecomment-66566318 for his
testing scenario.

On Thu, Dec 11, 2014 at 8:16 AM, Joao Eduardo Luis  wrote:

> On 12/11/2014 02:46 PM, Gregory Farnum wrote:
>
>> On Thu, Dec 11, 2014 at 2:21 AM, Joao Eduardo Luis 
>> wrote:
>>
>>> On 12/11/2014 04:28 AM, Christopher Armstrong wrote:
>>>

 If someone could point me to where this fix should go in the code, I'd
 actually love to dive in - I've been wanting to contribute back to Ceph,
 and this bug has hit us personally so I think it's a good candidate :)

>>>
>>>
>>> I'm not sure where the bug is or what it may be (see reply to Christian's
>>> email sent a few minutes ago).
>>>
>>> I believe the first step to assess what's happening is to reliably
>>> reproduce
>>> this.  Ideally in a different environment, or such that it makes it clear
>>> it's not an issue specific to your deployment.
>>>
>>> Next, say it's indeed the config file that is being misread: you'd
>>> probably
>>> want to look into common/config.{cc,h}.  If it happens to be a bug while
>>> building a monmap, you'd want to look into mon/MonMap.cc and
>>> mon/MonClient.cc.  Being a radosgw issue, you'll probably want to look
>>> into
>>> 'rgw/*' and/or 'librados/*', but maybe someone else could give you the
>>> pointers for those.
>>>
>>> I think the main task now is to reliably reproduce this.  I haven't been
>>> able to from the config you provided, but I may have made some
>>> assumptions
>>> that end up negating the whole bug.
>>>
>>
>> It might require some kind of library bug, maybe...? What OS did you
>> do this on, and what are people running when they see this not work?
>> -Greg
>>
>>
> For what is worth, I tested this against few-days-old master (
> bc2b9f6bf5fa629e127852720d6ad42ef1276b12) on ubuntu 14.04.
>
>   -Joao
>
>
> --
> Joao Eduardo Luis
> Software Engineer | http://ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Christopher Armstrong
(On Giant, v0.87)

On Thu, Dec 11, 2014 at 10:34 AM, Christopher Armstrong 
wrote:

> Our users are running CoreOS with kernel 3.17.2. Our user tested this by
> setting up the config and then bringing down one of the mons. See
> https://github.com/deis/deis/issues/2711#issuecomment-66566318 for his
> testing scenario.
>
> On Thu, Dec 11, 2014 at 8:16 AM, Joao Eduardo Luis 
> wrote:
>
>> On 12/11/2014 02:46 PM, Gregory Farnum wrote:
>>
>>> On Thu, Dec 11, 2014 at 2:21 AM, Joao Eduardo Luis 
>>> wrote:
>>>
 On 12/11/2014 04:28 AM, Christopher Armstrong wrote:

>
> If someone could point me to where this fix should go in the code, I'd
> actually love to dive in - I've been wanting to contribute back to
> Ceph,
> and this bug has hit us personally so I think it's a good candidate :)
>


 I'm not sure where the bug is or what it may be (see reply to
 Christian's
 email sent a few minutes ago).

 I believe the first step to assess what's happening is to reliably
 reproduce
 this.  Ideally in a different environment, or such that it makes it
 clear
 it's not an issue specific to your deployment.

 Next, say it's indeed the config file that is being misread: you'd
 probably
 want to look into common/config.{cc,h}.  If it happens to be a bug while
 building a monmap, you'd want to look into mon/MonMap.cc and
 mon/MonClient.cc.  Being a radosgw issue, you'll probably want to look
 into
 'rgw/*' and/or 'librados/*', but maybe someone else could give you the
 pointers for those.

 I think the main task now is to reliably reproduce this.  I haven't been
 able to from the config you provided, but I may have made some
 assumptions
 that end up negating the whole bug.

>>>
>>> It might require some kind of library bug, maybe...? What OS did you
>>> do this on, and what are people running when they see this not work?
>>> -Greg
>>>
>>>
>> For what is worth, I tested this against few-days-old master (
>> bc2b9f6bf5fa629e127852720d6ad42ef1276b12) on ubuntu 14.04.
>>
>>   -Joao
>>
>>
>> --
>> Joao Eduardo Luis
>> Software Engineer | http://ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unable to repair PG

2014-12-11 Thread Gregory Farnum
On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito  wrote:
> Hi,
>
> I've stopped OSD.16, removed the PG from the local filesystem and started
> the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
> deep-scrub and the PG is still inconsistent.

What led you to remove it from osd 16? Is that the one hosting the log
you snipped from? Is osd 16 the one hosting shard 6 of that PG, or was
it the primary?
Anyway, the message means that shard 6 (which I think is the seventh
OSD in the list) of PG 9.180 is missing a bunch of xattrs on object
370cbf80/29145.4_xxx/head//9. I'm actually a little surprised it
didn't crash if it's missing the "_" attr
-Greg

>
> I'm running out of ideas on trying to solve this. Does this mean that all
> copies of the object should also be inconsistent? Should I just try to
> figure which object/bucket this belongs to and delete it/copy it again to
> the ceph cluster?
>
> Also, do you know what the error message means? is it just some sort of
> metadata for this object that isn't correct, not the object itself?
>
> On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito 
> wrote:
>>
>> Hi,
>>
>> In the last few days this PG (pool is .rgw.buckets) has been in error
>> after running the scrub process.
>>
>> After getting the error, and trying to see what may be the issue (and
>> finding none), I've just issued a ceph repair followed by a ceph deep-scrub.
>> However it doesn't seem to have fixed the issue and it still remains.
>>
>> The relevant log from the OSD is as follows.
>>
>> 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
>> missing, 1 inconsistent objects
>> 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
>> errors
>> 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180 repair ok, 0
>> fixed
>> 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard 6: soid
>> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr _user.rgw.acl,
>> missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing
>> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr
>> _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat, missing
>> attr snapset
>> 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
>> missing, 1 inconsistent objects
>> 2014-12-10 10:56:50.597957 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
>> errors
>>
>> I'm running version firefly 0.80.7.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tgt / rbd performance

2014-12-11 Thread Nick Fisk
Hi,

 

Can you post the commands you ran for both benchmarks, without knowing the 
block size, write pattern and queue depth, it’s hard to determine where the 
bottleneck might be.

 

I can see OSD sde has a very high service time, which could be a sign of a 
problem, does it always show up high during tests, or is this just a one off 
spike?

 

The service times as a whole also look slightly high, I’m assuming there is a 
write back cache on the HP controller?

 

It might also be worth checking the max queue depths of both target and 
initiator to make sure you are not hitting any limits.

 

Nick

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ano nym
Sent: 11 December 2014 17:40
To: ceph-users@lists.ceph.com
Subject: [ceph-users] tgt / rbd performance

 

 

there is a ceph pool on a hp dl360g5 with 25 sas 10k (sda-sdy) on a msa70 which 
gives me about 600 MB/s continous write speed with rados write bench. tgt on 
the server with rbd backend uses this pool. mounting local(host) with iscsiadm, 
sdz is the virtual iscsi device. As you can see, sdz max out with 100%util at 
~55MB/s when writing to it. 

 

I know that tgt-rbd is more a proof-of-concept then production-ready. 

 

Anyway, is someone using it and/or are there any hints to speed it up?

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

  18.310.00   26.01   53.280.002.40

 

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util

sda   0.00 5.000.00   35.00 0.00  8316.50   475.23 
2.36   68.740.00   68.74  15.11  52.90

sdb   0.00 0.000.00   24.00 0.00  3718.50   309.88 
0.87   49.540.00   49.54  14.96  35.90

sdc   0.00 9.000.00   64.00 0.00 19739.00   616.84 
3.33   51.120.00   51.12   9.20  58.90

sdd   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

sde   0.00 0.000.002.00 0.00   520.00   520.00 
0.66   86.000.00   86.00  78.00  15.60

sdf   0.00 8.000.00  103.00 0.00 34638.00   672.58 
7.45   78.690.00   78.69   9.46  97.40

sdg   0.00 3.000.00   74.00 0.00 25472.00   688.43 
4.47   60.500.00   60.50   9.62  71.20

sdh   0.00 0.000.00   17.00 0.00  5760.00   677.65 
0.62   45.650.00   45.65  14.29  24.30

sdi   0.00 3.000.00   43.00 0.00 14724.00   684.84 
2.92   67.910.00   67.91  12.98  55.80

sdj   0.00 4.001.00   45.00 4.00 14152.00   615.48 
3.18   58.52   19.00   59.40  13.59  62.50

sdk   0.00 6.000.00   75.00 0.00 29047.50   774.60 
6.35   61.080.00   61.08  10.23  76.70

sdm   0.00 3.000.00   48.00 0.00 16488.00   687.00 
1.75   36.400.00   36.40   9.08  43.60

sdl   0.00 1.000.00   46.00 0.00 17412.00   757.04 
4.94  108.630.00  108.63   9.48  43.60

sdn   0.00 0.000.00   51.00 0.00 17692.00   693.80 
4.49   88.100.00   88.10   9.90  50.50

sdo   0.00 8.000.00   55.00 0.00 13574.00   493.60 
3.44   62.890.00   62.89   8.55  47.00

sdp   0.00 0.000.00   40.00 0.00 14488.00   724.40 
2.82   69.720.00   69.72  16.88  67.50

sdq   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

sdr   0.00 1.000.00   32.00 0.00  8856.00   553.50 
0.78   24.220.00   24.22   9.53  30.50

sds   0.00 7.001.00   51.00 4.00 16132.50   620.63 
3.14   58.98   79.00   58.59  16.48  85.70

sdt   0.00 2.000.00   50.00 0.00 19040.00   761.60 
3.56   71.120.00   71.12   9.54  47.70

sdu   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

sdw   0.00 0.000.00   16.00 0.00  6388.00   798.50 
0.95   77.620.00   77.62  11.56  18.50

sdx   0.00 1.000.00   28.00 0.00  9840.00   702.86 
1.40   49.860.00   49.86  12.43  34.80

sdy   0.00 2.000.00   54.00 0.00 20168.00   746.96 
5.64   72.650.00   72.65   9.72  52.50

sdv   0.00 0.000.00   15.00 0.00  4300.00   573.33 
0.49   33.070.00   33.07  19.00  28.50

 

sdz   0.00 0.000.00  115.00 0.00 57468.00   999.44   
143.50  736.480.00  736.48   8.70 100.00

 

b.r. Michael




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Yehuda Sadeh
The branch I pushed earlier was based off recent development branch. I
just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
 wrote:
> Hi again!
>
> I have installed and enabled the development branch repositories as
> described here:
>
> http://ceph.com/docs/master/install/get-packages/#add-ceph-development
>
> and when I try to update the ceph-radosgw package I get the following:
>
> Installed Packages
> Name: ceph-radosgw
> Arch: x86_64
> Version : 0.80.7
> Release : 0.el6
> Size: 3.8 M
> Repo: installed
> From repo   : Ceph
> Summary : Rados REST gateway
> URL : http://ceph.com/
> License : GPL-2.0
> Description : radosgw is an S3 HTTP REST gateway for the RADOS object store.
> It is
> : implemented as a FastCGI module using libfcgi, and can be used
> in
> : conjunction with any FastCGI capable web server.
>
> Available Packages
> Name: ceph-radosgw
> Arch: x86_64
> Epoch   : 1
> Version : 0.80.5
> Release : 9.el6
> Size: 1.3 M
> Repo: epel
> Summary : Rados REST gateway
> URL : http://ceph.com/
> License : GPL-2.0
> Description : radosgw is an S3 HTTP REST gateway for the RADOS object store.
> It is
> : implemented as a FastCGI module using libfcgi, and can be used
> in
> : conjunction with any FastCGI capable web server.
>
>
>
> Is this normal???
>
> I am concerned because the installed version is 0.80.7 and the available
> update package is 0.80.5
>
> Have I missed something?
>
> Regards,
>
> George
>
>
>
>> Pushed a fix to wip-10271. Haven't tested it though, let me know if
>> you try it.
>>
>> Thanks,
>> Yehuda
>>
>> On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh  wrote:
>>>
>>> I don't think it has been fixed recently. I'm looking at it now, and
>>> not sure why it hasn't triggered before in other areas.
>>>
>>> Yehuda
>>>
>>> On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
>>>  wrote:

 This issue seems very similar to these:

 http://tracker.ceph.com/issues/8202
 http://tracker.ceph.com/issues/8702


 Would it make any difference if I try to build CEPH from sources?

 I mean is someone aware of it been fixed on any of the recent commits
 and
 probably hasn't passed yet to the repositories?

 Regards,

 George





 On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:
>
>
> I 've just created issues #10271
>
> Best,
>
> George
>
> On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:
>>
>>
>> It looks like a bug. Can you open an issue on tracker.ceph.com,
>> describing what you see?
>>
>> Thanks,
>> Yehuda
>>
>> On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
>>  wrote:
>>>
>>>
>>> It would be nice to see where and how "uploadId"
>>>
>>> is being calculated...
>>>
>>>
>>> Thanks,
>>>
>>>
>>> George
>>>
>>>
>>>
 For example if I try to perform the same multipart upload at an
 older
 version ceph version 0.72.2
 (a913ded2ff138aefb8cb84d347d72164099cfd60)


 I can see the upload ID in the apache log as:

 "PUT



 /test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
 HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"

 but when I try the same at ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3)

 I get the following:

 "PUT





 /test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
 HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"


 and my guess is that the "%2F" at the latter is the one that is
 causing the problem and hence the 403 error.



 What do you think???


 Best,

 George



> Hi all!
>
> I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
> Radosgw with ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
> error.
>
>
> I believe that the id which is send to all requests and has been
> urlencoded by the aws-sdk-js doesn't match with the one in rados
> because it's not urlencoded.
>
> Is that the case? Can you confirm it?
>
> Is there something I can do?
>
>
> Regards,
>
> George
>
> ___
> ceph-us

Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Christopher Armstrong
If someone could point me to where this fix should go in the code, I'd
actually love to dive in - I've been wanting to contribute back to Ceph,
and this bug has hit us personally so I think it's a good candidate :)

On Wed, Dec 10, 2014 at 8:25 PM, Christopher Armstrong 
wrote:

> We're running Ceph entirely in Docker containers, so we couldn't use
> ceph-deploy due to the requirement of having a process management daemon
> (upstart, in Ubuntu's case). So, I wrote things out and templated them
> myself following the documentation.
>
> Thanks for linking the bug, Christian! You saved us a lot of time and
> troubleshooting. I'll post a comment on the bug.
>
> Chris
>
> On Wed, Dec 10, 2014 at 8:18 PM, Christian Balzer  wrote:
>
>> On Wed, 10 Dec 2014 20:09:01 -0800 Christopher Armstrong wrote:
>>
>> > Christian,
>> >
>> > That indeed looks like the bug! We tried with moving the monitor
>> > host/address into global and everything works as expected - see
>> > https://github.com/deis/deis/issues/2711#issuecomment-66566318
>> >
>> > This seems like a potentially bad bug - how has it not come up before?
>>
>> Ah, but as you can see from the issue report is has come up before.
>> But that discussion as well as that report clearly fell through the
>> cracks.
>>
>> It's another reason I dislike ceph-deploy, as people using just it
>> (probably the vast majority) will be unaffected as it stuffs everything
>> into [global].
>>
>> People reading the documentation examples or coming from older versions
>> (and making changes to their config) will get bitten.
>>
>> Christian
>>
>> > Anything we can do to help with a patch?
>> >
>> > Chris
>> >
>> > On Wed, Dec 10, 2014 at 5:14 PM, Christian Balzer 
>> wrote:
>> >
>> > >
>> > > Hello,
>> > >
>> > > I think this might very well be my poor, unacknowledged bug report:
>> > > http://tracker.ceph.com/issues/10012
>> > >
>> > > People with a mon_hosts entry in [global] (as created by ceph-deploy)
>> > > will be fine, people with mons specified outside of [global] will not.
>> > >
>> > > Regards,
>> > >
>> > > Christian
>> > >
>> > > On Thu, 11 Dec 2014 00:49:03 + Joao Eduardo Luis wrote:
>> > >
>> > > > On 12/10/2014 09:05 PM, Gregory Farnum wrote:
>> > > > > What version is he running?
>> > > > >
>> > > > > Joao, does this make any sense to you?
>> > > >
>> > > >  From the MonMap code I'm pretty sure that the client should have
>> > > > built the monmap from the [mon.X] sections, and solely based on 'mon
>> > > > addr'.
>> > > >
>> > > > 'mon_initial_members' is only useful to the monitors anyway, so it
>> > > > can be disregarded.
>> > > >
>> > > > Thus, there are two ways for a client to build a monmap:
>> > > > 1) based on 'mon_hosts' on the config (or -m on cli); or
>> > > > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections
>> > > >
>> > > > I don't see a 'mon hosts = ip1,ip2,...' on the config file, and I'm
>> > > > assuming a '-m ip1,ip2...' has been supplied on the cli, so we would
>> > > > have been left with the 'mon addr' options on each individual
>> [mon.X]
>> > > > section.
>> > > >
>> > > > We are left with two options here: assume there was unexpected
>> > > > behavior on this code path -- logs or steps to reproduce would be
>> > > > appreciated in this case! -- or assume something else failed:
>> > > >
>> > > > - are the ips on the remaining mon sections correct (nodo-1 &&
>> > > > nodo-2)?
>> > > > - were all the remaining monitors up and running when the failure
>> > > > occurred?
>> > > > - were the remaining monitors reachable by the client?
>> > > >
>> > > > In case you are able to reproduce this behavior, would be nice if
>> you
>> > > > could provide logs with 'debug monc = 10' and 'debug ms = 1'.
>> > > >
>> > > > Cheers!
>> > > >
>> > > >-Joao
>> > > >
>> > > >
>> > > > > -Greg
>> > > > >
>> > > > > On Wed, Dec 10, 2014 at 11:54 AM, Christopher Armstrong
>> > > > >  wrote:
>> > > > >> Thanks Greg - I thought the same thing, but confirmed with the
>> > > > >> user that it appears the radosgw client is indeed using initial
>> > > > >> members - when he added all of his hosts to initial members,
>> > > > >> things worked just fine. In either event, all of the monitors
>> > > > >> were always fully enumerated later in the config file. Is this
>> > > > >> potentially a bug specific to radosgw? Here's his config file:
>> > > > >>
>> > > > >> [global]
>> > > > >> fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515
>> > > > >> mon initial members = nodo-3
>> > > > >> auth cluster required = cephx
>> > > > >> auth service required = cephx
>> > > > >> auth client required = cephx
>> > > > >> osd pool default size = 3
>> > > > >> osd pool default min_size = 1
>> > > > >> osd pool default pg_num = 128
>> > > > >> osd pool default pgp_num = 128
>> > > > >> osd recovery delay start = 15
>> > > > >> log file = /dev/stdout
>> > > > >> mon_clock_drift_allowed = 1
>> > > > >>
>> > > > >>
>> > > > >> [mon.nodo-1]
>> > > > >> host = nodo-1
>> > > > >> mon add

Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Christopher Armstrong
Christian,

That indeed looks like the bug! We tried with moving the monitor
host/address into global and everything works as expected - see
https://github.com/deis/deis/issues/2711#issuecomment-66566318

This seems like a potentially bad bug - how has it not come up before?
Anything we can do to help with a patch?

Chris

On Wed, Dec 10, 2014 at 5:14 PM, Christian Balzer  wrote:

>
> Hello,
>
> I think this might very well be my poor, unacknowledged bug report:
> http://tracker.ceph.com/issues/10012
>
> People with a mon_hosts entry in [global] (as created by ceph-deploy) will
> be fine, people with mons specified outside of [global] will not.
>
> Regards,
>
> Christian
>
> On Thu, 11 Dec 2014 00:49:03 + Joao Eduardo Luis wrote:
>
> > On 12/10/2014 09:05 PM, Gregory Farnum wrote:
> > > What version is he running?
> > >
> > > Joao, does this make any sense to you?
> >
> >  From the MonMap code I'm pretty sure that the client should have built
> > the monmap from the [mon.X] sections, and solely based on 'mon addr'.
> >
> > 'mon_initial_members' is only useful to the monitors anyway, so it can
> > be disregarded.
> >
> > Thus, there are two ways for a client to build a monmap:
> > 1) based on 'mon_hosts' on the config (or -m on cli); or
> > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections
> >
> > I don't see a 'mon hosts = ip1,ip2,...' on the config file, and I'm
> > assuming a '-m ip1,ip2...' has been supplied on the cli, so we would
> > have been left with the 'mon addr' options on each individual [mon.X]
> > section.
> >
> > We are left with two options here: assume there was unexpected behavior
> > on this code path -- logs or steps to reproduce would be appreciated in
> > this case! -- or assume something else failed:
> >
> > - are the ips on the remaining mon sections correct (nodo-1 && nodo-2)?
> > - were all the remaining monitors up and running when the failure
> > occurred?
> > - were the remaining monitors reachable by the client?
> >
> > In case you are able to reproduce this behavior, would be nice if you
> > could provide logs with 'debug monc = 10' and 'debug ms = 1'.
> >
> > Cheers!
> >
> >-Joao
> >
> >
> > > -Greg
> > >
> > > On Wed, Dec 10, 2014 at 11:54 AM, Christopher Armstrong
> > >  wrote:
> > >> Thanks Greg - I thought the same thing, but confirmed with the user
> > >> that it appears the radosgw client is indeed using initial members -
> > >> when he added all of his hosts to initial members, things worked just
> > >> fine. In either event, all of the monitors were always fully
> > >> enumerated later in the config file. Is this potentially a bug
> > >> specific to radosgw? Here's his config file:
> > >>
> > >> [global]
> > >> fsid = fc0e2e09-ade3-4ff6-b23e-f789775b2515
> > >> mon initial members = nodo-3
> > >> auth cluster required = cephx
> > >> auth service required = cephx
> > >> auth client required = cephx
> > >> osd pool default size = 3
> > >> osd pool default min_size = 1
> > >> osd pool default pg_num = 128
> > >> osd pool default pgp_num = 128
> > >> osd recovery delay start = 15
> > >> log file = /dev/stdout
> > >> mon_clock_drift_allowed = 1
> > >>
> > >>
> > >> [mon.nodo-1]
> > >> host = nodo-1
> > >> mon addr = 192.168.2.200:6789
> > >>
> > >> [mon.nodo-2]
> > >> host = nodo-2
> > >> mon addr = 192.168.2.201:6789
> > >>
> > >> [mon.nodo-3]
> > >> host = nodo-3
> > >> mon addr = 192.168.2.202:6789
> > >>
> > >>
> > >>
> > >> [client.radosgw.gateway]
> > >> host = deis-store-gateway
> > >> keyring = /etc/ceph/ceph.client.radosgw.keyring
> > >> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> > >> log file = /dev/stdout
> > >>
> > >>
> > >> On Wed, Dec 10, 2014 at 11:40 AM, Gregory Farnum 
> > >> wrote:
> > >>>
> > >>> On Tue, Dec 9, 2014 at 3:11 PM, Christopher Armstrong
> > >>>  wrote:
> >  Hi folks,
> > 
> >  I think we have a bit of confusion around how initial members is
> >  used. I understand that we can specify a single monitor (or a
> >  subset of monitors) so
> >  that the cluster can form a quorum when it first comes up. This is
> >  how we're
> >  using the setting now - so the cluster can come up with just one
> >  monitor,
> >  with the other monitors to follow later.
> > 
> >  However, a Deis user reported that when the monitor in his initial
> >  members
> >  list went down, radosgw stopped functioning, even though there are
> >  three mons in his config file. I would think that the radosgw
> >  client would connect
> >  to any of the nodes in the config file to get the state of the
> >  cluster, and
> >  that the initial members list is only used when the monitors first
> >  come up
> >  and are trying to achieve quorum.
> > 
> >  The issue he filed is here:
> https://github.com/deis/deis/issues/2711
> > 
> >  He also found this Ceph issue filed:
> >  https://github.com/ceph/ceph/pull/1233
> > >>>
> > >>> Nope, t

Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent commits 
and probably hasn't passed yet to the repositories?


Regards,

George




On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:

I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:

It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
 wrote:

It would be nice to see where and how "uploadId"

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an 
older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

"PUT

/test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

"PUT


/test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"


and my guess is that the "%2F" at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
error.


I believe that the id which is send to all requests and has been
urlencoded by the aws-sdk-js doesn't match with the one in rados
because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error occurs while using ceph-deploy

2014-12-11 Thread mail list
Hi all, 

I follow the http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ to 
deploy ceph,
But when install the monitor node, i got error as below:

{code}
[louis@adminnode my-cluster]$ ceph-deploy new node1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/louis/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.21): /usr/bin/ceph-deploy new node1
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
[node1][DEBUG ] connected to host: adminnode
[node1][INFO  ] Running command: ssh -CT -o BatchMode=yes node1
[node1][DEBUG ] connection detected need for sudo
[node1][DEBUG ] connected to host: node1
[node1][DEBUG ] detect platform information from remote host
[node1][DEBUG ] detect machine type
[node1][DEBUG ] find the location of an executable
[node1][INFO  ] Running command: sudo /sbin/ip link show
[node1][INFO  ] Running command: sudo /sbin/ip addr show
[node1][DEBUG ] IP addresses found: ['10.211.55.12']
[ceph_deploy.new][DEBUG ] Resolving host node1
[ceph_deploy.new][DEBUG ] Monitor node1 at 10.211.55.12
[ceph_deploy.new][DEBUG ] Monitor initial members are ['node1']
[ceph_deploy.new][DEBUG ] Monitor addrs are ['10.211.55.12']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
Error in sys.exitfunc:
{code}

The command line do not print what is the ERROR.Then I check the ceph.log and 
found no ERR, 
any body encounter this error?
My OS is centos6.5 x86_64 (final).

I am newbie to ceph, any idea will be appreciated!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Yehuda Sadeh
I don't think it has been fixed recently. I'm looking at it now, and
not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
 wrote:
> This issue seems very similar to these:
>
> http://tracker.ceph.com/issues/8202
> http://tracker.ceph.com/issues/8702
>
>
> Would it make any difference if I try to build CEPH from sources?
>
> I mean is someone aware of it been fixed on any of the recent commits and
> probably hasn't passed yet to the repositories?
>
> Regards,
>
> George
>
>
>
>
>
> On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:
>>
>> I 've just created issues #10271
>>
>> Best,
>>
>> George
>>
>> On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:
>>>
>>> It looks like a bug. Can you open an issue on tracker.ceph.com,
>>> describing what you see?
>>>
>>> Thanks,
>>> Yehuda
>>>
>>> On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
>>>  wrote:

 It would be nice to see where and how "uploadId"

 is being calculated...


 Thanks,


 George



> For example if I try to perform the same multipart upload at an older
> version ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>
>
> I can see the upload ID in the apache log as:
>
> "PUT
>
> /test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
> HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"
>
> but when I try the same at ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3)
>
> I get the following:
>
> "PUT
>
>
>
> /test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
> HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"
>
>
> and my guess is that the "%2F" at the latter is the one that is
> causing the problem and hence the 403 error.
>
>
>
> What do you think???
>
>
> Best,
>
> George
>
>
>
>> Hi all!
>>
>> I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
>> Radosgw with ceph version 0.80.7
>> (6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
>> error.
>>
>>
>> I believe that the id which is send to all requests and has been
>> urlencoded by the aws-sdk-js doesn't match with the one in rados
>> because it's not urlencoded.
>>
>> Is that the case? Can you confirm it?
>>
>> Is there something I can do?
>>
>>
>> Regards,
>>
>> George
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Yehuda Sadeh
Pushed a fix to wip-10271. Haven't tested it though, let me know if you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh  wrote:
> I don't think it has been fixed recently. I'm looking at it now, and
> not sure why it hasn't triggered before in other areas.
>
> Yehuda
>
> On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
>  wrote:
>> This issue seems very similar to these:
>>
>> http://tracker.ceph.com/issues/8202
>> http://tracker.ceph.com/issues/8702
>>
>>
>> Would it make any difference if I try to build CEPH from sources?
>>
>> I mean is someone aware of it been fixed on any of the recent commits and
>> probably hasn't passed yet to the repositories?
>>
>> Regards,
>>
>> George
>>
>>
>>
>>
>>
>> On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:
>>>
>>> I 've just created issues #10271
>>>
>>> Best,
>>>
>>> George
>>>
>>> On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:

 It looks like a bug. Can you open an issue on tracker.ceph.com,
 describing what you see?

 Thanks,
 Yehuda

 On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
  wrote:
>
> It would be nice to see where and how "uploadId"
>
> is being calculated...
>
>
> Thanks,
>
>
> George
>
>
>
>> For example if I try to perform the same multipart upload at an older
>> version ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>>
>>
>> I can see the upload ID in the apache log as:
>>
>> "PUT
>>
>> /test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
>> HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"
>>
>> but when I try the same at ceph version 0.80.7
>> (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>> I get the following:
>>
>> "PUT
>>
>>
>>
>> /test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
>> HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"
>>
>>
>> and my guess is that the "%2F" at the latter is the one that is
>> causing the problem and hence the 403 error.
>>
>>
>>
>> What do you think???
>>
>>
>> Best,
>>
>> George
>>
>>
>>
>>> Hi all!
>>>
>>> I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
>>> Radosgw with ceph version 0.80.7
>>> (6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
>>> error.
>>>
>>>
>>> I believe that the id which is send to all requests and has been
>>> urlencoded by the aws-sdk-js doesn't match with the one in rados
>>> because it's not urlencoded.
>>>
>>> Is that the case? Can you confirm it?
>>>
>>> Is there something I can do?
>>>
>>>
>>> Regards,
>>>
>>> George
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] For all LSI SAS9201-16i user - don't upgrate to firmware P20

2014-12-11 Thread Udo Lembke
Hi all,
I have upgrade two LSI SAS9201-16i HBAs to the latest Firmware P20.00.00
and after that I got following syslog messages:

Dec  9 18:11:31 ceph-03 kernel: [  484.602834] mpt2sas0: log_info(0x3108): 
originator(PL), code(0x08), sub_code(0x)
Dec  9 18:12:15 ceph-03 kernel: [  528.310174] mpt2sas0: log_info(0x3108): 
originator(PL), code(0x08), sub_code(0x)
Dec  9 18:15:25 ceph-03 kernel: [  718.782477] mpt2sas0: log_info(0x3108): 
originator(PL), code(0x08), sub_code(0x)

Next night one OSD went down (read only mounted, and I must repair the 
filesystem with fsck) and then two other OSDs follows.

Then I change the card and after some tries I'm able to downgrade* the cards to 
P17 which run stable.


Udo


* downgrade on the fourth computer with dos booted and "sas2flsh -o -e 6"...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

Hi again!

I have installed and enabled the development branch repositories as 
described here:


http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the following:

Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS object 
store. It is
: implemented as a FastCGI module using libfcgi, and can be 
used in

: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch   : 1
Version : 0.80.5
Release : 9.el6
Size: 1.3 M
Repo: epel
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS object 
store. It is
: implemented as a FastCGI module using libfcgi, and can be 
used in

: conjunction with any FastCGI capable web server.



Is this normal???

I am concerned because the installed version is 0.80.7 and the 
available update package is 0.80.5


Have I missed something?

Regards,

George



Pushed a fix to wip-10271. Haven't tested it though, let me know if
you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh  
wrote:

I don't think it has been fixed recently. I'm looking at it now, and
not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
 wrote:

This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent 
commits and

probably hasn't passed yet to the repositories?

Regards,

George





On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:


I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:


It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
 wrote:


It would be nice to see where and how "uploadId"

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an 
older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

"PUT


/test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

"PUT




/test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"


and my guess is that the "%2F" at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload 
into

Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 
403

error.


I believe that the id which is send to all requests and has 
been
urlencoded by the aws-sdk-js doesn't match with the one in 
rados

because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error while deploy ceph

2014-12-11 Thread mail list
Hi all,

i am using ceph-deploy as following on cents 6.5 x86_64:

ceph-deploy -v  install --release=giant adminnode

as you see, I specified the release version as giant , but got the following 
error:

[adminnode][WARNIN] curl: (22) The requested URL returned error: 404 Not Found
[adminnode][DEBUG ] Retrieving 
http://ceph.com/rpm-giant/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[adminnode][WARNIN] error: skipping 
http://ceph.com/rpm-giant/el6/noarch/ceph-release-1-0.el6.noarch.rpm - transfer 
failed
[adminnode][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm -Uvh 
--replacepkgs 
http://ceph.com/rpm-giant/el6/noarch/ceph-release-1-0.el6.noarch.rpm

So i check  http://ceph.com/rpm-giant/el6 and found the 
ceph-release-1-0.el6.noarch.rpm locate at http://ceph.com/rpm-giant/el6/ 
instead of http://ceph.com/rpm-giant/el6/noarch/, which is different with the 
firefly version, So may be you put the wrong location!!

Then I tried without specified the release version which default specify the 
firefly version, and it works!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again: full ssd ceph cluster

2014-12-11 Thread Wido den Hollander
On 12/10/2014 04:08 PM, Mike wrote:
> Hello all!
> Some our customer asked for only ssd storage.
> By now we looking to 2027R-AR24NV w/ 3 x HBA controllers (LSI3008 chip,
> 8 internal 12Gb ports on each), 24 x Intel DC S3700 800Gb SSD drives, 2
> x mellanox 40Gbit ConnectX-3 (maybe newer ConnectX-4 100Gbit) and Xeon
> e5-2660V2 with 64Gb RAM.
> Replica is 2.
> Or something like that but in 1U w/ 8 SSD's.

I would recommend 1U with 8 SSDs. Such huge machines with so many SSDs
will require some serious bandwidth and CPU power.

It's better to go for more, but smaller, machines. Your cluster will
suffer less from loosing a machine.

> 
> We see a little bottle neck on network cards, but the biggest question
> can ceph (giant release) with sharding io and new cool stuff release
> this potential?
> 
> Any ideas?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unable to repair PG

2014-12-11 Thread Tomasz Kuzemko
Be very careful with running "ceph pg repair". Have a look at this
thread:

http://thread.gmane.org/gmane.comp.file-systems.ceph.user/15185

-- 
Tomasz Kuzemko
tomasz.kuze...@ovh.net

On Thu, Dec 11, 2014 at 10:57:22AM +, Luis Periquito wrote:
> Hi,
> 
> I've stopped OSD.16, removed the PG from the local filesystem and started
> the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
> deep-scrub and the PG is still inconsistent.
> 
> I'm running out of ideas on trying to solve this. Does this mean that all
> copies of the object should also be inconsistent? Should I just try to
> figure which object/bucket this belongs to and delete it/copy it again to
> the ceph cluster?
> 
> Also, do you know what the error message means? is it just some sort of
> metadata for this object that isn't correct, not the object itself?
> 
> On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito 
> wrote:
> 
> > Hi,
> >
> > In the last few days this PG (pool is .rgw.buckets) has been in error
> > after running the scrub process.
> >
> > After getting the error, and trying to see what may be the issue (and
> > finding none), I've just issued a ceph repair followed by a ceph
> > deep-scrub. However it doesn't seem to have fixed the issue and it still
> > remains.
> >
> > The relevant log from the OSD is as follows.
> >
> > 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
> > missing, 1 inconsistent objects
> > 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
> > errors
> > 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180 repair ok, 0
> > fixed
> > 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard 6: soid
> > 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr _user.rgw.acl,
> > missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing
> > attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr
> > _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat,
> > missing attr snapset
> > 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
> > missing, 1 inconsistent objects
> > 2014-12-10 10:56:50.597957 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
> > errors
> >
> > I'm running version firefly 0.80.7.
> >

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Joao Eduardo Luis

On 12/11/2014 04:28 AM, Christopher Armstrong wrote:

If someone could point me to where this fix should go in the code, I'd
actually love to dive in - I've been wanting to contribute back to Ceph,
and this bug has hit us personally so I think it's a good candidate :)


I'm not sure where the bug is or what it may be (see reply to 
Christian's email sent a few minutes ago).


I believe the first step to assess what's happening is to reliably 
reproduce this.  Ideally in a different environment, or such that it 
makes it clear it's not an issue specific to your deployment.


Next, say it's indeed the config file that is being misread: you'd 
probably want to look into common/config.{cc,h}.  If it happens to be a 
bug while building a monmap, you'd want to look into mon/MonMap.cc and 
mon/MonClient.cc.  Being a radosgw issue, you'll probably want to look 
into 'rgw/*' and/or 'librados/*', but maybe someone else could give you 
the pointers for those.


I think the main task now is to reliably reproduce this.  I haven't been 
able to from the config you provided, but I may have made some 
assumptions that end up negating the whole bug.


Cheers!

  -Joao



On Wed, Dec 10, 2014 at 8:25 PM, Christopher Armstrong
mailto:ch...@opdemand.com>> wrote:

We're running Ceph entirely in Docker containers, so we couldn't use
ceph-deploy due to the requirement of having a process management
daemon (upstart, in Ubuntu's case). So, I wrote things out and
templated them myself following the documentation.

Thanks for linking the bug, Christian! You saved us a lot of time
and troubleshooting. I'll post a comment on the bug.

Chris

On Wed, Dec 10, 2014 at 8:18 PM, Christian Balzer mailto:ch...@gol.com>> wrote:

On Wed, 10 Dec 2014 20:09:01 -0800 Christopher Armstrong wrote:

> Christian,
>
> That indeed looks like the bug! We tried with moving the monitor
> host/address into global and everything works as expected - see
>https://github.com/deis/deis/issues/2711#issuecomment-66566318
>
> This seems like a potentially bad bug - how has it not come up before?

Ah, but as you can see from the issue report is has come up before.
But that discussion as well as that report clearly fell through
the cracks.

It's another reason I dislike ceph-deploy, as people using just it
(probably the vast majority) will be unaffected as it stuffs
everything
into [global].

People reading the documentation examples or coming from older
versions
(and making changes to their config) will get bitten.

Christian

 > Anything we can do to help with a patch?
 >
 > Chris
 >
 > On Wed, Dec 10, 2014 at 5:14 PM, Christian Balzer
mailto:ch...@gol.com>> wrote:
 >
 > >
 > > Hello,
 > >
 > > I think this might very well be my poor, unacknowledged bug
report:
 > > http://tracker.ceph.com/issues/10012
 > >
 > > People with a mon_hosts entry in [global] (as created by
ceph-deploy)
 > > will be fine, people with mons specified outside of
[global] will not.
 > >
 > > Regards,
 > >
 > > Christian
 > >
 > > On Thu, 11 Dec 2014 00:49:03 + Joao Eduardo Luis wrote:
 > >
 > > > On 12/10/2014 09:05 PM, Gregory Farnum wrote:
 > > > > What version is he running?
 > > > >
 > > > > Joao, does this make any sense to you?
 > > >
 > > >  From the MonMap code I'm pretty sure that the client
should have
 > > > built the monmap from the [mon.X] sections, and solely
based on 'mon
 > > > addr'.
 > > >
 > > > 'mon_initial_members' is only useful to the monitors
anyway, so it
 > > > can be disregarded.
 > > >
 > > > Thus, there are two ways for a client to build a monmap:
 > > > 1) based on 'mon_hosts' on the config (or -m on cli); or
 > > > 2) based on 'mon addr = ip1,ip2...' from the [mon.X] sections
 > > >
 > > > I don't see a 'mon hosts = ip1,ip2,...' on the config
file, and I'm
 > > > assuming a '-m ip1,ip2...' has been supplied on the cli,
so we would
 > > > have been left with the 'mon addr' options on each
individual [mon.X]
 > > > section.
 > > >
 > > > We are left with two options here: assume there was
unexpected
 > > > behavior on this code path -- logs or steps to reproduce
would be
 > > > appreciated in this case! -- or assume something else failed:
 > > >
 > > > - are the ips on the remaining mon sections correct
(nodo-1 &&
 > > > nodo-2)?
 > > > - were all the remaining monitors up and running when 

Re: [ceph-users] "store is getting too big" on monitors after Firefly to Giant upgrade

2014-12-11 Thread Joao Eduardo Luis

On 12/10/2014 07:30 PM, Kevin Sumner wrote:

The mons have grown another 30GB each overnight (except for 003?), which
is quite worrying.  I ran a little bit of testing yesterday after my
post, but not a significant amount.

I wouldn’t expect compact on start to help this situation based on the
name since we don’t (shouldn’t?) restart the mons regularly, but there
appears to be no documentation on it.  We’re pretty good on disk space
on the mons currently, but if that changes, I’ll probably use this to
see about bringing these numbers in line.


This is an issue that has been seen on larger clusters, and it usually 
takes a monitor restart, with 'mon compact on start = true' or manual 
compaction 'ceph tell mon.FOO compact' to bring the monitor back to a 
sane disk usage level.


However, I have not been able to reproduce this in order to track the 
source.  I'm guessing I lack the scale of the cluster, or the 
appropriate workload (maybe both).


What kind of workload are you running the cluster through?  You mention 
cephfs, but do you have any more info you can share that could help us 
reproducing this state?


Sage also fixed an issue that could potentially cause this (depending on 
what is causing it in the first place) [1,2,3].  This bug, #9987, is due 
to a given cached value not being updated, leading to the monitor not 
removing unnecessary data, potentially causing this growth.  This cached 
value would be set to its proper value when the monitor is restarted 
though, so a simple restart would have all this unnecessary data blown away.


Restarting the monitor ends up masking the true cause of the store 
growth: whether from #9987 or from obsolete data kept by the monitor's 
backing store (leveldb), either due to misuse of leveldb or due to 
leveldb's nature (haven't been able to ascertain which may be at fault, 
partly due to being unable to reproduce the problem).


If you are up to it, I would suggest the following approach in hope to 
determine what may be at fault:


1) 'ceph tell mon.FOO compact' -- which will force the monitor to 
compact its store.  This won't close leveldb, so it won't have much 
effect on the store size if it happens to be leveldb holding on to some 
data (I could go into further detail, but I don't think this is the 
right medium).
1.a) you may notice the store increasing in size during this period; 
it's expected.
1.b) compaction may take a while, but in the end you'll hopefully see a 
significant reduction in size.


2) Assuming that failed, I would suggest doing the following:

2.1) grab ceph-kvstore-tool from the ceph-test package
2.2) stop the monitor
2.3) run 'ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > 
store.dump'

2.4) run (for above store's location, let's call it $STORE:

for m in osdmap pgmap; do
  for k in first_committed last_committed; do
ceph-kvstore-tool $STORE get $m $k >> store.dump
  done
done

ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump
ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump

2.5) send over the results of the dump
2.6) if you were to compress the store as well and send me a link to 
grab it I would appreciate it.


3) Next you could simply restart the monitor (without 'mon compact on 
start = true'); if the monitor's store size decreases, then there's a 
fair chance that you've been bit by #9987.  Otherwise, it may be 
leveldb's clutter.  You should also note that leveldb may itself compact 
automatically on start, so it's hard to say for sure what fixed what.


4) If store size hasn't gone back to sane levels by now, you may wish to 
restart with 'mon compact on start = true' and see if it helps.  If it 
doesn't, then we may have a completely different issue in our hands.


Now, assuming your store size went down on step 3, and if you are 
willing, it would be interesting to see if Sage's patches helps out in 
any way.  The patches have not been backported to the giant branch yet, 
so you would have to apply them yourself.  For them to work you would 
have to run the patched monitor as the leader.  I would suggest leaving 
the other monitors running an unpatched version so they could act as the 
control group.


Let us know if any of this helps.

Cheers!

  -Joao

[1] - http://tracker.ceph.com/issues/9987
[2] - 093c5f0cabeb552b90d944da2c50de48fcf6f564
[3] - 3fb731b722c50672a5a9de0c86a621f5f50f2d06



:: ~ » ceph health detail | grep 'too big'
HEALTH_WARN mon.cluster4-monitor001 store is getting too big! 77365 MB
 >= 15360 MB; mon.cluster4-monitor002 store is getting too big! 87868 MB
 >= 15360 MB; mon.cluster4-monitor003 store is getting too big! 30359 MB
 >= 15360 MB; mon.cluster4-monitor004 store is getting too big! 93414 MB
 >= 15360 MB; mon.cluster4-monitor005 store is getting too big! 88232 MB
 >= 15360 MB
mon.cluster4-monitor001 store is getting too big! 77365 MB >= 15360 MB
-- 72% avail
mon.cluster4-monitor002 store is getting too big! 87868 MB >= 15360 MB
-- 70% avail

[ceph-users] VM restore on Ceph *very* slow

2014-12-11 Thread Lindsay Mathieson

Anyone know why a VM live restore would be excessively slow on Ceph? restoring 
a  small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's can be 
over half an hour.

The same VM's on the same disks, but native, or glusterfs take less than 30 
seconds.

VM's are KVM on Proxmox.


thanks,
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is mon initial members used after the first quorum?

2014-12-11 Thread Gregory Farnum
On Thu, Dec 11, 2014 at 2:21 AM, Joao Eduardo Luis  wrote:
> On 12/11/2014 04:28 AM, Christopher Armstrong wrote:
>>
>> If someone could point me to where this fix should go in the code, I'd
>> actually love to dive in - I've been wanting to contribute back to Ceph,
>> and this bug has hit us personally so I think it's a good candidate :)
>
>
> I'm not sure where the bug is or what it may be (see reply to Christian's
> email sent a few minutes ago).
>
> I believe the first step to assess what's happening is to reliably reproduce
> this.  Ideally in a different environment, or such that it makes it clear
> it's not an issue specific to your deployment.
>
> Next, say it's indeed the config file that is being misread: you'd probably
> want to look into common/config.{cc,h}.  If it happens to be a bug while
> building a monmap, you'd want to look into mon/MonMap.cc and
> mon/MonClient.cc.  Being a radosgw issue, you'll probably want to look into
> 'rgw/*' and/or 'librados/*', but maybe someone else could give you the
> pointers for those.
>
> I think the main task now is to reliably reproduce this.  I haven't been
> able to from the config you provided, but I may have made some assumptions
> that end up negating the whole bug.

It might require some kind of library bug, maybe...? What OS did you
do this on, and what are people running when they see this not work?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error occurs while using ceph-deploy

2014-12-11 Thread mail list
Hi all,

Any one can help ?

On Dec 11, 2014, at 20:34, mail list  wrote:

> Hi all, 
> 
> I follow the http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ to 
> deploy ceph,
> But when install the monitor node, i got error as below:
> 
> {code}
> [louis@adminnode my-cluster]$ ceph-deploy new node1
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/louis/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.21): /usr/bin/ceph-deploy new node1
> [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
> [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
> [node1][DEBUG ] connected to host: adminnode
> [node1][INFO  ] Running command: ssh -CT -o BatchMode=yes node1
> [node1][DEBUG ] connection detected need for sudo
> [node1][DEBUG ] connected to host: node1
> [node1][DEBUG ] detect platform information from remote host
> [node1][DEBUG ] detect machine type
> [node1][DEBUG ] find the location of an executable
> [node1][INFO  ] Running command: sudo /sbin/ip link show
> [node1][INFO  ] Running command: sudo /sbin/ip addr show
> [node1][DEBUG ] IP addresses found: ['10.211.55.12']
> [ceph_deploy.new][DEBUG ] Resolving host node1
> [ceph_deploy.new][DEBUG ] Monitor node1 at 10.211.55.12
> [ceph_deploy.new][DEBUG ] Monitor initial members are ['node1']
> [ceph_deploy.new][DEBUG ] Monitor addrs are ['10.211.55.12']
> [ceph_deploy.new][DEBUG ] Creating a random mon key...
> [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
> [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
> Error in sys.exitfunc:
> {code}
> 
> The command line do not print what is the ERROR.Then I check the ceph.log and 
> found no ERR, 
> any body encounter this error?
> My OS is centos6.5 x86_64 (final).
> 
> I am newbie to ceph, any idea will be appreciated!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development branch. 
I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
 wrote:

Hi again!

I have installed and enabled the development branch repositories as
described here:


http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the 
following:


Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and can 
be used

in
: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch   : 1
Version : 0.80.5
Release : 9.el6
Size: 1.3 M
Repo: epel
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and can 
be used

in
: conjunction with any FastCGI capable web server.



Is this normal???

I am concerned because the installed version is 0.80.7 and the 
available

update package is 0.80.5

Have I missed something?

Regards,

George




Pushed a fix to wip-10271. Haven't tested it though, let me know if
you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh  
wrote:


I don't think it has been fixed recently. I'm looking at it now, 
and

not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
 wrote:


This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent 
commits

and
probably hasn't passed yet to the repositories?

Regards,

George





On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:



I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:



It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
 wrote:



It would be nice to see where and how "uploadId"

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at 
an

older
version ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60)


I can see the upload ID in the apache log as:

"PUT




/test/.dat?partNumber=25&uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

"PUT






/test/.dat?partNumber=12&uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.0.29 linux/v0.10.33"


and my guess is that the "%2F" at the latter is the one that 
is

causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload 
into

Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting 
a 403

error.


I believe that the id which is send to all requests and has 
been
urlencoded by the aws-sdk-js doesn't match with the one in 
rados

because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] system metrics monitoring

2014-12-11 Thread pragya jain
hello sir!
I need some open source monitoring tool for examining these metrics.
Please suggest some open source monitoring software.
Thanks Regards Pragya Jain 

 On Thursday, 11 December 2014 9:16 PM, Denish Patel  
wrote:
   
 

 Try http://www.circonus.com
On Thu, Dec 11, 2014 at 1:22 AM, pragya jain  wrote:

please somebody reply my query.
RegardsPragya Jain 

 On Tuesday, 9 December 2014 11:53 AM, pragya jain  
wrote:
   
 

 hello all!
As mentioned at statistics and monitoring page of Riak 
Systems Metrics To Graph

| Metric |
| Available Disk Space |
| IOWait |
| Read Operations |
| Write Operations |
| Network Throughput |
| Load Average |

Can somebody suggest me some monitoring tools that monitor these metrics?
Regards Pragya Jain

 

___
riak-users mailing list
riak-us...@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





-- 
Denish Patel,
OmniTI Computer Consulting Inc.
Database Architect,
http://omniti.com/does/data-management
http://www.pateldenish.com


 
   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] system metrics monitoring

2014-12-11 Thread Irek Fasikhov
Hi.

We use Zabbix.

2014-12-12 8:33 GMT+03:00 pragya jain :

> hello sir!
>
> I need some open source monitoring tool for examining these metrics.
>
> Please suggest some open source monitoring software.
>
> Thanks
> Regards
> Pragya Jain
>
>
>   On Thursday, 11 December 2014 9:16 PM, Denish Patel 
> wrote:
>
>
>
> Try http://www.circonus.com
>
> On Thu, Dec 11, 2014 at 1:22 AM, pragya jain 
> wrote:
>
> please somebody reply my query.
>
> Regards
> Pragya Jain
>
>
>   On Tuesday, 9 December 2014 11:53 AM, pragya jain 
> wrote:
>
>
>
> hello all!
>
> As mentioned at statistics and monitoring page of Riak
> Systems Metrics To Graph
> 
> MetricAvailable Disk SpaceIOWaitRead OperationsWrite OperationsNetwork
> ThroughputLoad Average
> Can somebody suggest me some monitoring tools that monitor these metrics?
>
> Regards
> Pragya Jain
>
>
>
> ___
> riak-users mailing list
> riak-us...@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> --
> Denish Patel,
> OmniTI Computer Consulting Inc.
> Database Architect,
> http://omniti.com/does/data-management
> http://www.pateldenish.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM restore on Ceph *very* slow

2014-12-11 Thread Irek Fasikhov
Hi.

For faster operation, use rbd export/export-diff and import/import-diff

2014-12-11 17:17 GMT+03:00 Lindsay Mathieson :

>
> Anyone know why a VM live restore would be excessively slow on Ceph?
> restoring
> a  small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's can
> be
> over half an hour.
>
> The same VM's on the same disks, but native, or glusterfs take less than 30
> seconds.
>
> VM's are KVM on Proxmox.
>
>
> thanks,
> --
> Lindsay
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM restore on Ceph *very* slow

2014-12-11 Thread Irek Fasikhov
Examples
Backups:
/usr/bin/nice -n +20 /usr/bin/rbd -n client.backup export
test/vm-105-disk-1@rbd_data.505392ae8944a - | /usr/bin/pv -s 40G -n -i 1 |
/usr/bin/nice -n +20 /usr/bin/pbzip2 -c > /backup/vm-105-disk-1
Restore:
pbzip2 -dk /nfs/RBD/big-vm-268-disk-1-LyncV2-20140830-011308.pbzip2 -c |
rbd -n client.rbdbackup -k /etc/ceph/big.keyring -c /etc/ceph/big.conf
import --image-format 2 - rbd/Lyncolddisk1

2014-12-12 8:38 GMT+03:00 Irek Fasikhov :

> Hi.
>
> For faster operation, use rbd export/export-diff and import/import-diff
>
> 2014-12-11 17:17 GMT+03:00 Lindsay Mathieson 
> :
>
>>
>> Anyone know why a VM live restore would be excessively slow on Ceph?
>> restoring
>> a  small VM with 12GB disk/2GB Ram is taking 18 *minutes*. Larger VM's
>> can be
>> over half an hour.
>>
>> The same VM's on the same disks, but native, or glusterfs take less than
>> 30
>> seconds.
>>
>> VM's are KVM on Proxmox.
>>
>>
>> thanks,
>> --
>> Lindsay
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] system metrics monitoring

2014-12-11 Thread pragya jain
hello sir!
According to TomiTakussaari/riak_zabbixCurrently supported Zabbix 
keys:riak.ring_num_partitions
riak.memory_total
riak.memory_processes_used
riak.pbc_active
riak.pbc_connects
riak.node_gets
riak.node_puts
riak.node_get_fsm_time_median
riak.node_put_fsm_time_medianAll these metrics are monitored by collectd, 
OpenTSDB and Ganglia also.I need some monitoring tool that monitor metrics,like,
| Available Disk Space |
| IOWait |
| Read Operations |
| Write Operations |
| Network Throughput |
| Load Average |

Does Zabbix provide monitoring of these metrics?
Thanks RegardsPragya jain 

 On Friday, 12 December 2014 11:05 AM, Irek Fasikhov  
wrote:
   
 

 Hi.
We use Zabbix.

2014-12-12 8:33 GMT+03:00 pragya jain :

hello sir!
I need some open source monitoring tool for examining these metrics.
Please suggest some open source monitoring software.
Thanks Regards Pragya Jain 

 On Thursday, 11 December 2014 9:16 PM, Denish Patel  
wrote:
   
 

 Try http://www.circonus.com
On Thu, Dec 11, 2014 at 1:22 AM, pragya jain  wrote:

please somebody reply my query.
RegardsPragya Jain 

 On Tuesday, 9 December 2014 11:53 AM, pragya jain  
wrote:
   
 

 hello all!
As mentioned at statistics and monitoring page of Riak 
Systems Metrics To Graph

| Metric |
| Available Disk Space |
| IOWait |
| Read Operations |
| Write Operations |
| Network Throughput |
| Load Average |

Can somebody suggest me some monitoring tools that monitor these metrics?
Regards Pragya Jain

 

___
riak-users mailing list
riak-us...@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





-- 
Denish Patel,
OmniTI Computer Consulting Inc.
Database Architect,
http://omniti.com/does/data-management
http://www.pateldenish.com


 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
С уважением, Фасихов Ирек НургаязовичМоб.: +79229045757

 
   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com