Re: Many dns domain names in radosgw

2012-11-19 Thread Sławomir Skowron
Yes. I am looking for using domain x.com, and y.com with virtual host
buckets like b.x.com, c.y.com

But if it's not possible i can handle this with cname *.x.com and use
only b and c on x.com domain.

Thanks for response.

19 lis 2012 19:02, "Yehuda Sadeh"  napisał(a):
>
> On Sat, Nov 17, 2012 at 1:50 PM, Sławomir Skowron  wrote:
> > Welcome,
> >
> > I have a question. Is there, any way to support multiple domains names
> > in one radosgw on virtual host type connection in S3 ??
> >
> Are you aiming at having multiple virtual domain names pointing at the
> same bucket?
>
> Currently a gateway can only be set up with a single domain, so the
> virtual bucket scheme will only translate subdomains of that domain as
> buckets. Starting at 0.55 there will be a way to point alternative
> domains to a specific bucket (by modifying their dns CNAME record),
> however, It doesn't sound like it's what you're looking for.
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Geo-replication with RBD

2013-01-31 Thread Sławomir Skowron
We are using nginx, on top of rgw. In nginx we manage to create logic,
for using a AMQP, and async operations via queues. Then workers, on
every side getiing data from own queue, and then coping data from
source, to destination in s3 API. Works for PUT/DELETE, and work
automatic when production goes on another location.

On Thu, Jan 31, 2013 at 9:25 AM, Gandalf Corvotempesta
 wrote:
> 2013/1/31 Skowron Sławomir :
>> We have managed async geo-replication of s3 service, beetwen two ceph 
>> clusters in two DC's, and to amazon s3 as third. All this via s3 API. I love 
>> to see native RGW geo-replication with described features in another thread.
>
> how did you do this?
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Geo-replication with RBD

2013-02-18 Thread Sławomir Skowron
Hi, now i can response, after i was sick.

Nginx is compiled with perl/or lua support. Inside nginx configuration
is hook, for a perl code, or lua code, as you prefer. This code have a
inline functionality. We have testing this from logs, but it's not a
good idea.Now in line option, have advantage, because, we can reject
PUT's if AMQP is not working, and we don't need to resync, all
requests. If it took long, than, wee can't disable queue, and go
direct, without AMQP, and resync offline, from logs, by a simple admin
tool.

This in line functionality, working only on DELETE, PUT, and rest are
skipped, Every DELETE, PUT, have a own queue, with own priorities, and
custom info in header, for calculating, a time, of synchronization.
This nginx functionality, only putting data into queues, and every
data, are going into our, S3 (ceph), and Amazon s3, via nginx, with
almost same configuration, distributed by puppet.

On every DataCenter, we have a bunch of workers, getting data from
queues, dedicated for a location, and then, they are getting data
syncing, from source to destination. If data can't be get from source,
than info is going into error queue, and this queue is re-checked, for
some time.

I am in middle of writing some article, about this, but my sickness,
have slow down this process slightly.


On Thu, Jan 31, 2013 at 10:50 AM, Gandalf Corvotempesta
 wrote:
> 2013/1/31 Sławomir Skowron :
>> We are using nginx, on top of rgw. In nginx we manage to create logic, for
>> using a AMQP, and async operations via queues. Then workers, on every side
>> getiing data from own queue, and then coping data from source, to
>> destination in s3 API. Works for PUT/DELETE, and work automatic when
>> production goes on another location.
>
> I don't know much about messaging, are you able to share some
> configuration or more details ?



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron

On Thu, Jan 31, 2013 at 10:50 AM, Gandalf Corvotempesta
 wrote:
> 2013/1/31 Sławomir Skowron :
>> We are using nginx, on top of rgw. In nginx we manage to create logic, for
>> using a AMQP, and async operations via queues. Then workers, on every side
>> getiing data from own queue, and then coping data from source, to
>> destination in s3 API. Works for PUT/DELETE, and work automatic when
>> production goes on another location.
>
> I don't know much about messaging, are you able to share some
> configuration or more details ?



--
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Geo-replication with RBD

2013-02-18 Thread Sławomir Skowron
Hi, Sorry for very late response, but i was sick.

Our case is to make a failover rbd instance in another cluster. We are
storing block device images, for some services like Database. We need
to have a two clusters, synchronized, for a quick failover, if first
cluster goes down, or for upgrade with restart, or many other cases.

Volumes are in many sizes: 1-500GB
external block device for kvm vm, like EBS.

On Mon, Feb 18, 2013 at 3:07 PM, Sławomir Skowron  wrote:
> Hi, Sorry for very late response, but i was sick.
>
> Our case is to make a failover rbd instance in another cluster. We are
> storing block device images, for some services like Database. We need to
> have a two clusters, synchronized, for a quick failover, if first cluster
> goes down, or for upgrade with restart, or many other cases.
>
> Volumes are in many sizes: 1-500GB
> external block device for kvm vm, like EBS.
>
>
> On Fri, Feb 1, 2013 at 12:27 AM, Neil Levine 
> wrote:
>>
>> Skowron,
>>
>> Can you go into a bit more detail on your specific use-case? What type
>> of data are you storing in rbd (type, volume)?
>>
>> Neil
>>
>> On Wed, Jan 30, 2013 at 10:42 PM, Skowron Sławomir
>>  wrote:
>> > I make new thread, because i think it's a diffrent case.
>> >
>> > We have managed async geo-replication of s3 service, beetwen two ceph
>> > clusters in two DC's, and to amazon s3 as third. All this via s3 API. I 
>> > love
>> > to see native RGW geo-replication with described features in another 
>> > thread.
>> >
>> > There is another case. What about RBD replication ?? It's much more
>> > complicated, and for disaster recovery much more important, just like in
>> > enterprise storage arrays.
>> > One cluster in two DC's, not solving problem, because we need security
>> > in data consistency, and isolation.
>> > Do you thinking about this case ??
>> >
>> > Regards
>> > Slawomir Skowron--
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>
> --
> -
> Pozdrawiam
>
> Sławek "sZiBis" Skowron



--
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Geo-replication with RBD

2013-02-19 Thread Sławomir Skowron
My requirement is to have full disaster recovery, buisness continuity,
and failover of automatet services on second Datacenter, and not on
same ceph cluster.
Datacenters have 10GE dedicated link, for communication, and there is
option to expand cluster into two DataCenters, but it is not what i
mean.
There are advantages of this option like fast snapshots, and fast
switch of services, but there are some problems.

When we talk about disaster recovery i mean that whole storage cluster
have problems, not only services at top of storage. I am thinking
about bug, or mistake of admin, that makes cluster not accessible in
any copy, or a upgrade that makes data corruption, or upgrade that is
disruptive for services - auto failover services into another DC,
before upgrade cluster.

If cluster have a solution to replicate data in rbd images to next
cluster, than, only data are migrated, and when disaster comes, than
there is no need to work on last imported snapshot (there can be
constantly imported snapshot with minutes, or hour, before last
production), but work on data from now. And when we have automated
solution to recover DB (one of app service on top of rbd) clusters in
new datacenter infrastructure, than we have a real disaster recovery
solution.

That's why we made, a s3 api layer synchronization to another DC, and
Amazon, and only RBD is left.

Dnia 19 lut 2013 o godz. 10:23 "Sébastien Han"
 napisał(a):

> Hi,
>
> For of all, I have some questions about your setup:
>
> * What are your requirements?
> * Are the DCs far from each others?
>
> If they are reasonably close to each others, you can setup a single
> cluster, with replicas across both DCs and manage the RBD devices with
> pacemaker.
>
> Cheers.
>
> --
> Regards,
> Sébastien Han.
>
>
> On Mon, Feb 18, 2013 at 3:20 PM, Sławomir Skowron  wrote:
>> Hi, Sorry for very late response, but i was sick.
>>
>> Our case is to make a failover rbd instance in another cluster. We are
>> storing block device images, for some services like Database. We need
>> to have a two clusters, synchronized, for a quick failover, if first
>> cluster goes down, or for upgrade with restart, or many other cases.
>>
>> Volumes are in many sizes: 1-500GB
>> external block device for kvm vm, like EBS.
>>
>> On Mon, Feb 18, 2013 at 3:07 PM, Sławomir Skowron  wrote:
>>> Hi, Sorry for very late response, but i was sick.
>>>
>>> Our case is to make a failover rbd instance in another cluster. We are
>>> storing block device images, for some services like Database. We need to
>>> have a two clusters, synchronized, for a quick failover, if first cluster
>>> goes down, or for upgrade with restart, or many other cases.
>>>
>>> Volumes are in many sizes: 1-500GB
>>> external block device for kvm vm, like EBS.
>>>
>>>
>>> On Fri, Feb 1, 2013 at 12:27 AM, Neil Levine 
>>> wrote:
>>>>
>>>> Skowron,
>>>>
>>>> Can you go into a bit more detail on your specific use-case? What type
>>>> of data are you storing in rbd (type, volume)?
>>>>
>>>> Neil
>>>>
>>>> On Wed, Jan 30, 2013 at 10:42 PM, Skowron Sławomir
>>>>  wrote:
>>>>> I make new thread, because i think it's a diffrent case.
>>>>>
>>>>> We have managed async geo-replication of s3 service, beetwen two ceph
>>>>> clusters in two DC's, and to amazon s3 as third. All this via s3 API. I 
>>>>> love
>>>>> to see native RGW geo-replication with described features in another 
>>>>> thread.
>>>>>
>>>>> There is another case. What about RBD replication ?? It's much more
>>>>> complicated, and for disaster recovery much more important, just like in
>>>>> enterprise storage arrays.
>>>>> One cluster in two DC's, not solving problem, because we need security
>>>>> in data consistency, and isolation.
>>>>> Do you thinking about this case ??
>>>>>
>>>>> Regards
>>>>> Slawomir Skowron--
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majord...@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>> --
>>> -
>>> Pozdrawiam
>>>
>>> Sławek "sZiBis" Skowron
>>
>>
>>
>> --
>> -
>> Pozdrawiam
>>
>> Sławek "sZiBis" Skowron
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Geo-replication with RBD

2013-02-20 Thread Sławomir Skowron
Like i say, yes. Now it is only option, to migrate data from one
cluster to other, and now it must be enough, with some auto features.

But is there any timeline, or any brainstorming in ceph internal
meetings, about any possible replication in block level, or something
like that ??

On 20 lut 2013, at 17:33, Sage Weil  wrote:

> On Wed, 20 Feb 2013, S?awomir Skowron wrote:
>> My requirement is to have full disaster recovery, buisness continuity,
>> and failover of automatet services on second Datacenter, and not on
>> same ceph cluster.
>> Datacenters have 10GE dedicated link, for communication, and there is
>> option to expand cluster into two DataCenters, but it is not what i
>> mean.
>> There are advantages of this option like fast snapshots, and fast
>> switch of services, but there are some problems.
>>
>> When we talk about disaster recovery i mean that whole storage cluster
>> have problems, not only services at top of storage. I am thinking
>> about bug, or mistake of admin, that makes cluster not accessible in
>> any copy, or a upgrade that makes data corruption, or upgrade that is
>> disruptive for services - auto failover services into another DC,
>> before upgrade cluster.
>>
>> If cluster have a solution to replicate data in rbd images to next
>> cluster, than, only data are migrated, and when disaster comes, than
>> there is no need to work on last imported snapshot (there can be
>> constantly imported snapshot with minutes, or hour, before last
>> production), but work on data from now. And when we have automated
>> solution to recover DB (one of app service on top of rbd) clusters in
>> new datacenter infrastructure, than we have a real disaster recovery
>> solution.
>>
>> That's why we made, a s3 api layer synchronization to another DC, and
>> Amazon, and only RBD is left.
>
> Have you read the thread from Jens last week, 'snapshot, clone and mount a
> VM-Image'?  Would this type of capability capture you're requirements?
>
> sage
>
>>
>> Dnia 19 lut 2013 o godz. 10:23 "S?bastien Han"
>>  napisa?(a):
>>
>>> Hi,
>>>
>>> For of all, I have some questions about your setup:
>>>
>>> * What are your requirements?
>>> * Are the DCs far from each others?
>>>
>>> If they are reasonably close to each others, you can setup a single
>>> cluster, with replicas across both DCs and manage the RBD devices with
>>> pacemaker.
>>>
>>> Cheers.
>>>
>>> --
>>> Regards,
>>> S?bastien Han.
>>>
>>>
>>> On Mon, Feb 18, 2013 at 3:20 PM, S?awomir Skowron  wrote:
 Hi, Sorry for very late response, but i was sick.

 Our case is to make a failover rbd instance in another cluster. We are
 storing block device images, for some services like Database. We need
 to have a two clusters, synchronized, for a quick failover, if first
 cluster goes down, or for upgrade with restart, or many other cases.

 Volumes are in many sizes: 1-500GB
 external block device for kvm vm, like EBS.

 On Mon, Feb 18, 2013 at 3:07 PM, S?awomir Skowron  wrote:
> Hi, Sorry for very late response, but i was sick.
>
> Our case is to make a failover rbd instance in another cluster. We are
> storing block device images, for some services like Database. We need to
> have a two clusters, synchronized, for a quick failover, if first cluster
> goes down, or for upgrade with restart, or many other cases.
>
> Volumes are in many sizes: 1-500GB
> external block device for kvm vm, like EBS.
>
>
> On Fri, Feb 1, 2013 at 12:27 AM, Neil Levine 
> wrote:
>>
>> Skowron,
>>
>> Can you go into a bit more detail on your specific use-case? What type
>> of data are you storing in rbd (type, volume)?
>>
>> Neil
>>
>> On Wed, Jan 30, 2013 at 10:42 PM, Skowron S?awomir
>>  wrote:
>>> I make new thread, because i think it's a diffrent case.
>>>
>>> We have managed async geo-replication of s3 service, beetwen two ceph
>>> clusters in two DC's, and to amazon s3 as third. All this via s3 API. I 
>>> love
>>> to see native RGW geo-replication with described features in another 
>>> thread.
>>>
>>> There is another case. What about RBD replication ?? It's much more
>>> complicated, and for disaster recovery much more important, just like in
>>> enterprise storage arrays.
>>> One cluster in two DC's, not solving problem, because we need security
>>> in data consistency, and isolation.
>>> Do you thinking about this case ??
>>>
>>> Regards
>>> Slawomir Skowron--
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordo

[0.48.3] cluster health - 1 pgs incomplete state

2013-02-20 Thread Sławomir Skowron
Hi,

I have some problem. After OSD expand, and cluster crush re-organize i
have 1 pg in incomplete state. How can i solve this problem ??

ceph -s
   health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs
stuck unclean
   monmap e21: 3 mons at
{0=10.178.64.4:6790/0,1=10.178.64.5:6790/0,2=10.178.64.6:6790/0},
election epoch 54, quorum 0,1,2 0,1,2
   osdmap e87682: 156 osds: 156 up, 156 in
pgmap v13097839: 6480 pgs: 6479 active+clean, 1 incomplete; 1484
GB data, 7202 GB used, 36218 GB / 43420 GB avail
   mdsmap e1: 0/0/1 up

ceph health details
HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean
pg 5.5c is stuck incomplete, last acting [35,68,120]
pg 5.5c is stuck incomplete, last acting [35,68,120]
pg 5.5c is incomplete, acting [35,68,120]

in attachment output from ceph pg 5.5c query

Regards
Sławek "sZiBis" Skowron
ceph pg 5.5c query

{ "state": "incomplete",
  "up": [
35,
68,
120],
  "acting": [
35,
68,
120],
  "info": { "pgid": "5.5c",
  "last_update": "28692'1809",
  "last_complete": "28692'1809",
  "log_tail": "509'809",
  "last_backfill": "0\/\/0\/\/-1",
  "purged_snaps": "[]",
  "history": { "epoch_created": 365,
  "last_epoch_started": 37673,
  "last_epoch_clean": 24973,
  "last_epoch_split": 37673,
  "same_up_since": 81165,
  "same_interval_since": 81165,
  "same_primary_since": 55011,
  "last_scrub": "19046'1806",
  "last_scrub_stamp": "2013-02-11 00:20:52.190807"},
  "stats": { "version": "28692'1809",
  "reported": "55011'57838",
  "state": "incomplete",
  "last_fresh": "2013-02-20 16:27:35.078140",
  "last_change": "2013-02-19 11:14:39.520274",
  "last_active": "0.00",
  "last_clean": "0.00",
  "last_unstale": "2013-02-20 16:27:35.078140",
  "mapping_epoch": 70925,
  "log_start": "509'809",
  "ondisk_log_start": "509'809",
  "created": 365,
  "last_epoch_clean": 365,
  "parent": "0.0",
  "parent_split_bits": 0,
  "last_scrub": "19046'1806",
  "last_scrub_stamp": "2013-02-11 00:20:52.190807",
  "log_size": 203950,
  "ondisk_log_size": 203950,
  "stat_sum": { "num_bytes": 0,
  "num_objects": 0,
  "num_object_clones": 0,
  "num_object_copies": 0,
  "num_objects_missing_on_primary": 0,
  "num_objects_degraded": 0,
  "num_objects_unfound": 0,
  "num_read": 0,
  "num_read_kb": 0,
  "num_write": 0,
  "num_write_kb": 0},
  "stat_cat_sum": {},
  "up": [
35,
68,
120],
  "acting": [
35,
68,
120]},
  "empty": 0,
  "dne": 0,
  "incomplete": 1},
  "recovery_state": [
{ "name": "Started\/Primary\/Peering",
  "enter_time": "2013-02-19 11:14:35.094762",
  "past_intervals": [
{ "first": 23374,
  "last": 23496,
  "maybe_went_rw": 1,
  "up": [
95],
  "acting": [
95]},
{ "first": 23497,
  "last": 23498,
  "maybe_went_rw": 1,
  "up": [
56,
95],
  "acting": [
56,
95]},
{ "first": 23499,
  "last": 23540,
  "maybe_went_rw": 1,
  "up": [
56,
95],
  "acting": [
95,
56]},
{ "first": 23541,
  "last": 24899,
  "maybe_went_rw": 1,
  "up": [
56,
95],
  "acting": [
56,
95]},
{ "first": 24900,
  "last": 24908,
  "maybe_went_rw": 1,
  "up": [
68,
95],
  "acting": [
68,
95]},
{ "first": 24909,
  "last": 24950,
  "maybe_went_rw": 1,
  "up": [
72,
95],
  "acting": [
72,
95]},
{ "first": 24951,
  "last": 25727,
  "maybe_went_rw": 1,
  "up": [
72,
95],
  "acting": [
72,
95

Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-04 Thread Sławomir Skowron
Ok, thanks for response. But if i have crush map like this in attachment.

All data should be balanced equal, not including hosts with 0.5 weight.

How make data auto balanced ?? when i know that some pq's have too
much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite
enough.

pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
pg_num 4800 pgp_num 4800 last_change 908 owner 0

When will bee possible to expand number of pg's ??

Best Regards

Slawomir Skowron

On Mon, Mar 4, 2013 at 3:16 PM, Yehuda Sadeh  wrote:
> On Mon, Mar 4, 2013 at 3:02 AM, Sławomir Skowron  wrote:
>> Hi,
>>
>> We have a big problem with RGW. I don't know what is the initial
>> trigger, but i have theory.
>>
>> 2-3 osd, from 78 in cluster (6480 PG on RGW pool), have 3x time more
>> RAM usage, they have much more operations in journal, and much bigger
>> latency.
>>
>> When we PUT some objects then in some cases, there are so many
>> operations in triple replication on this osd (one PG). Then this
>> triple can't handle this load, and goes down, drives on backend of
>> this osd are getting fire with big wait-io, and big response times.
>> RGW waiting for this PG, and eventually block all the others
>> operations when makes 1024 operations blocked in queue.
>> Then whole cluster have problems, and we have an outage.
>>
>> When RGW block operations there is only one PG that have >1000
>> operations in queue -
>> ceph pg map 3.9447554d
>> osdmap e11404 pg 3.9447554d (3.54d) -> up [53,45,23] acting [53,45,23]
>>
>> now this osd are migrated, with ratio 0.5 on, but before it was
>>
>> ceph pg map 3.9447554d
>> osdmap e11404 pg 3.9447554d (3.54d) -> up [71,45,23] acting [71,45,23]
>>
>> and this three osd's have such a problems. Under this osd's are only 3
>> drive, one drive per osd, that's why this have such a big impact.
>>
>> What i done. I gave 50% smaller ratio in CRUSH for this osd's, but
>> data move to other osd, and this osd, have half of possible capacity.
>> I think it won't help in long term, and it's not a solution.
>>
>> I have second cluster, with only replication on it, and there are same
>> case. Attachment explain everything. Every parameter on this bad osd
>> is much higher than on others. There are 2-3 osd with such high
>> counters.
>>
>> Is this a bug ?? maybe there is no problems in bobtail ?? I can't
>> switch quick into bobtail that's why i need some answers, which way i
>> need to go.
>>
>
> Not sure if bobtail is going to help much here, although there were a
> few performance fixes that went in. If your cluster is unbalanced (in
> terms of performance) then requests are going to be accumulated on the
> weakest link. Reweighting the osd like what you did is a valid way to
> go. You need to make sure that on the steady state, there's no one osd
> that starts holding all the traffic.
> Also, make sure that your pools have enough pgs so that the placement
> distribution is uniform.
>
> Yehuda



--
-
Pozdrawiam

Sławek "sZiBis" Skowron
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool

# buckets
host s3-10-177-64-4 {
id -2   # do not change unnecessarily
# weight 25.500
alg straw
hash 0  # rje

Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-04 Thread Sławomir Skowron
On Mon, Mar 4, 2013 at 6:02 PM, Sage Weil  wrote:
> On Mon, 4 Mar 2013, S?awomir Skowron wrote:
>> Ok, thanks for response. But if i have crush map like this in attachment.
>>
>> All data should be balanced equal, not including hosts with 0.5 weight.
>>
>> How make data auto balanced ?? when i know that some pq's have too
>> much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite
>> enough.
>>
>> pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
>> pg_num 4800 pgp_num 4800 last_change 908 owner 0
>>
>> When will bee possible to expand number of pg's ??
>
> Soon.  :)
>
> The bigger question for me is why there is one PG that is getting pounded
> while the others are not.  Is there a large skew in the workload toward a
> small number of very hot objects?

Yes, there are constantly about 100-200 operations in second, all
going into RGW backend. But when problems comes, there are more
requests, more GET, and PUT, because of reconnect of applications,
with short timeouts. But statistically all new PUTs normally goes for
many pg's, this should not overload a single master OSD. Maybe
balanced Reads from all replicas could help a little ??.

>  I expect it should be obvious if you go
> to the loaded osd and do
>
>  ceph --admin-daemon /var/run/ceph/ceph-osd.NN.asok dump_ops_in_flight
>

Yes i did that, but only when cluster going unstable there are such
long operations. Normaly there are no ops in queue, only when cluster
going to rebalance, remap, or anything else.

> and look at the request queue.
>
> sage
>
>
>>
>> Best Regards
>>
>> Slawomir Skowron
>>
>> On Mon, Mar 4, 2013 at 3:16 PM, Yehuda Sadeh  wrote:
>> > On Mon, Mar 4, 2013 at 3:02 AM, S?awomir Skowron  wrote:
>> >> Hi,
>> >>
>> >> We have a big problem with RGW. I don't know what is the initial
>> >> trigger, but i have theory.
>> >>
>> >> 2-3 osd, from 78 in cluster (6480 PG on RGW pool), have 3x time more
>> >> RAM usage, they have much more operations in journal, and much bigger
>> >> latency.
>> >>
>> >> When we PUT some objects then in some cases, there are so many
>> >> operations in triple replication on this osd (one PG). Then this
>> >> triple can't handle this load, and goes down, drives on backend of
>> >> this osd are getting fire with big wait-io, and big response times.
>> >> RGW waiting for this PG, and eventually block all the others
>> >> operations when makes 1024 operations blocked in queue.
>> >> Then whole cluster have problems, and we have an outage.
>> >>
>> >> When RGW block operations there is only one PG that have >1000
>> >> operations in queue -
>> >> ceph pg map 3.9447554d
>> >> osdmap e11404 pg 3.9447554d (3.54d) -> up [53,45,23] acting [53,45,23]
>> >>
>> >> now this osd are migrated, with ratio 0.5 on, but before it was
>> >>
>> >> ceph pg map 3.9447554d
>> >> osdmap e11404 pg 3.9447554d (3.54d) -> up [71,45,23] acting [71,45,23]
>> >>
>> >> and this three osd's have such a problems. Under this osd's are only 3
>> >> drive, one drive per osd, that's why this have such a big impact.
>> >>
>> >> What i done. I gave 50% smaller ratio in CRUSH for this osd's, but
>> >> data move to other osd, and this osd, have half of possible capacity.
>> >> I think it won't help in long term, and it's not a solution.
>> >>
>> >> I have second cluster, with only replication on it, and there are same
>> >> case. Attachment explain everything. Every parameter on this bad osd
>> >> is much higher than on others. There are 2-3 osd with such high
>> >> counters.
>> >>
>> >> Is this a bug ?? maybe there is no problems in bobtail ?? I can't
>> >> switch quick into bobtail that's why i need some answers, which way i
>> >> need to go.
>> >>
>> >
>> > Not sure if bobtail is going to help much here, although there were a
>> > few performance fixes that went in. If your cluster is unbalanced (in
>> > terms of performance) then requests are going to be accumulated on the
>> > weakest link. Reweighting the osd like what you did is a valid way to
>> > go. You need to make sure that on the steady state, there's no one osd
>> > that starts holding all the traffic.
>> > Also, make sure that your pools have enough pgs so that the placement
>> > distribution is uniform.
>> >
>> > Yehuda
>>
>>
>>
>> --
>> -
>> Pozdrawiam
>>
>> S?awek "sZiBis" Skowron
>>



--
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-04 Thread Sławomir Skowron
Alone (one of this slow osd in mentioned tripple)

2013-03-04 18:39:27.683035 osd.23 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 15.241943 sec at 68795 KB/sec

in for loop (some slow request appear):

for x in `seq 0 25`; do ceph osd tell $x bench;done
2013-03-04 18:41:08.259454 osd.12 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.658448 sec at 27844 KB/sec
2013-03-04 18:41:07.850213 osd.5 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.402402 sec at 28034 KB/sec
2013-03-04 18:41:07.850231 osd.11 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.201831 sec at 28186 KB/sec
2013-03-04 18:41:08.100186 osd.10 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.540605 sec at 27931 KB/sec
2013-03-04 18:41:08.319766 osd.21 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.532806 sec at 27937 KB/sec
2013-03-04 18:41:08.415835 osd.14 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.772730 sec at 27760 KB/sec
2013-03-04 18:41:08.775264 osd.9 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.195523 sec at 27452 KB/sec
2013-03-04 18:41:08.808824 osd.6 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.338387 sec at 27350 KB/sec
2013-03-04 18:41:08.923809 osd.19 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.177933 sec at 27465 KB/sec
2013-03-04 18:41:08.925848 osd.18 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.201476 sec at 27448 KB/sec
2013-03-04 18:41:08.936961 osd.15 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.273058 sec at 27397 KB/sec
2013-03-04 18:41:08.619022 osd.20 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.713017 sec at 27804 KB/sec
2013-03-04 18:41:08.764705 osd.22 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.954886 sec at 27626 KB/sec
2013-03-04 18:41:08.499156 osd.0 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.035553 sec at 27568 KB/sec
2013-03-04 18:41:07.873457 osd.2 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.489969 sec at 27969 KB/sec
2013-03-04 18:41:08.134530 osd.13 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.513056 sec at 27952 KB/sec
2013-03-04 18:41:08.219142 osd.1 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.856368 sec at 27698 KB/sec
2013-03-04 18:41:08.485806 osd.4 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.060621 sec at 27550 KB/sec
2013-03-04 18:41:08.612236 osd.7 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.122105 sec at 27505 KB/sec
2013-03-04 18:41:08.647494 osd.8 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.134885 sec at 27496 KB/sec
2013-03-04 18:41:08.649267 osd.3 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.961966 sec at 27621 KB/sec
2013-03-04 18:41:08.943610 osd.24 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.091272 sec at 27527 KB/sec
2013-03-04 18:41:08.975838 osd.17 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.270884 sec at 27398 KB/sec
2013-03-04 18:41:09.544561 osd.23 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.715030 sec at 27084 KB/sec
2013-03-04 18:41:08.969981 osd.16 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 38.287596 sec at 27386 KB/sec
2013-03-04 18:41:09.533789 osd.25 [INF] bench: wrote 1024 MB in blocks
of 4096 KB in 37.954333 sec at 27627 KB/sec

I have a little fragmented xfs, but performance is still good.

On Mon, Mar 4, 2013 at 6:25 PM, Gregory Farnum  wrote:
> On Mon, Mar 4, 2013 at 9:23 AM, Sławomir Skowron  wrote:
>> On Mon, Mar 4, 2013 at 6:02 PM, Sage Weil  wrote:
>>> On Mon, 4 Mar 2013, S?awomir Skowron wrote:
>>>> Ok, thanks for response. But if i have crush map like this in attachment.
>>>>
>>>> All data should be balanced equal, not including hosts with 0.5 weight.
>>>>
>>>> How make data auto balanced ?? when i know that some pq's have too
>>>> much data ?? I have 4800 pg's on RGW only with 78 OSD, it is quite
>>>> enough.
>>>>
>>>> pool 3 '.rgw.buckets' rep size 3 crush_ruleset 0 object_hash rjenkins
>>>> pg_num 4800 pgp_num 4800 last_change 908 owner 0
>>>>
>>>> When will bee possible to expand number of pg's ??
>>>
>>> Soon.  :)
>>>
>>> The bigger question for me is why there is one PG that is getting pounded
>>> while the others are not.  Is there a large skew in the workload toward a
>>> small number of very hot objects?
>>
>> Yes, there are constantly about 100-200 operations in second, all
>> going into RGW backend. But when problems comes, there are more
>> requests, more GET, and PUT, because of reconnect of applications,
>> with short timeouts. But statistically all new PUTs normally goes for
>> many pg's, this should not overload a single master OSD. Maybe
>> balanced Reads from all replicas could help a little ??.
>>
>>>  I expect it should be ob

Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-04 Thread Sławomir Skowron
And some output from rest-bench:

2013-03-04 19:31:41.503865min lat: 0.166207 max lat: 3.44611 avg lat: 0.911577
2013-03-04 19:31:41.503865   sec Cur ops   started  finished  avg MB/s
 cur MB/s  last lat   avg lat
2013-03-04 19:31:41.50386540  16   715   699   69.7985
   64   1.54288  0.911577
2013-03-04 19:31:42.50421841  16   721   705   68.6825
   24  0.949049  0.909889
2013-03-04 19:31:43.50452842  16   742   726   69.0462
   84  0.5669440.9164
2013-03-04 19:31:44.50485743  16   761   745   69.2071
   76   1.17317  0.919921
2013-03-04 19:31:45.50509944  16   766   750   68.0899
   20   1.23423  0.918905
2013-03-04 19:31:46.50697545  16   785   769   68.2626
   76  0.711296   0.92321
2013-03-04 19:31:47.50796446  16   794   778   67.5607
   36   1.79786  0.926638
2013-03-04 19:31:48.50814847  16   812   796   67.6548
   72  0.847533  0.930029
2013-03-04 19:31:49.50834748  16   829   813   67.6617
   68  0.807918  0.940498
2013-03-04 19:31:50.50854749  16   840   824   67.1792
   44   0.95126  0.938767
2013-03-04 19:31:51.50875350  16   858   842   67.2752
   72  0.711993  0.937664
2013-03-04 19:31:52.50907651  13   859   846   66.2706
   16   1.49896  0.939526
2013-03-04 19:31:53.509662 Total time run: 51.235707
Total writes made:  859
Write size: 4194304
Bandwidth (MB/sec): 67.063

Stddev Bandwidth:   22.35
Max bandwidth (MB/sec): 100
Min bandwidth (MB/sec): 0
Average Latency:0.951978
Stddev Latency: 0.456654
Max latency:3.44611
Min latency:0.166207

On Mon, Mar 4, 2013 at 6:42 PM, Sławomir Skowron  wrote:
> Alone (one of this slow osd in mentioned tripple)
>
> 2013-03-04 18:39:27.683035 osd.23 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 15.241943 sec at 68795 KB/sec
>
> in for loop (some slow request appear):
>
> for x in `seq 0 25`; do ceph osd tell $x bench;done
> 2013-03-04 18:41:08.259454 osd.12 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.658448 sec at 27844 KB/sec
> 2013-03-04 18:41:07.850213 osd.5 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.402402 sec at 28034 KB/sec
> 2013-03-04 18:41:07.850231 osd.11 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.201831 sec at 28186 KB/sec
> 2013-03-04 18:41:08.100186 osd.10 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.540605 sec at 27931 KB/sec
> 2013-03-04 18:41:08.319766 osd.21 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.532806 sec at 27937 KB/sec
> 2013-03-04 18:41:08.415835 osd.14 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.772730 sec at 27760 KB/sec
> 2013-03-04 18:41:08.775264 osd.9 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.195523 sec at 27452 KB/sec
> 2013-03-04 18:41:08.808824 osd.6 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.338387 sec at 27350 KB/sec
> 2013-03-04 18:41:08.923809 osd.19 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.177933 sec at 27465 KB/sec
> 2013-03-04 18:41:08.925848 osd.18 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.201476 sec at 27448 KB/sec
> 2013-03-04 18:41:08.936961 osd.15 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.273058 sec at 27397 KB/sec
> 2013-03-04 18:41:08.619022 osd.20 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.713017 sec at 27804 KB/sec
> 2013-03-04 18:41:08.764705 osd.22 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.954886 sec at 27626 KB/sec
> 2013-03-04 18:41:08.499156 osd.0 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.035553 sec at 27568 KB/sec
> 2013-03-04 18:41:07.873457 osd.2 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.489969 sec at 27969 KB/sec
> 2013-03-04 18:41:08.134530 osd.13 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.513056 sec at 27952 KB/sec
> 2013-03-04 18:41:08.219142 osd.1 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.856368 sec at 27698 KB/sec
> 2013-03-04 18:41:08.485806 osd.4 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.060621 sec at 27550 KB/sec
> 2013-03-04 18:41:08.612236 osd.7 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.122105 sec at 27505 KB/sec
> 2013-03-04 18:41:08.647494 osd.8 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.134885 sec at 27496 KB/sec
> 2013-03-04 18:41:08.649267 osd.3 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 37.961966 sec at 27621 KB/sec
> 2013-03-04 18:41:08.943610 osd.24 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.091272 sec at 27527 KB/sec
> 2013-03-04 18:41:08.975838 osd.17 [INF] bench: wrote 1024 MB in blocks
> of 4096 KB in 38.27

Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-06 Thread Sławomir Skowron
Hi, i do some test, to reproduce this problem.

As you can see, only one drive (each drive in same PG) is much more
utilize, then others, and there are some ops in queue on this slow
osd. This test is getting heads from s3 objects, alphabetically
sorted. This is strange. why this files is going in much part only
from this triple osd's.

checking what osd are in this pg.

 ceph pg map 7.35b
osdmap e117008 pg 7.35b (7.35b) -> up [18,61,133] acting [18,61,133]

On osd.61

{ "num_ops": 13,
  "ops": [
{ "description": "osd_sub_op(client.10376104.0:961532 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370134
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.448543",
  "age": "0.032431",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376110.0:972570 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370135
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.453829",
  "age": "0.027145",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376104.0:961534 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370136
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.454012",
  "age": "0.026962",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376107.0:952760 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370137
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.458980",
  "age": "0.021994",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376110.0:972572 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370138
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.459546",
  "age": "0.021428",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376110.0:972574 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370139
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.463680",
  "age": "0.017294",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376107.0:952762 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370140
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.464660",
  "age": "0.016314",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376104.0:961536 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370141
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.468076",
  "age": "0.012898",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376110.0:972576 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370142
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.468332",
  "age": "0.012642",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376107.0:952764 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370143
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.470480",
  "age": "0.010494",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376107.0:952766 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370144
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.475372",
  "age": "0.005602",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376104.0:961538 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370145
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.479391",
  "age": "0.001583",
  "flag_point": "started"},
{ "description": "osd_sub_op(client.10376107.0:952768 7.35b
2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370146
snapset=0=[]:[] snapc=0=[])",
  "received_at": "2013-03-06 13:59:18.480276",
  "age": "0.000698",
  "flag_point": "started"}]}

On osd.18

{ "num_ops": 9,
  "ops": [
{ "description": "osd_op(client.10391092.0:718883
2013-03-06-13-8700.1-ocdn [append 0~299] 7.2b11a75b)",
  "received_at": "2013-03-06 13:57:52.929677",
  "age": "0.025480",
  "flag_point": "waiting for sub ops",
  "client_info": { "client": "client.10391092",
  "tid": 718883}},
{ "description": "osd_op(client.10373691.0:956595
2013-03-06-13-8700.1-ocdn [append 0~299] 7.2b11a75b)",
  "received_at": "2013-03-06 13:57:52.934533",
  "age": "0.020624",
  "flag_point": "waiting for sub ops",
  "client_info": { "client": "client.10373691",
  "tid": 956595}},
{ "description": "osd_op(client.10391092.0:718885
2013-03-06-13-8700.1-ocdn [appen

Re: RGW Blocking on 1-2 PG's - argonaut

2013-03-06 Thread Sławomir Skowron
Great, thanks. Now i understand everything.

Best Regards
SS

Dnia 6 mar 2013 o godz. 15:04 Yehuda Sadeh  napisał(a):

> On Wed, Mar 6, 2013 at 5:06 AM, Sławomir Skowron  wrote:
>> Hi, i do some test, to reproduce this problem.
>>
>> As you can see, only one drive (each drive in same PG) is much more
>> utilize, then others, and there are some ops in queue on this slow
>> osd. This test is getting heads from s3 objects, alphabetically
>> sorted. This is strange. why this files is going in much part only
>> from this triple osd's.
>>
>> checking what osd are in this pg.
>>
>> ceph pg map 7.35b
>> osdmap e117008 pg 7.35b (7.35b) -> up [18,61,133] acting [18,61,133]
>>
>> On osd.61
>>
>> { "num_ops": 13,
>>  "ops": [
>>{ "description": "osd_sub_op(client.10376104.0:961532 7.35b
>> 2b11a75b\/2013-03-06-13-8700.1-ocdn\/head\/\/7 [] v 117008'1370134
>
> The ops log is slowing you down. Unless you really need it, set 'rgw
> enable ops log = false'. This is off by default in bobtail.
>
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mkfs on osd - failed in 0.47

2012-05-21 Thread Sławomir Skowron
Ubuntu precise:
Linux obs-10-177-66-4 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10
20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

# mount
/dev/sdc on /vol0/data/osd.0 type ext4
(rw,noexec,nodev,noatime,nodiratime,user_xattr,data=writeback,barrier=0)

# ceph-osd -i 0 --mkjournal --mkfs --monmap /tmp/monmap
2012-05-21 22:36:54.150374 7f65fbc0b780 -1 filestore(/vol0/data/osd.0)
leveldb db created
2012-05-21 22:36:54.152699 7f65fbc0b780 -1 filestore(/vol0/data/osd.0)
Extended attributes don't appear to work. Got error (2) No such file
or directory. If you are using ext3 or ext4, be sure to mount the
underlying file system with the 'user_xattr' option.
2012-05-21 22:36:54.152729 7f65fbc0b780 -1 OSD::mkfs: couldn't mount
FileStore: error -95
2012-05-21 22:36:54.152761 7f65fbc0b780 -1  ** ERROR: error creating
empty object store in /vol0/data/osd.0: (95) Operation not supported

# ls -la /vol0/data/osd.0
total 524328
drwxr-xr-x  4 root root  4096 May 21 22:36 .
drwxr-xr-x 29 root root  4096 May 21 22:34 ..
drwxr-xr-x  3 root root  4096 May 21 22:36 current
-rw-r--r--  1 root root37 May 21 22:36 fsid
-rw-r--r--  1 root root 536870912 May 21 22:36 journal
drwx--  2 root root 16384 May  9 15:06 lost+found
-rw-r--r--  1 root root 4 May 21 22:36 store_version

Is anyone have some idea ??

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mkfs on osd - failed in 0.47

2012-05-21 Thread Sławomir Skowron
Yes on root. 30 minutes ago on 0.46 same operation works.

ii  ceph 0.47-1precise
distributed storage and file system
ii  ceph-common  0.47-1precise  common
utilities to mount and interact with a ceph filesystem
ii  ceph-fuse0.47-1precise
FUSE-based client for the Ceph distributed file system
ii  libcephfs1   0.47-1precise  Ceph
distributed file system client library
ii  python-ceph  0.47-1precise  Python
libraries for the Ceph distributed filesystem

Repo:

deb http://ceph.com/debian/ precise main


On Mon, May 21, 2012 at 10:54 PM, Gregory Farnum  wrote:
> Are you actually running as root? (ie, right perms?)
> What version of Ceph are you using? If you pulled and built binaries
> off of master over the weekend, I believe it was broken for a few
> hours in a way that will manifest somewhat like this.
>
> On Mon, May 21, 2012 at 1:49 PM, Stefan Priebe  wrote:
>> Am 21.05.2012 22:41, schrieb Sławomir Skowron:
>>
>>> # ceph-osd -i 0 --mkjournal --mkfs --monmap /tmp/monmap
>>> 2012-05-21 22:36:54.150374 7f65fbc0b780 -1 filestore(/vol0/data/osd.0)
>>> leveldb db created
>>> 2012-05-21 22:36:54.152699 7f65fbc0b780 -1 filestore(/vol0/data/osd.0)
>>> Extended attributes don't appear to work. Got error (2) No such file
>>> or directory. If you are using ext3 or ext4, be sure to mount the
>>> underlying file system with the 'user_xattr' option.
>>> 2012-05-21 22:36:54.152729 7f65fbc0b780 -1 OSD::mkfs: couldn't mount
>>> FileStore: error -95
>>> 2012-05-21 22:36:54.152761 7f65fbc0b780 -1  ** ERROR: error creating
>>> empty object store in /vol0/data/osd.0: (95) Operation not supported
>>
>>
>> I get the same (v0.47) while using XFS and trying to restore an osd:
>>
>> #~: ceph-osd -c /etc/ceph/ceph.conf -i 1 --mkfs --monmap /tmp/monmap
>> 2012-05-21 22:47:34.151171 7f9090a80780 -1 filestore(/srv) leveldb db
>> created
>> 2012-05-21 22:47:34.539570 7f9090a80780 -1 filestore(/srv) Extended
>> attributes don't appear to work. Got error (2) No such file or directory. If
>> you are using ext3 or ext4, be sure to mount the underlying file system with
>> the 'user_xattr' option.
>> 2012-05-21 22:47:34.539618 7f9090a80780 -1 OSD::mkfs: couldn't mount
>> FileStore: error -95
>> 2012-05-21 22:47:34.539642 7f9090a80780 -1  ** ERROR: error creating empty
>> object store in /srv: (95) Operation not supported
>>
>> Stefan
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Designing a cluster guide

2012-05-21 Thread Sławomir Skowron
Maybe good for journal will be two cheap MLC Intel drives on Sandforce
(320/520), 120GB or 240GB, and HPA changed to 20-30GB only for
separate journaling partitions with hardware RAID1.

I like to test setup like this, but maybe someone have any real life info ??

On Mon, May 21, 2012 at 5:07 PM, Tomasz Paszkowski  wrote:
> Another great thing that should be mentioned is:
> https://github.com/facebook/flashcache/. It gives really huge
> performance improvements for reads/writes (especialy on FunsionIO
> drives) event without using librbd caching :-)
>
>
>
> On Sat, May 19, 2012 at 6:15 PM, Alexandre DERUMIER  
> wrote:
>> Hi,
>>
>> For your journal , if you have money, you can use
>>
>> stec zeusram ssd drive. (around 2000€ /8GB / 10 iops read/write with 4k 
>> block).
>> I'm using them with zfs san, they rocks for journal.
>> http://www.stec-inc.com/product/zeusram.php
>>
>> another interessesting product is ddrdrive
>> http://www.ddrdrive.com/
>>
>> - Mail original -
>>
>> De: "Stefan Priebe" 
>> À: "Gregory Farnum" 
>> Cc: ceph-devel@vger.kernel.org
>> Envoyé: Samedi 19 Mai 2012 10:37:01
>> Objet: Re: Designing a cluster guide
>>
>> Hi Greg,
>>
>> Am 17.05.2012 23:27, schrieb Gregory Farnum:
 It mentions for example "Fast CPU" for the mds system. What does fast
 mean? Just the speed of one core? Or is ceph designed to use multi core?
 Is multi core or more speed important?
>>> Right now, it's primarily the speed of a single core. The MDS is
>>> highly threaded but doing most things requires grabbing a big lock.
>>> How fast is a qualitative rather than quantitative assessment at this
>>> point, though.
>> So would you recommand a fast (more ghz) Core i3 instead of a single
>> xeon for this system? (price per ghz is better).
>>
>>> It depends on what your nodes look like, and what sort of cluster
>>> you're running. The monitors are pretty lightweight, but they will add
>>> *some* load. More important is their disk access patterns — they have
>>> to do a lot of syncs. So if they're sharing a machine with some other
>>> daemon you want them to have an independent disk and to be running a
>>> new kernel&glibc so that they can use syncfs rather than sync. (The
>>> only distribution I know for sure does this is Ubuntu 12.04.)
>> Which kernel and which glibc version supports this? I have searched
>> google but haven't found an exact version. We're using debian lenny
>> squeeze with a custom kernel.
>>
 Regarding the OSDs is it fine to use an SSD Raid 1 for the journal and
 perhaps 22x SATA Disks in a Raid 10 for the FS or is this quite absurd
 and you should go for 22x SSD Disks in a Raid 6?
>>> You'll need to do your own failure calculations on this one, I'm
>>> afraid. Just take note that you'll presumably be limited to the speed
>>> of your journaling device here.
>> Yeah that's why i wanted to use a Raid 1 of SSDs for the journaling. Or
>> is this still too slow? Another idea was to use only a ramdisk for the
>> journal and backup the files while shutting down to disk and restore
>> them after boot.
>>
>>> Given that Ceph is going to be doing its own replication, though, I
>>> wouldn't want to add in another whole layer of replication with raid10
>>> — do you really want to multiply your storage requirements by another
>>> factor of two?
>> OK correct bad idea.
>>
 Is it more useful the use a Raid 6 HW Controller or the btrfs raid?
>>> I would use the hardware controller over btrfs raid for now; it allows
>>> more flexibility in eg switching to xfs. :)
>> OK but overall you would recommand running one osd per disk right? So
>> instead of using a Raid 6 with for example 10 disks you would run 6 osds
>> on this machine?
>>
 Use single socket Xeon for the OSDs or Dual Socket?
>>> Dual socket servers will be overkill given the setup you're
>>> describing. Our WAG rule of thumb is 1GHz of modern CPU per OSD
>>> daemon. You might consider it if you decided you wanted to do an OSD
>>> per disk instead (that's a more common configuration, but it requires
>>> more CPU and RAM per disk and we don't know yet which is the better
>>> choice).
>> Is there also a rule of thumb for the memory?
>>
>> My biggest problem with ceph right now is the awful slow speed while
>> doing random reads and writes.
>>
>> Sequential read and writes are at 200Mb/s (that's pretty good for bonded
>> dual Gbit/s). But random reads and write are only at 0,8 - 1,5 Mb/s
>> which is def. too slow.
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>>
>> --
>>
>>
>>
>>
>>        Alexandre D erumier
>> Ingénieur Système
>> Fixe : 03 20 68 88 90
>> Fax : 03 20 68 90 81
>> 45 Bvd du Général Leclerc 59100 Roubaix - France
>> 12 rue Marivaux 75002 Paris - France
>>
>> --
>> To unsubscribe from this list: send the line "unsubscri

Re: mkfs on osd - failed in 0.47

2012-05-21 Thread Sławomir Skowron
Great Thanks.

On Mon, May 21, 2012 at 11:24 PM, Sage Weil  wrote:
> On Mon, 21 May 2012, Stefan Priebe wrote:
>> Am 21.05.2012 22:58, schrieb Sawomir Skowron:
>> > Yes on root. 30 minutes ago on 0.46 same operation works.
>> >
>> ...
>> > Repo:
>> >
>> > deb http://ceph.com/debian/ precise main
>>
>> Same to me except i'm using debian squeeze.
>
> I just pushed a fix for this to the stable branch.  Will build a v0.47.1
> shortly.
>
> sage



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Designing a cluster guide

2012-05-21 Thread Sławomir Skowron
I have some performance from rbd cluster near 320MB/s on VM from 3
node cluster, but with 10GE, and with 26 2.5" SAS drives used on every
machine it's not everything that can be.
Every osd drive is raid0 with one drive via battery cached nvram in
hardware raid ctrl.
Every osd take much ram for caching.

That's why i'am thinking about to change 2 drives for SSD in raid1
with hpa tuned for increase durability of drive for journaling - but
if this will work ;)

With newest drives can theoreticaly get 500MB/s with a long queue
depth. This means that i can in theory improve bandwith score, and
take lower latency, and better handling of multiple IO writes, from
many hosts.
Reads are cached in ram from OSD daemon, VFS in kernel, nvram in ctrl,
and in near future improve from cache in kvm (i need to test that -
this will improve performance)

But if SSD drive goes slower, it can get whole performance down in
writes. It's is very delicate.

Pozdrawiam

iSS

Dnia 22 maj 2012 o godz. 02:47 Quenten Grasso  napisał(a):

> I Should have added For storage I'm considering something like Enterprise 
> nearline SAS 3TB disks running individual disks not raided with rep level of 
> 2 as suggested :)
>
>
> Regards,
> Quenten
>
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org 
> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Quenten Grasso
> Sent: Tuesday, 22 May 2012 10:43 AM
> To: 'Gregory Farnum'
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: Designing a cluster guide
>
> Hi Greg,
>
> I'm only talking about journal disks not storage. :)
>
>
>
> Regards,
> Quenten
>
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org 
> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Gregory Farnum
> Sent: Tuesday, 22 May 2012 10:30 AM
> To: Quenten Grasso
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Designing a cluster guide
>
> On Mon, May 21, 2012 at 4:52 PM, Quenten Grasso  wrote:
>> Hi All,
>>
>>
>> I've been thinking about this issue myself past few days, and an idea I've 
>> come up with is running 16 x 2.5" 15K 72/146GB Disks,
>> in raid 10 inside a 2U Server with JBOD's attached to the server for actual 
>> storage.
>>
>> Can someone help clarify this one,
>>
>> Once the data is written to the (journal disk) and then read from the 
>> (journal disk) then written to the (storage disk) once this is complete this 
>> is considered a successful write by the client?
>> Or
>> Once the data is written to the (journal disk) is this considered successful 
>> by the client?
> This one — the write is considered "safe" once it is on-disk on all
> OSDs currently responsible for hosting the object.
>
> Every time anybody mentions RAID10 I have to remind them of the
> storage amplification that entails, though. Are you sure you want that
> on top of (well, underneath, really) Ceph's own replication?
>
>> Or
>> Once the data is written to the (journal disk) and written to the (storage 
>> disk) at the same time, once complete this is considered a successful write 
>> by the client? (if this is the case SSD's may not be so useful)
>>
>>
>> Pros
>> Quite fast Write throughput to the journal disks,
>> No write wareout of SSD's
>> RAID 10 with 1GB Cache Controller also helps improve things (if really keen 
>> you could use a cachecade as well)
>>
>>
>> Cons
>> Not as fast as SSD's
>> More rackspace required per server.
>>
>>
>> Regards,
>> Quenten
>>
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org 
>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Slawomir Skowron
>> Sent: Tuesday, 22 May 2012 7:22 AM
>> To: ceph-devel@vger.kernel.org
>> Cc: Tomasz Paszkowski
>> Subject: Re: Designing a cluster guide
>>
>> Maybe good for journal will be two cheap MLC Intel drives on Sandforce
>> (320/520), 120GB or 240GB, and HPA changed to 20-30GB only for
>> separate journaling partitions with hardware RAID1.
>>
>> I like to test setup like this, but maybe someone have any real life info ??
>>
>> On Mon, May 21, 2012 at 5:07 PM, Tomasz Paszkowski  wrote:
>>> Another great thing that should be mentioned is:
>>> https://github.com/facebook/flashcache/. It gives really huge
>>> performance improvements for reads/writes (especialy on FunsionIO
>>> drives) event without using librbd caching :-)
>>>
>>>
>>>
>>> On Sat, May 19, 2012 at 6:15 PM, Alexandre DERUMIER  
>>> wrote:
 Hi,

 For your journal , if you have money, you can use

 stec zeusram ssd drive. (around 2000€ /8GB / 10 iops read/write with 
 4k block).
 I'm using them with zfs san, they rocks for journal.
 http://www.stec-inc.com/product/zeusram.php

 another interessesting product is ddrdrive
 http://www.ddrdrive.com/

 - Mail original -

 De: "Stefan Priebe" 
 À: "Gregory Farnum" 
 Cc: ceph-devel@vger.kernel.org
 Envoyé: Samedi 19 Mai 2012 10:37:01
 Objet: Re: Designing a cluster guide

 Hi Greg,

 Am 17.05.2012 23:

Re: Designing a cluster guide

2012-05-21 Thread Sławomir Skowron
http://en.wikipedia.org/wiki/Host_protected_area

On Tue, May 22, 2012 at 8:30 AM, Stefan Priebe - Profihost AG
 wrote:
> Am 21.05.2012 23:22, schrieb Sławomir Skowron:
>> Maybe good for journal will be two cheap MLC Intel drives on Sandforce
>> (320/520), 120GB or 240GB, and HPA changed to 20-30GB only for
>> separate journaling partitions with hardware RAID1.
>> I like to test setup like this, but maybe someone have any real life
>> info ??
>
> HPA?
>
> That was also my idea but most of the people here still claim that
> they're too slow and you need something MORE powerful like.
>
> zeus ram: http://www.stec-inc.com/product/zeusram.php
> fusion io: http://www.fusionio.com/platforms/iodrive2/

But in commodity hardware, or cheap servers, even in mid-range
machines, cost of pci-e flash/ram card is too high even in small
cluster.

>
> Stefan



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mkfs on osd - failed in 0.47

2012-05-22 Thread Sławomir Skowron
One more thing:

=== osd.0 ===
2012-05-22 10:14:09.801059 7ffc4414a780 -1 filestore(/vol0/data/osd.0)
leveldb db created
2012-05-22 10:14:09.804227 7ffc4414a780 -1 filestore(/vol0/data/osd.0)
limited size xattrs -- enable filestore_xattr_use_omap
2012-05-22 10:14:09.804250 7ffc4414a780 -1 OSD::mkfs: couldn't mount
FileStore: error -95
2012-05-22 10:14:09.804364 7ffc4414a780 -1  ** ERROR: error creating
empty object store in /vol0/data/osd.0: (95) Operation not supported
failed: '/sbin/mkcephfs -d /tmp/mkcephfs.T8O2L4YPFH --init-daemon osd.0'

/dev/sdb on /vol0/data/osd.0 type ext4
(rw,noexec,nodev,noatime,nodiratime,user_xattr,data=writeback,barrier=0)

ls -la osd.0
total 524332
drwxr-xr-x  4 root root  4096 May 22 10:16 .
drwxr-xr-x 29 root root  4096 May 22 10:15 ..
drwxr-xr-x  3 root root  4096 May 22 10:16 current
-rw-r--r--  1 root root37 May 22 10:16 fsid
-rw-r--r--  1 root root 536870912 May 22 10:16 journal
drwx--  2 root root 16384 May 22 10:15 lost+found
-rw-r--r--  1 root root 4 May 22 10:16 store_version
-rwx--  1 root root 0 May 22 10:16 xattr_test


On Mon, May 21, 2012 at 11:25 PM, Sławomir Skowron  wrote:
> Great Thanks.
>
> On Mon, May 21, 2012 at 11:24 PM, Sage Weil  wrote:
>> On Mon, 21 May 2012, Stefan Priebe wrote:
>>> Am 21.05.2012 22:58, schrieb Sawomir Skowron:
>>> > Yes on root. 30 minutes ago on 0.46 same operation works.
>>> >
>>> ...
>>> > Repo:
>>> >
>>> > deb http://ceph.com/debian/ precise main
>>>
>>> Same to me except i'm using debian squeeze.
>>
>> I just pushed a fix for this to the stable branch.  Will build a v0.47.1
>> shortly.
>>
>> sage
>
>
>
> --
> -
> Pozdrawiam
>
> Sławek "sZiBis" Skowron

-- 
-
Pozdrawiam

Sławek Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mkfs on osd - failed in 0.47

2012-05-22 Thread Sławomir Skowron
Ok, now it is clear to me.

I disable filestore_xattr_use_omap for now, and i will try to move
puppet class to xfs for a new cluster init :)

Thanks

On Tue, May 22, 2012 at 7:47 PM, Greg Farnum  wrote:
> On Tuesday, May 22, 2012 at 1:21 AM, Sławomir Skowron wrote:
>> One more thing:
>>
>> === osd.0 ===
>> 2012-05-22 10:14:09.801059 7ffc4414a780 -1 filestore(/vol0/data/osd.0)
>> leveldb db created
>> 2012-05-22 10:14:09.804227 7ffc4414a780 -1 filestore(/vol0/data/osd.0)
>> limited size xattrs -- enable filestore_xattr_use_omap
>> 2012-05-22 10:14:09.804250 7ffc4414a780 -1 OSD::mkfs: couldn't mount
>> FileStore: error -95
>> 2012-05-22 10:14:09.804364 7ffc4414a780 -1 ** ERROR: error creating
>> empty object store in /vol0/data/osd.0: (95) Operation not supported
>> failed: '/sbin/mkcephfs -d /tmp/mkcephfs.T8O2L4YPFH --init-daemon osd.0'
>
> That's because you're using something that only does small xattrs (ext4, 
> probably?) and the system wants you to turn on "filestore_xattr_use_omap" in 
> the config file (for OSDs). :)
> I'm not sure why it's off by default -- Sam?
> -Greg
>
>>
>> /dev/sdb on /vol0/data/osd.0 type ext4
>> (rw,noexec,nodev,noatime,nodiratime,user_xattr,data=writeback,barrier=0)
>>
>> ls -la osd.0
>> total 524332
>> drwxr-xr-x 4 root root 4096 May 22 10:16 .
>> drwxr-xr-x 29 root root 4096 May 22 10:15 ..
>> drwxr-xr-x 3 root root 4096 May 22 10:16 current
>> -rw-r--r-- 1 root root 37 May 22 10:16 fsid
>> -rw-r--r-- 1 root root 536870912 May 22 10:16 journal
>> drwx-- 2 root root 16384 May 22 10:15 lost+found
>> -rw-r--r-- 1 root root 4 May 22 10:16 store_version
>> -rwx-- 1 root root 0 May 22 10:16 xattr_test
>>
>>
>> On Mon, May 21, 2012 at 11:25 PM, Sławomir Skowron > (mailto:szi...@gmail.com)> wrote:
>> > Great Thanks.
>> >
>> > On Mon, May 21, 2012 at 11:24 PM, Sage Weil > > (mailto:s...@inktank.com)> wrote:
>> > > On Mon, 21 May 2012, Stefan Priebe wrote:
>> > > > Am 21.05.2012 22:58, schrieb Sawomir Skowron:
>> > > > > Yes on root. 30 minutes ago on 0.46 same operation works.
>> > > >
>> > > >
>> > > > ...
>> > > > > Repo:
>> > > > >
>> > > > > deb http://ceph.com/debian/ precise main
>> > > >
>> > > > Same to me except i'm using debian squeeze.
>> > >
>> > > I just pushed a fix for this to the stable branch. Will build a v0.47.1
>> > > shortly.
>> > >
>> > > sage
>> >
>> >
>> >
>> > --
>> > -
>> > Pozdrawiam
>> >
>> > Sławek "sZiBis" Skowron
>>
>> --
>> -
>> Pozdrawiam
>>
>> Sławek Skowron
>
>



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW, future directions

2012-05-22 Thread Sławomir Skowron
On Tue, May 22, 2012 at 8:07 PM, Yehuda Sadeh  wrote:
> RGW is maturing. Beside looking at performance, which highly ties into
> RADOS performance, we'd like to hear whether there are certain pain
> points or future directions that you (you as in the ceph community)
> would like to see us taking.
>
> There are a few directions that we were thinking about:
>
> 1. Extend Object Storage API
>
> Swift and S3 has some features that we don't currently support. We can
> certainly extend our functionality, however, is there any demand for
> more features? E.g., self destructing objects, web site, user logs,
> etc.

More compatibility with S3 and swift is good.

>
> 2. Better OpenStack interoperability
>
> Keystone support? Other?
>
> 3. New features
>
> Some examples:
>
>  - multitenancy: api for domains and user management
>  - snapshots
>  - computation front end: upload object, then do some data
> transformation/calculation.
>  - simple key-value api
>
> 4. CDMI
>
> Sage brought up the CDMI support question to ceph-devel, and I don't
> remember him getting any response. Is there any intereset in CDMI?
>
>
> 5. Native apache/nginx module or embedded web server
>
> We still need to prove that the web server is a bottleneck, or poses
> scaling issues. Writing a correct native nginx module will require
> turning rgw process model into event driven, which is not going to be
> easy.
>

nginx module is nice thing.

> 6. Improve garbage collection
>
> Currently rgw generates intent logs for garabage removal that require
> running an external tool later, which is an administrative pain. We
> can implement other solutions (OSD side garbage collection,
> integrating cleanup process into the gateway, etc.) but we need to
> understand the priority.

crontab can handle this task for now, but in big workload, better if
it's integrated, like scrub, and tuned via conf

>
> 7. libradosgw
>
> We have had this in mind for some time now. Creating a programming api
> for rgw, not too different from librados and librbd. It'll hopefully
> make code much cleaner. It will allow users to write different front
> ends for the rgw backend, and it will make it easier for users to
> write applications that interact with the backend, e.g., do processing
> on objects that users uploaded, FUSE for rgw without S3 as an
> intermediate, etc.
>
> 8. Administration tools improvement
>
> We can always do better there.
>
> 9. Other ideas?

- I would like to see a feature that can make replication between
clusters. For start 2 clusters. It's a very good feature, when you
have two datacenters, and repliacation goes via hight speed link, but
aplications at top of clusters does not need to handle this tasks, and
data are consistent.

- gracefull upgrade

- reload cluster config without restart daemons - or maybe this exist
right now ??

>
>
> Any comments are welcome!
>
> Thanks,
> Yehuda
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RGW, future directions

2012-05-22 Thread Sławomir Skowron
On Tue, May 22, 2012 at 9:09 PM, Yehuda Sadeh  wrote:
> On Tue, May 22, 2012 at 11:25 AM, Sławomir Skowron  wrote:
>> On Tue, May 22, 2012 at 8:07 PM, Yehuda Sadeh  wrote:
>>> RGW is maturing. Beside looking at performance, which highly ties into
>>> RADOS performance, we'd like to hear whether there are certain pain
>>> points or future directions that you (you as in the ceph community)
>>> would like to see us taking.
>>>
>>> There are a few directions that we were thinking about:
>>>
>>> 1. Extend Object Storage API
>>>
>>> Swift and S3 has some features that we don't currently support. We can
>>> certainly extend our functionality, however, is there any demand for
>>> more features? E.g., self destructing objects, web site, user logs,
>>> etc.
>>
>> More compatibility with S3 and swift is good.

Now i have no issues about that, because we use only simples parts of
S3, but there is always, a good if compatibility is better.

>
> Any specific functional interest?
>
>>
>>>
>>> 2. Better OpenStack interoperability
>>>
>>> Keystone support? Other?
>>>
>>> 3. New features
>>>
>>> Some examples:
>>>
>>>  - multitenancy: api for domains and user management

multitenancy - quotas, QoS, and user management it's nice feature for
admin, especially in cloud systems, with many different type of
control, even performance, and nice integration API for external
accounting services in cloud system.

>>>  - snapshots
>>>  - computation front end: upload object, then do some data
>>> transformation/calculation.
>>>  - simple key-value api
>>>
>>> 4. CDMI
>>>
>>> Sage brought up the CDMI support question to ceph-devel, and I don't
>>> remember him getting any response. Is there any intereset in CDMI?
>>>
>>>
>>> 5. Native apache/nginx module or embedded web server
>>>
>>> We still need to prove that the web server is a bottleneck, or poses
>>> scaling issues. Writing a correct native nginx module will require
>>> turning rgw process model into event driven, which is not going to be
>>> easy.
>>>
>>
>> nginx module is nice thing.
>
> It would be nice to have some concrete numbers as to where apache or
> nginx with fastcgi holding us back, and how a dedicated module is
> going to improve that. As a rule of thumb it is a no brainer, but
> still we want to have a better understanding of the situation before
> we dive into such a project.

Now it's hard to say, beacause for now there is now intensive traffic
@ this cluster. I have only synthetic test. Reads from one machine in
cluster are about 500req/s (no nginx cache) via nginx <-> radosgw
fcgi, but machine can handle more (cpu, ram, drives). I need to test
that how it's realy holding this a whole service. With cache enabled
in nginx, reads operation are no issue, but writes (PUTs, DELETEs)
will always be problematic, but now radosgw have improved with this a
lot.

>
>>
>>> 6. Improve garbage collection
>>>
>>> Currently rgw generates intent logs for garabage removal that require
>>> running an external tool later, which is an administrative pain. We
>>> can implement other solutions (OSD side garbage collection,
>>> integrating cleanup process into the gateway, etc.) but we need to
>>> understand the priority.
>>
>> crontab can handle this task for now, but in big workload, better if
>> it's integrated, like scrub, and tuned via conf
>
> Yeah. One of the original ideas was to leverage scrubbing for objects
> expiration (issue #1994). The discussion never converged, as the devil
> is as always in the details. We can revive that discussion.
>
>>
>>>
>>> 7. libradosgw
>>>
>>> We have had this in mind for some time now. Creating a programming api
>>> for rgw, not too different from librados and librbd. It'll hopefully
>>> make code much cleaner. It will allow users to write different front
>>> ends for the rgw backend, and it will make it easier for users to
>>> write applications that interact with the backend, e.g., do processing
>>> on objects that users uploaded, FUSE for rgw without S3 as an
>>> intermediate, etc.
>>>
>>> 8. Administration tools improvement
>>>
>>> We can always do better there.
>>>
>>> 9. Other ideas?
>>
>> - I would like to see a feature that can make replication between
>> clusters. For start 2 clusters. It'

Re: RGW, future directions

2012-05-24 Thread Sławomir Skowron
On Thu, May 24, 2012 at 7:15 AM, Wido den Hollander  wrote:
>
>
> On 22-05-12 20:07, Yehuda Sadeh wrote:
>>
>> RGW is maturing. Beside looking at performance, which highly ties into
>> RADOS performance, we'd like to hear whether there are certain pain
>> points or future directions that you (you as in the ceph community)
>> would like to see us taking.
>>
>> There are a few directions that we were thinking about:
>>
>> 1. Extend Object Storage API
>>
>> Swift and S3 has some features that we don't currently support. We can
>> certainly extend our functionality, however, is there any demand for
>> more features? E.g., self destructing objects, web site, user logs,
>> etc.
>>
>> 2. Better OpenStack interoperability
>>
>> Keystone support? Other?
>>
>> 3. New features
>>
>> Some examples:
>>
>>  - multitenancy: api for domains and user management
>>  - snapshots
>>  - computation front end: upload object, then do some data
>> transformation/calculation.
>>  - simple key-value api
>>
>> 4. CDMI
>>
>> Sage brought up the CDMI support question to ceph-devel, and I don't
>> remember him getting any response. Is there any intereset in CDMI?
>>
>>
>> 5. Native apache/nginx module or embedded web server
>>
>> We still need to prove that the web server is a bottleneck, or poses
>> scaling issues. Writing a correct native nginx module will require
>> turning rgw process model into event driven, which is not going to be
>> easy.
>>
>
> I'd not go for a native nginx or Apache module, that would bring extra C
> code into the story which would mean extra dependencies.
>
> My vote would still go to a embedded webserver written in something like
> Python. You could then use Apache/nginx/Varnish as a reverse proxy in front
> and do all kinds of cool stuff.
>
> You could even doing caching in nginx or Varnish and let the RGW notify
> those proxy's when an object has changed so they can purge their cache. This
> would dramatically improve the performance of the gateway.
>
> It would also simplify the code, why try to do caching on your own when some
> great HTTP caches are out there?
>

100% +1

>
>> 6. Improve garbage collection
>>
>> Currently rgw generates intent logs for garabage removal that require
>> running an external tool later, which is an administrative pain. We
>> can implement other solutions (OSD side garbage collection,
>> integrating cleanup process into the gateway, etc.) but we need to
>> understand the priority.
>>
>> 7. libradosgw
>>
>> We have had this in mind for some time now. Creating a programming api
>> for rgw, not too different from librados and librbd. It'll hopefully
>> make code much cleaner. It will allow users to write different front
>> ends for the rgw backend, and it will make it easier for users to
>> write applications that interact with the backend, e.g., do processing
>> on objects that users uploaded, FUSE for rgw without S3 as an
>> intermediate, etc.
>>
>
> Yes, I would really like this. Combine this with the Python
> stand-alone/embedded webserver I proposed and you get a really nice RGW I
> think.
>
>
>> 8. Administration tools improvement
>>
>> We can always do better there.
>>
>
> When we have libradosgw it wouldn't be that hard to make a nice web
> front-end where you can manage the whole thing.
>
>
>> 9. Other ideas?
>>
>>
>> Any comments are welcome!
>>
>> Thanks,
>> Yehuda
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RBD stale on VM, and RBD cache enable problem

2012-06-11 Thread Sławomir Skowron
I have two questions. My newly created cluster with xfs on all osd,
ubuntu precise, kernel 3.2.0-23-generic. Ceph 0.47.2-1precise

pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num
64 pgp_num 64 last_change 1228 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1226 owner 0
pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 64
pgp_num 64 last_change 1232 owner 0
pool 3 '.rgw' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8
pgp_num 8 last_change 3878 owner 18446744073709551615

1. After i stop all daemons on 1 machine in my 3 node cluster with 3
replicas, rbd image operations on vm, staling. DD on this device in VM
freezing, and after ceph start on this machine everything goes online.
Is there any problem with my config ?? in this situation ceph should
go from another copies with reads, and writes into another osd in
replica chain, yes ??

Another test iozone on device, and it's stop after daemons stop on 1
machine, and after osd up, iozone go forward, how can i tune this to
work without freeze ??

2012-06-11 21:38:49.583133pg v88173: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
2012-06-11 21:38:50.582257pg v88174: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
.
2012-06-11 21:39:49.991893pg v88197: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
2012-06-11 21:39:50.992755pg v88198: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
2012-06-11 21:39:51.993533pg v88199: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
2012-06-11 21:39:52.994397pg v88200: 200 pgs: 60 active+clean, 1
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)

After boot all osd on stoped machine:

2012-06-11 21:40:37.826619   osd e4162: 72 osds: 53 up, 72 in
2012-06-11 21:40:37.825706 mon.0 10.177.66.4:6790/0 348 : [INF] osd.24
10.177.66.6:6800/21597 boot
2012-06-11 21:40:38.825297pg v88202: 200 pgs: 54 active+clean, 7
stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
2012-06-11 21:40:38.826517   osd e4163: 72 osds: 54 up, 72 in
2012-06-11 21:40:38.825250 mon.0 10.177.66.4:6790/0 349 : [INF] osd.25
10.177.66.6:6803/21712 boot
2012-06-11 21:40:38.825655 mon.0 10.177.66.4:6790/0 350 : [INF] osd.28
10.177.66.6:6812/26210 boot
2012-06-11 21:40:38.825907 mon.0 10.177.66.4:6790/0 351 : [INF] osd.29
10.177.66.6:6815/26327 boot
2012-06-11 21:40:39.826738pg v88203: 200 pgs: 56 active+clean, 4
stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928
GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%)
2012-06-11 21:40:39.830098   osd e4164: 72 osds: 59 up, 72 in
2012-06-11 21:40:39.826570 mon.0 10.177.66.4:6790/0 352 : [INF] osd.26
10.177.66.6:6806/21835 boot
2012-06-11 21:40:39.826961 mon.0 10.177.66.4:6790/0 353 : [INF] osd.27
10.177.66.6:6809/21953 boot
2012-06-11 21:40:39.828147 mon.0 10.177.66.4:6790/0 354 : [INF] osd.30
10.177.66.6:6818/26511 boot
2012-06-11 21:40:39.828418 mon.0 10.177.66.4:6790/0 355 : [INF] osd.31
10.177.66.6:6821/26583 boot
2012-06-11 21:40:39.828935 mon.0 10.177.66.4:6790/0 356 : [INF] osd.33
10.177.66.6:6827/26859 boot
2012-06-11 21:40:39.829274 mon.0 10.177.66.4:6790/0 357 : [INF] osd.34
10.177.66.6:6830/26979 boot
2012-06-11 21:40:40.827935pg v88204: 200 pgs: 56 active+clean, 4
stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928
GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%)
2012-06-11 21:40:40.830059   osd e4165: 72 osds: 62 up, 72 in
2012-06-11 21:40:40.827798 mon.0 10.177.66.4:6790/0 358 : [INF] osd.32
10.177.66.6:6824/26701 boot
2012-06-11 21:40:40.829043 mon.0 10.177.66.4:6790/0 359 : [INF] osd.35
10.177.66.6:6833/27165 boot
2012-06-11 21:40:40.829316 mon.0 10.177.66.4:6790/0 360 : [INF] osd.36
10.177.66.6:6836/27280 boot
2012-06-11 21:40:40.829602 mon.0 10.177.66.4:6790/0 361 : [INF] osd.37
10.177.66.6:6839/27397 boot
2012-06-11 21:40:41.828776pg v88205: 200 pgs: 56 active+clean, 4
stale+active+clean, 3 peering, 137 active+degraded; 783 GB data, 1928
GB used, 18111 GB / 20040 GB avail; 76921/254952 degraded (30.171%)
2012-06-11 21:40:41.831823   osd e4166: 72 osds: 68 up, 72 in
2012-06-11 21:40:41.828713 mon.0 10.177.66.4:6790/0 362 : [INF] osd.38
10.177.66.6:6842/27513 boot
2012-06-11 21:40:41.82944

Re: RBD stale on VM, and RBD cache enable problem

2012-07-04 Thread Sławomir Skowron
On Tue, Jul 3, 2012 at 7:39 PM, Gregory Farnum  wrote:
> On Mon, Jun 11, 2012 at 12:53 PM, Sławomir Skowron  wrote:
>> I have two questions. My newly created cluster with xfs on all osd,
>> ubuntu precise, kernel 3.2.0-23-generic. Ceph 0.47.2-1precise
>>
>> pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num
>> 64 pgp_num 64 last_change 1228 owner 0 crash_replay_interval 45
>> pool 1 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins
>> pg_num 64 pgp_num 64 last_change 1226 owner 0
>> pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 64
>> pgp_num 64 last_change 1232 owner 0
>> pool 3 '.rgw' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8
>> pgp_num 8 last_change 3878 owner 18446744073709551615
>>
>> 1. After i stop all daemons on 1 machine in my 3 node cluster with 3
>> replicas, rbd image operations on vm, staling. DD on this device in VM
>> freezing, and after ceph start on this machine everything goes online.
>> Is there any problem with my config ?? in this situation ceph should
>> go from another copies with reads, and writes into another osd in
>> replica chain, yes ??
>
> It should switch to a new "primary" OSD as soon as they detect that
> one machine is missing, which by default will be ~25 seconds. How long
> did you wait to see if it would continue?
> If you'd like to reduce this time, you can turn down some combination of
> osd_heartbeat_grace -- default 20 seconds, and controls how long an OSD
> will wait before it decides a peer is down.
> osd_min_down_reporters -- default 1, controls how many OSDs need to
> report an OSD as down before accepting it. This is already as low as
> it should go
> osd_min_down_reports -- default 3, controls how many failure reports
> the monitor needs to receive before accepting an OSD as down. Since
> you only have 3 OSDs, and one is down, leaving this at 3 means you're
> going to wait for osd_heartbeat_grace plus osd_mon_report_interval_min
> (default 5; don't change this) before an OSD is marked down.

Thanks for this options, i will try this on int cluster.

> Given the logging you include I'm a little concerned that you have 1
> PG "stale", indicating that the monitor hasn't gotten a report on that
> PG in a very long time. That means either that one PG is somehow
> broken, or else that the OSD you turned off isn't getting marked down
> and that PG is the only one noticing it.
> Could you re-run this test with monitor debugging turned up, see how
> long it takes for the OSD to get marked down (using "ceph -w"), and
> report back?

There will be some problems with this, because this cluster is newly
re-inited, from ext4, to xfs, and with many other changes. Actual
there are two clusters @ bottom of one application doing data sync to
this clusters.

What is real problem, is that in moment of backfill, when i change
crush config, and rebalancing starting, or machine/group of osd goes
down, radosgw has some problems, with PUT request (writes) in next 9
minutes, cause some 504 (timeout on loadbalancer) in nginx, and down,
some delayed operations in Ceph cluster.

I will try to test this on integration cluster, but in this sprint it
will be difficult :(

> -Greg
>
>> Another test iozone on device, and it's stop after daemons stop on 1
>> machine, and after osd up, iozone go forward, how can i tune this to
>> work without freeze ??
>>
>> 2012-06-11 21:38:49.583133pg v88173: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>> 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
>> 2012-06-11 21:38:50.582257pg v88174: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>> 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
>> .
>> 2012-06-11 21:39:49.991893pg v88197: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>> 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
>> 2012-06-11 21:39:50.992755pg v88198: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>> 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
>> 2012-06-11 21:39:51.993533pg v88199: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>> 18111 GB / 20040 GB avail; 78169/254952 degraded (30.660%)
>> 2012-06-11 21:39:52.994397pg v88200: 200 pgs: 60 active+clean, 1
>> stale+active+clean, 139 active+degraded; 783 GB data, 1928 GB used,
>>

Increase number of PG

2012-07-20 Thread Sławomir Skowron
I know that this feature is disabled, are you planning to enable this
in near future ??

I have many of drives, and my S3 instalation use only few of them in
one time, and i need to improve that.

When i use it as rbd it use all of them.


Regards

Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Increase number of PG

2012-07-22 Thread Sławomir Skowron
Dnia 21 lip 2012 o godz. 20:08 Yehuda Sadeh  napisał(a):

> On Sat, Jul 21, 2012 at 10:13 AM, Gregory Farnum  wrote:
>> On Fri, Jul 20, 2012 at 1:15 PM, Tommi Virtanen > (mailto:t...@inktank.com)> wrote:
>>> On Fri, Jul 20, 2012 at 8:31 AM, Sławomir Skowron >> (mailto:szi...@gmail.com)> wrote:
>>>> I know that this feature is disabled, are you planning to enable this
>>>> in near future ??
>>>>
>>>
>>>
>>> PG splitting/joining is the next major project for the OSD. It won't
>>> be backported to argonaut, but it will be in the next stable release,
>>> and will probably appear in our regular development release in 2-3
>>> months.

Ok, so i am waiting for this feature, but in a meantime i can move my
objects to a new pool with more PG's manualy created, and use it as a
bucket pool in radosgw ?? How can i tell radosgw to use this pool, or
i can't ??
At this moment my pool .rgw.buckets have default 8 PG's, and it is
small amount, too small.

>>>
>>>> I have many of drives, and my S3 instalation use only few of them in
>>>> one time, and i need to improve that.
>>>>
>>>> When i use it as rbd it use all of them.
>>>
>>> Radosgw normally stores most of the data for a single S3-level object
>>> in a single RADOS object, where as RBD stripes disk images across
>>> objects by default in 4MB chunks. If you have only a few S3 objects,
>>> you will see an uneven distribution. It will get more balanced as you
>>> upload more images. Also, if you use multi-part uploads, each part
>>> goes into a separate RADOS object, so that'll spread the load more
>>> evenly.
>>>
>>
>> RGW only does this for small objects — I believe its default chunk size is 
>> also 4MB.
>

Yes i have a lots of small objects (500k) from bajts to 2-3MB in
.rgw.buckets pool. They are not even hit multipart.

> Actually no. While the infrastructure is there, currently a regular
> object upload at the moment is not going to create more than 2 rados
> objects. The head object, which ad is capped at 512k and the tail, which
> will contain the rest. As Tommi specified, multipart upload chunks
> depend on the actual upload.
> There's actually no real reason anymore for not striping, and it's
> easy enough to implement, so it might be something that we're going to
> do soon.
>

This can be useful. But now in my case,  objects are too small, and if
i think right, my only option is to have more PG's to balance new
objects in more drives.

My workload looks like this:

- Max 20% are PUTs, with 99% of objects smaller then 4MB,
- 80% are GETs, and S3 metadata operations.

When workload hit worse scenario (PUT, and then only one GET), then
every GET miss the cache in NGINX, and it's goes from only few drives,
and it's hurts ;)

> Yehuda
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Increase number of PG

2012-07-23 Thread Sławomir Skowron
Ok everything is clear now, Thanks. I will try this in planning service works.

Regards

Slawomir Skowron.

On 23 lip 2012, at 18:00, Tommi Virtanen  wrote:

> On Sun, Jul 22, 2012 at 11:57 PM, Sławomir Skowron  wrote:
>> My workload looks like this:
>>
>> - Max 20% are PUTs, with 99% of objects smaller then 4MB,
>> - 80% are GETs, and S3 metadata operations.
>
> Well, the good news is that that's actually the easy to fix part --
> just increase the number of PGs (which you currently have to do the
> awkward way, as explained earlier in this thread), nothing else
> needed.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RadosGW hanging

2012-08-14 Thread Sławomir Skowron
Ok, more info.

ceph -w was in many hours in same state:

2012-08-14 10:42:08.060339pg v3530828: 240 pgs: 238 active+clean,
2 active+clean+scrubbing; 634 GB data, 1582 GB used, 18458 GB / 20040
GB avail

Two PG in 2 active+clean+scrubbing, and fragment of ceph -w output:

2012-08-14 10:42:02.753954 osd.8 10.177.66.4:6861/8729 215538 : [WRN]
6 slow requests, 6 included below; oldest blocked for > 39941.540514
secs
2012-08-14 10:42:02.753961 osd.8 10.177.66.4:6861/8729 215539 : [WRN]
slow request 39941.540514 seconds old, received at 2012-08-13
23:36:21.213355: osd_op(client.2997480.0:397
20565.1__shadow_images/pulscms/YjY7MDA_/2dc02bf8fda55367396d4508de7a107f.jpg_IKRO1n3TG9Pnp5ffhj2KXMvgM7ssjlH
[write 524288~187647] 6.d82ee747) v4 currently delayed
2012-08-14 10:42:02.753965 osd.8 10.177.66.4:6861/8729 215540 : [WRN]
slow request 39924.756970 seconds old, received at 2012-08-13
23:36:37.996899: osd_op(client.2997480.0:1480
20565.1__shadow_images/pulscms/NjU7MDA_/e24ba2bc400864c02a34d74d246c2ea5.jpg_Q5ZNBIzJjG65RJccOqagYQp7h3_YSIM
[write 524288~146458] 6.f6a8c297) v4 currently delayed
2012-08-14 10:42:02.753970 osd.8 10.177.66.4:6861/8729 215541 : [WRN]
slow request 39793.329296 seconds old, received at 2012-08-13
23:38:49.424573: osd_op(client.2997480.0:4440
20565.1__shadow_images/pulscms/NWY7MDA_/7dc146588b5c08f00bb0afa81a5d194c.jpg_UAHJCxaQwZ02JFQdRskU1EXCsY-M9uK
[write 524288~203649] 6.99216177) v4 currently delayed
2012-08-14 10:42:02.753973 osd.8 10.177.66.4:6861/8729 215542 : [WRN]
slow request 39737.889310 seconds old, received at 2012-08-13
23:39:44.864559: osd_op(client.2997480.0:5323
20565.1__shadow_images/pulscms/NjU7MDA_/e24ba2bc400864c02a34d74d246c2ea5.jpg_B-P79zBKYklPq1aYlTQAMGo9xmZPVeS
[write 524288~146458] 6.4f2c1caf) v4 currently delayed
2012-08-14 10:42:02.753977 osd.8 10.177.66.4:6861/8729 215543 : [WRN]
slow request 39082.054071 seconds old, received at 2012-08-13
23:50:40.699798: osd_op(client.2997480.0:8887
20565.1_files/pulscms/OTU7MDA_/54ad2076d83bee578bb3fa2919013934
[create 0~0,delete,writefull 0~3515,setxattr user.rgw.acl
(109),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (33)]
6.c39afed7) v4 currently delayed

Using ceph --admin-daemon
/var/run/ceph/ceph-client.radosgw.obs-10-177-66-4.asok
objecter_requests, i found, that some request appears many times in
radosgw, as it is in delayed ops in ceph -w:

 11   "pg": "6.25195037",
 24   "pg": "6.cd5a3cd7",
959   "pg": "7.6b5c8bd3",

example:

root@obs-10-177-66-4:~# ceph pg map 6.25195037
osdmap e124908 pg 6.25195037 (6.7) -> up [8,61,35] acting [8,61,35]

After restart this two OSD, delayed operations has gone.

When scrubbing in pg is online, again, then number of waiting objecter
requests in rgw going up, and in this case scrubbing is not going to
be end, for many hours, i have quite big problem.

Is this some known bug ?? or maybe new one ??

On Tue, Aug 14, 2012 at 8:33 AM, Sławomir Skowron
 wrote:
> Cluster version  0.47.2-1precise.
>
> Now i can't say what triggers the problem, but cluster is loosing some
> OSD, and when it's remmaping, and rebooting this osd's, and finaly
> returning to normal (with as lot of delayed operations on delete), But
> when cluster is starting to be loosing some OSD, then radogw goes a
> wild, and every operation that hit radosgw fcgi, returning only http
> code 502 from nginx, for every request.
>
> Process of radosgw is working, and after restart radosgw on each host,
> everything back to normal, after cluster back to normal.
>
> Is anybody see such kind of behavior  ??
>
> --
> -
> Regards
>
> Sławek "sZiBis" Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph 0.48.1 osd die

2012-08-21 Thread Sławomir Skowron
Ubuntu precise, ceph 0.48.1

After crush change, whole cluster reorganize, but one machine get very
hight load, and 4 OSD on this machine die with this in log.

After that i reboot machine, and re-init this OSD (i left one to
diagnose if needed), for full stability. Now everything is ok, but
maybe this will be useful.

--- begin dump of recent events ---
   -36> 2012-08-21 20:18:52.286460 7f4c5785a780  5 asok(0x1e4c000)
register_command perfcounters_dump hook 0x1e3f010
   -35> 2012-08-21 20:18:52.286490 7f4c5785a780  5 asok(0x1e4c000)
register_command 1 hook 0x1e3f010
   -34> 2012-08-21 20:18:52.286493 7f4c5785a780  5 asok(0x1e4c000)
register_command perf dump hook 0x1e3f010
   -33> 2012-08-21 20:18:52.286502 7f4c5785a780  5 asok(0x1e4c000)
register_command perfcounters_schema hook 0x1e3f010
   -32> 2012-08-21 20:18:52.286506 7f4c5785a780  5 asok(0x1e4c000)
register_command 2 hook 0x1e3f010
   -31> 2012-08-21 20:18:52.286508 7f4c5785a780  5 asok(0x1e4c000)
register_command perf schema hook 0x1e3f010
   -30> 2012-08-21 20:18:52.286514 7f4c5785a780  5 asok(0x1e4c000)
register_command config show hook 0x1e3f010
   -29> 2012-08-21 20:18:52.286533 7f4c5785a780  5 asok(0x1e4c000)
register_command config set hook 0x1e3f010
   -28> 2012-08-21 20:18:52.286536 7f4c5785a780  5 asok(0x1e4c000)
register_command log flush hook 0x1e3f010
   -27> 2012-08-21 20:18:52.286538 7f4c5785a780  5 asok(0x1e4c000)
register_command log dump hook 0x1e3f010
   -26> 2012-08-21 20:18:52.286541 7f4c5785a780  5 asok(0x1e4c000)
register_command log reopen hook 0x1e3f010
   -25> 2012-08-21 20:18:52.289243 7f4c5785a780  0 ceph version
0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c),
process ceph-osd, pid 24314
   -24> 2012-08-21 20:18:52.290166 7f4c5785a780  1 finished
global_init_daemonize
   -23> 2012-08-21 20:18:52.293285 7f4c5785a780  5 asok(0x1e4c000)
init /var/run/ceph/ceph-osd.30.asok
   -22> 2012-08-21 20:18:52.293343 7f4c5785a780  5 asok(0x1e4c000)
bind_and_listen /var/run/ceph/ceph-osd.30.asok
   -21> 2012-08-21 20:18:52.293377 7f4c5785a780  5 asok(0x1e4c000)
register_command 0 hook 0x1e3e0a0
   -20> 2012-08-21 20:18:52.293401 7f4c5785a780  5 asok(0x1e4c000)
register_command version hook 0x1e3e0a0
   -19> 2012-08-21 20:18:52.293405 7f4c5785a780  5 asok(0x1e4c000)
register_command git_version hook 0x1e3e0a0
   -18> 2012-08-21 20:18:52.293412 7f4c5785a780  5 asok(0x1e4c000)
register_command help hook 0x1e3f0c0
   -17> 2012-08-21 20:18:52.293504 7f4c53914700  5 asok(0x1e4c000) entry start
   -16> 2012-08-21 20:18:52.296295 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount FIEMAP ioctl is supported and
appears to work
   -15> 2012-08-21 20:18:52.296310 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
   -14> 2012-08-21 20:18:52.296575 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount did NOT detect btrfs
   -13> 2012-08-21 20:18:52.341323 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount syncfs(2) syscall fully supported
(by glibc and kernel)
   -12> 2012-08-21 20:18:52.341448 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount found snaps <>
   -11> 2012-08-21 20:18:52.344103 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount: WRITEAHEAD journal mode explicitly
enabled in conf
   -10> 2012-08-21 20:18:52.520247 7f4c5110f700  1 FileStore::op_tp
worker finish
-9> 2012-08-21 20:18:52.520325 7f4c5010d700  1 FileStore::op_tp
worker finish
-8> 2012-08-21 20:18:52.520399 7f4c4f90c700  1 FileStore::op_tp
worker finish
-7> 2012-08-21 20:18:52.520458 7f4c5090e700  1 FileStore::op_tp
worker finish
-6> 2012-08-21 20:18:52.535297 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount FIEMAP ioctl is supported and
appears to work
-5> 2012-08-21 20:18:52.535316 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
-4> 2012-08-21 20:18:52.535704 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount did NOT detect btrfs
-3> 2012-08-21 20:18:52.538389 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount syncfs(2) syscall fully supported
(by glibc and kernel)
-2> 2012-08-21 20:18:52.538459 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount found snaps <>
-1> 2012-08-21 20:18:52.540296 7f4c5785a780  0
filestore(/vol0/data/osd.30) mount: WRITEAHEAD journal mode explicitly
enabled in conf
 0> 2012-08-21 20:18:52.805875 7f4c5785a780 -1 *** Caught signal
(Aborted) **
 in thread 7f4c5785a780

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: /usr/bin/ceph-osd() [0x6edaba]
 2: (()+0xfcb0) [0x7f4c56cf7cb0]
 3: (gsignal()+0x35) [0x7f4c558d3445]
 4: (abort()+0x17b) [0x7f4c558d6bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4c5622169d]
 6: (()+0xb5846) [0x7f4c5621f846]
 7: (()+0xb5873) [0x7f4c5621f873]
 8: (()+0xb596e) [0x7f4c5621f96e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x7a94f7]
 10: (void decode(std::map,
std::

Re: Ceph remap/recovery stuck

2012-08-24 Thread Sławomir Skowron
I have found workaround.

Change CRUSH to replication to osd in rule for this pool, and after
recovery, remapped data, i just change same rule into rack awarenes,
and whole cluster, recover again, and back to normal.

Is there any way, to start refill, recovery in this situation for this
specyfic OSD ??

On Thu, Aug 23, 2012 at 3:52 PM, Sławomir Skowron  wrote:
> 3 osd after crash rebuilds ok, but rebuild of two more osd (12 and
> 30), i can't make cluster to be active+clean
>
> I do rebuild like in doc:
>
> stop osd,
> remove from crush,
> rm from map,
> recreate a osd, after cluster get stable
>
> But now, all osd are in, and up, and data won't remap, and some of PG,
> have only two osd in chain with replication level 3 for this pool.
>
> 2012-08-23 15:26:46.073685 mon.0 [INF] pgmap v117192: 6472 pgs: 63
> active, 4457 active+clean, 1942 active+remapped, 10 active+degraded;
> 596 GB data, 1650 GB used, 20059 GB / 21710 GB avail; 57815/4705888
> degraded (1.229%)
>
> In attachment output from:
>
> ceph osd dump -o -
>
> I can't find any info in doc for this situation.
>
> HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean; recovery
> 57871/4706179 degraded (1.230%)
> root@s3-10-177-64-6:~# ceph -s
>health HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean;
> recovery 57871/4706179 degraded (1.230%)
>monmap e4: 3 mons at
> {0=10.177.64.4:6789/0,1=10.177.64.6:6789/0,2=10.177.64.8:6789/0},
> election epoch 16, quorum 0,1,2 0,1,2
>osdmap e1300: 78 osds: 78 up, 78 in
> pgmap v117464: 6472 pgs: 63 active, 4457 active+clean, 1942
> active+remapped, 10 active+degraded; 596 GB data, 1651 GB used, 20059
> GB / 21710 GB avail; 57871/4706179 degraded (1.230%)
>mdsmap e1: 0/0/1 up
>
> Please help, i will try to give you any output you need.
>
>
> And one more thing, little bug in 0.48.1:
>
> ceph health blabla command, does same thing, as ceph health details.
> Whatever is after health, means details.
>
> --
> -
> Regards
>
> Sławek "sZiBis" Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideal hardware spec?

2012-08-24 Thread Sławomir Skowron
Dnia 24 sie 2012 o godz. 17:05 Mark Nelson  napisał(a):

> On 08/24/2012 09:17 AM, Stephen Perkins wrote:
>> Morning Wido (and all),
>>
 I'd like to see a "best" hardware config as well... however, I'm
 interested in a SAS switching fabric where the nodes do not have any
 storage (except possibly onboard boot drive/USB as listed below).
 Each node would have a SAS HBA that allows it to access a LARGE jbod
 provided by a HA set of SAS Switches
 (http://www.lsi.com/solutions/Pages/SwitchedSAS.aspx). The drives are lun
>> masked for each host.

 The thought here is that you can add compute nodes, storage shelves,
 and disks all independently.  With proper masking, you could provide
>> redundancy
 to cover drive, node, and shelf failures.You could also add disks
 "horizontally" if you have spare slots in a shelf, and you could add
 shelves "vertically" and increase the disk count available to existing
>> nodes.

>>>
>>> What would the benefit be from building such a complex SAS environment?
>>> You'd be spending a lot of money on SAS switch, JBODs and cabling.
>>
>> Density.
>>
>
> Trying to balance between dense solutions with more failure points vs cheap 
> low density solutions is always tough.  Though not the densest solution out 
> there, we are starting to investigate performance on an SC847a chassis with 
> 36 hotswap drives in 4U (along with internal drives for the system).  Our 
> setup doesn't use SAS expanders which is nice bonus, though it does require a 
> lot of controllers.
>
>>> Your SPOF would still be your whole SAS setup.
>>
>> Well... I'm not sure I would consider it a single point of failure...  a
>> pair of cross-connected switches and 3-5 disk shelves.  Shelves can be
>> purchased with fully redundant internals (dual data paths etc to SAS
>> drives).  That is not even that important. If each shelf is just looked at
>> as JBOD, then you can group disks from different shelves into btrfs or
>> hardware RAID groups.  Or... you can look at each disk as its own storage
>> with its own OSD.
>>
>> A SAS switch going offline would have no impact since everything is cross
>> connected.
>>
>> A whole shelf can go offline and it would only appear as a single drive
>> failure in a RAID group (if disks groups are distributed properly).
>>
>> You can then get compute nodes fairly densely packed by purchasing
>> SuperMicro 2uTwin enclosures:
>>   http://www.supermicro.com/products/nfo/2UTwin2.cfm
>>
>> You can get 3 - 4 of those compute enclosure with dual SAS connectors (each
>> enclosure not necessarily fully populated initially). The beauty is that the
>> SAS interconnect is fast.   Much faster than Ethernet.
>>
>> Please bear in mind that I am looking to create a highly available and
>> scalable storage system that will fit in as small an area as possible and
>> draw as little power as possible.  The reasoning is that we co-locate all
>> our equipment at remote data centers.  Each rack (along with its associated
>> power and any needed cross connects) represents a significant ongoing
>> operational expense.  Therefore, for me, density and incremental scalability
>> are important.
>
> There are some pretty interesting solutions on the horizon from various 
> vendors that achieve a pretty decent amount of density.  Should be 
> interesting times ahead. :)

LSI/Netapp have nice 60xNL SAS drives in 4U solution with SAS
backplane, but this is always, a balance between price, and
performance with elasticity. Balance between low/middle price hardware
vs midrange/enterprise solutions.

I think Ceph was created to be cheaper solution. To give as, a chance,
to use storage servers, commodity hardware, without priced SAN
infrastructure behind, and a fast 10Gb Ethernet. That gives more
scalability, and ability, to scale out, not to scale in. Software like
Ceph, do the job, for hardware solutions.

>
>>
>>> And what is the benefit for having Ceph run on top of that? If you have all
>> the disks available to all the nodes, why not run ZFS?
>>> ZFS would give you better performance since what you are building would
>> actually be a local filesystem.
>>
>> There is no high availability here.  Yes... You can try to do old school
>> magic with SAN file systems, complicated clustering, and synchronous
>> replication, but a RAIN approach appeals to me.  That is what I see in Ceph.
>> Don't get me wrong... I love ZFS... but am trying to figure out a scalable
>> HA solution that looks like RAIN. (Am I missing a feature of ZFS)?
>>
>>> For risk spreading you should not interconnect all the nodes.
>>
>> I do understand this.  However, our operational setup will not allow
>> multiple racks at the beginning.  So... given the constraints of 1 rack
>> (with dual power and dual WAN links), I do not see that a pair of cross
>> connected SAS switches is any less reliable than a pair of cross connected
>> ethernet switches...
>>
>> As storage scales and we outgrow the single r

Re: Ceph remap/recovery stuck

2012-08-24 Thread Sławomir Skowron
Nice thanks.

Dnia 24 sie 2012 o godz. 18:35 Sage Weil  napisał(a):

> On Fri, 24 Aug 2012, S?awomir Skowron wrote:
>> I have found workaround.
>>
>> Change CRUSH to replication to osd in rule for this pool, and after
>> recovery, remapped data, i just change same rule into rack awarenes,
>> and whole cluster, recover again, and back to normal.
>>
>> Is there any way, to start refill, recovery in this situation for this
>> specyfic OSD ??
>
> This sounds like it might be a problem with the crush retry behavior.
> In some cases it would fail to generate teh right number of replicas for a
> given input.  We fixed this by adding tunables that disable the old/bad
> behavior, but haven't enabled it by default because support is only now
> showing up in new kernels.  If you aren't using older kernel clients, you
> can enable the new values on your cluster by following the instructions
> at:
>
>http://ceph.com/docs/master/ops/manage/crush/#tunables
>
> FWIW you can test whether this helps by extracting your crushmap from
> the cluster, making whatever changes you are planning to the map, and then
> running
>
> crushtool -i newmap --test
>
> and verify that you get the right number of results for numrep=3 and
> below.  There are a bunch of options you can pass to adjust the range of
> inputs that are tested (e.g.,  --min-x 1 --max-x 10, --num-rep 3,
> etc.).  crushtool is also used to adjust the tunables to 0, so you can
> then verify that it fixes the problem... all before injecting the new map
> into the cluster and actually triggering any data migration.
>
> sage
>
>
>>
>> On Thu, Aug 23, 2012 at 3:52 PM, S?awomir Skowron  wrote:
>>> 3 osd after crash rebuilds ok, but rebuild of two more osd (12 and
>>> 30), i can't make cluster to be active+clean
>>>
>>> I do rebuild like in doc:
>>>
>>> stop osd,
>>> remove from crush,
>>> rm from map,
>>> recreate a osd, after cluster get stable
>>>
>>> But now, all osd are in, and up, and data won't remap, and some of PG,
>>> have only two osd in chain with replication level 3 for this pool.
>>>
>>> 2012-08-23 15:26:46.073685 mon.0 [INF] pgmap v117192: 6472 pgs: 63
>>> active, 4457 active+clean, 1942 active+remapped, 10 active+degraded;
>>> 596 GB data, 1650 GB used, 20059 GB / 21710 GB avail; 57815/4705888
>>> degraded (1.229%)
>>>
>>> In attachment output from:
>>>
>>> ceph osd dump -o -
>>>
>>> I can't find any info in doc for this situation.
>>>
>>> HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean; recovery
>>> 57871/4706179 degraded (1.230%)
>>> root@s3-10-177-64-6:~# ceph -s
>>>   health HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean;
>>> recovery 57871/4706179 degraded (1.230%)
>>>   monmap e4: 3 mons at
>>> {0=10.177.64.4:6789/0,1=10.177.64.6:6789/0,2=10.177.64.8:6789/0},
>>> election epoch 16, quorum 0,1,2 0,1,2
>>>   osdmap e1300: 78 osds: 78 up, 78 in
>>>pgmap v117464: 6472 pgs: 63 active, 4457 active+clean, 1942
>>> active+remapped, 10 active+degraded; 596 GB data, 1651 GB used, 20059
>>> GB / 21710 GB avail; 57871/4706179 degraded (1.230%)
>>>   mdsmap e1: 0/0/1 up
>>>
>>> Please help, i will try to give you any output you need.
>>>
>>>
>>> And one more thing, little bug in 0.48.1:
>>>
>>> ceph health blabla command, does same thing, as ceph health details.
>>> Whatever is after health, means details.
>>>
>>> --
>>> -
>>> Regards
>>>
>>> S?awek "sZiBis" Skowron
>>
>>
>>
>> --
>> -
>> Pozdrawiam
>>
>> S?awek "sZiBis" Skowron
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e: mon memory issue

2012-08-31 Thread Sławomir Skowron
I have this problem too. My mon's in 0.48.1 cluster have 10GB RAM
each, with 78 osd, and 2k request per minute (max) in radosgw.

Now i have run one via valgrind. I will send output when mon grow up.

On Fri, Aug 31, 2012 at 6:03 PM, Sage Weil  wrote:
> On Fri, 31 Aug 2012, Xiaopong Tran wrote:
>
>> Hi,
>>
>> Is there any known memory issue with mon? We have 3 mons running, and
>> on keeps on crashing after 2 or 3 days, and I think it's because mon
>> sucks up all memory.
>>
>> Here's mon after starting for 10 minutes:
>>
>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>> 13700 root  20   0  163m  32m 3712 S   4.3  0.1   0:05.15 ceph-mon
>>  2595 root  20   0 1672m 523m0 S   1.7  1.6 954:33.56 ceph-osd
>>  1941 root  20   0 1292m 220m0 S   0.7  0.7 946:40.69 ceph-osd
>>  2316 root  20   0 1169m 198m0 S   0.7  0.6 420:26.74 ceph-osd
>>  2395 root  20   0 1149m 184m0 S   0.7  0.6 364:29.08 ceph-osd
>>  2487 root  20   0 1354m 373m0 S   0.7  1.2 401:13.97 ceph-osd
>>   235 root  20   0 000 S   0.3  0.0   0:37.68 kworker/4:1
>>  1304 root  20   0 000 S   0.3  0.0   0:00.16 jbd2/sda3-8
>>  1327 root  20   0 000 S   0.3  0.0  13:07.00 xfsaild/sdf1
>>  2011 root  20   0 1240m 177m0 S   0.3  0.6 411:52.91 ceph-osd
>>  2153 root  20   0 1095m 166m0 S   0.3  0.5 370:56.01 ceph-osd
>>  2725 root  20   0 1214m 186m0 S   0.3  0.6 378:16.59 ceph-osd
>>
>> Here's the memory situation of mon on another machine, after mon has
>> been running for 3 hours:
>>
>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>>  1716 root  20   0 1923m 1.6g 4028 S   7.6  5.2   8:45.82 ceph-mon
>>  1923 root  20   0  774m 138m 5052 S   0.7  0.4   1:28.56 ceph-osd
>>  2114 root  20   0  836m 143m 4864 S   0.7  0.4   1:20.14 ceph-osd
>>  2304 root  20   0  863m 176m 4988 S   0.7  0.5   1:13.30 ceph-osd
>>  2578 root  20   0  823m 150m 5056 S   0.7  0.5   1:24.55 ceph-osd
>>  2781 root  20   0  819m 131m 4900 S   0.7  0.4   1:12.14 ceph-osd
>>  2995 root  20   0  863m 179m 5024 S   0.7  0.6   1:41.96 ceph-osd
>>  3474 root  20   0  888m 208m 5608 S   0.7  0.6   7:08.08 ceph-osd
>>  1228 root  20   0 000 S   0.3  0.0   0:07.01 jbd2/sda3-8
>>  1853 root  20   0  859m 176m 4820 S   0.3  0.5   1:17.01 ceph-osd
>>  3373 root  20   0  789m 118m 4916 S   0.3  0.4   1:06.26 ceph-osd
>>
>> And here is the situation on a third node, mon has been running
>> for over a week:
>>
>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>>  1717 root  20   0 68.8g  26g 2044 S  91.5 84.1   9220:40 ceph-mon
>>  1986 root  20   0 1281m 226m0 S   1.7  0.7   1225:28 ceph-osd
>>  2196 root  20   0 1501m 538m0 S   1.0  1.7   1221:54 ceph-osd
>>  2266 root  20   0 1121m 176m0 S   0.7  0.5 399:23.70 ceph-osd
>>  2056 root  20   0 1072m 167m0 S   0.3  0.5 403:49.76 ceph-osd
>>  2126 root  20   0 1412m 458m0 S   0.3  1.4   1215:48 ceph-osd
>>  2337 root  20   0 1128m 188m0 S   0.3  0.6 408:31.88 ceph-osd
>>
>> So, after a while, sooner or later, mon is going to crash, just
>> a matter of time.
>>
>> Does anyone see anything like this? This is kinda scary.
>>
>> OS: Debian Wheezy 3.2.0-3-amd64
>> Ceph: 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
>
> Can you try with 0.48.1argonaut?
>
> If it still happens, can you run ceph-mon through massif?
>
>  valgrind --tool=massif ceph-mon -i whatever
>
> That'll generate a massif.out file (make sure it's there; you may need to
> specify the output file for valgrind) over time.  Once ceph-mon starts
> eating ram, send us a copy of the file and we can hopefully see what is
> leaking.
>
> Thanks!
> sage
>
>
>>
>> With this issue on hand, I'll have to monitor it closely and
>> restart mon once in a while, or I will get a crash (which is
>> still good enough), or a system that does not respond at
>> all because memory is exhausted, and the whole ceph cluster
>> is unreachable. We had this problem in the morning, mon on one
>> node exhausted the memory, none of the ceph command responds
>> anymore, the only thing left to do is to hard reset the node.
>> The whole cluster was basically done at that time.
>>
>> Here is our usage situation:
>>
>> 1) A few applications which read and write data through
>> librados API, we have about 20-30 connections at any one time.
>> So far, our apps have no such memory issue, we have been
>> monitoring them closely.
>>
>> 2) We have a few scripts which pull data from an old storage
>> system, and use the rados command to put it into ceph.
>> Basically, just shell script. Each rados command is run
>> to write one object (one file), and exit. We run about
>> 25 scripts simultaneously, which means at any one time,
>> there are at most 25 connections.
>>
>> I don't think this is a very busy system. But this

Re: e: mon memory issue

2012-09-04 Thread Sławomir Skowron
Valgrind returns nothing.

valgrind --tool=massif --log-file=ceph_mon_valgrind ceph-mon -i 0 > log.txt

==30491== Massif, a heap profiler
==30491== Copyright (C) 2003-2011, and GNU GPL'd, by Nicholas Nethercote
==30491== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==30491== Command: ceph-mon -i 0
==30491== Parent PID: 4013
==30491==
==30491==

cat massif.out.26201
desc: (none)
cmd: ceph-mon -i 0
time_unit: i
#---
snapshot=0
#---
time=0
mem_heap_B=0
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty

What i have done wrong ??

On Fri, Aug 31, 2012 at 8:34 PM, Sławomir Skowron  wrote:
> I have this problem too. My mon's in 0.48.1 cluster have 10GB RAM
> each, with 78 osd, and 2k request per minute (max) in radosgw.
>
> Now i have run one via valgrind. I will send output when mon grow up.
>
> On Fri, Aug 31, 2012 at 6:03 PM, Sage Weil  wrote:
>> On Fri, 31 Aug 2012, Xiaopong Tran wrote:
>>
>>> Hi,
>>>
>>> Is there any known memory issue with mon? We have 3 mons running, and
>>> on keeps on crashing after 2 or 3 days, and I think it's because mon
>>> sucks up all memory.
>>>
>>> Here's mon after starting for 10 minutes:
>>>
>>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>>> 13700 root  20   0  163m  32m 3712 S   4.3  0.1   0:05.15 ceph-mon
>>>  2595 root  20   0 1672m 523m0 S   1.7  1.6 954:33.56 ceph-osd
>>>  1941 root  20   0 1292m 220m0 S   0.7  0.7 946:40.69 ceph-osd
>>>  2316 root  20   0 1169m 198m0 S   0.7  0.6 420:26.74 ceph-osd
>>>  2395 root  20   0 1149m 184m0 S   0.7  0.6 364:29.08 ceph-osd
>>>  2487 root  20   0 1354m 373m0 S   0.7  1.2 401:13.97 ceph-osd
>>>   235 root  20   0 000 S   0.3  0.0   0:37.68 kworker/4:1
>>>  1304 root  20   0 000 S   0.3  0.0   0:00.16 jbd2/sda3-8
>>>  1327 root  20   0 000 S   0.3  0.0  13:07.00 xfsaild/sdf1
>>>  2011 root  20   0 1240m 177m0 S   0.3  0.6 411:52.91 ceph-osd
>>>  2153 root  20   0 1095m 166m0 S   0.3  0.5 370:56.01 ceph-osd
>>>  2725 root  20   0 1214m 186m0 S   0.3  0.6 378:16.59 ceph-osd
>>>
>>> Here's the memory situation of mon on another machine, after mon has
>>> been running for 3 hours:
>>>
>>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>>>  1716 root  20   0 1923m 1.6g 4028 S   7.6  5.2   8:45.82 ceph-mon
>>>  1923 root  20   0  774m 138m 5052 S   0.7  0.4   1:28.56 ceph-osd
>>>  2114 root  20   0  836m 143m 4864 S   0.7  0.4   1:20.14 ceph-osd
>>>  2304 root  20   0  863m 176m 4988 S   0.7  0.5   1:13.30 ceph-osd
>>>  2578 root  20   0  823m 150m 5056 S   0.7  0.5   1:24.55 ceph-osd
>>>  2781 root  20   0  819m 131m 4900 S   0.7  0.4   1:12.14 ceph-osd
>>>  2995 root  20   0  863m 179m 5024 S   0.7  0.6   1:41.96 ceph-osd
>>>  3474 root  20   0  888m 208m 5608 S   0.7  0.6   7:08.08 ceph-osd
>>>  1228 root  20   0 000 S   0.3  0.0   0:07.01 jbd2/sda3-8
>>>  1853 root  20   0  859m 176m 4820 S   0.3  0.5   1:17.01 ceph-osd
>>>  3373 root  20   0  789m 118m 4916 S   0.3  0.4   1:06.26 ceph-osd
>>>
>>> And here is the situation on a third node, mon has been running
>>> for over a week:
>>>
>>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>>>  1717 root  20   0 68.8g  26g 2044 S  91.5 84.1   9220:40 ceph-mon
>>>  1986 root  20   0 1281m 226m0 S   1.7  0.7   1225:28 ceph-osd
>>>  2196 root  20   0 1501m 538m0 S   1.0  1.7   1221:54 ceph-osd
>>>  2266 root  20   0 1121m 176m0 S   0.7  0.5 399:23.70 ceph-osd
>>>  2056 root  20   0 1072m 167m0 S   0.3  0.5 403:49.76 ceph-osd
>>>  2126 root  20   0 1412m 458m0 S   0.3  1.4   1215:48 ceph-osd
>>>  2337 root  20   0 1128m 188m0 S   0.3  0.6 408:31.88 ceph-osd
>>>
>>> So, after a while, sooner or later, mon is going to crash, just
>>> a matter of time.
>>>
>>> Does anyone see anything like this? This is kinda scary.
>>>
>>> OS: Debian Wheezy 3.2.0-3-amd64
>>> Ceph: 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
>>
>> Can you try with 0.48.1argonaut?
>>
>> If it still happens, can you run ceph-mon through massif?
>>
>>  valgrind --tool=massif ceph-mon -i whatever
>>
>> That'll generate a massif.out file (make sure it's there; you may nee

Re: e: mon memory issue

2012-09-05 Thread Sławomir Skowron
Unfortunately here is the problem in my Ubuntu 12.04.1

--9399-- You may be able to write your own handler.
--9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--9399-- Nevertheless we consider this a bug.  Please report
--9399-- it at http://valgrind.org/support/bug_reports.html.
==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction hints
==9399==This could cause spurious value errors to appear.
==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
writing a proper wrapper.
--9399-- WARNING: unhandled syscall: 306
--9399-- You may be able to write your own handler.
--9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--9399-- Nevertheless we consider this a bug.  Please report
--9399-- it at http://valgrind.org/support/bug_reports.html.
==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction hints
==9399==This could cause spurious value errors to appear.
==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
writing a proper wrapper.
^C2012-09-05 09:13:18.660048 a964700 -1 mon.0@0(leader) e4 *** Got
Signal Interrupt ***
==9399==


On Wed, Sep 5, 2012 at 5:32 AM, Sage Weil  wrote:
> On Tue, 4 Sep 2012, S?awomir Skowron wrote:
>> Valgrind returns nothing.
>>
>> valgrind --tool=massif --log-file=ceph_mon_valgrind ceph-mon -i 0 > log.txt
>
> The fork is probably confusing it.  I usually pass -f to ceph-mon (or
> ceph-osd etc) to keep it in the foreground.  Can you give that a go?
> e.g.,
>
> valgrind --tool-massif ceph-mon -i 0 -f &
>
> and watch for the massif.out.$pid file.
>
> Thanks!
> sage
>
>
>>
>> ==30491== Massif, a heap profiler
>> ==30491== Copyright (C) 2003-2011, and GNU GPL'd, by Nicholas Nethercote
>> ==30491== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
>> ==30491== Command: ceph-mon -i 0
>> ==30491== Parent PID: 4013
>> ==30491==
>> ==30491==
>>
>> cat massif.out.26201
>> desc: (none)
>> cmd: ceph-mon -i 0
>> time_unit: i
>> #---
>> snapshot=0
>> #---
>> time=0
>> mem_heap_B=0
>> mem_heap_extra_B=0
>> mem_stacks_B=0
>> heap_tree=empty
>>
>> What i have done wrong ??
>>
>> On Fri, Aug 31, 2012 at 8:34 PM, S?awomir Skowron  wrote:
>> > I have this problem too. My mon's in 0.48.1 cluster have 10GB RAM
>> > each, with 78 osd, and 2k request per minute (max) in radosgw.
>> >
>> > Now i have run one via valgrind. I will send output when mon grow up.
>> >
>> > On Fri, Aug 31, 2012 at 6:03 PM, Sage Weil  wrote:
>> >> On Fri, 31 Aug 2012, Xiaopong Tran wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Is there any known memory issue with mon? We have 3 mons running, and
>> >>> on keeps on crashing after 2 or 3 days, and I think it's because mon
>> >>> sucks up all memory.
>> >>>
>> >>> Here's mon after starting for 10 minutes:
>> >>>
>> >>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>> >>> 13700 root  20   0  163m  32m 3712 S   4.3  0.1   0:05.15 ceph-mon
>> >>>  2595 root  20   0 1672m 523m0 S   1.7  1.6 954:33.56 ceph-osd
>> >>>  1941 root  20   0 1292m 220m0 S   0.7  0.7 946:40.69 ceph-osd
>> >>>  2316 root  20   0 1169m 198m0 S   0.7  0.6 420:26.74 ceph-osd
>> >>>  2395 root  20   0 1149m 184m0 S   0.7  0.6 364:29.08 ceph-osd
>> >>>  2487 root  20   0 1354m 373m0 S   0.7  1.2 401:13.97 ceph-osd
>> >>>   235 root  20   0 000 S   0.3  0.0   0:37.68 kworker/4:1
>> >>>  1304 root  20   0 000 S   0.3  0.0   0:00.16 jbd2/sda3-8
>> >>>  1327 root  20   0 000 S   0.3  0.0  13:07.00 
>> >>> xfsaild/sdf1
>> >>>  2011 root  20   0 1240m 177m0 S   0.3  0.6 411:52.91 ceph-osd
>> >>>  2153 root  20   0 1095m 166m0 S   0.3  0.5 370:56.01 ceph-osd
>> >>>  2725 root  20   0 1214m 186m0 S   0.3  0.6 378:16.59 ceph-osd
>> >>>
>> >>> Here's the memory situation of mon on another machine, after mon has
>> >>> been running for 3 hours:
>> >>>
>> >>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>> >>>  1716 root  20   0 1923m 1.6g 4028 S   7.6  5.2   8:45.82 ceph-mon
>> >>>  1923 root  20   0  774m 138m 5052 S   0.7  0.4   1:28.56 ceph-osd
>> >>>  2114 root  20   0  836m 143m 4864 S   0.7  0.4   1:20.14 ceph-osd
>> >>>  2304 root  20   0  863m 176m 4988 S   0.7  0.5   1:13.30 ceph-osd
>> >>>  2578 root  20   0  823m 150m 5056 S   0.7  0.5   1:24.55 ceph-osd
>> >>>  2781 root  20   0  819m 131m 4900 S   0.7  0.4   1:12.14 ceph-osd
>> >>>  2995 root  20   0  863m 179m 5024 S   0.7  0.6   1:41.96 ceph-osd
>> >>>  3474 root  20   0  888m 208m 5608 S   0.7  0.6   7:08.08 ceph-osd
>> >>>  1228 root  20   0 000 S   0.3  0.0   0:07.01 jbd2/sda3-8
>> >>>  1853 root  20   0  859m 176m 4820 S   0.3  0.5   1:17.01 ceph-osd
>> >>>  3373 root  20   0  789m 118m 4916 S   0.3  0.4   1:06.26 ceph-osd
>> >>>
>> >>> And here is the situation on a third node, mon has been running
>> >>> for over a week:
>> >>>
>> >>>

Re: Inject configuration change into cluster

2012-09-05 Thread Sławomir Skowron
Ok, but in global case, when i use, a chef/puppet or any other, i wish
to change configuration in ceph.conf, and reload daemon to get new
configuration changes from ceph.conf, this feature would be very
helpful in ceph administration.

Inject is ok, for testing, or debuging.

On Tue, Sep 4, 2012 at 5:34 PM, Sage Weil  wrote:
> On Tue, 4 Sep 2012, Wido den Hollander wrote:
>> On 09/04/2012 10:30 AM, Skowron S?awomir wrote:
>> > Ok, thanks.
>> >
>> > Number of workers used for recover, or numer of disk threads.
>> >
>>
>> I think those can be changed while the OSD is running. You could always give
>> it a try.
>
> The thread pool sizes can't currently be adjusted, unfortunately.  It
> wouldn't be too difficult to change that...
>
> sage
>
>>
>> Wido
>>
>> > -Original Message-
>> > From: Wido den Hollander [mailto:w...@widodh.nl]
>> > Sent: Tuesday, September 04, 2012 10:18 AM
>> > To: Skowron S?awomir
>> > Cc: ceph-devel@vger.kernel.org
>> > Subject: Re: Inject configuration change into cluster
>> >
>> > On 09/04/2012 07:04 AM, Skowron S?awomir wrote:
>> > > Is there any way now, to inject new configuration change, without
>> > > restarting daemons ??
>> > >
>> >
>> > Yes, you can use the injectargs command.
>> >
>> > $ ceph osd tell 0 injectargs '--debug-osd 20'
>> >
>> > What do you want to change? Not everything can be changed while the OSD is
>> > running.
>> >
>> > Wido
>> >
>> > > Regards
>> > >
>> > > Slawomir Skowron--
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> > > in the body of a message to majord...@vger.kernel.org More majordomo
>> > > info at  http://vger.kernel.org/majordomo-info.html
>> > >
>> >
>> > N???r??y?b???X???v???^???)?{.n???+?z???]z???{ay???
>> >  ?,j ??f?h?z??? ???w?
>> > ?j:+v?w???j???m 
>> > zZ+???j"??!tml=
>> >
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e: mon memory issue

2012-09-05 Thread Sławomir Skowron
On Wed, Sep 5, 2012 at 5:51 PM, Sage Weil  wrote:
> On Wed, 5 Sep 2012, S?awomir Skowron wrote:
>> Unfortunately here is the problem in my Ubuntu 12.04.1
>>
>> --9399-- You may be able to write your own handler.
>> --9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
>> --9399-- Nevertheless we consider this a bug.  Please report
>> --9399-- it at http://valgrind.org/support/bug_reports.html.
>> ==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction 
>> hints
>> ==9399==This could cause spurious value errors to appear.
>> ==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
>> writing a proper wrapper.
>> --9399-- WARNING: unhandled syscall: 306
>> --9399-- You may be able to write your own handler.
>> --9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
>> --9399-- Nevertheless we consider this a bug.  Please report
>> --9399-- it at http://valgrind.org/support/bug_reports.html.
>> ==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction 
>> hints
>> ==9399==This could cause spurious value errors to appear.
>> ==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
>> writing a proper wrapper.
>
> These are harmless; it just doesn't recognize syncfs(2) or one of the
> ioctls, but everything else works.
>
>> ^C2012-09-05 09:13:18.660048 a964700 -1 mon.0@0(leader) e4 *** Got
>> Signal Interrupt ***
>> ==9399==
>
> Did you hit control-c?
>
> If you leave it running it should gather the memory utilization info we
> need...

Yes it's running now, and  i will see tomorrow how much memory mon consumes.

>
> sage
>
>>
>>
>> On Wed, Sep 5, 2012 at 5:32 AM, Sage Weil  wrote:
>> > On Tue, 4 Sep 2012, S?awomir Skowron wrote:
>> >> Valgrind returns nothing.
>> >>
>> >> valgrind --tool=massif --log-file=ceph_mon_valgrind ceph-mon -i 0 > 
>> >> log.txt
>> >
>> > The fork is probably confusing it.  I usually pass -f to ceph-mon (or
>> > ceph-osd etc) to keep it in the foreground.  Can you give that a go?
>> > e.g.,
>> >
>> > valgrind --tool-massif ceph-mon -i 0 -f &
>> >
>> > and watch for the massif.out.$pid file.
>> >
>> > Thanks!
>> > sage
>> >
>> >
>> >>
>> >> ==30491== Massif, a heap profiler
>> >> ==30491== Copyright (C) 2003-2011, and GNU GPL'd, by Nicholas Nethercote
>> >> ==30491== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright 
>> >> info
>> >> ==30491== Command: ceph-mon -i 0
>> >> ==30491== Parent PID: 4013
>> >> ==30491==
>> >> ==30491==
>> >>
>> >> cat massif.out.26201
>> >> desc: (none)
>> >> cmd: ceph-mon -i 0
>> >> time_unit: i
>> >> #---
>> >> snapshot=0
>> >> #---
>> >> time=0
>> >> mem_heap_B=0
>> >> mem_heap_extra_B=0
>> >> mem_stacks_B=0
>> >> heap_tree=empty
>> >>
>> >> What i have done wrong ??
>> >>
>> >> On Fri, Aug 31, 2012 at 8:34 PM, S?awomir Skowron  
>> >> wrote:
>> >> > I have this problem too. My mon's in 0.48.1 cluster have 10GB RAM
>> >> > each, with 78 osd, and 2k request per minute (max) in radosgw.
>> >> >
>> >> > Now i have run one via valgrind. I will send output when mon grow up.
>> >> >
>> >> > On Fri, Aug 31, 2012 at 6:03 PM, Sage Weil  wrote:
>> >> >> On Fri, 31 Aug 2012, Xiaopong Tran wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> Is there any known memory issue with mon? We have 3 mons running, and
>> >> >>> on keeps on crashing after 2 or 3 days, and I think it's because mon
>> >> >>> sucks up all memory.
>> >> >>>
>> >> >>> Here's mon after starting for 10 minutes:
>> >> >>>
>> >> >>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>> >> >>> 13700 root  20   0  163m  32m 3712 S   4.3  0.1   0:05.15 ceph-mon
>> >> >>>  2595 root  20   0 1672m 523m0 S   1.7  1.6 954:33.56 ceph-osd
>> >> >>>  1941 root  20   0 1292m 220m0 S   0.7  0.7 946:40.69 ceph-osd
>> >> >>>  2316 root  20   0 1169m 198m0 S   0.7  0.6 420:26.74 ceph-osd
>> >> >>>  2395 root  20   0 1149m 184m0 S   0.7  0.6 364:29.08 ceph-osd
>> >> >>>  2487 root  20   0 1354m 373m0 S   0.7  1.2 401:13.97 ceph-osd
>> >> >>>   235 root  20   0 000 S   0.3  0.0   0:37.68 
>> >> >>> kworker/4:1
>> >> >>>  1304 root  20   0 000 S   0.3  0.0   0:00.16 
>> >> >>> jbd2/sda3-8
>> >> >>>  1327 root  20   0 000 S   0.3  0.0  13:07.00 
>> >> >>> xfsaild/sdf1
>> >> >>>  2011 root  20   0 1240m 177m0 S   0.3  0.6 411:52.91 ceph-osd
>> >> >>>  2153 root  20   0 1095m 166m0 S   0.3  0.5 370:56.01 ceph-osd
>> >> >>>  2725 root  20   0 1214m 186m0 S   0.3  0.6 378:16.59 ceph-osd
>> >> >>>
>> >> >>> Here's the memory situation of mon on another machine, after mon has
>> >> >>> been running for 3 hours:
>> >> >>>
>> >> >>>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND
>> >> >>>  1716 root  20   0 1923m 1.6g 4028 S   7.6  5.2   8:45.82 ceph-mon
>> >> >>>  1923 root  20   0  774m 138m 5052 S   0.7  0.4   1:28.56 ceph-osd
>> >> >>>  2114 root  20   0  836m 143m 4864

Re: e: mon memory issue

2012-09-10 Thread Sławomir Skowron
I try with radosgw and it's reporting very nice output from valgrind,
but still nothing from mon.

desc: (none)
cmd: /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c
/etc/ceph/ceph.conf -f
time_unit: i
#---
snapshot=0
#---
time=0
mem_heap_B=0
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty

On Wed, Sep 5, 2012 at 8:44 PM, Sławomir Skowron  wrote:
> On Wed, Sep 5, 2012 at 5:51 PM, Sage Weil  wrote:
>> On Wed, 5 Sep 2012, S?awomir Skowron wrote:
>>> Unfortunately here is the problem in my Ubuntu 12.04.1
>>>
>>> --9399-- You may be able to write your own handler.
>>> --9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
>>> --9399-- Nevertheless we consider this a bug.  Please report
>>> --9399-- it at http://valgrind.org/support/bug_reports.html.
>>> ==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction 
>>> hints
>>> ==9399==This could cause spurious value errors to appear.
>>> ==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
>>> writing a proper wrapper.
>>> --9399-- WARNING: unhandled syscall: 306
>>> --9399-- You may be able to write your own handler.
>>> --9399-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
>>> --9399-- Nevertheless we consider this a bug.  Please report
>>> --9399-- it at http://valgrind.org/support/bug_reports.html.
>>> ==9399== Warning: noted but unhandled ioctl 0x9408 with no size/direction 
>>> hints
>>> ==9399==This could cause spurious value errors to appear.
>>> ==9399==See README_MISSING_SYSCALL_OR_IOCTL for guidance on
>>> writing a proper wrapper.
>>
>> These are harmless; it just doesn't recognize syncfs(2) or one of the
>> ioctls, but everything else works.
>>
>>> ^C2012-09-05 09:13:18.660048 a964700 -1 mon.0@0(leader) e4 *** Got
>>> Signal Interrupt ***
>>> ==9399==
>>
>> Did you hit control-c?
>>
>> If you leave it running it should gather the memory utilization info we
>> need...
>
> Yes it's running now, and  i will see tomorrow how much memory mon consumes.
>
>>
>> sage
>>
>>>
>>>
>>> On Wed, Sep 5, 2012 at 5:32 AM, Sage Weil  wrote:
>>> > On Tue, 4 Sep 2012, S?awomir Skowron wrote:
>>> >> Valgrind returns nothing.
>>> >>
>>> >> valgrind --tool=massif --log-file=ceph_mon_valgrind ceph-mon -i 0 > 
>>> >> log.txt
>>> >
>>> > The fork is probably confusing it.  I usually pass -f to ceph-mon (or
>>> > ceph-osd etc) to keep it in the foreground.  Can you give that a go?
>>> > e.g.,
>>> >
>>> > valgrind --tool-massif ceph-mon -i 0 -f &
>>> >
>>> > and watch for the massif.out.$pid file.
>>> >
>>> > Thanks!
>>> > sage
>>> >
>>> >
>>> >>
>>> >> ==30491== Massif, a heap profiler
>>> >> ==30491== Copyright (C) 2003-2011, and GNU GPL'd, by Nicholas Nethercote
>>> >> ==30491== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright 
>>> >> info
>>> >> ==30491== Command: ceph-mon -i 0
>>> >> ==30491== Parent PID: 4013
>>> >> ==30491==
>>> >> ==30491==
>>> >>
>>> >> cat massif.out.26201
>>> >> desc: (none)
>>> >> cmd: ceph-mon -i 0
>>> >> time_unit: i
>>> >> #---
>>> >> snapshot=0
>>> >> #---
>>> >> time=0
>>> >> mem_heap_B=0
>>> >> mem_heap_extra_B=0
>>> >> mem_stacks_B=0
>>> >> heap_tree=empty
>>> >>
>>> >> What i have done wrong ??
>>> >>
>>> >> On Fri, Aug 31, 2012 at 8:34 PM, S?awomir Skowron  
>>> >> wrote:
>>> >> > I have this problem too. My mon's in 0.48.1 cluster have 10GB RAM
>>> >> > each, with 78 osd, and 2k request per minute (max) in radosgw.
>>> >> >
>>> >> > Now i have run one via valgrind. I will send output when mon grow up.
>>> >> >
>>> >> > On Fri, Aug 31, 2012 at 6:03 PM, Sage Weil  wrote:
>>> >> >> On Fri, 31 Aug 2012, Xiaopong Tran wrote:
>>> >> >>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> Is there any known memory issue with mon? We have 3 mons running, a

Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
Every acl operation ending with 403 in PUT.

~# s3 -u test oc
 Bucket  Status
  
ocAccess Denied

Anyone know why, and how to enable this bucket ?? Now i have problems
with cluster, because there is no way to upload new file

~# s3 -u getacl oc

ERROR: ErrorAccessDenied


-- 
-
Regards

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
On Tue, Sep 11, 2012 at 6:48 PM, Yehuda Sadeh  wrote:
> On Tue, Sep 11, 2012 at 9:45 AM, Yehuda Sadeh  wrote:
>> On Tue, Sep 11, 2012 at 7:28 AM, Sławomir Skowron  wrote:
>>> Every acl operation ending with 403 in PUT.
>>>
>>> ~# s3 -u test oc
>>>  Bucket  Status
>>>   
>>> 
>>> ocAccess Denied
>>>
>>> Anyone know why, and how to enable this bucket ?? Now i have problems
>>> with cluster, because there is no way to upload new file
>>>
>>> ~# s3 -u getacl oc
>>>
>>> ERROR: ErrorAccessDenied
>>>
>>
>> User somehow lost bucket ownership (was it actually the owner?). Do
>> you know how to reproduce the issue? any remaining logs?
>>
>> Try getting bucket info:
>>
>> # radosgw-admin bucket stats --bucket=oc
>>
>> If that doesn't fail and actually shows relevant info, try checking
>> whether the user credentials match the s3 tool credentials.
>>
> Oh, and thinking about it some more.. 'oc' is a too short name for a
> bucket (requires min of 3 chars). How did you create it? The failure
> may be related.

Yes i made a shortcut of name :))

Right now every bucket in pool, are afected

:~#radosgw-admin bucket stats --bucket=lvstest
{ "bucket": "lvstest",
  "pool": ".rgw.buckets",
  "id": "1142048.1",
  "marker": "1142048.1",
  "owner": "0",
  "usage": { "rgw.main": { "size_kb": 1,
  "size_kb_actual": 4,
  "num_objects": 1}}}
:~# radosgw-admin bucket stats --bucket=ocdn
{ "bucket": "ocdn",
  "pool": ".rgw.buckets",
  "id": "4168.2",
  "marker": "4168.2",
  "owner": "0",
  "usage": { "rgw.main": { "size_kb": 513059717,
  "size_kb_actual": 516402364,
  "num_objects": 1606730}}}

Credentials from radosgw-admin user info match that from clients requests.

Every GET, PUT, HEAD using this credentials works fine, but only one
operations does not work (403 from radosgw) - setting acl for object
for a public-read. Setting canned acl with PUT for public-read from
s3lib work good, but get/set acl failed.

list bucket object works good, and list buckets via s3 client.

Now i can't reproduce, but i will dig logs from radosgw, for related
time, when this happend.

Example 403 from radosgw log, before that PUT of object ends with 200:

2012-09-11 19:36:34.346312 7fb25d7fa700  1 == req done
req=0x1435980 http_status=403 ==
2012-09-11 19:37:04.342894 7fb25d7fa700 20 dequeued request req=0x13994c0
2012-09-11 19:37:04.342903 7fb25d7fa700 20 RGWWQ: empty
2012-09-11 19:37:04.342910 7fb25d7fa700  1 == starting new request
req=0x13994c0 =
2012-09-11 19:37:04.342948 7fb25d7fa700  2 req 39665:0.38initializing
2012-09-11 19:37:04.342971 7fb25d7fa700 10
s->object=images/pulscms/ZjM7MDA_/d6d6df3de5afa365d0fb7379fdbd75b8.jpg
s->bucket=ocdn
2012-09-11 19:37:04.342983 7fb25d7fa700 10 meta>> HTTP_X_AMZ_ACL=public-read
2012-09-11 19:37:04.342991 7fb25d7fa700 10 x>> x-amz-acl:public-read
2012-09-11 19:37:04.342996 7fb25d7fa700 20 FCGI_ROLE=RESPONDER
2012-09-11 19:37:04.342997 7fb25d7fa700 20 SCRIPT_FILENAME=/var/www/radosgw.fcgi
2012-09-11 19:37:04.342999 7fb25d7fa700 20 QUERY_STRING=acl
2012-09-11 19:37:04.343001 7fb25d7fa700 20 REQUEST_METHOD=PUT
2012-09-11 19:37:04.343002 7fb25d7fa700 20 CONTENT_TYPE=
2012-09-11 19:37:04.343003 7fb25d7fa700 20 CONTENT_LENGTH=0
2012-09-11 19:37:04.343004 7fb25d7fa700 20 HTTP_CONTENT_LENGTH=0
2012-09-11 19:37:04.343005 7fb25d7fa700 20
SCRIPT_NAME=/ocdn/images/pulscms/ZjM7MDA_/d6d6df3de5afa365d0fb7379fdbd75b8.jpg
2012-09-11 19:37:04.343006 7fb25d7fa700 20
REQUEST_URI=/ocdn/images/pulscms/ZjM7MDA_/d6d6df3de5afa365d0fb7379fdbd75b8.jpg
2012-09-11 19:37:04.343007 7fb25d7fa700 20
DOCUMENT_URI=/ocdn/images/pulscms/ZjM7MDA_/d6d6df3de5afa365d0fb7379fdbd75b8.jpg
2012-09-11 19:37:04.343008 7fb25d7fa700 20 DOCUMENT_ROOT=/var/www
2012-09-11 19:37:04.343009 7fb25d7fa700 20 SERVER_PROTOCOL=HTTP/1.1
2012-09-11 19:37:04.343010 7fb25d7fa700 20 GATEWAY_INTERFACE=CGI/1.1
2012-09-11 19:37:04.343011 7fb25d7fa700 20 SERVER_SOFTWARE=nginx/1.2.0
2012-09-11 19:37:04.343012 7fb25d7fa700 20 REMOTE_ADDR=10.177.62.9
2012-09-11 19:37:04.343013 7fb25d7fa700 20 REMOTE_PORT=56378
2012-09-11 19:37:04.343014 7fb25d7fa700 20 SERVER_ADDR=10.177.0.3
2012-09-11 19:37:04.343015 7fb25d7fa700 20 SERVER_PORT=80
2012-09-11 19:37:04.343016 7fb25d7fa700 20 SERVER_NAME=

Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
e request
2012-09-11 22:37:34.712754 7faec76c6700 10 --> Status: 403
2012-09-11 22:37:34.712766 7faec76c6700 10 --> Content-Length: 78
2012-09-11 22:37:34.712769 7faec76c6700 10 --> Accept-Ranges: bytes
2012-09-11 22:37:34.712772 7faec76c6700 10 --> Content-type: application/xml
2012-09-11 22:37:34.712887 7faec76c6700  2 req 71188:0.004554:s3:PUT
/ocdn/files/pulscms/MjU7MDA_/ecebacddde95224b96f46333912049b1:put_obj:http
status=403
2012-09-11 22:37:34.713093 7faec76c6700  1 == req done
req=0x1368860 http_status=403 ==

On Tue, Sep 11, 2012 at 7:46 PM, Sławomir Skowron  wrote:
> On Tue, Sep 11, 2012 at 6:48 PM, Yehuda Sadeh  wrote:
>> On Tue, Sep 11, 2012 at 9:45 AM, Yehuda Sadeh  wrote:
>>> On Tue, Sep 11, 2012 at 7:28 AM, Sławomir Skowron  wrote:
>>>> Every acl operation ending with 403 in PUT.
>>>>
>>>> ~# s3 -u test oc
>>>>  Bucket  Status
>>>>   
>>>> 
>>>> ocAccess Denied
>>>>
>>>> Anyone know why, and how to enable this bucket ?? Now i have problems
>>>> with cluster, because there is no way to upload new file
>>>>
>>>> ~# s3 -u getacl oc
>>>>
>>>> ERROR: ErrorAccessDenied
>>>>
>>>
>>> User somehow lost bucket ownership (was it actually the owner?). Do
>>> you know how to reproduce the issue? any remaining logs?
>>>
>>> Try getting bucket info:
>>>
>>> # radosgw-admin bucket stats --bucket=oc
>>>
>>> If that doesn't fail and actually shows relevant info, try checking
>>> whether the user credentials match the s3 tool credentials.
>>>
>> Oh, and thinking about it some more.. 'oc' is a too short name for a
>> bucket (requires min of 3 chars). How did you create it? The failure
>> may be related.
>
> Yes i made a shortcut of name :))
>
> Right now every bucket in pool, are afected
>
> :~#radosgw-admin bucket stats --bucket=lvstest
> { "bucket": "lvstest",
>   "pool": ".rgw.buckets",
>   "id": "1142048.1",
>   "marker": "1142048.1",
>   "owner": "0",
>   "usage": { "rgw.main": { "size_kb": 1,
>   "size_kb_actual": 4,
>   "num_objects": 1}}}
> :~# radosgw-admin bucket stats --bucket=ocdn
> { "bucket": "ocdn",
>   "pool": ".rgw.buckets",
>   "id": "4168.2",
>   "marker": "4168.2",
>   "owner": "0",
>   "usage": { "rgw.main": { "size_kb": 513059717,
>   "size_kb_actual": 516402364,
>   "num_objects": 1606730}}}
>
> Credentials from radosgw-admin user info match that from clients requests.
>
> Every GET, PUT, HEAD using this credentials works fine, but only one
> operations does not work (403 from radosgw) - setting acl for object
> for a public-read. Setting canned acl with PUT for public-read from
> s3lib work good, but get/set acl failed.
>
> list bucket object works good, and list buckets via s3 client.
>
> Now i can't reproduce, but i will dig logs from radosgw, for related
> time, when this happend.
>
> Example 403 from radosgw log, before that PUT of object ends with 200:
>
> 2012-09-11 19:36:34.346312 7fb25d7fa700  1 == req done
> req=0x1435980 http_status=403 ==
> 2012-09-11 19:37:04.342894 7fb25d7fa700 20 dequeued request req=0x13994c0
> 2012-09-11 19:37:04.342903 7fb25d7fa700 20 RGWWQ: empty
> 2012-09-11 19:37:04.342910 7fb25d7fa700  1 == starting new request
> req=0x13994c0 =
> 2012-09-11 19:37:04.342948 7fb25d7fa700  2 req 39665:0.38initializing
> 2012-09-11 19:37:04.342971 7fb25d7fa700 10
> s->object=images/pulscms/ZjM7MDA_/d6d6df3de5afa365d0fb7379fdbd75b8.jpg
> s->bucket=ocdn
> 2012-09-11 19:37:04.342983 7fb25d7fa700 10 meta>> HTTP_X_AMZ_ACL=public-read
> 2012-09-11 19:37:04.342991 7fb25d7fa700 10 x>> x-amz-acl:public-read
> 2012-09-11 19:37:04.342996 7fb25d7fa700 20 FCGI_ROLE=RESPONDER
> 2012-09-11 19:37:04.342997 7fb25d7fa700 20 
> SCRIPT_FILENAME=/var/www/radosgw.fcgi
> 2012-09-11 19:37:04.342999 7fb25d7fa700 20 QUERY_STRING=acl
> 2012-09-11 19:37:04.343001 7fb25d7fa700 20 REQUEST_METHOD=PUT
> 2012-09-11 19:37:04.343002 7fb25d7fa700 20 CONTENT_TYPE=
> 2012-09-11 19:37:04.343003 7fb25d7fa700 20 CONTENT_LENGTH=0
> 2012-09-11 19

Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
j=.rgw:ocdn is
not atomic, not appending atomic test
2012-09-11 23:22:48.294892 7f67bd722700 20 rados->read obj-ofs=0
read_ofs=0 read_len=16384
2012-09-11 23:22:48.296079 7f67bd722700 20 rados->read r=0 bl.length=65
2012-09-11 23:22:48.296100 7f67bd722700 20 rgw_get_bucket_info:
bucket=ocdn(@.rgw.buckets[4168.2]) owner 0
2012-09-11 23:22:48.296112 7f67bd722700 20 get_obj_state:
rctx=0x7f6a8800a6f0 obj=ocdn: state=0x7f6a8800b3b8 s->prefetch_data=0
2012-09-11 23:22:48.296129 7f67bd722700 15 Read
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>0ocdnhttp://www.w3.org/2001/XMLSchema-instance";
xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersREADhttp://www.w3.org/2001/XMLSchema-instance";
xsi:type="CanonicalUser">0ocdnFULL_CONTROL
2012-09-11 23:22:48.296142 7f67bd722700  2 req 75477:0.008146:s3:PUT
/ocdn/test3:put_obj:verifying op permissions
2012-09-11 23:22:48.296146 7f67bd722700  5 Searching permissions for
uid=0 mask=2
2012-09-11 23:22:48.296148 7f67bd722700  5 Found permission: 15
2012-09-11 23:22:48.296149 7f67bd722700 10  uid=0 requested perm
(type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2012-09-11 23:22:48.296151 7f67bd722700  2 req 75477:0.008156:s3:PUT
/ocdn/test3:put_obj:verifying op params
2012-09-11 23:22:48.296154 7f67bd722700  2 req 75477:0.008159:s3:PUT
/ocdn/test3:put_obj:executing
2012-09-11 23:22:48.296240 7f67bd722700 10 x>> x-amz-acl:public-read
2012-09-11 23:22:48.296246 7f67bd722700 10 x>> x-amz-date:Tue, 11 Sep
2012 21:22:48 GMT
2012-09-11 23:22:48.296258 7f67bd722700 20 get_obj_state:
rctx=0x7f6a8800a6f0 obj=ocdn:test3 state=0x7f6a8800d758
s->prefetch_data=0
2012-09-11 23:22:48.297443 7f67bd722700 20
prepare_atomic_for_write_impl: state is not atomic.
state=0x7f6a8800d758
2012-09-11 23:22:48.315654 7f67bd722700 10 --> ETag: "a241f32b8cf07f90d36b5199
2012-09-11 23:22:48.315678 7f67bd722700 10 --> Content-Length: 0
2012-09-11 23:22:48.315680 7f67bd722700 10 --> Accept-Ranges: bytes
2012-09-11 23:22:48.315682 7f67bd722700 10 --> Status: 200
2012-09-11 23:22:48.315685 7f67bd722700 10 --> Content-type: application/xml
2012-09-11 23:22:48.315801 7f67bd722700  2 req 75477:0.027806:s3:PUT
/ocdn/test3:put_obj:http status=200
2012-09-11 23:22:48.316010 7f67bd722700  1 == req done
req=0x274a630 http_status=200 ==


On Tue, Sep 11, 2012 at 10:46 PM, Yehuda Sadeh  wrote:
> On Tue, Sep 11, 2012 at 1:41 PM, Sławomir Skowron  wrote:
>> And more logs:
>>
>>
>> 2012-09-11 21:03:38.357304 7faf0bf4f700  1 == req done
>> req=0x141a650 http_status=403 ==
>> 2012-09-11 21:23:54.423185 7faf0bf4f700 20 dequeued request req=0x139a3d0
>> 2012-09-11 21:23:54.423192 7faf0bf4f700 20 RGWWQ: empty
>> 2012-09-11 21:23:54.423198 7faf0bf4f700  1 == starting new request
>> req=0x139a3d0 =
>> 2012-09-11 21:23:54.423237 7faf0bf4f700  2 req 58098:0.39initializing
>> 2012-09-11 21:23:54.423258 7faf0bf4f700 10 s->object= s->bucket=
>> 2012-09-11 21:23:54.423265 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 21:23:54.423267 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 21:23:54.423269 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 21:23:54.423270 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 21:23:54.423272 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09-11 21:23:54.423273 7faf0bf4f700 20 CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423274 7faf0bf4f700 20 HTTP_CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423276 7faf0bf4f700 20 SCRIPT_NAME=/
>> 2012-09-11 21:23:54.423277 7faf0bf4f700 20 REQUEST_URI=/
>> 2012-09-11 21:23:54.423279 7faf0bf4f700 20 DOCUMENT_URI=/
>> 2012-09-11 21:23:54.423280 7faf0bf4f700 20 DOCUMENT_ROOT=/var/www
>> 2012-09-11 21:23:54.423282 7faf0bf4f700 20 SERVER_PROTOCOL=HTTP/1.0
>> 2012-09-11 21:23:54.423283 7faf0bf4f700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2012-09-11 21:23:54.423284 7faf0bf4f700 20 SERVER_SOFTWARE=nginx/1.2.0
>> 2012-09-11 21:23:54.423286 7faf0bf4f700 20 REMOTE_ADDR=10.177.95.19
>> 2012-09-11 21:23:54.423287 7faf0bf4f700 20 REMOTE_PORT=60477
>> 2012-09-11 21:23:54.423289 7faf0bf4f700 20 SERVER_ADDR=10.177.64.4
>> 2012-09-11 21:23:54.423290 7faf0bf4f700 20 SERVER_PORT=80
>> ...skipping...
>> 2012-09-11 22:23:44.530567 7faf0bf4f700 10
>> s->object=images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> s->bucket=ocdn
>> 2012-09-11 22:23:44.530586 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 22:23:44.530588 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 22:23:44.530589 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 22:23:44.530591 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 22:23:44.530592 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09

Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
Ok, but why this happend. There is no new code started before this
problem. Is there any way to recover cluster to normal operation
withoud Access Denied in s3 any acl operation ??

On Tue, Sep 11, 2012 at 11:32 PM, Yehuda Sadeh  wrote:
> On Tue, Sep 11, 2012 at 2:28 PM, Sławomir Skowron  wrote:
>> :~# s3 -u put ocdn/test3 cannedAcl=public-read < /tmp/testdl
>> :~# s3 -u get ocdn/test3 > /tmp/test3
>> :~# HEAD http://127.0.0.1/ocdn/test3
>> 200 OK
>> Connection: close
>> Date: Tue, 11 Sep 2012 21:23:19 GMT
>> Accept-Ranges: bytes
>> ETag: "a241f32b8cf07f90d36b5199629b8829"
>> Server: nginx
>> Content-Length: 6713
>> Last-Modified: Tue, 11 Sep 2012 21:22:48 GMT
>> Client-Date: Tue, 11 Sep 2012 21:23:19 GMT
>> Client-Peer: 127.0.0.1:80
>> Client-Response-Num: 1
>>
>> put with cannedacl forpublic works on this object.
>>
> As it should. I was referring to setacl with canned acl.
>
> Yehuda



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
Ehh my ideas run out for this day.

libs32.0-1

s3 -u create foo

Bucket successfully created.

s3 -u getacl foo | s3 -u setacl ocdn < /tmp/acl

ERROR: ErrorAccessDenied


ERROR: ErrorAccessDenied

On Tue, Sep 11, 2012 at 11:44 PM, Yehuda Sadeh  wrote:
> Maybe your s3 library got updated and now uses a newer s3 dialect?
>
> Basically you need to update the bucket acl, e.g.:
>
> # s3 -u create foo
> # s3 -u getacl foo | s3 -u setacl oldbucket < acl
>
> On Tue, Sep 11, 2012 at 2:38 PM, Sławomir Skowron  wrote:
>> Ok, but why this happend. There is no new code started before this
>> problem. Is there any way to recover cluster to normal operation
>> withoud Access Denied in s3 any acl operation ??
>>
>> On Tue, Sep 11, 2012 at 11:32 PM, Yehuda Sadeh  wrote:
>>> On Tue, Sep 11, 2012 at 2:28 PM, Sławomir Skowron  wrote:
>>>> :~# s3 -u put ocdn/test3 cannedAcl=public-read < /tmp/testdl
>>>> :~# s3 -u get ocdn/test3 > /tmp/test3
>>>> :~# HEAD http://127.0.0.1/ocdn/test3
>>>> 200 OK
>>>> Connection: close
>>>> Date: Tue, 11 Sep 2012 21:23:19 GMT
>>>> Accept-Ranges: bytes
>>>> ETag: "a241f32b8cf07f90d36b5199629b8829"
>>>> Server: nginx
>>>> Content-Length: 6713
>>>> Last-Modified: Tue, 11 Sep 2012 21:22:48 GMT
>>>> Client-Date: Tue, 11 Sep 2012 21:23:19 GMT
>>>> Client-Peer: 127.0.0.1:80
>>>> Client-Response-Num: 1
>>>>
>>>> put with cannedacl forpublic works on this object.
>>>>
>>> As it should. I was referring to setacl with canned acl.
>>>
>>> Yehuda
>>
>>
>>
>> --
>> -
>> Pozdrawiam
>>
>> Sławek "sZiBis" Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Access Dienied for bucket upload - 403 code

2012-09-11 Thread Sławomir Skowron
I grep only for 7faec76c6700

Where are stored acl data for bucket in ceph?? Maybe acl are broken
for anonymous user ??

Ceph supporting global acl for bucket??

Dnia 12 wrz 2012 o godz. 01:27 Yehuda Sadeh  napisał(a):

> On Tue, Sep 11, 2012 at 1:41 PM, Sławomir Skowron  wrote:
>> And more logs:
>>
>>
>> 2012-09-11 21:03:38.357304 7faf0bf4f700  1 == req done
>> req=0x141a650 http_status=403 ==
>> 2012-09-11 21:23:54.423185 7faf0bf4f700 20 dequeued request req=0x139a3d0
>> 2012-09-11 21:23:54.423192 7faf0bf4f700 20 RGWWQ: empty
>> 2012-09-11 21:23:54.423198 7faf0bf4f700  1 == starting new request
>> req=0x139a3d0 =
>> 2012-09-11 21:23:54.423237 7faf0bf4f700  2 req 58098:0.39initializing
>> 2012-09-11 21:23:54.423258 7faf0bf4f700 10 s->object= s->bucket=
>> 2012-09-11 21:23:54.423265 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 21:23:54.423267 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 21:23:54.423269 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 21:23:54.423270 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 21:23:54.423272 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09-11 21:23:54.423273 7faf0bf4f700 20 CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423274 7faf0bf4f700 20 HTTP_CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423276 7faf0bf4f700 20 SCRIPT_NAME=/
>> 2012-09-11 21:23:54.423277 7faf0bf4f700 20 REQUEST_URI=/
>> 2012-09-11 21:23:54.423279 7faf0bf4f700 20 DOCUMENT_URI=/
>> 2012-09-11 21:23:54.423280 7faf0bf4f700 20 DOCUMENT_ROOT=/var/www
>> 2012-09-11 21:23:54.423282 7faf0bf4f700 20 SERVER_PROTOCOL=HTTP/1.0
>> 2012-09-11 21:23:54.423283 7faf0bf4f700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2012-09-11 21:23:54.423284 7faf0bf4f700 20 SERVER_SOFTWARE=nginx/1.2.0
>> 2012-09-11 21:23:54.423286 7faf0bf4f700 20 REMOTE_ADDR=10.177.95.19
>> 2012-09-11 21:23:54.423287 7faf0bf4f700 20 REMOTE_PORT=60477
>> 2012-09-11 21:23:54.423289 7faf0bf4f700 20 SERVER_ADDR=10.177.64.4
>> 2012-09-11 21:23:54.423290 7faf0bf4f700 20 SERVER_PORT=80
>> ...skipping...
>> 2012-09-11 22:23:44.530567 7faf0bf4f700 10
>> s->object=images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> s->bucket=ocdn
>> 2012-09-11 22:23:44.530586 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 22:23:44.530588 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 22:23:44.530589 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 22:23:44.530591 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 22:23:44.530592 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09-11 22:23:44.530593 7faf0bf4f700 20 CONTENT_LENGTH=
>> 2012-09-11 22:23:44.530594 7faf0bf4f700 20 HTTP_CONTENT_LENGTH=
>> 2012-09-11 22:23:44.530595 7faf0bf4f700 20
>> SCRIPT_NAME=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530596 7faf0bf4f700 20
>> REQUEST_URI=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530598 7faf0bf4f700 20
>> DOCUMENT_URI=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530600 7faf0bf4f700 20 DOCUMENT_ROOT=/var/www
>> 2012-09-11 22:23:44.530603 7faf0bf4f700 20 SERVER_PROTOCOL=HTTP/1.1
>> 2012-09-11 22:23:44.530604 7faf0bf4f700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2012-09-11 22:23:44.530605 7faf0bf4f700 20 SERVER_SOFTWARE=nginx/1.2.0
>> 2012-09-11 22:23:44.530606 7faf0bf4f700 20 REMOTE_ADDR=10.167.14.53
>> 2012-09-11 22:23:44.530607 7faf0bf4f700 20 REMOTE_PORT=62145
>> 2012-09-11 22:23:44.530608 7faf0bf4f700 20 SERVER_ADDR=10.177.64.4
>> 2012-09-11 22:23:44.530609 7faf0bf4f700 20 SERVER_PORT=80
>> 2012-09-11 22:23:44.530610 7faf0bf4f700 20 SERVER_NAME=
>> 2012-09-11 22:23:44.530610 7faf0bf4f700 20 REDIRECT_STATUS=200
>> 2012-09-11 22:23:44.530611 7faf0bf4f700 20 RGW_SHOULD_LOG=no
>> 2012-09-11 22:23:44.530612 7faf0bf4f700 20 HTTP_HOST=10.177.64.4
>> 2012-09-11 22:23:44.530613 7faf0bf4f700 20 HTTP_CONNECTION=keep-alive
>> 2012-09-11 22:23:44.530614 7faf0bf4f700 20 HTTP_USER_AGENT=Mozilla/5.0
>> (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like
>> Gecko) Chrome/21.0.1180.89 Safari/537.1
>> 2012-09-11 22:23:44.530615 7faf0bf4f700 20
>> HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>> 2012-09-11 22:23:44.530616 7faf0bf4f700 20
>> HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
>> 2012-09-11 22:23:44.530617 7faf0bf4f700 20 
>> HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8
>> 2012-09-11 22:23:44.530618 7faf0bf4f700 20
>> HTTP_ACCEPT_CHARSET=ISO-8859-1,utf-8;q=0.7,*;q=0.3
>> 201

Re: Access Dienied for bucket upload - 403 code

2012-09-12 Thread Sławomir Skowron
0:51.935424 7f0939ffb700 20 get_obj_state:
rctx=0x7f091400c1f0
obj=ocdn:images/pulscms/YTU7MDYsNWEsM2E_/d4716c54c40099f7df31224486020516.jpg
state=0x7f0914022c48 s->prefetch_data=1
2012-09-12 09:30:51.935426 7f0939ffb700 20 state->obj_tag is empty,
not appending atomic test
2012-09-12 09:30:51.935436 7f0939ffb700 10 --> Content-Length: 2219
2012-09-12 09:30:51.935438 7f0939ffb700 10 --> Accept-Ranges: bytes
2012-09-12 09:30:51.935447 7f0939ffb700 10 --> Last-Modified: Tue, 11 Sep 2012
2012-09-12 09:30:51.935451 7f0939ffb700 10 --> ETag: "cd96f6c7632c9f782967c665
2012-09-12 09:30:51.935456 7f0939ffb700 10 --> Status: 200
2012-09-12 09:30:51.935458 7f0939ffb700 10 --> Content-type:
2012-09-12 09:30:51.935574 7f0939ffb700  2 req 16954:0.005754:s3:GET
/ocdn/images/pulscms/YTU7MDYsNWEsM2E_/d4716c54c40099f7df31224486020516.jpg:get_obj:http
status=200

first cluster with problem:

2012-09-12 09:41:21.438096 7fa257fff700 20 rgw_get_bucket_info:
bucket=ocdn(@.rgw.buckets[4168.2]) owner 0
2012-09-12 09:41:21.438110 7fa257fff700 20 get_obj_state:
rctx=0x7fa11c0137b0 obj=ocdn: state=0x7fa11c00ee18 s->prefetch_data=0
2012-09-12 09:41:21.438126 7fa257fff700 15 Read
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>0ocdnhttp://www.w3.org/2001/XMLSchema-instance";
xsi:type="Group">http://acs.amazonaws.com/groups/global/AllUsersREADhttp://www.w3.org/2001/XMLSchema-instance";
xsi:type="CanonicalUser">0ocdnFULL_CONTROL
2012-09-12 09:41:21.438148 7fa257fff700 20 get_obj_state:
rctx=0x7fa11c0137b0
obj=ocdn:images/pulscms/ZWI7MDMsMWUwLDAsMSwx/2cbff537b4543942d6571124b9cc3910.jpg
state=0x7fa11c0122f8 s->prefetch_data=1
2012-09-12 09:41:21.440954 7fa257fff700 20 get_obj_state: s->obj_tag
was set empty
2012-09-12 09:41:21.440972 7fa257fff700 15 Read
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>0ocdnhttp://www.w3.org/2001/XMLSchema-instance";
xsi:type="CanonicalUser">0ocdnFULL_CONTROL
2012-09-12 09:41:21.440984 7fa257fff700  2 req 7430:0.006465:s3:GET
/ocdn/images/pulscms/ZWI7MDMsMWUwLDAsMSwx/2cbff537b4543942d6571124b9cc3910.jpg:get_obj:verifying
op permissions
2012-09-12 09:41:21.440991 7fa257fff700  5 Searching permissions for
uid=anonymous mask=1
2012-09-12 09:41:21.440994 7fa257fff700  5 Permissions for user not found
2012-09-12 09:41:21.440995 7fa257fff700  5 Searching permissions for
group=1 mask=1
2012-09-12 09:41:21.440996 7fa257fff700  5 Permissions for group not found
2012-09-12 09:41:21.440997 7fa257fff700  5 Getting permissions
id=anonymous owner=0 perm=0
2012-09-12 09:41:21.440999 7fa257fff700 10  uid=anonymous requested
perm (type)=1, policy perm=0, user_perm_mask=15, acl perm=0
2012-09-12 09:41:21.441001 7fa257fff700  5 Searching permissions for
uid=anonymous mask=16
2012-09-12 09:41:21.441003 7fa257fff700  5 Permissions for user not found
2012-09-12 09:41:21.441004 7fa257fff700  5 Searching permissions for
group=1 mask=16
2012-09-12 09:41:21.441005 7fa257fff700  5 Found permission: 1
2012-09-12 09:41:21.441006 7fa257fff700  5 Getting permissions
id=anonymous owner=0 perm=0
2012-09-12 09:41:21.441007 7fa257fff700 10  uid=anonymous requested
perm (type)=16, policy perm=0, user_perm_mask=16, acl perm=0
2012-09-12 09:41:21.441016 7fa257fff700 10 --> Status: 403
2012-09-12 09:41:21.441027 7fa257fff700 10 --> Content-Length: 78
2012-09-12 09:41:21.441029 7fa257fff700 10 --> Accept-Ranges: bytes
2012-09-12 09:41:21.441032 7fa257fff700 10 --> Content-type: application/xml
2012-09-12 09:41:21.441138 7fa257fff700  2 req 7430:0.006620:s3:GET
/ocdn/images/pulscms/ZWI7MDMsMWUwLDAsMSwx/2cbff537b4543942d6571124b9cc3910.jpg:get_obj:http
status=403


On Wed, Sep 12, 2012 at 7:07 AM, Sławomir Skowron  wrote:
> I grep only for 7faec76c6700
>
> Where are stored acl data for bucket in ceph?? Maybe acl are broken
> for anonymous user ??
>
> Ceph supporting global acl for bucket??
>
> Dnia 12 wrz 2012 o godz. 01:27 Yehuda Sadeh  napisał(a):
>
>> On Tue, Sep 11, 2012 at 1:41 PM, Sławomir Skowron  wrote:
>>> And more logs:
>>>
>>>
>>> 2012-09-11 21:03:38.357304 7faf0bf4f700  1 == req done
>>> req=0x141a650 http_status=403 ==
>>> 2012-09-11 21:23:54.423185 7faf0bf4f700 20 dequeued request req=0x139a3d0
>>> 2012-09-11 21:23:54.423192 7faf0bf4f700 20 RGWWQ: empty
>>> 2012-09-11 21:23:54.423198 7faf0bf4f700  1 == starting new request
>>> req=0x139a3d0 =
>>> 2012-09-11 21:23:54.423237 7faf0bf4f700  2 req 
>>> 58098:0.39initializing
>>> 2012-09-11 21:23:54.423258 7faf0bf4f700 10 s->object= s->bucket=
>>> 2012-09-11 21:23:54.423265 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>>> 2012-09-11 21:23:54.423267 7faf0bf4f700 20 
>>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>>> 2012-09-11 21:23:54.423269 7faf0bf4f700 20 QUERY_ST

[Solved] Re: Access Dienied for bucket upload - 403 code

2012-09-12 Thread Sławomir Skowron
Problem solved. Everything because Nginx, and a request_uri in fcgi
params to radosgw.

Now request_uri is ok, and problem disappear.

Big thanks for help Yehuda.

Regards

Dnia 12 wrz 2012 o godz. 01:27 Yehuda Sadeh  napisał(a):

> On Tue, Sep 11, 2012 at 1:41 PM, Sławomir Skowron  wrote:
>> And more logs:
>>
>>
>> 2012-09-11 21:03:38.357304 7faf0bf4f700  1 == req done
>> req=0x141a650 http_status=403 ==
>> 2012-09-11 21:23:54.423185 7faf0bf4f700 20 dequeued request req=0x139a3d0
>> 2012-09-11 21:23:54.423192 7faf0bf4f700 20 RGWWQ: empty
>> 2012-09-11 21:23:54.423198 7faf0bf4f700  1 == starting new request
>> req=0x139a3d0 =
>> 2012-09-11 21:23:54.423237 7faf0bf4f700  2 req 58098:0.39initializing
>> 2012-09-11 21:23:54.423258 7faf0bf4f700 10 s->object= s->bucket=
>> 2012-09-11 21:23:54.423265 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 21:23:54.423267 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 21:23:54.423269 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 21:23:54.423270 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 21:23:54.423272 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09-11 21:23:54.423273 7faf0bf4f700 20 CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423274 7faf0bf4f700 20 HTTP_CONTENT_LENGTH=
>> 2012-09-11 21:23:54.423276 7faf0bf4f700 20 SCRIPT_NAME=/
>> 2012-09-11 21:23:54.423277 7faf0bf4f700 20 REQUEST_URI=/
>> 2012-09-11 21:23:54.423279 7faf0bf4f700 20 DOCUMENT_URI=/
>> 2012-09-11 21:23:54.423280 7faf0bf4f700 20 DOCUMENT_ROOT=/var/www
>> 2012-09-11 21:23:54.423282 7faf0bf4f700 20 SERVER_PROTOCOL=HTTP/1.0
>> 2012-09-11 21:23:54.423283 7faf0bf4f700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2012-09-11 21:23:54.423284 7faf0bf4f700 20 SERVER_SOFTWARE=nginx/1.2.0
>> 2012-09-11 21:23:54.423286 7faf0bf4f700 20 REMOTE_ADDR=10.177.95.19
>> 2012-09-11 21:23:54.423287 7faf0bf4f700 20 REMOTE_PORT=60477
>> 2012-09-11 21:23:54.423289 7faf0bf4f700 20 SERVER_ADDR=10.177.64.4
>> 2012-09-11 21:23:54.423290 7faf0bf4f700 20 SERVER_PORT=80
>> ...skipping...
>> 2012-09-11 22:23:44.530567 7faf0bf4f700 10
>> s->object=images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> s->bucket=ocdn
>> 2012-09-11 22:23:44.530586 7faf0bf4f700 20 FCGI_ROLE=RESPONDER
>> 2012-09-11 22:23:44.530588 7faf0bf4f700 20 
>> SCRIPT_FILENAME=/var/www/radosgw.fcgi
>> 2012-09-11 22:23:44.530589 7faf0bf4f700 20 QUERY_STRING=
>> 2012-09-11 22:23:44.530591 7faf0bf4f700 20 REQUEST_METHOD=GET
>> 2012-09-11 22:23:44.530592 7faf0bf4f700 20 CONTENT_TYPE=
>> 2012-09-11 22:23:44.530593 7faf0bf4f700 20 CONTENT_LENGTH=
>> 2012-09-11 22:23:44.530594 7faf0bf4f700 20 HTTP_CONTENT_LENGTH=
>> 2012-09-11 22:23:44.530595 7faf0bf4f700 20
>> SCRIPT_NAME=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530596 7faf0bf4f700 20
>> REQUEST_URI=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530598 7faf0bf4f700 20
>> DOCUMENT_URI=/ocdn/images/pulscms/NjQ7MDMsMWUwLDAsMCwx/0a9915212e85062de6134566905cf252.jpg
>> 2012-09-11 22:23:44.530600 7faf0bf4f700 20 DOCUMENT_ROOT=/var/www
>> 2012-09-11 22:23:44.530603 7faf0bf4f700 20 SERVER_PROTOCOL=HTTP/1.1
>> 2012-09-11 22:23:44.530604 7faf0bf4f700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2012-09-11 22:23:44.530605 7faf0bf4f700 20 SERVER_SOFTWARE=nginx/1.2.0
>> 2012-09-11 22:23:44.530606 7faf0bf4f700 20 REMOTE_ADDR=10.167.14.53
>> 2012-09-11 22:23:44.530607 7faf0bf4f700 20 REMOTE_PORT=62145
>> 2012-09-11 22:23:44.530608 7faf0bf4f700 20 SERVER_ADDR=10.177.64.4
>> 2012-09-11 22:23:44.530609 7faf0bf4f700 20 SERVER_PORT=80
>> 2012-09-11 22:23:44.530610 7faf0bf4f700 20 SERVER_NAME=
>> 2012-09-11 22:23:44.530610 7faf0bf4f700 20 REDIRECT_STATUS=200
>> 2012-09-11 22:23:44.530611 7faf0bf4f700 20 RGW_SHOULD_LOG=no
>> 2012-09-11 22:23:44.530612 7faf0bf4f700 20 HTTP_HOST=10.177.64.4
>> 2012-09-11 22:23:44.530613 7faf0bf4f700 20 HTTP_CONNECTION=keep-alive
>> 2012-09-11 22:23:44.530614 7faf0bf4f700 20 HTTP_USER_AGENT=Mozilla/5.0
>> (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like
>> Gecko) Chrome/21.0.1180.89 Safari/537.1
>> 2012-09-11 22:23:44.530615 7faf0bf4f700 20
>> HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>> 2012-09-11 22:23:44.530616 7faf0bf4f700 20
>> HTTP_ACCEPT_ENCODING=gzip,deflate,sdch
>> 2012-09-11 22:23:44.530617 7faf0bf4f700 20 
>> HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.8
>> 2012-09-11 22:23:44.530618 7faf0bf4f700 20
>> HTTP_ACCEPT_CHARSET=ISO-8859-1,utf-8;q=0.7,*;q=0.3
>> 201

Re: Inject configuration change into cluster

2012-09-16 Thread Sławomir Skowron
Ok, i will try, but i have all day meeting today, and tomorrow.

One more question, is there any way to check configuration not from
ceph.conf, but from running daemon in cluster ??

On Fri, Sep 14, 2012 at 9:12 PM, Sage Weil  wrote:
> Hi,
>
> > Dnia 5 wrz 2012 o godz. 17:53 "Sage Weil"  napisa?(a):
>>
>> > On Wed, 5 Sep 2012, S?awomir Skowron wrote:
>> >> Ok, but in global case, when i use, a chef/puppet or any other, i wish
>> >> to change configuration in ceph.conf, and reload daemon to get new
>> >> configuration changes from ceph.conf, this feature would be very
>> >> helpful in ceph administration.
>> >>
>> >> Inject is ok, for testing, or debuging.
>> >
>> > Opened http://tracker.newdream.net/issues/3086.  There is already an issue
>> > open for reloading the config file,
>> > http://tracker.newdream.net/issues/2459.
>> >
>>
>> Thanks. This is what i mean.
>> Is this feature have a chance, to be implemented in next months ??
>
> I pushed a branch, wip-tp, that implements this.  Want to give it a spin?
> Just inject the config option change (via the admin socket or injectargs)
> and it'll resize the thread pool.
>
> sage
>
>>
>> > sage
>> >
>> >>
>> >> On Tue, Sep 4, 2012 at 5:34 PM, Sage Weil  wrote:
>> >>> On Tue, 4 Sep 2012, Wido den Hollander wrote:
>>  On 09/04/2012 10:30 AM, Skowron S?awomir wrote:
>> > Ok, thanks.
>> >
>> > Number of workers used for recover, or numer of disk threads.
>> >
>> 
>>  I think those can be changed while the OSD is running. You could always 
>>  give
>>  it a try.
>> >>>
>> >>> The thread pool sizes can't currently be adjusted, unfortunately.  It
>> >>> wouldn't be too difficult to change that...
>> >>>
>> >>> sage
>> >>>
>> 
>>  Wido
>> 
>> > -Original Message-
>> > From: Wido den Hollander [mailto:w...@widodh.nl]
>> > Sent: Tuesday, September 04, 2012 10:18 AM
>> > To: Skowron S?awomir
>> > Cc: ceph-devel@vger.kernel.org
>> > Subject: Re: Inject configuration change into cluster
>> >
>> > On 09/04/2012 07:04 AM, Skowron S?awomir wrote:
>> >> Is there any way now, to inject new configuration change, without
>> >> restarting daemons ??
>> >>
>> >
>> > Yes, you can use the injectargs command.
>> >
>> > $ ceph osd tell 0 injectargs '--debug-osd 20'
>> >
>> > What do you want to change? Not everything can be changed while the 
>> > OSD is
>> > running.
>> >
>> > Wido
>> >
>> >> Regards
>> >>
>> >> Slawomir Skowron--
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >> in the body of a message to majord...@vger.kernel.org More majordomo
>> >> info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>> > N???r??y?b???X???v???^???)?{.n???+?z???]z???{ay???
>> >  ?,j ??f?h?z??? ???w?
>> > ?j:+v?w???j???m 
>> > zZ+???j"??!tml=
>> >
>> 
>>  --
>>  To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>  the body of a message to majord...@vger.kernel.org
>>  More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>> the body of a message to majord...@vger.kernel.org
>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> >>
>> >> --
>> >> -
>> >> Pozdrawiam
>> >>
>> >> S?awek "sZiBis" Skowron
>> >>
>> >>
>>



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


unexpected problem with radosgw fcgi

2012-11-07 Thread Sławomir Skowron
I have realize that requests from fastcgi in nginx from radosgw returning:

HTTP/1.1 200, not a HTTP/1.1 200 OK

Any other cgi that i run, for example php via fastcgi return this like
RFC says, with OK.

Is someone experience this problem ??

I see in code:

./src/rgw/rgw_rest.cc line 36

const static struct rgw_html_errors RGW_HTML_ERRORS[] = {
{ 0, 200, "" },


What if i change this into:

{ 0, 200, "OK" },

--
-
Regards

Sławek Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexpected problem with radosgw fcgi

2012-11-08 Thread Sławomir Skowron
Ok, i will digg in nginx, thanks.

Dnia 8 lis 2012 o godz. 22:48 Yehuda Sadeh  napisał(a):

> On Wed, Nov 7, 2012 at 6:16 AM, Sławomir Skowron  wrote:
>> I have realize that requests from fastcgi in nginx from radosgw returning:
>>
>> HTTP/1.1 200, not a HTTP/1.1 200 OK
>>
>> Any other cgi that i run, for example php via fastcgi return this like
>> RFC says, with OK.
>>
>> Is someone experience this problem ??
>
> I have seen a similar issue in the past with nginx. It doesn't happen
> with apache. My guess is that it's either something with the way nginx
> is configured, or some difference in the fastcgi module
> implementation.
>
>>
>> I see in code:
>>
>> ./src/rgw/rgw_rest.cc line 36
>>
>> const static struct rgw_html_errors RGW_HTML_ERRORS[] = {
>>{ 0, 200, "" },
>> 
>>
>> What if i change this into:
>>
>> { 0, 200, "OK" },
>
> The third field there specifies the error code embedded in the
> returned XML with S3, so it wouldn't fix anything.
>
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Many dns domain names in radosgw

2012-11-17 Thread Sławomir Skowron
Welcome,

I have a question. Is there, any way to support multiple domains names
in one radosgw on virtual host type connection in S3 ??

Regards

SS
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Two questions

2011-07-26 Thread Sławomir Skowron
Hello. I have some questions.

1. Is there any chance to change default 4MB object size to for
example 1MB or less ??

2. I have create cluster of two mons, and 32 osd (1TB each) on two
machines. At this radosgw with apache2 for test. When i putting data
from s3 client to rados, everything is ok, but when one of osd is
going down all s3 clients going to be freeze for some seconds, before
ceph sign this osd as down, and then everything is starting working
again wit degraded osd. How can i tune this marking as a down for ms
or 0 :) and not for seconds freeze ??

My setup is debian 6.0, kernel 2.6.37.6  x86_64 and ceph version
0.31-10 from stable repo.

Best regards

Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-27 Thread Sławomir Skowron
Thanks.

2011/7/27 Wido den Hollander :
> Hi,
>
> On Wed, 2011-07-27 at 07:58 +0200, Sławomir Skowron wrote:
>> Hello. I have some questions.
>>
>> 1. Is there any chance to change default 4MB object size to for
>> example 1MB or less ??
>
> If you are using the filesystem, you can change the stripe-size per
> directory with the "cephfs" tool.

Unfortunately i only use rados layer in this case, and i even don't
have any mds :) is there any chance to change this in rados layer ??

>>
>> 2. I have create cluster of two mons, and 32 osd (1TB each) on two
>> machines. At this radosgw with apache2 for test. When i putting data
>> from s3 client to rados, everything is ok, but when one of osd is
>> going down all s3 clients going to be freeze for some seconds, before
>> ceph sign this osd as down, and then everything is starting working
>> again wit degraded osd. How can i tune this marking as a down for ms
>> or 0 :) and not for seconds freeze ??
>
> Normally a OSD should be down within about 10 seconds and shouldn't
> affect all I/O operations.
>
> By default it takes 300 seconds before a OSD gets marked as "out", this
> is handled by "mon osd down out interval"
>
> If you want to change the "down" reporting behaviour you could do this
> with "osd min down reporters" and "osd min down reports". I'm not sure
> you really want to do this, as the default values should be ok.

Ok. I will check this.

> Wido
>
>>
>> My setup is debian 6.0, kernel 2.6.37.6  x86_64 and ceph version
>> 0.31-10 from stable repo.
>>
>> Best regards
>>
>> Slawomir Skowron
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-27 Thread Sławomir Skowron
Ok, I will show example:

rados df
pool name KB  objects   clones degraded
  unfound   rdrd KB   wrwr KB
.log  558212500
   000  2844888  2844888
.pool  1100
   00088
.rgw   0600
   00010
.users 1100
   00011
.users.email   1100
   00011
.users.uid 2200
   01022
data   0000
   00000
metadata   0000
   00000
rbd0000
   00000
sstest  32244226  28410550   653353
   000 17066724 32370391
  total used   324792996  2841071
  total avail31083452176
  total space33043244460

It means I have almost 3mln of objects in sstest.

pg_pool 7 'sstest' pg_pool(rep pg_size 3 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 lpg_num 0 lpgp_num 0 last_change 21 owner
0)

3 copies in this pool.

sstest used 32.244.226 KB + log 558.212 KB = 32.802.438 KB

Total used is 324.792.996 KB and it's almost 10x more.

2011-07-27 12:57:35.541556pg v54158: 6986 pgs: 8 active, 6978
active+clean; 32104 MB data, 310 GB used, 29642 GB / 31512 GB avail;

I'am putting files beetwen 4-50KB on RADOS via s3 clilent, and radosgw.


Can you explain that to me on this example from real life ??


2011/7/27 Wido den Hollander :
> Hi,
>
> On Wed, 2011-07-27 at 12:19 +0200, Sławomir Skowron wrote:
>> Thanks.
>>
>> 2011/7/27 Wido den Hollander :
>> > Hi,
>> >
>> > On Wed, 2011-07-27 at 07:58 +0200, Sławomir Skowron wrote:
>> >> Hello. I have some questions.
>> >>
>> >> 1. Is there any chance to change default 4MB object size to for
>> >> example 1MB or less ??
>> >
>> > If you are using the filesystem, you can change the stripe-size per
>> > directory with the "cephfs" tool.
>>
>> Unfortunately i only use rados layer in this case, and i even don't
>> have any mds :) is there any chance to change this in rados layer ??
>
> The RADOS gateway does NOT stripe over multiple objects. If you upload a
> 1GB object through the RADOS gateway it will be a 1GB object.
>
> Only the RBD (Rados Block Device) and the Ceph filesystem do striping
> over RADOS objects.
>
> RADOS itself doesn't stripe and the RADOS gateway is a almost 1:1
> mapping of pools and objects.
>
> Wido
>
>>
>> >>
>> >> 2. I have create cluster of two mons, and 32 osd (1TB each) on two
>> >> machines. At this radosgw with apache2 for test. When i putting data
>> >> from s3 client to rados, everything is ok, but when one of osd is
>> >> going down all s3 clients going to be freeze for some seconds, before
>> >> ceph sign this osd as down, and then everything is starting working
>> >> again wit degraded osd. How can i tune this marking as a down for ms
>> >> or 0 :) and not for seconds freeze ??
>> >
>> > Normally a OSD should be down within about 10 seconds and shouldn't
>> > affect all I/O operations.
>> >
>> > By default it takes 300 seconds before a OSD gets marked as "out", this
>> > is handled by "mon osd down out interval"
>> >
>> > If you want to change the "down" reporting behaviour you could do this
>> > with "osd min down reporters" and "osd min down reports". I'm not sure
>> > you really want to do this, as the default values should be ok.
>>
>> Ok. I will check this.
>>
>> > Wido
>> >
>> >>
>> >> My setup is debian 6.0, kernel 2.6.37.6  x86_64 and ceph version
>> >> 0.31-10 from stable repo.
>> >>
>> >> Best regards
>> >>
>> >> Slawomir Skowron
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majord...@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>>
>
>
>



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-28 Thread Sławomir Skowron
Dnia 27 lip 2011 o godz. 18:15 Gregory Farnum
 napisał(a):

> 2011/7/27 Sławomir Skowron :
>> Ok, I will show example:
>>
>> rados df
>> pool name KB  objects   clones degraded
>>  unfound   rdrd KB   wrwr KB
>> .log  558212500
>>   000  2844888  2844888
>> .pool  1100
>>   00088
>> .rgw   0600
>>   00010
>> .users 1100
>>   00011
>> .users.email   1100
>>   00011
>> .users.uid 2200
>>   01022
>> data   0000
>>   00000
>> metadata   0000
>>   00000
>> rbd0000
>>   00000
>> sstest  32244226  28410550   653353
>>   000 17066724 32370391
>>  total used   324792996  2841071
>>  total avail31083452176
>>  total space33043244460
>>
>> It means I have almost 3mln of objects in sstest.
>>
>> pg_pool 7 'sstest' pg_pool(rep pg_size 3 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 lpg_num 0 lpgp_num 0 last_change 21 owner
>> 0)
>>
>> 3 copies in this pool.
>>
>> sstest used 32.244.226 KB + log 558.212 KB = 32.802.438 KB
>>
>> Total used is 324.792.996 KB and it's almost 10x more.
>>
>> 2011-07-27 12:57:35.541556pg v54158: 6986 pgs: 8 active, 6978
>> active+clean; 32104 MB data, 310 GB used, 29642 GB / 31512 GB avail;
>>
>> I'am putting files beetwen 4-50KB on RADOS via s3 clilent, and radosgw.
>>
>>
>> Can you explain that to me on this example from real life ??
>
> Hmm, what underlying filesystem are you using? Do you have any logging
> enabled, and what disk is it logging to? Are all your OSDs running
> under the same OS, or are they in virtual machines?

I use ext4. I have loging enabled in osd and mon for most everything in 20.
Every osd is running on same version of debian 6 booted from network.
In my testing case i use two machines and they are using the same
configuration and system.
They are not a VM's.

> If I remember correctly, that "total used" count is generated by
> looking at df or something for the drives in question -- if there's
> other data on the same drive as the OSD, it'll get (admittedly
> incorrectly) counted as part of the "total used" by RADOS even if
> RADOS can't touch it.
> -Greg

Every machine setup is same and looks like this:

2 x 300GB SAS -> hardware RAID1 -> root filesystem and ceph logs in
/var/log/(osd,mon)
12 x 2TB SATA -> hardware RAID5 + 1 Spare -> 16 x 1TB (raid usable
space is bigger) ceph osd mounted as ext4 in /data/osd.(osd id)

For test only i run journal in the same lun as the osd, because two
SAS drives is not enough, but in real scenario system will run on one
SAS drive for OS, journals on one SSD, and rest on SATA drives, or
system and journals on RAID 1 SSD drives.

If i count journals on two machines it 32 x 512MB, and its only 16GB
more in rados df in calculation. Where is more ??

With regards
Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-28 Thread Sławomir Skowron
Dnia 27 lip 2011 o godz. 17:52 Sage Weil  napisał(a):

> On Wed, 27 Jul 2011, S?awomir Skowron wrote:
>> Hello. I have some questions.
>>
>> 1. Is there any chance to change default 4MB object size to for
>> example 1MB or less ??
>
> Yeah.  You can use the cephfs to set the default layout on the root
> directory, which will apply to any new files.  See 'man cephfs'.

Ok. Thanks.

>> 2. I have create cluster of two mons, and 32 osd (1TB each) on two
>> machines. At this radosgw with apache2 for test. When i putting data
>> from s3 client to rados, everything is ok, but when one of osd is
>> going down all s3 clients going to be freeze for some seconds, before
>> ceph sign this osd as down, and then everything is starting working
>> again wit degraded osd. How can i tune this marking as a down for ms
>> or 0 :) and not for seconds freeze ??
>
> The 'osd heartbeat grace' defualt is 20.  Stick something like
>
>osd heartbeat grace = 5
>
> in the [osd] section of your ceph.conf

Thanks, i will check this in work.

> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-28 Thread Sławomir Skowron
2011/7/28 Sławomir Skowron :
> Dnia 27 lip 2011 o godz. 17:52 Sage Weil  napisał(a):
>
>> On Wed, 27 Jul 2011, S?awomir Skowron wrote:
>>> Hello. I have some questions.
>>>
>>> 1. Is there any chance to change default 4MB object size to for
>>> example 1MB or less ??
>>
>> Yeah.  You can use the cephfs to set the default layout on the root
>> directory, which will apply to any new files.  See 'man cephfs'.
>
> Ok. Thanks.
>
>>> 2. I have create cluster of two mons, and 32 osd (1TB each) on two
>>> machines. At this radosgw with apache2 for test. When i putting data
>>> from s3 client to rados, everything is ok, but when one of osd is
>>> going down all s3 clients going to be freeze for some seconds, before
>>> ceph sign this osd as down, and then everything is starting working
>>> again wit degraded osd. How can i tune this marking as a down for ms
>>> or 0 :) and not for seconds freeze ??
>>
>> The 'osd heartbeat grace' defualt is 20.  Stick something like
>>
>>    osd heartbeat grace = 5
>>
>> in the [osd] section of your ceph.conf
>
> Thanks, i will check this in work.

It works perfect thanks a lot

>
>> sage
>>
>



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions

2011-07-28 Thread Sławomir Skowron
2011/7/28 Sławomir Skowron :
> Dnia 27 lip 2011 o godz. 18:15 Gregory Farnum
>  napisał(a):
>
>> 2011/7/27 Sławomir Skowron :
>>> Ok, I will show example:
>>>
>>> rados df
>>> pool name                 KB      objects       clones     degraded
>>>  unfound           rd        rd KB           wr        wr KB
>>> .log                  558212            5            0            0
>>>       0            0            0      2844888      2844888
>>> .pool                      1            1            0            0
>>>       0            0            0            8            8
>>> .rgw                       0            6            0            0
>>>       0            0            0            1            0
>>> .users                     1            1            0            0
>>>       0            0            0            1            1
>>> .users.email               1            1            0            0
>>>       0            0            0            1            1
>>> .users.uid                 2            2            0            0
>>>       0            1            0            2            2
>>> data                       0            0            0            0
>>>       0            0            0            0            0
>>> metadata                   0            0            0            0
>>>       0            0            0            0            0
>>> rbd                        0            0            0            0
>>>       0            0            0            0            0
>>> sstest              32244226      2841055            0       653353
>>>       0            0            0     17066724     32370391
>>>  total used       324792996      2841071
>>>  total avail    31083452176
>>>  total space    33043244460
>>>
>>> It means I have almost 3mln of objects in sstest.
>>>
>>> pg_pool 7 'sstest' pg_pool(rep pg_size 3 crush_ruleset 0 object_hash
>>> rjenkins pg_num 8 pgp_num 8 lpg_num 0 lpgp_num 0 last_change 21 owner
>>> 0)
>>>
>>> 3 copies in this pool.
>>>
>>> sstest used 32.244.226 KB + log 558.212 KB = 32.802.438 KB
>>>
>>> Total used is 324.792.996 KB and it's almost 10x more.
>>>
>>> 2011-07-27 12:57:35.541556    pg v54158: 6986 pgs: 8 active, 6978
>>> active+clean; 32104 MB data, 310 GB used, 29642 GB / 31512 GB avail;
>>>
>>> I'am putting files beetwen 4-50KB on RADOS via s3 clilent, and radosgw.
>>>
>>>
>>> Can you explain that to me on this example from real life ??
>>
>> Hmm, what underlying filesystem are you using? Do you have any logging
>> enabled, and what disk is it logging to? Are all your OSDs running
>> under the same OS, or are they in virtual machines?
>
> I use ext4. I have loging enabled in osd and mon for most everything in 20.
> Every osd is running on same version of debian 6 booted from network.
> In my testing case i use two machines and they are using the same
> configuration and system.
> They are not a VM's.
>
>> If I remember correctly, that "total used" count is generated by
>> looking at df or something for the drives in question -- if there's
>> other data on the same drive as the OSD, it'll get (admittedly
>> incorrectly) counted as part of the "total used" by RADOS even if
>> RADOS can't touch it.

When you write this i check something and i think it is what you suggest.

Because of my test before i mount ext4 filesystems in /data/osd.(osd
id), but /data was a symlink to /var/data/ and i think total used
space was higher by a size of var, and there are logs, lots of logs
:). Ceph produce many logs in such verbosity. Tell me if im wrong.

Now its look like this, and it's looks better :)

2011-07-28 11:44:08.227278pg v110939: 6986 pgs: 8 active, 6978
active+clean; 42441 MB data, 223 GB used, 29457 GB / 31240 GB avail

rados df
pool name KB  objects   clones degraded
  unfound   rdrd KB   wrwr KB
.log  694273600
   000  3539909  3539909
.pool  1100
   00088
.rgw   0600
   00010
.users 1100
   000  

Re: Two questions

2011-07-29 Thread Sławomir Skowron
Yes i have made a test, and now everything is ok. Thanks for help.

iSS

Dnia 28 lip 2011 o godz. 18:36 Gregory Farnum
 napisał(a):

> 2011/7/28 Sławomir Skowron :
>> Because of my test before i mount ext4 filesystems in /data/osd.(osd
>> id), but /data was a symlink to /var/data/ and i think total used
>> space was higher by a size of var, and there are logs, lots of logs
>> :). Ceph produce many logs in such verbosity. Tell me if im wrong.
>>
>> Now its look like this, and it's looks better :)
>>
>> 2011-07-28 11:44:08.227278pg v110939: 6986 pgs: 8 active, 6978
>> active+clean; 42441 MB data, 223 GB used, 29457 GB / 31240 GB avail
>>
>> rados df
>> pool name KB  objects   clones degraded
>>  unfound   rdrd KB   wrwr KB
>> .log  694273600
>>   000  3539909  3539909
>> .pool  1100
>>   00088
>> .rgw   0600
>>   00010
>> .users 1100
>>   00011
>> .users.email   1100
>>   00011
>> .users.uid 2200
>>   01022
>> data   0000
>>   00000
>> metadata   0000
>>   00000
>> rbd0000
>>   00000
>> sstest  42766318  348369000
>>   000 20922736 42892546
>>  total used   234415408  3483707
>>  total avail30888365376
>>  total space32757780072
>
> Okay, so now you've got (42766318+694273)*3KB=124GB of data, and 223GB
> used. I guess your OSD journals are 512MB each, so that's another
> 16GB, which still leaves more unexplained space usage than I would
> expect.
> But it's probably just some peculiarity of how your system is set up;
> you could check and see how the numbers change when you add new
> objects to the system to make sure it's just a base case rather than
> something to worry about. :)
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw should support cdmi?

2011-09-22 Thread Sławomir Skowron
Any decision ?? I would like to see this in RadosGW, especially, range
support in Object Restfull API.

On Tue, Sep 20, 2011 at 5:39 AM, Sage Weil  wrote:
> http://www.snia.org/cdmi
>
> There's a cdmi plugfest going on here at SDC.  Also:
>
> https://github.com/scality/Droplet
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem with radosgw in 0.37

2011-11-08 Thread Sławomir Skowron
Maybe, i have forgot something, but there is no doc about that.

I create a configuration with nginx and radosgw for S3.

On top of radosgw standing nginx witch cache capability. Everything
was ok in version 0.32 of ceph. A have create a new filesystem with a
newest 0.37 version, and now i have some problems.

I run radosgw like this:

radosgw --rgw-socket-path=/var/run/radosgw.sock --conf=/etc/ceph/ceph.conf

In nginx i talk to unix socket of radosgw. Everything looks good. In
Radosgw-admin i create user, and its ok.

{ "user_id": "0",
  "rados_uid": 0,
  "display_name": "ocd",
  "email": "",
  "suspended": 0,
  "subusers": [],
  "keys": [
{ "user": "0",
  "access_key": "CFLZFEPYUAZV4EZ1P8OJ",
  "secret_key": "HrWN8SNfjXPhUELPLIbRIA3nCfppQjJ5xV6EnhNM"}],

pool name   category KB  objects   clones
   degraded  unfound   rdrd KB   wr
wr KB
.intent-log -  110
   0   0001
1
.log- 19   180
   0   000   39
   39
.rgw- 20   250
   0   010   49
   46
.rgw.buckets- 19   190
   0   0   32   15   45
   23
.users  -  220
   0   0002
2
.users.email-  110
   0   0001
1
.users.uid  -  440
   0   0315
5
data-  000
   0   0000
0
metadata-  000
   0   0000
0
rbd -  000
   0   0000
0
  total used  54   71
  total avail  174920736
  total space  175460736

But when i testing this with a local s3lib its not working. In nginx acces log:

127.0.0.1 - - - [08/Nov/2011:09:03:48 +0100] "PUT /nodejs/test01
HTTP/1.1" rlength: 377 bsent: 270 rtime: 0.006 urtime: 0.004 status:
403 bbsent: 103 httpref: "-" useragent: "Mozilla/4.0 (Compatible; s3;
libs3 2.0; Linux x86_64)"
127.0.0.1 - - - [08/Nov/2011:09:03:55 +0100] "PUT /nodejs/test01
HTTP/1.1" rlength: 377 bsent: 270 rtime: 0.006 urtime: 0.004 status:
403 bbsent: 103 httpref: "-" useragent: "Mozilla/4.0 (Compatible; s3;
libs3 2.0; Linux x86_64)"

Request have code 100, waiting for something in radosgw.

s3 -u put nodejs/test01 < /usr/src/libs3-2.0/TODO

ERROR: ErrorAccessDenied

>From external s3 client, a have something like that. Somehow bucket
was created, but other operations not working with error access
denied.



 AccessDenied


I have some data from s3lib, by a many tries:

 Bucket Created
  
nodejs2011-11-07T13:11:42Z
root@vm-10-177-48-24:/usr/src# s3 -u getacl nodejs
OwnerID 0 ocd
 Type  User Identifier
  Permission
--  
--
 
UserID  0 (ocd)
FULL_CONTROL
root@vm-10-177-48-24:/usr/src# s3 -u test nodejs
 Bucket  Status
  
nodejsUSA
root@vm-10-177-48-24:/usr/src# s3 -u getacl nodejs
OwnerID 0 ocd
 Type  User Identifier
  Permission
--  
--
 
UserID  0 (ocd)
FULL_CONTROL
root@vm-10-177-48-24:/usr/src# s3 -u list
 Bucket Created
  
nodejs2011-11-07T13:11:42Z


Ceph.conf

; global
[global]
; enable secure authentication
auth supported = cephx
keyring = /etc/ceph/keyring.bin

; monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any no

Re: Problem with radosgw in 0.37

2011-11-08 Thread Sławomir Skowron
Thank you very much. That's solve the problem. I was looking in source
code for that today, and i found i think self documented code for that
too :)

./src/common/config_opts.h:OPTION(debug_rgw, OPT_INT, 20)
   // log level for the Rados gateway
./src/common/config_opts.h:OPTION(rgw_cache_enabled, OPT_BOOL, false)
 // rgw cache enabled
./src/common/config_opts.h:OPTION(rgw_cache_lru_size, OPT_INT, 1)
 // num of entries in rgw cache
./src/common/config_opts.h:OPTION(rgw_socket_path, OPT_STR, "")   //
path to unix domain socket, if not specified, rgw will not run as
external fcgi
./src/common/config_opts.h:OPTION(rgw_dns_name, OPT_STR, "")
./src/common/config_opts.h:OPTION(rgw_swift_url, OPT_STR, "")  //
./src/common/config_opts.h:OPTION(rgw_swift_url_prefix, OPT_STR, "swift")  //
./src/common/config_opts.h:OPTION(rgw_print_continue, OPT_BOOL, true)
// enable if 100-Continue works
./src/common/config_opts.h:OPTION(rgw_remote_addr_param, OPT_STR,
"REMOTE_ADDR")  // e.g. X-Forwarded-For, if you have a reverse proxy
./src/common/config_opts.h:OPTION(rgw_op_thread_timeout, OPT_INT, 10*60)
./src/common/config_opts.h:OPTION(rgw_op_thread_suicide_timeout, OPT_INT, 60*60)
./src/common/config_opts.h:OPTION(rgw_thread_pool_size, OPT_INT, 100)
./src/common/config_opts.h:OPTION(rgw_maintenance_tick_interval,
OPT_DOUBLE, 10.0)
./src/common/config_opts.h:OPTION(rgw_pools_preallocate_max, OPT_INT, 100)
./src/common/config_opts.h:OPTION(rgw_pools_preallocate_threshold, OPT_INT, 70)
./src/common/config_opts.h:OPTION(rgw_log_nonexistent_bucket, OPT_BOOL, false)
./src/common/config_opts.h:OPTION(rgw_log_object_name, OPT_STR,
"%Y-%m-%d-%H-%i-%n")  // man date to see codes (a subset are
supported)
./src/common/config_opts.h:OPTION(rgw_log_object_name_utc, OPT_BOOL, false)
./src/common/config_opts.h:OPTION(rgw_intent_log_object_name, OPT_STR,
"%Y-%m-%d-%i-%n")  // man date to see codes (a subset are supported)
./src/common/config_opts.h:OPTION(rgw_intent_log_object_name_utc,
OPT_BOOL, false)

Theoreticaly, 100-continue is working with nginx, but i will dig this
soon. Now development can go on again.

But, there is no info in changelog, or I miss something about rgw
cache capability, or maybe many more features, that can be useful in
any other ceph piece in future.

Thanks again for a quick help.


2011/11/8 Yehuda Sadeh Weinraub :
> I haven't ran rgw over nginx for quite a while, so I'm not sure
> whether it can actually work. The problem that you're seeing might be
> related to the 100-continue processing, which is now enabled by
> default. Try to turn it off, setting the following under the global
> (or client) section in your ceph.conf:
>
>        rgw print continue = false
>
> If it doesn't help we'll need to dig deeper. Thanks,
> Yehuda
>
> 2011/11/8 Sławomir Skowron :
>> Maybe, i have forgot something, but there is no doc about that.
>>
>> I create a configuration with nginx and radosgw for S3.
>>
>> On top of radosgw standing nginx witch cache capability. Everything
>> was ok in version 0.32 of ceph. A have create a new filesystem with a
>> newest 0.37 version, and now i have some problems.
>>
>> I run radosgw like this:
>>
>> radosgw --rgw-socket-path=/var/run/radosgw.sock --conf=/etc/ceph/ceph.conf
>>
>> In nginx i talk to unix socket of radosgw. Everything looks good. In
>> Radosgw-admin i create user, and its ok.
>>
>> { "user_id": "0",
>>  "rados_uid": 0,
>>  "display_name": "ocd",
>>  "email": "",
>>  "suspended": 0,
>>  "subusers": [],
>>  "keys": [
>>        { "user": "0",
>>          "access_key": "CFLZFEPYUAZV4EZ1P8OJ",
>>          "secret_key": "HrWN8SNfjXPhUELPLIbRIA3nCfppQjJ5xV6EnhNM"}],
>>
>> pool name       category                 KB      objects       clones
>>   degraded      unfound           rd        rd KB           wr
>> wr KB
>> .intent-log     -                          1            1            0
>>           0           0            0            0            1
>>    1
>> .log            -                         19           18            0
>>           0           0            0            0           39
>>   39
>> .rgw            -                         20           25            0
>>           0           0            1            0           49
>>   46
>> .rgw.buckets    -                         19           19            0
>>           0           0           32           15           45
>>   23
>> .users          -               

Problem with attaching rbd device in qemu-kvm

2011-12-01 Thread Sławomir Skowron
I have some problems. Can you help me ??

ceph cluster:
ceph 0.38, oneiric, kernel 3.0.0 x86_64 - now only one machine.

kvm hypervisor:
kvm version 1.0-rc4 (0.15.92), libvirt 0.9.2, kernel 3.0.0, Ubuntu
oneiric x86_64.

I create image from qemu-img on machine with kvm VM's, and it works very well:

qemu-img create -f rbd rbd:kvmtest/kvmtest1 10G
Formatting 'rbd:kvmtest/kvmtest1', fmt=rbd size=10737418240 cluster_size=0

in ceph cluster machines:

rados -p kvmtest ls
kvmtest1.rbd
rbd_directory
rbd_info

But when i try to attach new device to kvm VM like this:

virsh attach-device one-10 /tmp/kvm.xml

with kvm.xml like this:









output of virsh:

error: Failed to attach device from /tmp/kvm.xml
error: cannot resolve symlink rbd:kvmtest/kvmtest1: No such file or directory

and mon.0 log in ceph cluster.

2011-12-01 20:18:17.142595 7fe02dd04700 mon.0@0(leader) e1
ms_verify_authorizer 10.177.32.66:0/25020608 client protocol 0
2011-12-01 20:18:17.143041 7fe03223b700 -- 10.177.64.4:6789/0 <==
client.? 10.177.32.66:0/25020608 1  auth(proto 0 21 bytes) v1 
47+0+0 (3310669357 0 0) 0x29f6a00 con 0x297adc0
2011-12-01 20:18:17.143095 7fe03223b700 mon.0@0(leader) e1 ms_dispatch
new session MonSession: client.? 10.177.32.66:0/25020608 is open for
client.? 10.177.32.66:0/25020608
2011-12-01 20:18:17.143125 7fe03223b700 mon.0@0(leader).auth v146
preprocess_query auth(proto 0 21 bytes) v1 from client.?
10.177.32.66:0/25020608
2011-12-01 20:18:17.143168 7fe03223b700 -- 10.177.64.4:6789/0 -->
10.177.32.66:0/25020608 -- auth_reply(proto 0 -95 Operation not
supported) v1 -- ?+0 0x2f30600 con 0x297adc0

ceph.conf, and keyring.bin exists on server, and machine with qemu-kvm
in same directory:

; global
[global]
; enable secure authentication

auth supported = cephx
keyring = /etc/ceph/keyring.bin

debug rgw = 1

rgw print continue = false
rgw socket path = /var/run/radosgw.sock

; monitors
;  You need at least one.  You need at least three if you want to
;  tolerate any node failures.  Always create an odd number.
[mon]
mon data = /vol0/data/mon.$id

; some minimal logging (just message traffic) to aid debugging

debug ms = 1 ; see message traffic
debug mon = 0   ; monitor
debug paxos = 0 ; monitor replication
debug auth = 0  ;

mon allowed clock drift = 2

[mon.0]
host = 10-177-64-4
mon addr = 10.177.64.4:6789

; radosgw client list
[client.radosgw.10-177-64-4]

host = 10-177-64-4
log file = /var/log/ceph/$name.log
debug rgw = 1
debug ms = 1

; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like.
[osd]
; This is where the btrfs volume will be mounted.

osd data = /vol0/data/osd.$id

; Ideally, make this a separate disk or partition.  A few GB
; is usually enough; more if you have fast disks.  You can use
; a file under the osd data dir if need be
; (e.g. /data/osd$id/journal), but it will be slower than a
; separate disk or partition.

osd journal = /vol0/data/osd.$id/journal

; If the OSD journal is a file, you need to specify the size.
This is specified in MB.

osd journal size = 512

filestore journal writeahead = 1
osd heartbeat grace = 5

debug ms = 0 ; message traffic
debug osd = 0
debug filestore = 0 ; local object storage
debug journal = 0   ; local journaling
debug monc = 0
debug rados = 1

[osd.0]
host = 10-177-64-4
osd data = /vol0/data/osd.0
keyring = /vol0/data/osd.0/keyring

[osd.1]
host = 10-177-64-4
osd data = /vol0/data/osd.1
keyring = /vol0/data/osd.1/keyring

[osd.2]
host = 10-177-64-4
osd data = /vol0/data/osd.2
keyring = /vol0/data/osd.2/keyring

[osd.3]
host = 10-177-64-4
osd data = /vol0/data/osd.3
keyring = /vol0/data/osd.3/keyring

[osd.4]
host = 10-177-64-4
osd data = /vol0/data/osd.4
keyring = /vol0/data/osd.4/keyring

[osd.5]
host = 10-177-64-4
osd data = /vol0/data/osd.5
keyring = /vol0/data/osd.5/keyring

[osd.6]
host = 10-177-64-4
osd data = /vol0/data/osd.6
keyring = /vol0/data/osd.6/keyring

[osd.7]
host = 10-177-64-4
osd data = /vol0/data/osd.7
keyring = /vol0/data/osd.7/keyring

[osd.8]
host = 10-177-64-4
osd data = /vol0/data/osd.8
keyring = /vol0/data/osd.8/keyring

[osd.9]
host = 10-177-64-4
osd data = /vol0/data/osd.9
keyring = /vol0/data/osd.9/keyring

[osd.10]
host = 10-177-64-4
osd data = /vol0/data/osd.10
keyring = /vol0/data/osd.10/keyring

[osd.11]
host = 10-177-64-4
osd data = /vol0/data/osd.11

Re: Problem with attaching rbd device in qemu-kvm

2011-12-09 Thread Sławomir Skowron
Sorry, for my lag, but i was sick.

I handled problem with apparmor before and now it's not a problem,
even when i send mail before, it was solved.

Ok when i create image from qemu-img it's looks like that, and works
perfect, (attachment with log of this opperation.)

But when i use virsh, it's still a problem. Just like he don't know
what user is going to be auth.

I try, as client.admin (default i hope :)) only with id=admin in
kvm.xml, or client=admin, and other combinations - like client.rbd
etc., Still nothing.

I try, improve a keyring file, adding client rbd into keyring on
machine with kvm hypervisor in /etc/ceph/keyring.bin

[client.rbd]
key = 
auid = 18446744073709551615
caps mds = "allow"
caps mon = "allow r"
caps osd = "allow rw pool=kvmtest"

and on cluster site like above. I was added, a section [client.rbd]
with keyring path, and it's still a problem.

2011-12-09 13:29:11.519096 7fe616b59700 mon.0@0(leader) e1
ms_verify_authorizer 10.177.32.66:0/58020608 client protocol 0
root@s3-10-177-64-4:~# tail -n1 /var/log/ceph/mon.0.log | grep 13:29:11.
2011-12-09 13:29:11.519096 7fe616b59700 mon.0@0(leader) e1
ms_verify_authorizer 10.177.32.66:0/58020608 client protocol 0
2011-12-09 13:29:11.519624 7fe61b090700 -- 10.177.64.4:6789/0 <==
client.? 10.177.32.66:0/58020608 1  auth(proto 0 21 bytes) v1 
47+0+0 (3310669357 0 0) 0x17f2e00 con 0x19d2c80
2011-12-09 13:29:11.519645 7fe61b090700 mon.0@0(leader) e1 have connection
2011-12-09 13:29:11.519655 7fe61b090700 mon.0@0(leader) e1 do not have
session, making new one
2011-12-09 13:29:11.519668 7fe61b090700 mon.0@0(leader) e1 ms_dispatch
new session MonSession: client.? 10.177.32.66:0/58020608 is open for
client.? 10.177.32.66:0/58020608
2011-12-09 13:29:11.519674 7fe61b090700 mon.0@0(leader) e1 setting
timeout on session
2011-12-09 13:29:11.519680 7fe61b090700 mon.0@0(leader) e1  caps
2011-12-09 13:29:11.519688 7fe61b090700 mon.0@0(leader).auth v353
AuthMonitor::update_from_paxos()
2011-12-09 13:29:11.519697 7fe61b090700 mon.0@0(leader).auth v353
preprocess_query auth(proto 0 21 bytes) v1 from client.?
10.177.32.66:0/58020608
2011-12-09 13:29:11.519703 7fe61b090700 mon.0@0(leader).auth v353
prep_auth() blob_size=21
2011-12-09 13:29:11.519735 7fe61b090700 -- 10.177.64.4:6789/0 -->
10.177.32.66:0/58020608 -- auth_reply(proto 0 -95 Operation not
supported) v1 -- ?+0 0x1246200 con 0x19d2c80

from libvirt.log

23:20:56.689: 1309: error : virSecurityDACRestoreSecurityFileLabel:143
: cannot resolve symlink rbd:kvmtest/kvmtest1:name=admin: No such file
or directory
23:20:56.882: 1309: warning : qemuDomainAttachPciDiskDevice:250 :
Unable to restore security label on rbd:kvmtest/kvmtest1:name=admin
23:21:09.172: 1309: error : qemuMonitorTextAddDrive:2418 : operation
failed: open disk image file failed
23:21:09.172: 1309: error : virSecurityDACRestoreSecurityFileLabel:143
: cannot resolve symlink rbd:kvmtest/kvmtest1:name=admin: No such file
or directory
23:21:09.364: 1309: warning : qemuDomainAttachPciDiskDevice:250 :
Unable to restore security label on rbd:kvmtest/kvmtest1:name=admin
23:21:49.645: 1309: error : qemuMonitorTextAddDrive:2418 : operation
failed: open disk image file failed
23:21:49.645: 1309: error : virSecurityDACRestoreSecurityFileLabel:143
: cannot resolve symlink rbd:kvmtest/kvmtest1:id=admin: No such file
or directory
23:21:49.853: 1309: warning : qemuDomainAttachPciDiskDevice:250 :
Unable to restore security label on rbd:kvmtest/kvmtest1:id=admin
13:08:08.924: 1309: error : qemuMonitorTextAddDrive:2418 : operation
failed: open disk image file failed
13:08:08.924: 1309: error : virSecurityDACRestoreSecurityFileLabel:143
: cannot resolve symlink rbd:kvmtest/kvmtest1: No such file or
directory
13:08:09.117: 1309: warning : qemuDomainAttachPciDiskDevice:250 :
Unable to restore security label on rbd:kvmtest/kvmtest1
13:09:45.402: 1309: error : qemuMonitorTextAddDrive:2418 : operation
failed: open disk image file failed
13:29:05.554: 1310: error : qemuMonitorTextAddDrive:2418 : operation
failed: open disk image file failed
13:29:05.554: 1310: error : virSecurityDACRestoreSecurityFileLabel:143
: cannot resolve symlink kvmtest/kvmtest1: No such file or directory
13:29:05.751: 1310: warning : qemuDomainAttachPciDiskDevice:250 :
Unable to restore security label on kvmtest/kvmtest1
13:29:05.751: 1310: warning : virEventPollUpdateHandle:147 : Ignoring
invalid update watch -1

2011/12/1 Josh Durgin :
> On 12/01/2011 11:37 AM, Sławomir Skowron wrote:
>>
>> I have some problems. Can you help me ??
>>
>> ceph cluster:
>> ceph 0.38, oneiric, kernel 3.0.0 x86_64 - now only one machine.
>>
>> kvm hypervisor:
>> kvm version 1.0-rc4 (0.15.92), libvirt 0.9.2, kernel 3.0.0, Ubuntu
>> oneiric x86_64.
>>
>> I create image from qemu-img on machine with kvm VM'

Re: Problem with attaching rbd device in qemu-kvm

2011-12-13 Thread Sławomir Skowron
Finally, i manage the problem with rbd with kvm 1.0, and libvirt
0.9.8, or i think i manage :), but i get stuck with one thing after.

2011-12-13 12:13:31.173+: 21512: error :
qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
command 'device_add': Bus 'pci.0' does not support hotplugging
2011-12-13 12:13:31.173+: 21512: warning :
qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
file=rbd:rbd/testdysk:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk2,format=raw
(virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2)
2011-12-13 12:13:31.173+: 21512: error :
virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink
rbd/testdysk: No such file or directory2011-12-13 12:13:31.173+:
21512: warning : qemuDomainAttachPciDiskDevice:287 : Unable to restore
security label on rbd/testdysk
and i can't find any info about that.

my kvm.xml inject file:


 
 
 
 
 
 
 
 


secret:

cat /tmp/secret.xml

 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
 
 rbd/admin
 admin
 


virsh secret-define /tmp/secret.xml
Secret 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f created

virsh secret-set-value 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==
Secret value set

2011/12/12 Josh Durgin :
> On 12/09/2011 04:48 AM, Sławomir Skowron wrote:
>>
>> Sorry, for my lag, but i was sick.
>>
>> I handled problem with apparmor before and now it's not a problem,
>> even when i send mail before, it was solved.
>>
>> Ok when i create image from qemu-img it's looks like that, and works
>> perfect, (attachment with log of this opperation.)
>>
>> But when i use virsh, it's still a problem. Just like he don't know
>> what user is going to be auth.
>>
>> I try, as client.admin (default i hope :)) only with id=admin in
>> kvm.xml, or client=admin, and other combinations - like client.rbd
>> etc., Still nothing.
>
>
> admin is the default, and the correct option for client.admin is id=admin,
> or id=foo for client.foo.
>
>
>>
>> I try, improve a keyring file, adding client rbd into keyring on
>> machine with kvm hypervisor in /etc/ceph/keyring.bin
>>
>> [client.rbd]
>>         key = 
>>         auid = 18446744073709551615
>>         caps mds = "allow"
>>         caps mon = "allow r"
>>         caps osd = "allow rw pool=kvmtest"
>>
>> and on cluster site like above. I was added, a section [client.rbd]
>> with keyring path, and it's still a problem.
>>
>
> Did you run 'ceph auth add -i /etc/ceph/keyring.bin'?
> Make sure 'ceph auth list' shows client.rbd with the right key.

Overall i want to use admin id, and i think now it's correctly used in xml.

>
>
>> 2011-12-09 13:29:11.519096 7fe616b59700 mon.0@0(leader) e1
>> ms_verify_authorizer 10.177.32.66:0/58020608 client protocol 0
>> root@s3-10-177-64-4:~# tail -n1 /var/log/ceph/mon.0.log | grep
>> 13:29:11.
>> 2011-12-09 13:29:11.519096 7fe616b59700 mon.0@0(leader) e1
>> ms_verify_authorizer 10.177.32.66:0/58020608 client protocol 0
>> 2011-12-09 13:29:11.519624 7fe61b090700 -- 10.177.64.4:6789/0<==
>> client.? 10.177.32.66:0/58020608 1  auth(proto 0 21 bytes) v1 
>> 47+0+0 (3310669357 0 0) 0x17f2e00 con 0x19d2c80
>> 2011-12-09 13:29:11.519645 7fe61b090700 mon.0@0(leader) e1 have connection
>> 2011-12-09 13:29:11.519655 7fe61b090700 mon.0@0(leader) e1 do not have
>> session, making new one
>> 2011-12-09 13:29:11.519668 7fe61b090700 mon.0@0(leader) e1 ms_dispatch
>> new session MonSession: client.? 10.177.32.66:0/58020608 is open for
>> client.? 10.177.32.66:0/58020608
>> 2011-12-09 13:29:11.519674 7fe61b090700 mon.0@0(leader) e1 setting
>> timeout on session
>> 2011-12-09 13:29:11.519680 7fe61b090700 mon.0@0(leader) e1  caps
>> 2011-12-09 13:29:11.519688 7fe61b090700 mon.0@0(leader).auth v353
>> AuthMonitor::update_from_paxos()
>> 2011-12-09 13:29:11.519697 7fe61b090700 mon.0@0(leader).auth v353
>> preprocess_query auth(proto 0 21 bytes) v1 from client.?
>> 10.177.32.66:0/58020608
>> 2011-12-09 13:29:11.519703 7fe61b090700 mon.0@0(leader).auth v353
>> prep_auth() blob_size=21
>> 2011-12-09 13:29:11.519735 7fe61b090700 -- 10.177.64.4:6789/0 -->
>> 10.177.32.66:0/58020608 -- auth_reply(proto 0 -95 Operation not
>> supported) v1 -- ?+0 0x1246200 con 0x19d2c80
>>
>> from libvirt.log
>>
&

Re: Problem with attaching rbd device in qemu-kvm

2011-12-16 Thread Sławomir Skowron
2011/12/13 Josh Durgin :
> On 12/13/2011 04:56 AM, Sławomir Skowron wrote:
>>
>> Finally, i manage the problem with rbd with kvm 1.0, and libvirt
>> 0.9.8, or i think i manage :), but i get stuck with one thing after.
>>
>> 2011-12-13 12:13:31.173+: 21512: error :
>> qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
>> command 'device_add': Bus 'pci.0' does not support hotplugging
>> 2011-12-13 12:13:31.173+: 21512: warning :
>> qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
>>
>> file=rbd:rbd/testdysk:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
>> none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk2,format=raw
>>
>> (virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2)
>> 2011-12-13 12:13:31.173+: 21512: error :
>> virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink
>> rbd/testdysk: No such file or directory2011-12-13 12:13:31.173+:
>> 21512: warning : qemuDomainAttachPciDiskDevice:287 : Unable to restore
>> security label on rbd/testdysk
>> and i can't find any info about that.
>>
>> my kvm.xml inject file:
>>
>> 
>>          
>>          
>>                  > uuid="0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f"/>
>>          
>>          
>>                  
>>          
>>          
>> 
>>
>> secret:
>>
>> cat /tmp/secret.xml
>> 
>>      0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>>      
>>              rbd/admin
>>              admin
>>      
>> 
>>
>> virsh secret-define /tmp/secret.xml
>> Secret 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f created
>>
>> virsh secret-set-value 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>> AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==
>> Secret value set
>>
>
> Your xml and configuration are all fine, you're hitting the libvirt bug that
> I mentioned before. The fix isn't in any stable release yet, and 0.9.8 was
> just released yesterday, so I guess you'll have to compile it yourself or
> disable the libvirt security driver.

Yes libvirt 0.9.8 with this patch solved problem with rbd, now it's
clean. Thanks for that.

But, a second problem appear.

error: Failed to attach device from /root/kvm-ceph.xml
error: internal error unable to execute QEMU command 'device_add':
Property 'virtio-blk-pci.drive' can't find value 'drive-virtio-disk3'

/var/log/libvirt/libvirtd.log
2011-12-16 12:03:04.950+: 1070: error :
qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
command 'device_add': Property 'virtio-blk-pci.drive' can't find value
'drive-virtio-disk3'
2011-12-16 12:03:04.950+: 1070: warning :
qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
file=rbd:rbd/testdysk:rbd_writeback_window=800:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk3,format=raw
(virtio-blk-pci,bus=pci.0,addr=0xd,drive=drive-virtio-disk3,id=virtio-disk3)

if i attach a scsi device from img file everything is OK.

Anyone have, any concept with this ??

>
>
>> 2011/12/12 Josh Durgin:
>>>
>>> On 12/09/2011 04:48 AM, Sławomir Skowron wrote:
>>>>
>>>>
>>>> Sorry, for my lag, but i was sick.
>>>>
>>>> I handled problem with apparmor before and now it's not a problem,
>>>> even when i send mail before, it was solved.
>>>>
>>>> Ok when i create image from qemu-img it's looks like that, and works
>>>> perfect, (attachment with log of this opperation.)
>>>>
>>>> But when i use virsh, it's still a problem. Just like he don't know
>>>> what user is going to be auth.
>>>>
>>>> I try, as client.admin (default i hope :)) only with id=admin in
>>>> kvm.xml, or client=admin, and other combinations - like client.rbd
>>>> etc., Still nothing.
>>>
>>>
>>>
>>> admin is the default, and the correct option for client.admin is
>>> id=admin,
>>> or id=foo for client.foo.
>>>
>>>
>>>>
>>>> I try, improve a keyring file, adding client rbd into keyring on
>>>> machine with kvm hypervisor in /etc/ceph/keyring.bin
>>>>
>>>> [client.rbd]
>>>>         key = 
>>>>         auid = 18446744073709551615
>>>>         caps mds 

Re: Problem with attaching rbd device in qemu-kvm

2011-12-19 Thread Sławomir Skowron
Hi,

Actual setup:

ii  libvirt-bin  0.9.2-4ubuntu15.1
  the programs for the libvirt library
ii  libvirt0 0.9.2-4ubuntu15.1
  library for interfacing with different virtualization systems
ii  qemu-kvm 1.0.0+dfsg+rc2-1~oneiric1
  Full virtualization on i386 and amd64 hardware

I change many thing in system, and successfully attached rbd drives via libvirt:

virsh attach-device one-888 kvm-ceph.xml
Device attached successfully

A new set of disks, appears in libvirt vm xml, but not in vm machine.
After reboot, all new rbd drives appear in VM.

Is there any chance to hotadd rbd device to working VM without reboot ??


2011/12/16 Sławomir Skowron :
> 2011/12/13 Josh Durgin :
>> On 12/13/2011 04:56 AM, Sławomir Skowron wrote:
>>>
>>> Finally, i manage the problem with rbd with kvm 1.0, and libvirt
>>> 0.9.8, or i think i manage :), but i get stuck with one thing after.
>>>
>>> 2011-12-13 12:13:31.173+: 21512: error :
>>> qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
>>> command 'device_add': Bus 'pci.0' does not support hotplugging
>>> 2011-12-13 12:13:31.173+: 21512: warning :
>>> qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
>>>
>>> file=rbd:rbd/testdysk:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
>>> none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk2,format=raw
>>>
>>> (virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2)
>>> 2011-12-13 12:13:31.173+: 21512: error :
>>> virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink
>>> rbd/testdysk: No such file or directory2011-12-13 12:13:31.173+:
>>> 21512: warning : qemuDomainAttachPciDiskDevice:287 : Unable to restore
>>> security label on rbd/testdysk
>>> and i can't find any info about that.
>>>
>>> my kvm.xml inject file:
>>>
>>> 
>>>          
>>>          
>>>                  >> uuid="0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f"/>
>>>          
>>>          
>>>                  
>>>          
>>>          
>>> 
>>>
>>> secret:
>>>
>>> cat /tmp/secret.xml
>>> 
>>>      0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>>>      
>>>              rbd/admin
>>>              admin
>>>      
>>> 
>>>
>>> virsh secret-define /tmp/secret.xml
>>> Secret 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f created
>>>
>>> virsh secret-set-value 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>>> AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==
>>> Secret value set
>>>
>>
>> Your xml and configuration are all fine, you're hitting the libvirt bug that
>> I mentioned before. The fix isn't in any stable release yet, and 0.9.8 was
>> just released yesterday, so I guess you'll have to compile it yourself or
>> disable the libvirt security driver.
>
> Yes libvirt 0.9.8 with this patch solved problem with rbd, now it's
> clean. Thanks for that.
>
> But, a second problem appear.
>
> error: Failed to attach device from /root/kvm-ceph.xml
> error: internal error unable to execute QEMU command 'device_add':
> Property 'virtio-blk-pci.drive' can't find value 'drive-virtio-disk3'
>
> /var/log/libvirt/libvirtd.log
> 2011-12-16 12:03:04.950+: 1070: error :
> qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
> command 'device_add': Property 'virtio-blk-pci.drive' can't find value
> 'drive-virtio-disk3'
> 2011-12-16 12:03:04.950+: 1070: warning :
> qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
> file=rbd:rbd/testdysk:rbd_writeback_window=800:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
> none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk3,format=raw
> (virtio-blk-pci,bus=pci.0,addr=0xd,drive=drive-virtio-disk3,id=virtio-disk3)
>
> if i attach a scsi device from img file everything is OK.
>
> Anyone have, any concept with this ??
>
>>
>>
>>> 2011/12/12 Josh Durgin:
>>>>
>>>> On 12/09/2011 04:48 AM, Sławomir Skowron wrote:
>>>>>
>>>>>
>>>>> Sorry, for my lag, but i was sick.
>>>>>
>>>>> I handled problem with apparmor before and now it's not a problem,
>>>>> even when i send mail be

Re: Problem with attaching rbd device in qemu-kvm

2011-12-20 Thread Sławomir Skowron
Ehhh too long with this :) I forget to load acpiphp.

Thanks for everything, now its working beautifully. After 10h of
iozone on rbd devices, and no hang, or problem.

2011/12/20 Josh Durgin :
> On 12/19/2011 07:37 AM, Sławomir Skowron wrote:
>>
>> Hi,
>>
>> Actual setup:
>>
>> ii  libvirt-bin                      0.9.2-4ubuntu15.1
>>       the programs for the libvirt library
>> ii  libvirt0                         0.9.2-4ubuntu15.1
>>       library for interfacing with different virtualization systems
>> ii  qemu-kvm                         1.0.0+dfsg+rc2-1~oneiric1
>>       Full virtualization on i386 and amd64 hardware
>>
>> I change many thing in system, and successfully attached rbd drives via
>> libvirt:
>>
>> virsh attach-device one-888 kvm-ceph.xml
>> Device attached successfully
>>
>> A new set of disks, appears in libvirt vm xml, but not in vm machine.
>> After reboot, all new rbd drives appear in VM.
>>
>> Is there any chance to hotadd rbd device to working VM without reboot ??
>
>
> Hotplugging should work, but it does require guest support. Does your guest
> kernel have hotplugging enabled? Does hotplugging work with a raw image file
> using the same configuration?
>
>
>>
>> 2011/12/16 Sławomir Skowron:
>>>
>>> 2011/12/13 Josh Durgin:
>>>>
>>>> On 12/13/2011 04:56 AM, Sławomir Skowron wrote:
>>>>>
>>>>>
>>>>> Finally, i manage the problem with rbd with kvm 1.0, and libvirt
>>>>> 0.9.8, or i think i manage :), but i get stuck with one thing after.
>>>>>
>>>>> 2011-12-13 12:13:31.173+: 21512: error :
>>>>> qemuMonitorJSONCheckError:318 : internal error unable to execute QEMU
>>>>> command 'device_add': Bus 'pci.0' does not support hotplugging
>>>>> 2011-12-13 12:13:31.173+: 21512: warning :
>>>>> qemuDomainAttachPciDiskDevice:244 : qemuMonitorAddDevice failed on
>>>>>
>>>>>
>>>>> file=rbd:rbd/testdysk:id=admin:key=AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==:auth_supported=cephx
>>>>>
>>>>> none:mon_host=10.177.64.4\:6789,if=none,id=drive-virtio-disk2,format=raw
>>>>>
>>>>>
>>>>> (virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2)
>>>>> 2011-12-13 12:13:31.173+: 21512: error :
>>>>> virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink
>>>>> rbd/testdysk: No such file or directory2011-12-13 12:13:31.173+:
>>>>> 21512: warning : qemuDomainAttachPciDiskDevice:287 : Unable to restore
>>>>> security label on rbd/testdysk
>>>>> and i can't find any info about that.
>>>>>
>>>>> my kvm.xml inject file:
>>>>>
>>>>> 
>>>>>          
>>>>>          
>>>>>                  >>>> uuid="0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f"/>
>>>>>          
>>>>>          
>>>>>                  
>>>>>          
>>>>>          
>>>>> 
>>>>>
>>>>> secret:
>>>>>
>>>>> cat /tmp/secret.xml
>>>>> 
>>>>>      0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>>>>>      
>>>>>              rbd/admin
>>>>>              admin
>>>>>      
>>>>> 
>>>>>
>>>>> virsh secret-define /tmp/secret.xml
>>>>> Secret 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f created
>>>>>
>>>>> virsh secret-set-value 0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f
>>>>> AQANeNFOiIx9DhAAv76MRXZjWNKn2sSSTQeJog==
>>>>> Secret value set
>>>>>
>>>>
>>>> Your xml and configuration are all fine, you're hitting the libvirt bug
>>>> that
>>>> I mentioned before. The fix isn't in any stable release yet, and 0.9.8
>>>> was
>>>> just released yesterday, so I guess you'll have to compile it yourself
>>>> or
>>>> disable the libvirt security driver.
>>>
>>>
>>> Yes libvirt 0.9.8 with this patch solved problem with rbd, now it's
>>> clean. Thanks for that.
>>>
>>> But, a second problem appear.
>>>
>>> error: Failed to attach device from /root/kvm-ceph.xml
>>> error: internal error 

Adding new mon to existing cluster in ceph v0.39(+?)

2012-01-10 Thread Sławomir Skowron
I have some problem with adding a new mon to existing ceph cluster.

Now cluster contains a 3 mon's, but i started with only one in one
machine. Then adding a second, and third machine, with new mon's, and
OSD. Adding, a new OSD is quiet simple, but adding, a new mon is
compilation of some pieces in old doc of ceph, new doc, and a group
mails.

This -> http://ceph.newdream.net/docs/latest/ops/manage/grow/mon/ -
not working properly in section (Adding a monitor)

Maybe this will be useful for someone:

1. Create a new mon structure with existing one working mon instance,
maybe created with mkcepfs in init of cluster.

  a) edit ceph.conf, and add new mon definition in mon part of conf in
whole cluster.

  b) ceph auth get mon. -o /tmp/monkey

  c) fsid=`ceph fsid --concise`

  d) ceph-mon -i  --mkfs -k /tmp/monkey --fsid $fsid

2. Before you start new mon (check if new mon is not working - in my
case it's not starting even if i try :)), some things musts be done
before.
It's based on http://ceph.newdream.net/docs/latest/ops/manage/grow/mon/
(Removing a monitor from an unhealthy or down cluster)

  a) On a surviving monitor node, find the most recent monmap in mon
dir, like in doc about removing monitor.

  b) On a surviving monitor node:

   $ cp $mon_data/monmap/ /tmp/foo
   $ monmaptool /tmp/foo --add  :

  c) Inject a new monmap to working ceph mon.

 ceph-mon -i  --inject-monmap /tmp/foo

 ceph -s will show new number of mons.

  d) copy /tmp/foo, and inject this monmap, to every mon, that works
in existing cluster, even on machine with new mon to update, a monmap
in new mon directory.

  e) Start new mon:

   service ceph start mon

then mon_status will show a new list of mon in ceph cluster.

   ceph mon_status

Now new mon works perfect.


Maybe it's not a supported way to insert a new mon to cluster, but for
me now it's only way that works :)

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


.rgw expand number of pg's

2012-01-10 Thread Sławomir Skowron
How to expand number of pg's in rgw pool ??

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: .rgw expand number of pg's

2012-01-10 Thread Sławomir Skowron
Maybe i missunderstood problem, but i see something like this.

My setup is 3 node cluster. 78 osd and 3 mons. At the top of the
cluster working radosgw on every machine. Every pool have a 3
replicas. Default politics for replicas is host in racks, and every
machine is in other rack.

When i do a stress test via s3 client, writing a lot of new object via
balancer to the cluster i discovered, that only 3 osd are involved.
This means that only one osd on every machine working in one time,
when the object are writen via radosgw.

Thats why i write this mail about increase number of pgs in radosgw
pool. Maybe i am wrong, but how i can perform this better, more
parallel, to use power of many drives (osd's) ??

For example when i use a rbd in this case, usage of osd devices is
more random, and parallel.

Pozdrawiam

iSS

On 10 sty 2012, at 18:11, Samuel Just  wrote:

> At the moment, expanding the number of pgs in a pool is not working.
> We hope to get it working in the somewhat near future (probably a few
> months).  Are you attempting to expand the number of osds and running
> out of pgs?
> -Sam
>
> 2012/1/10 Sławomir Skowron :
>> How to expand number of pg's in rgw pool ??
>>
>> --
>> -
>> Pozdrawiam
>>
>> Sławek "sZiBis" Skowron
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding new mon to existing cluster in ceph v0.39(+?)

2012-01-18 Thread Sławomir Skowron
Now i don't remember message but it was about a public-addr of mon in
conf, and problem with no new mon in monmap. My mon section looks like
this:

mon data = /vol0/data/mon.$id

; some minimal logging (just message traffic) to aid debugging

debug ms = 1 ; see message traffic
debug mon = 0   ; monitor
debug paxos = 0 ; monitor replication
debug auth = 0  ;

mon allowed clock drift = 2

[mon.0]
host = s3-10-177-64-4
mon addr = 10.177.64.4:6789

[mon.1]
host = s3-10-177-64-6
mon addr = 10.177.64.6:6789

[mon.2]
host = s3-10-177-64-8
mon addr = 10.177.64.8:6789

But as i can see there is a updated doc, about adding new mon.

2012/1/10 Samuel Just :
> It looks like in step one you needed to supply a either monmap or
> addresses of existing monitors.  What errors did you encounter?
> -Sam
>
> 2012/1/10 Sławomir Skowron :
>> I have some problem with adding a new mon to existing ceph cluster.
>>
>> Now cluster contains a 3 mon's, but i started with only one in one
>> machine. Then adding a second, and third machine, with new mon's, and
>> OSD. Adding, a new OSD is quiet simple, but adding, a new mon is
>> compilation of some pieces in old doc of ceph, new doc, and a group
>> mails.
>>
>> This -> http://ceph.newdream.net/docs/latest/ops/manage/grow/mon/ -
>> not working properly in section (Adding a monitor)
>>
>> Maybe this will be useful for someone:
>>
>> 1. Create a new mon structure with existing one working mon instance,
>> maybe created with mkcepfs in init of cluster.
>>
>>  a) edit ceph.conf, and add new mon definition in mon part of conf in
>> whole cluster.
>>
>>  b) ceph auth get mon. -o /tmp/monkey
>>
>>  c) fsid=`ceph fsid --concise`
>>
>>  d) ceph-mon -i  --mkfs -k /tmp/monkey --fsid $fsid
>>
>> 2. Before you start new mon (check if new mon is not working - in my
>> case it's not starting even if i try :)), some things musts be done
>> before.
>> It's based on http://ceph.newdream.net/docs/latest/ops/manage/grow/mon/
>> (Removing a monitor from an unhealthy or down cluster)
>>
>>  a) On a surviving monitor node, find the most recent monmap in mon
>> dir, like in doc about removing monitor.
>>
>>  b) On a surviving monitor node:
>>
>>   $ cp $mon_data/monmap/ /tmp/foo
>>   $ monmaptool /tmp/foo --add  :
>>
>>  c) Inject a new monmap to working ceph mon.
>>
>>     ceph-mon -i  --inject-monmap /tmp/foo
>>
>>     ceph -s will show new number of mons.
>>
>>  d) copy /tmp/foo, and inject this monmap, to every mon, that works
>> in existing cluster, even on machine with new mon to update, a monmap
>> in new mon directory.
>>
>>  e) Start new mon:
>>
>>   service ceph start mon
>>
>> then mon_status will show a new list of mon in ceph cluster.
>>
>>   ceph mon_status
>>
>> Now new mon works perfect.
>>
>>
>> Maybe it's not a supported way to insert a new mon to cluster, but for
>> me now it's only way that works :)
>>
>> --
>> -
>> Pozdrawiam
>>
>> Sławek "sZiBis" Skowron
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disable logging for radosgw inside rados pool

2012-02-08 Thread Sławomir Skowron
Excellent, thanks.

Pozdrawiam

iSS

Dnia 8 lut 2012 o godz. 15:45 Yehuda Sadeh Weinraub
 napisał(a):

> 2012/2/8 Sławomir Skowron :
>> Is there any way to disable logging inside rados for radosgw.
>>
>> pool name   category KB  objects   clones
>>   degraded  unfound   rdrd KB   wr
>> wr KB
>> .intent-log -   5952   160
>>   0   00029288
>> 29288
>> .log- 356198  4970
>>   0   000  1473971
>> 1473971
>> .rgw-  1   100
>>   0   0109
>>7
>> .rgw.buckets-9756733655890
>>   0   0   588675   4224301428  1000737
>> 1699523842
>>
>> Disable writes to .log ??
>>
>
> You can set RGW_SHOULD_LOG parameter to 'no' in your http server
> config. Note that we may move that option soon to ceph.conf (just
> opened issue #2040).
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph_setattr mask

2012-02-14 Thread Sławomir Skowron
Pozdrawiam

iSS

On 14 lut 2012, at 01:05, Noah Watkins  wrote:

> Howdy,
>
> It looks like ceph_fs.h contains the mask flags (e.g. CEPH_SETATTR_MODE) used 
> in ceph_setattr, but I do not see these flags in any header installed from 
> .deb files (grep /usr/include/*).
>
> Am I missing a location? Should these flags be part of the installed headers?
>
> Thanks,
> -Noah--
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Tier and MetroCluster

2012-02-17 Thread Sławomir Skowron
I have some question, about future plans.

I know ceph is LAN DFS, not a WAN in any kind, but ;)

1. Is there any plan about tier support. Example:
I have ceph cluster with fast SAS drives, and a losts of RAM, and SSD
acceleration, and 10GE network. I use only RBD, and RadosGW. Cluster
have relative small capacity, but its fast.
I have a second cluster with diffrent configuration, with a large
number of big SATA drives, and smaller number of RAM, 1Gb ethernet.
Same usage as above.
All i need is to have phisical 2 clusters, but with some tier
function, to move old objects from fast to slow cluster (for archive),
and opposite.

If ceph will be very stable this can be done inside one cluster it
will be much simpler, but crush need to know what is faster, and what
is slower.

2. Like above two clusters this time same configurations and size, but
with async replication between.
Replication can be done by the external replication darmon on top of
rados, or other solution.

Is there any plans with any of this ?? Or something like this ??

Pozdrawiam

Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tier and MetroCluster

2012-02-17 Thread Sławomir Skowron
Dnia 17 lut 2012 o godz. 19:06 Tommi Virtanen
 napisał(a):

> 2012/2/17 Sławomir Skowron :
>> 1. Is there any plan about tier support. Example:
>> I have ceph cluster with fast SAS drives, and a losts of RAM, and SSD
>> acceleration, and 10GE network. I use only RBD, and RadosGW. Cluster
>> have relative small capacity, but its fast.
>> I have a second cluster with diffrent configuration, with a large
>> number of big SATA drives, and smaller number of RAM, 1Gb ethernet.
>> Same usage as above.
>> All i need is to have phisical 2 clusters, but with some tier
>> function, to move old objects from fast to slow cluster (for archive),
>> and opposite.
>
> You can do this right now, by having just one cluster, specifying
> different crush rulesets for different pools, and then moving your
> objects from one pool to another as they get "old". You'll need to
> manage the migration yourself -- with RADOS, explicitly creating
> objects in the "old" pool, with Ceph DFS, "cephfs PATH set_layout
> --pool MYPOOL" only affects new files. For radosgw, this doesn't
> currently exist, and I'm not sure how it would behave, but it is
> conceivable.
>
>> If ceph will be very stable this can be done inside one cluster it
>> will be much simpler, but crush need to know what is faster, and what
>> is slower.
>
> There are no extra smarts about it. For RADOS, there most likely will
> be no automatic thing here -- after all, we are intentionally avoiding
> any lookup tables -- for the Ceph DFS, I can see that the set_layout
> logic could be extended to migrate existing files too. And radosgw
> might get this as an automatic feature, one day.

Thanks for advice.

>From the economical point of view its very nice feature, and i will be
happy to see it in radosgw in some future maybe :)

Yes, you have right, It can be done by the tool in offline via RADOS,
and it's good idea. Maybe storing, a key=>value counters for a object
inside ceph, or outside, but it is simple my mind storming :)

>
>> 2. Like above two clusters this time same configurations and size, but
>> with async replication between.
>> Replication can be done by the external replication darmon on top of
>> rados, or other solution.
>>
>> Is there any plans with any of this ?? Or something like this ??
>
> I think this is something a lot of people will want, so it probably
> will get done at some point. It is not being developed, currently.

Thanks, and i am waiting for some new exciting features in ceph project :)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Serious problem after increase pg_num in pool

2012-02-20 Thread Sławomir Skowron
After increase number pg_num from 8 to 100 in .rgw.buckets i have some
serious problems.

pool name   category KB  objects   clones
   degraded  unfound   rdrd KB   wr
wr KB
.intent-log -   4662   190
   0   00026502
26501
.log-  000
   0   000   913732
913342
.rgw-  1   100
   0   0109
7
.rgw.buckets-   39582566737070
8061   0865940   610896
36050541
.rgw.control-  010
   0   0000
0
.users  -  110
   0   0001
1
.users.uid  -  120
   0   0213
3
data-  000
   0   0000
0
metadata-  000
   0   0000
0
rbd -   21590723 53280
   1   0   77   75  3013595
378345507
  total used   22951425279068
  total avail19685615164
  total space20980898464

2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384251 mon.0
10.177.64.4:6789/0 36135 : [INF] osd.28 10.177.64.6:6806/824 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384275 mon.0
10.177.64.4:6789/0 36136 : [INF] osd.37 10.177.64.6:6841/29133 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384301 mon.0
10.177.64.4:6789/0 36137 : [INF] osd.7 10.177.64.4:6813/8223 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384327 mon.0
10.177.64.4:6789/0 36138 : [INF] osd.44 10.177.64.6:6859/2370 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384353 mon.0
10.177.64.4:6789/0 36139 : [INF] osd.49 10.177.64.6:6865/29878 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384384 mon.0
10.177.64.4:6789/0 36140 : [INF] osd.17 10.177.64.4:6827/5909 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384410 mon.0
10.177.64.4:6789/0 36141 : [INF] osd.12 10.177.64.4:6810/5410 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384435 mon.0
10.177.64.4:6789/0 36142 : [INF] osd.39 10.177.64.6:6843/12733 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384461 mon.0
10.177.64.4:6789/0 36143 : [INF] osd.42 10.177.64.6:6848/13067 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384485 mon.0
10.177.64.4:6789/0 36144 : [INF] osd.31 10.177.64.6:6840/1233 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384513 mon.0
10.177.64.4:6789/0 36145 : [INF] osd.36 10.177.64.6:6830/12573 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384537 mon.0
10.177.64.4:6789/0 36146 : [INF] osd.38 10.177.64.6:6833/32587 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384567 mon.0
10.177.64.4:6789/0 36147 : [INF] osd.5 10.177.64.4:6873/7842 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384596 mon.0
10.177.64.4:6789/0 36148 : [INF] osd.21 10.177.64.4:6844/11607 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384622 mon.0
10.177.64.4:6789/0 36149 : [INF] osd.23 10.177.64.4:6853/6826 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384661 mon.0
10.177.64.4:6789/0 36150 : [INF] osd.51 10.177.64.6:6858/15894 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384693 mon.0
10.177.64.4:6789/0 36151 : [INF] osd.48 10.177.64.6:6862/13476 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384723 mon.0
10.177.64.4:6789/0 36152 : [INF] osd.32 10.177.64.6:6815/3701 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:06:10.688085   log 2012-02-20 20:06:09.384759 mon.0
10.177.64.4:6789/0 36153 : [INF] osd.41 10.177.64.6:6847/1861 failed
(by osd.55 10.177.64.8:6809/28642)
2012-02-20 20:

Re: Serious problem after increase pg_num in pool

2012-02-20 Thread Sławomir Skowron
and this in ceph -w

2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611270 osd.76
10.177.64.8:6872/5395 49 : [ERR] mkpg 7.e up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611308 osd.76
10.177.64.8:6872/5395 50 : [ERR] mkpg 7.16 up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611339 osd.76
10.177.64.8:6872/5395 51 : [ERR] mkpg 7.1e up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611369 osd.76
10.177.64.8:6872/5395 52 : [ERR] mkpg 7.26 up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611399 osd.76
10.177.64.8:6872/5395 53 : [ERR] mkpg 7.2e up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611428 osd.76
10.177.64.8:6872/5395 54 : [ERR] mkpg 7.36 up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611458 osd.76
10.177.64.8:6872/5395 55 : [ERR] mkpg 7.3e up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611488 osd.76
10.177.64.8:6872/5395 56 : [ERR] mkpg 7.46 up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611517 osd.76
10.177.64.8:6872/5395 57 : [ERR] mkpg 7.4e up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611547 osd.76
10.177.64.8:6872/5395 58 : [ERR] mkpg 7.56 up [76,11] != acting [76]
2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611577 osd.76
10.177.64.8:6872/5395 59 : [ERR] mkpg 7.5e up [76,11] != acting [76]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618816 osd.20
10.177.64.4:6839/6735 54 : [ERR] mkpg 7.f up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618854 osd.20
10.177.64.4:6839/6735 55 : [ERR] mkpg 7.17 up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618883 osd.20
10.177.64.4:6839/6735 56 : [ERR] mkpg 7.1f up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618912 osd.20
10.177.64.4:6839/6735 57 : [ERR] mkpg 7.27 up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618941 osd.20
10.177.64.4:6839/6735 58 : [ERR] mkpg 7.2f up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618970 osd.20
10.177.64.4:6839/6735 59 : [ERR] mkpg 7.37 up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618999 osd.20
10.177.64.4:6839/6735 60 : [ERR] mkpg 7.3f up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619027 osd.20
10.177.64.4:6839/6735 61 : [ERR] mkpg 7.47 up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619056 osd.20
10.177.64.4:6839/6735 62 : [ERR] mkpg 7.4f up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619085 osd.20
10.177.64.4:6839/6735 63 : [ERR] mkpg 7.57 up [51,20,64] != acting
[20,51,64]
2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619113 osd.20
10.177.64.4:6839/6735 64 : [ERR] mkpg 7.5f up [51,20,64] != acting
[20,51,64]

2012/2/20 Sławomir Skowron :
> After increase number pg_num from 8 to 100 in .rgw.buckets i have some
> serious problems.
>
> pool name       category                 KB      objects       clones
>   degraded      unfound           rd        rd KB           wr
> wr KB
> .intent-log     -                       4662           19            0
>           0           0            0            0        26502
> 26501
> .log            -                          0            0            0
>           0           0            0            0       913732
> 913342
> .rgw            -                          1           10            0
>           0           0            1            0            9
>    7
> .rgw.buckets    -                   39582566        73707            0
>        8061           0        86594            0       610896
> 36050541
> .rgw.control    -                          0            1            0
>           0           0            0            0            0
>    0
> .users          -                          1            1            0
>           0           0            0            0            1
>    1
> .users.uid      -                          1            2            0
>           0           0            2            1            3
>    3
> data            -                          0            0            0
>           0           0            0            0            0
>    0
> metadata        -                          0            0            0
>           0           0            0            0            0
>    0
> rbd             -                   21590723         5328            0
>           1           0           77           75      3013595
> 378345507
>  total used       229514252        79068
&g

Re: Serious problem after increase pg_num in pool

2012-02-20 Thread Sławomir Skowron
40 GB in 3 copies in rgw bucket, and some data in RBD, but they can be
destroyed.

Ceph -s reports 224 GB in normal state.

Pozdrawiam

iSS

Dnia 20 lut 2012 o godz. 21:19 Sage Weil  napisał(a):

> Ooh, the pg split functionality is currently broken, and we weren't
> planning on fixing it for a while longer.  I didn't realize it was still
> possible to trigger from the monitor.
>
> I'm looking at how difficult it is to make it work (even inefficiently).
>
> How much data do you have in the cluster?
>
> sage
>
>
>
>
> On Mon, 20 Feb 2012, S?awomir Skowron wrote:
>
>> and this in ceph -w
>>
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611270 osd.76
>> 10.177.64.8:6872/5395 49 : [ERR] mkpg 7.e up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611308 osd.76
>> 10.177.64.8:6872/5395 50 : [ERR] mkpg 7.16 up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611339 osd.76
>> 10.177.64.8:6872/5395 51 : [ERR] mkpg 7.1e up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611369 osd.76
>> 10.177.64.8:6872/5395 52 : [ERR] mkpg 7.26 up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611399 osd.76
>> 10.177.64.8:6872/5395 53 : [ERR] mkpg 7.2e up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611428 osd.76
>> 10.177.64.8:6872/5395 54 : [ERR] mkpg 7.36 up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611458 osd.76
>> 10.177.64.8:6872/5395 55 : [ERR] mkpg 7.3e up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611488 osd.76
>> 10.177.64.8:6872/5395 56 : [ERR] mkpg 7.46 up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611517 osd.76
>> 10.177.64.8:6872/5395 57 : [ERR] mkpg 7.4e up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611547 osd.76
>> 10.177.64.8:6872/5395 58 : [ERR] mkpg 7.56 up [76,11] != acting [76]
>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611577 osd.76
>> 10.177.64.8:6872/5395 59 : [ERR] mkpg 7.5e up [76,11] != acting [76]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618816 osd.20
>> 10.177.64.4:6839/6735 54 : [ERR] mkpg 7.f up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618854 osd.20
>> 10.177.64.4:6839/6735 55 : [ERR] mkpg 7.17 up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618883 osd.20
>> 10.177.64.4:6839/6735 56 : [ERR] mkpg 7.1f up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618912 osd.20
>> 10.177.64.4:6839/6735 57 : [ERR] mkpg 7.27 up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618941 osd.20
>> 10.177.64.4:6839/6735 58 : [ERR] mkpg 7.2f up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618970 osd.20
>> 10.177.64.4:6839/6735 59 : [ERR] mkpg 7.37 up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618999 osd.20
>> 10.177.64.4:6839/6735 60 : [ERR] mkpg 7.3f up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619027 osd.20
>> 10.177.64.4:6839/6735 61 : [ERR] mkpg 7.47 up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619056 osd.20
>> 10.177.64.4:6839/6735 62 : [ERR] mkpg 7.4f up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619085 osd.20
>> 10.177.64.4:6839/6735 63 : [ERR] mkpg 7.57 up [51,20,64] != acting
>> [20,51,64]
>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619113 osd.20
>> 10.177.64.4:6839/6735 64 : [ERR] mkpg 7.5f up [51,20,64] != acting
>> [20,51,64]
>>
>> 2012/2/20 S?awomir Skowron :
>>> After increase number pg_num from 8 to 100 in .rgw.buckets i have some
>>> serious problems.
>>>
>>> pool name   category KB  objects   clones
>>>   degraded  unfound   rdrd KB   wr
>>> wr KB
>>> .intent-log -   4662   190
>>>   0   00026502
>>> 26501
>>> .log-  000
>>>   0   000   913732
>>> 913342
>>> .rgw-  1   100
>>>   0   0109
>>>7
>>> .rgw.buckets-   39582566737070
>>>8061   0865940   610896
>>> 36050541
>>> .rgw.control-  010
>>>   0   0000
>>>0
>>> .users  -  110
>>>   0  

Re: Serious problem after increase pg_num in pool

2012-02-20 Thread Sławomir Skowron
If there is no chance to stabilize this cluster i will try something like this.

- stop one machine in cluster.
- check if its still ok, and data are available
- make new fs on one machine
- migrate data by rados via obsync
- expand new cluster by second, and third machine
- change keys for radosgw etc
- new cluster is up with old dara

I can be done to migrate objects in .rgw.buckets pool via obsync ??

Dnia 21 lut 2012 o godz. 07:46 "Sławomir Skowron"  napisał(a):

> 40 GB in 3 copies in rgw bucket, and some data in RBD, but they can be
> destroyed.
>
> Ceph -s reports 224 GB in normal state.
>
> Pozdrawiam
>
> iSS
>
> Dnia 20 lut 2012 o godz. 21:19 Sage Weil  napisał(a):
>
>> Ooh, the pg split functionality is currently broken, and we weren't
>> planning on fixing it for a while longer.  I didn't realize it was still
>> possible to trigger from the monitor.
>>
>> I'm looking at how difficult it is to make it work (even inefficiently).
>>
>> How much data do you have in the cluster?
>>
>> sage
>>
>>
>>
>>
>> On Mon, 20 Feb 2012, S?awomir Skowron wrote:
>>
>>> and this in ceph -w
>>>
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611270 osd.76
>>> 10.177.64.8:6872/5395 49 : [ERR] mkpg 7.e up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611308 osd.76
>>> 10.177.64.8:6872/5395 50 : [ERR] mkpg 7.16 up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611339 osd.76
>>> 10.177.64.8:6872/5395 51 : [ERR] mkpg 7.1e up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611369 osd.76
>>> 10.177.64.8:6872/5395 52 : [ERR] mkpg 7.26 up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611399 osd.76
>>> 10.177.64.8:6872/5395 53 : [ERR] mkpg 7.2e up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611428 osd.76
>>> 10.177.64.8:6872/5395 54 : [ERR] mkpg 7.36 up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611458 osd.76
>>> 10.177.64.8:6872/5395 55 : [ERR] mkpg 7.3e up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611488 osd.76
>>> 10.177.64.8:6872/5395 56 : [ERR] mkpg 7.46 up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611517 osd.76
>>> 10.177.64.8:6872/5395 57 : [ERR] mkpg 7.4e up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611547 osd.76
>>> 10.177.64.8:6872/5395 58 : [ERR] mkpg 7.56 up [76,11] != acting [76]
>>> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611577 osd.76
>>> 10.177.64.8:6872/5395 59 : [ERR] mkpg 7.5e up [76,11] != acting [76]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618816 osd.20
>>> 10.177.64.4:6839/6735 54 : [ERR] mkpg 7.f up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618854 osd.20
>>> 10.177.64.4:6839/6735 55 : [ERR] mkpg 7.17 up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618883 osd.20
>>> 10.177.64.4:6839/6735 56 : [ERR] mkpg 7.1f up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618912 osd.20
>>> 10.177.64.4:6839/6735 57 : [ERR] mkpg 7.27 up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618941 osd.20
>>> 10.177.64.4:6839/6735 58 : [ERR] mkpg 7.2f up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618970 osd.20
>>> 10.177.64.4:6839/6735 59 : [ERR] mkpg 7.37 up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618999 osd.20
>>> 10.177.64.4:6839/6735 60 : [ERR] mkpg 7.3f up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619027 osd.20
>>> 10.177.64.4:6839/6735 61 : [ERR] mkpg 7.47 up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619056 osd.20
>>> 10.177.64.4:6839/6735 62 : [ERR] mkpg 7.4f up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619085 osd.20
>>> 10.177.64.4:6839/6735 63 : [ERR] mkpg 7.57 up [51,20,64] != acting
>>> [20,51,64]
>>> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619113 osd.20
>>> 10.177.

Re: Missing required features 2000

2012-02-21 Thread Sławomir Skowron
Ok Sorry for trouble, this is old version of ceph on one machine. All
packages has been updated to 0.42, except a ceph package :(

2012/2/21 Sławomir Skowron :
> What is that mean v0.42 in mon log:
>
> 2012-02-21 14:31:56.188513 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6815/11843 pipe(0x133f280 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.215195 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0x133f780 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.215376 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6809/11390 pipe(0x133f780 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.254106 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0x1672a00 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.254280 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6866/8497 pipe(0x1672a00 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.254888 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0x11e5000 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.255063 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6866/8497 pipe(0x11e5000 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.278601 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0x11e5280 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.278775 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6803/10948 pipe(0x11e5280 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.289903 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0x11e5780 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.290199 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6830/7347 pipe(0x11e5780 sd=36 pgs=0 cs=0 l=1).peer
> missing required features 2000
> 2012-02-21 14:31:56.290768 7f4c59cee700 -- 10.177.64.4:6789/0 >> :/0
> pipe(0xe95c80 sd=36 pgs=0 cs=0 l=0).accept sd=36
> 2012-02-21 14:31:56.290944 7f4c59cee700 -- 10.177.64.4:6789/0 >>
> 10.177.64.6:6830/7347 pipe(0xe95c80 sd=36 pgs=0 cs=0 l=1).peer missing
> required features 200
>
> After adding a new osd's into cluster.
>
> --
> -
> Pozdrawiam
>
> Sławek "sZiBis" Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Serious problem after increase pg_num in pool

2012-02-21 Thread Sławomir Skowron
Unfortunately 3 hours ago i made a decision about re-init cluster :(

Some data are available via rados, but cluster was unstable, and
migration of data was difficult, on time pression from outside :)

After init a new cluster on one machine, with clean pools i was able
to increase number of pg in .rgw pools

Now cluster is stable in 0.42 version, and new data going in.

Dnia 21 lut 2012 o godz. 17:00 Sage Weil  napisał(a):

> On Tue, 21 Feb 2012, S?awomir Skowron wrote:
>> If there is no chance to stabilize this cluster i will try something like 
>> this.
>>
>> - stop one machine in cluster.
>> - check if its still ok, and data are available
>> - make new fs on one machine
>> - migrate data by rados via obsync
>> - expand new cluster by second, and third machine
>> - change keys for radosgw etc
>> - new cluster is up with old dara
>>
>> I can be done to migrate objects in .rgw.buckets pool via obsync ??
>
> obsync operates at the s3/switch bucket level, of which many are stored
> in .rgw.buckets.  You'll need to sync each of those buckets individually.
>
> Before you do that, though, I have a pg split branch that is almost ready.
> If you don't mind, I'd be curious if it can handle your semi-broken
> cluster.  I'll have it pushed in about 2 hours, if you can wait!  If not,
> no worries.
>
> sage
>
>
>
>>
>> Dnia 21 lut 2012 o godz. 07:46 "Sÿÿawomir Skowron"  
>> napisaÿÿ(a):
>>
>>> 40 GB in 3 copies in rgw bucket, and some data in RBD, but they can be
>>> destroyed.
>>>
>>> Ceph -s reports 224 GB in normal state.
>>>
>>> Pozdrawiam
>>>
>>> iSS
>>>
>>> Dnia 20 lut 2012 o godz. 21:19 Sage Weil  napisaÿÿ(a):
>>>
 Ooh, the pg split functionality is currently broken, and we weren't
 planning on fixing it for a while longer.  I didn't realize it was still
 possible to trigger from the monitor.

 I'm looking at how difficult it is to make it work (even inefficiently).

 How much data do you have in the cluster?

 sage




 On Mon, 20 Feb 2012, S?awomir Skowron wrote:

> and this in ceph -w
>
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611270 osd.76
> 10.177.64.8:6872/5395 49 : [ERR] mkpg 7.e up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611308 osd.76
> 10.177.64.8:6872/5395 50 : [ERR] mkpg 7.16 up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611339 osd.76
> 10.177.64.8:6872/5395 51 : [ERR] mkpg 7.1e up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611369 osd.76
> 10.177.64.8:6872/5395 52 : [ERR] mkpg 7.26 up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611399 osd.76
> 10.177.64.8:6872/5395 53 : [ERR] mkpg 7.2e up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611428 osd.76
> 10.177.64.8:6872/5395 54 : [ERR] mkpg 7.36 up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611458 osd.76
> 10.177.64.8:6872/5395 55 : [ERR] mkpg 7.3e up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611488 osd.76
> 10.177.64.8:6872/5395 56 : [ERR] mkpg 7.46 up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611517 osd.76
> 10.177.64.8:6872/5395 57 : [ERR] mkpg 7.4e up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611547 osd.76
> 10.177.64.8:6872/5395 58 : [ERR] mkpg 7.56 up [76,11] != acting [76]
> 2012-02-20 20:34:13.531857   log 2012-02-20 20:34:07.611577 osd.76
> 10.177.64.8:6872/5395 59 : [ERR] mkpg 7.5e up [76,11] != acting [76]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618816 osd.20
> 10.177.64.4:6839/6735 54 : [ERR] mkpg 7.f up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618854 osd.20
> 10.177.64.4:6839/6735 55 : [ERR] mkpg 7.17 up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618883 osd.20
> 10.177.64.4:6839/6735 56 : [ERR] mkpg 7.1f up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618912 osd.20
> 10.177.64.4:6839/6735 57 : [ERR] mkpg 7.27 up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618941 osd.20
> 10.177.64.4:6839/6735 58 : [ERR] mkpg 7.2f up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618970 osd.20
> 10.177.64.4:6839/6735 59 : [ERR] mkpg 7.37 up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.618999 osd.20
> 10.177.64.4:6839/6735 60 : [ERR] mkpg 7.3f up [51,20,64] != acting
> [20,51,64]
> 2012-02-20 20:34:17.015290   log 2012-02-20 20:34:07.619027 osd.20
> 10.177.64.4:6839/6735 61 : [ERR] mkpg 7.47 up [51,20,64] != acti

RadosGW problems with copy in s3

2012-02-28 Thread Sławomir Skowron
After some parallel copy command via botto for many files everything,
going to slow down, and eventualy got timeout from nginx@radosgw.

# ceph -s
2012-02-28 12:16:57.818566pg v20743: 8516 pgs: 8516 active+clean;
2154 MB data, 53807 MB used, 20240 GB / 21379 GB avail
2012-02-28 12:16:57.845274   mds e1: 0/0/1 up
2012-02-28 12:16:57.845307   osd e719: 78 osds: 78 up, 78 in
2012-02-28 12:16:57.845512   log 2012-02-28 12:14:41.578889 osd.24
10.177.64.4:6839/2063 20 : [WRN] old request
osd_op(client.62138.0:1162 3_.archive/1330427624.13/37828,serwisy.html
[delete,create 0~0,setxattr user.rgw.acl (62),setxattr
user.rgw.content_type (10),setxattr user.rgw.etag (33),setxattr
user.rgw.idtag,setxattr user.rgw.x-amz-copy-source (33),setxattr
user.rgw.x-amz-meta-checksum (33),setxattr
user.rgw.x-amz-metadata-directive (5),setxattr
user.rgw.x-amz-storage-class (9),setxattr user.rgw.idtag (32),setxattr
user.rgw.shadow_name (74),clonerange 0~17913 from
3_.archive/1330427624.13/37828,serwisy.html_g8DHGgLP7Dhi7YSAzYtu_FGT_96NsiF/head
offset 0] 7.645e07f5) v4 received at 2012-02-28 12:14:10.683511
currently waiting for sub ops
2012-02-28 12:16:57.845622   mon e3: 3 mons at
{0=10.177.64.4:6789/0,1=10.177.64.6:6789/0,2=10.177.64.8:6789/0}

and other one from ceph -s

2012-02-28 12:32:16.697642pg v21010: 8516 pgs: 8516 active+clean;
2161 MB data, 53839 MB used, 20240 GB / 21379 GB avail
2012-02-28 12:32:16.722796   mds e1: 0/0/1 up
2012-02-28 12:32:16.722938   osd e719: 78 osds: 78 up, 78 in
2012-02-28 12:32:16.723204   log 2012-02-28 12:28:30.015814 osd.24
10.177.64.4:6839/2063 78 : [WRN] old request
osd_op(client.62020.0:3344 .dir.3 [call rgw.bucket_complete_op]
7.ccb26a35) v4 received at 2012-02-28 12:27:59.458121 currently
waiting for sub ops
2012-02-28 12:32:16.723327   mon e3: 3 mons at
{0=10.177.64.4:6789/0,1=10.177.64.6:6789/0,2=10.177.64.8:6789/0}

from nginx timeout after 33 seconds.

10.177.8.12 - - - [28/Feb/2012:12:14:19 +0100] "PUT
/www.onet.pl/.archive/1330427624.13/_d/dodatki/887184387c048cb30c88f22169f0f74d%2Cwyslij_belka.gif
HTTP/1.1" rlength: 528 bsent: 0 rtime: 33.001 urtime: - status: 499
bbsent: 0 httpref: "-" useragent: "Boto/2.2.2 (linux2)"

Inside attachment logs from radosgw in debug mode.

In normal put, get, head etc, everything works OK. Only parallel copy
going to be mess i think :).

I see some in radosgw log:

2012-02-28 12:27:34.774107 7ff99bfef700 can't clone object
www.onet.pl:.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d,wyslij_belka.gif
to shadow object, tag/shadow_obj haven't been set

and

2012-02-28 12:27:52.798272 7ff99bfef700 prepare_atomic_for_write_impl:
state is not atomic. state=0x7ff93c02cd58

I use ceph 0.42 with ext4 as local filesystems, maybe xattr in ext4
make that problem ??.

-- 
-
Pozdrawiam

Sławek Skowron
2012-02-28 12:27:34.767514 7ff99bfef700 dequeued request req=0x7ff93c08fb10
2012-02-28 12:27:34.767531 7ff99bfef700 RGWWQ: empty
2012-02-28 12:27:34.767544 7ff99bfef700 == starting new request 
req=0x7ff93c08fb10 =
2012-02-28 12:27:34.767593 7ff99bfef700 req 650:0.47initializing
2012-02-28 12:27:34.767601 7ff99bfef700 in url_decode with 
/test/.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d%2Cwyslij_belka.gif
2012-02-28 12:27:34.767606 7ff99bfef700 
src=/test/.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d%2Cwyslij_belka.gif
2012-02-28 12:27:34.767612 7ff99bfef700 
dest=/test/.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d,wyslij_belka.gif
2012-02-28 12:27:34.767630 7ff99bfef700 in url_decode with
2012-02-28 12:27:34.767636 7ff99bfef700 src=
2012-02-28 12:27:34.767640 7ff99bfef700 dest=
2012-02-28 12:27:34.767646 7ff99bfef700 parsed: name= val=
2012-02-28 12:27:34.767662 7ff99bfef700 
s->object=.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d,wyslij_belka.gif
 s->bucket=test
2012-02-28 12:27:34.767675 7ff99bfef700 meta>> HTTP_X_AMZ_STORAGE_CLASS=STANDARD
2012-02-28 12:27:34.767690 7ff99bfef700 meta>> 
HTTP_X_AMZ_COPY_SOURCE=test/_d/dodatki/887184387c048cb30c88f22169f0f74d%2Cwyslij_belka.gif
2012-02-28 12:27:34.767704 7ff99bfef700 meta>> 
HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
2012-02-28 12:27:34.767717 7ff99bfef700 x>> 
x-amz-copy-source:test/_d/dodatki/887184387c048cb30c88f22169f0f74d%2Cwyslij_belka.gif
2012-02-28 12:27:34.767723 7ff99bfef700 x>> x-amz-metadata-directive:COPY
2012-02-28 12:27:34.767727 7ff99bfef700 x>> x-amz-storage-class:STANDARD
2012-02-28 12:27:34.767736 7ff99bfef700 FCGI_ROLE=RESPONDER
2012-02-28 12:27:34.767741 7ff99bfef700 SCRIPT_FILENAME=/var/www/radosgw.fcgi
2012-02-28 12:27:34.767745 7ff99bfef700 QUERY_STRING=
2012-02-28 12:27:34.767750 7ff99bfef700 REQUEST_METHOD=PUT
2012-02-28 12:27:34.767755 7ff99bfef700 CONTENT_TYPE=
2012-02-28 12:27:34.767759 7ff99bfef700 CONTENT_LENGTH=0
2012-02-28 12:27:34.767764 7ff99bfef700 
SCRIPT_NAME=/test/.archive/1330428452.43/_d/dodatki/887184387c048cb30c88f22169f0f74d,wyslij_belka.gif
2012-02

Re: RadosGW problems with copy in s3

2012-02-29 Thread Sławomir Skowron
Ok, it's intentional.

We are checking meta info about files, then, checking md5 of file
content. In parallel, updating object that have change, and then
archiving this objects in another key, and last thing is deleting
objects that expires.

This happens over and over, because, this site is changing many times.

Now i don't have any idea, how to workaround this problem, without
shutdown this app :(

Another question is, why data from radosgw writing almost everything
only on one osd, and copies to 2 others, and only those. Is anyone can
explain this to me ??

/dev/sdu  275G  605M  260G   1% /vol0/data/osd.18
/dev/sdw  275G  608M  260G   1% /vol0/data/osd.21
/dev/sdz  275G  638M  260G   1% /vol0/data/osd.24
/dev/sde  275G  605M  260G   1% /vol0/data/osd.3
/dev/sdr  275G  605M  260G   1% /vol0/data/osd.16
/dev/sdaa 275G  696M  260G   1% /vol0/data/osd.25
/dev/sdp  275G  605M  260G   1% /vol0/data/osd.14
/dev/sdd  275G  605M  260G   1% /vol0/data/osd.2
/dev/sdk  275G  605M  260G   1% /vol0/data/osd.8
/dev/sdh  275G  608M  260G   1% /vol0/data/osd.6
/dev/sds  275G  605M  260G   1% /vol0/data/osd.17
/dev/sdf  275G  638M  260G   1% /vol0/data/osd.4
/dev/sdj  275G  637M  260G   1% /vol0/data/osd.9
/dev/sdc  275G  604M  260G   1% /vol0/data/osd.1
/dev/sdv  275G  2.7G  258G   2% /vol0/data/osd.20
/dev/sdn  275G  607M  260G   1% /vol0/data/osd.12
/dev/sdo  275G  605M  260G   1% /vol0/data/osd.13
/dev/sdg  275G  605M  260G   1% /vol0/data/osd.5
/dev/sdy  275G  633M  260G   1% /vol0/data/osd.23
/dev/sdm  275G  605M  260G   1% /vol0/data/osd.11
/dev/sdx  275G  605M  260G   1% /vol0/data/osd.22
/dev/sdb  275G  608M  260G   1% /vol0/data/osd.0
/dev/sdq  275G  605M  260G   1% /vol0/data/osd.15
/dev/sdl  275G  605M  260G   1% /vol0/data/osd.10
/dev/sdi  275G  605M  260G   1% /vol0/data/osd.7
/dev/sdt  275G  604M  260G   1% /vol0/data/osd.19


2012/2/28 Yehuda Sadeh Weinraub :
> (resending to list)
>
> On Tue, Feb 28, 2012 at 11:53 AM, Sławomir Skowron
>  wrote:
>>
>> 2012/2/28 Yehuda Sadeh Weinraub :
>> > On Tue, Feb 28, 2012 at 3:43 AM, Sławomir Skowron
>> >  wrote:
>> >> After some parallel copy command via botto for many files everything,
>> >> going to slow down, and eventualy got timeout from nginx@radosgw.
>
>
> Note that you're overwriting the same object. Is that intentional?
>
>>
>> Reproduced logs in attachment.
>>
>
> I was able to recreate the issue. The problem is specifically related
> to the fact that you're overwriting the same object from 10s of
> parallel threads. What happens is that our race-detection code (that
> is related to the radosgw atomic object write) detects that the
> underlying object has been written and it needs to reread its header
> before overwriting it. This works well when there are a few writers to
> the same object, but doesn't scale very well. I opened issue #2120 to
> track the issue.

You join this with Tasks #1956, and when this task go to development ??

> Yehuda

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RadosGW problems with copy in s3

2012-03-05 Thread Sławomir Skowron
2012/3/1 Sławomir Skowron :
> 2012/2/29 Yehuda Sadeh Weinraub :
>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron
>>  wrote:
>>>
>>> Ok, it's intentional.
>>>
>>> We are checking meta info about files, then, checking md5 of file
>>> content. In parallel, updating object that have change, and then
>>> archiving this objects in another key, and last thing is deleting
>>> objects that expires.
>>>
>>> This happens over and over, because, this site is changing many times.
>>>
>>> Now i don't have any idea, how to workaround this problem, without
>>> shutdown this app :(
>>
>> I looked at your osd log again, and there are other things that don't
>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug
>> filestore = 20'.
>
> osd.24 almost 10 minutes of log in debug, as above in attachment.
>
>> Other than that, I just pushed a workaround that might improve things.
>> It's on the wip-rgw-atomic-no-retry branch on github (based on
>> 0.42.2), so you might want to give it a spin and let us know whether
>> it actually improved things.
>
> Ok i will try, and let you know soon.

Unfortunately, no improvment after upgrade for this version.

Ningx reports growing time for radosgw backend, and after increase
this time to 33 seconds, changing into timeout.

10.177.8.11 - - - [05/Mar/2012:10:14:10 +0100] "PUT
/test/_x/sidebar2/_d/widgets/sym/button_bw.png HTTP/1.1" rlength: 1049
bsent: 232 rtime: 2.487 urtime: 2.485 status: 200 bbsent: 25 httpref:
"-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.12 - - - [05/Mar/2012:10:14:10 +0100] "PUT
/test/_x/sidebar2/_d/widgets/sym/button_bw.png?acl HTTP/1.1" rlength:
353 bsent: 157 rtime: 0.013 urtime: 0.013 status: 200 bbsent: 5
httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:13 +0100] "PUT
/test/_x/sidebar2/_d/widgets/newmail/list.png?acl HTTP/1.1" rlength:
352 bsent: 157 rtime: 0.006 urtime: 0.006 status: 200 bbsent: 5
httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:17 +0100] "PUT
/test/_x/sidebar2/_d/widgets/zumisearch/button.png?acl HTTP/1.1"
rlength: 357 bsent: 157 rtime: 0.006 urtime: 0.006 status: 200 bbsent:
5 httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:21 +0100] "PUT
/test/_g/sidebar/_d/widgets/globaltime/city_point_green.gif HTTP/1.1"
rlength: 1297 bsent: 232 rtime: 13.427 urtime: 13.426 status: 200
bbsent: 25 httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:26 +0100] "PUT
/test/_x/sidebar2/_d/widgets/zum/zumikombw.png HTTP/1.1" rlength: 4278
bsent: 232 rtime: 18.384 urtime: 18.383 status: 200 bbsent: 25
httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.12 - - - [05/Mar/2012:10:14:29 +0100] "PUT
/test/_x/sidebar2/_d/widgets/allegro/allegro.png HTTP/1.1" rlength:
4129 bsent: 232 rtime: 21.963 urtime: 21.962 status: 200 bbsent: 25
httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:32 +0100] "PUT
/test/_g/sidebar/_d/popover/mdcbtn.gif HTTP/1.1" rlength: 1532 bsent:
232 rtime: 25.105 urtime: 25.104 status: 200 bbsent: 25 httpref: "-"
useragent: "Boto/2.2.2 (linux2)"
10.177.8.12 - - - [05/Mar/2012:10:14:40 +0100] "PUT
/test/40622%2Cserwisy.html HTTP/1.1" rlength: 18178 bsent: 232 rtime:
32.651 urtime: 32.648 status: 200 bbsent: 25 httpref: "-" useragent:
"Boto/2.2.2 (linux2)"
10.177.8.12 - - - [05/Mar/2012:10:14:40 +0100] "PUT
/test/_x/sidebar/cssb/popover/btn-nrm.gif HTTP/1.1" rlength: 539
bsent: 25 rtime: 33.003 urtime: - status: 499 bbsent: 25 httpref: "-"
useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:40 +0100] "PUT
/test/DaneAjax%2CWiadomosciLokalne%2Cajax.json%3Fmiasto%3Dslask%26z%3D0%26v%3D201203051014
HTTP/1.1" rlength: 1572 bsent: 25 rtime: 33.002 urtime: - status: 499
bbsent: 25 httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.12 - - - [05/Mar/2012:10:14:40 +0100] "PUT
/test/37828%2Cserwisy.html HTTP/1.1" rlength: 18363 bsent: 25 rtime:
33.003 urtime: - status: 499 bbsent: 25 httpref: "-" useragent:
"Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:40 +0100] "PUT
/test/DaneAjax%2CWiadomosciLokalne%2Cajax.json%3Fmiasto%3Dpoznan%26z%3D0%26v%3D201203051014
HTTP/1.1" rlength: 1602 bsent: 25 rtime: 33.004 urtime: - status: 499
bbsent: 25 httpref: "-" useragent: "Boto/2.2.2 (linux2)"
10.177.8.11 - - - [05/Mar/2012:10:14:40 +0100] &

Re: RadosGW problems with copy in s3

2012-03-05 Thread Sławomir Skowron
On 5 mar 2012, at 19:59, Yehuda Sadeh Weinraub
 wrote:

> On Mon, Mar 5, 2012 at 2:23 AM, Sławomir Skowron
>  wrote:
>> 2012/3/1 Sławomir Skowron :
>>> 2012/2/29 Yehuda Sadeh Weinraub :
>>>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron
>>>>  wrote:
>>>>>
>>>>> Ok, it's intentional.
>>>>>
>>>>> We are checking meta info about files, then, checking md5 of file
>>>>> content. In parallel, updating object that have change, and then
>>>>> archiving this objects in another key, and last thing is deleting
>>>>> objects that expires.
>>>>>
>>>>> This happens over and over, because, this site is changing many times.
>>>>>
>>>>> Now i don't have any idea, how to workaround this problem, without
>>>>> shutdown this app :(
>>>>
>>>> I looked at your osd log again, and there are other things that don't
>>>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug
>>>> filestore = 20'.
>>>
>>> osd.24 almost 10 minutes of log in debug, as above in attachment.
>>>
>>>> Other than that, I just pushed a workaround that might improve things.
>>>> It's on the wip-rgw-atomic-no-retry branch on github (based on
>>>> 0.42.2), so you might want to give it a spin and let us know whether
>>>> it actually improved things.
>>>
>>> Ok i will try, and let you know soon.
>>
>> Unfortunately, no improvment after upgrade for this version.
>>
> It looks like an issue with updating the bucket index, but I'm having
> trouble confirming it, as the log provided (of osd.24) doesn't contain
> any relevant operations. If you could provide a log from the relevant
> osd it may be very helpful.
>
> You can find the relevant osd by looking at an operation that took too
> long, and look for a request like the following:
>
> 2012-02-28 20:20:10.944859 7fb1affb7700 -- 10.177.64.6:0/1020439 -->
> 10.177.64.4:6839/7954 -- osd_op(client.65007.0:587 .dir.3 [call
> rgw.bucket_prepare_op] 7.ccb26a35) v4 -- ?+0 0xf25270 con 0xbcd1c0
>
> It would be easiest looking for the reply to that request as it will
> contain the osd id (search for a line that contains osd_op_reply and
> the client.65007.0:587 request id).
>
> In the mean time, I created issue #2139 for a probable culprit. Having
> the relevant logs will allow us to verify whether you're hitting that
> or another issue.
>
> Thanks,
> Yehuda

Ok, because of time difference between as i will try too find this on
the morning in job. If there will be insufficient verbosity of logs i
will try too start all OSD in debug, as you write earlier, and then
generate, the problem again.
I try to send logs as soon as possible.

Regards
Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RadosGW problems with copy in s3

2012-03-06 Thread Sławomir Skowron
2012/3/5 Sławomir Skowron :
> On 5 mar 2012, at 19:59, Yehuda Sadeh Weinraub
>  wrote:
>
>> On Mon, Mar 5, 2012 at 2:23 AM, Sławomir Skowron
>>  wrote:
>>> 2012/3/1 Sławomir Skowron :
>>>> 2012/2/29 Yehuda Sadeh Weinraub :
>>>>> On Wed, Feb 29, 2012 at 5:06 AM, Sławomir Skowron
>>>>>  wrote:
>>>>>>
>>>>>> Ok, it's intentional.
>>>>>>
>>>>>> We are checking meta info about files, then, checking md5 of file
>>>>>> content. In parallel, updating object that have change, and then
>>>>>> archiving this objects in another key, and last thing is deleting
>>>>>> objects that expires.
>>>>>>
>>>>>> This happens over and over, because, this site is changing many times.
>>>>>>
>>>>>> Now i don't have any idea, how to workaround this problem, without
>>>>>> shutdown this app :(
>>>>>
>>>>> I looked at your osd log again, and there are other things that don't
>>>>> look right. I'll also need you to turn on 'debug osd = 20' and 'debug
>>>>> filestore = 20'.
>>>>
>>>> osd.24 almost 10 minutes of log in debug, as above in attachment.
>>>>
>>>>> Other than that, I just pushed a workaround that might improve things.
>>>>> It's on the wip-rgw-atomic-no-retry branch on github (based on
>>>>> 0.42.2), so you might want to give it a spin and let us know whether
>>>>> it actually improved things.
>>>>
>>>> Ok i will try, and let you know soon.
>>>
>>> Unfortunately, no improvment after upgrade for this version.
>>>
>> It looks like an issue with updating the bucket index, but I'm having
>> trouble confirming it, as the log provided (of osd.24) doesn't contain
>> any relevant operations. If you could provide a log from the relevant
>> osd it may be very helpful.
>>
>> You can find the relevant osd by looking at an operation that took too
>> long, and look for a request like the following:
>>
>> 2012-02-28 20:20:10.944859 7fb1affb7700 -- 10.177.64.6:0/1020439 -->
>> 10.177.64.4:6839/7954 -- osd_op(client.65007.0:587 .dir.3 [call
>> rgw.bucket_prepare_op] 7.ccb26a35) v4 -- ?+0 0xf25270 con 0xbcd1c0
>>
>> It would be easiest looking for the reply to that request as it will
>> contain the osd id (search for a line that contains osd_op_reply and
>> the client.65007.0:587 request id).
>>
>> In the mean time, I created issue #2139 for a probable culprit. Having
>> the relevant logs will allow us to verify whether you're hitting that
>> or another issue.
>>
>> Thanks,
>> Yehuda
>
> Ok, because of time difference between as i will try too find this on
> the morning in job. If there will be insufficient verbosity of logs i
> will try too start all OSD in debug, as you write earlier, and then
> generate, the problem again.
> I try to send logs as soon as possible.
>
> Regards
> Slawomir Skowron

All logs from osd.24, osd.62, and osd.36 with osd debug =20 and
filestore debug = 20 from 2012-03-06 10:25 and more.

http://217.144.195.170/ceph/osd.24.log.tar.gz   (348MB) - machine 1 - rack1
http://217.144.195.170/ceph/osd.36.log.tar.gz   (26MB) - machine 2 - rack 2
http://217.144.195.170/ceph/osd.62.log.tar.gz   (23MB) - machine3 - rack 3

-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RadosGW problems with copy in s3

2012-03-06 Thread Sławomir Skowron
On 6 mar 2012, at 18:53, Yehuda Sadeh Weinraub
 wrote:

> On Tue, Mar 6, 2012 at 2:08 AM, Sławomir Skowron  wrote:
>
>> All logs from osd.24, osd.62, and osd.36 with osd debug =20 and
>> filestore debug = 20 from 2012-03-06 10:25 and more.
>>
>> http://217.144.195.170/ceph/osd.24.log.tar.gz   (348MB) - machine 1 - rack1
>> http://217.144.195.170/ceph/osd.36.log.tar.gz   (26MB) - machine 2 - rack 2
>> http://217.144.195.170/ceph/osd.62.log.tar.gz   (23MB) - machine3 - rack 3
>>
>
> I looked at the logs, and it seems that you hit the tmap scaling
> issue. Luckily we're fixing that for 0.44, so it's going to be fixed
> in the next version.
>
> Yehuda

That's wonderful news. Thanks for everything.

Regards
Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RadosGW problems with copy in s3

2012-03-26 Thread Sławomir Skowron
After some tests. PUT/GET/DELETE via Radosgw now in version 0.44 works
much better.

End of this topic.

Thanks.

2012/3/6 Sławomir Skowron :
> On 6 mar 2012, at 18:53, Yehuda Sadeh Weinraub
>  wrote:
>
>> On Tue, Mar 6, 2012 at 2:08 AM, Sławomir Skowron  wrote:
>>
>>> All logs from osd.24, osd.62, and osd.36 with osd debug =20 and
>>> filestore debug = 20 from 2012-03-06 10:25 and more.
>>>
>>> http://217.144.195.170/ceph/osd.24.log.tar.gz   (348MB) - machine 1 - rack1
>>> http://217.144.195.170/ceph/osd.36.log.tar.gz   (26MB) - machine 2 - rack 2
>>> http://217.144.195.170/ceph/osd.62.log.tar.gz   (23MB) - machine3 - rack 3
>>>
>>
>> I looked at the logs, and it seems that you hit the tmap scaling
>> issue. Luckily we're fixing that for 0.44, so it's going to be fixed
>> in the next version.
>>
>> Yehuda
>
> That's wonderful news. Thanks for everything.
>
> Regards
> Slawomir Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RBD attach via libvirt to kvm vm - VM kernel hang

2012-03-28 Thread Sławomir Skowron
Dom0 - Ubuntu oneiric, kernel 3.0.0-16-server.

ii  kvm
1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu10 dummy transitional package
from kvm to qemu-kvm
ii  qemu1.0+noroms-0ubuntu10
  dummy transitional package from qemu to qemu-kvm
ii  qemu-common 1.0+noroms-0ubuntu10
  qemu common functionality (bios, documentation, etc)
ii  qemu-kvm1.0+noroms-0ubuntu10
  Full virtualization on i386 and amd64 hardware
ii  qemu-utils  1.0+noroms-0ubuntu10
  qemu utilities
ii  libvirt-bin 0.9.9~release1-2ubuntu6
  programs for the libvirt library
ii  libvirt00.9.9~release1-2ubuntu6
  library for interfacing with different virtualization
systems
ii  python-libvirt  0.9.9~release1-2ubuntu6
  libvirt Python bindings


 
 
 
 
 
 
 


After:

virsh attach-device on-01 /tmp/rbd.xml
Device attached successfully

libvirt log is clean.

VM on-01 with config like in attachment (dumpxml) - hang, after
attaching rbd device with kernel_bug in attachment.

With compiled vanilia kvm 1.0, this bug exists, same as above ubuntu
kvm version.

-- 
-
Regards

Sławek Skowron
[   68.630913] BUG: unable to handle kernel NULL pointer dereference at 
0049
[   68.632016] IP: [] pci_find_capability+0x15/0x60
[   68.632016] PGD 1b537067 PUD 1afed067 PMD 0
[   68.632016] Oops:  [#1] SMP
[   68.632016] CPU 0
[   68.632016] Modules linked in: bonding psmouse virtio_balloon serio_raw 
i2c_piix4 acpiphp lp parport floppy ixgbevf
[   68.632016]
[   68.632016] Pid: 17, comm: kworker/0:1 Not tainted 3.0.0-13-xen #22 Bochs 
Bochs
[   68.632016] RIP: 0010:[]  [] 
pci_find_capability+0x15/0x60
[   68.632016] RSP: :88001daa3b20  EFLAGS: 00010282
[   68.632016] RAX:  RBX: 88001da67000 RCX: 00a4
[   68.632016] RDX:  RSI: 0010 RDI: 
[   68.632016] RBP: 88001daa3b40 R08: 0002 R09: 88001daa3b1c
[   68.632016] R10: 0028 R11:  R12: 
[   68.632016] R13:  R14:  R15: 00a8
[   68.632016] FS:  () GS:88001fc0() 
knlGS:
[   68.632016] CS:  0010 DS:  ES:  CR0: 8005003b
[   68.632016] CR2: 0049 CR3: 1ca7a000 CR4: 06f0
[   68.632016] DR0:  DR1:  DR2: 
[   68.632016] DR3:  DR6: 0ff0 DR7: 0400
[   68.632016] Process kworker/0:1 (pid: 17, threadinfo 88001daa2000, task 
88001da79720)
[   68.632016] Stack:
[   68.632016]  0282 88001da67000 88001da67000 

[   68.632016]  88001daa3ba0 8131a9f7 81c5 
88001da66000
[   68.632016]  81c5dcc0 1da66000 88001daa3ba0 
88001da67000
[   68.632016] Call Trace:
[   68.632016]  [] pci_set_payload+0xa7/0x140
[   68.632016]  [] pci_configure_slot.part.6+0x18/0x100
[   68.632016]  [] pci_configure_slot+0x32/0x40
[   68.632016]  [] enable_device+0x188/0x9a0 [acpiphp]
[   68.632016]  [] ? pci_bus_read_config_dword+0x89/0xa0
[   68.632016]  [] ? acpi_os_wait_events_complete+0x23/0x23
[   68.632016]  [] acpiphp_enable_slot+0x80/0xb0 [acpiphp]
[   68.632016]  [] acpiphp_check_bridge.isra.12+0x64/0xf0 
[acpiphp]
[   68.632016]  [] handle_hotplug_event_func+0x103/0x1b0 
[acpiphp]
[   68.632016]  [] ? acpi_bus_get_device+0x27/0x40
[   68.632016]  [] acpi_ev_notify_dispatch+0x67/0x7e
[   68.632016]  [] acpi_os_execute_deferred+0x27/0x34
[   68.632016]  [] process_one_work+0x11a/0x480
[   68.632016]  [] worker_thread+0x165/0x370
[   68.632016]  [] ? manage_workers.isra.30+0x130/0x130
[   68.632016]  [] kthread+0x8c/0xa0
[   68.632016]  [] kernel_thread_helper+0x4/0x10
[   68.632016]  [] ? flush_kthread_worker+0xa0/0xa0
[   68.632016]  [] ? gs_change+0x13/0x13
[   68.632016] Code: 45 cc 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 80 00 
00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 <0f> b6 
57 49 49 89 fc 89 f3 8b 77 38 48 8b 7f 10 e8 e6 f8 ff ff
[   68.632016] RIP  [] pci_find_capability+0x15/0x60
[   68.632016]  RSP 
[   68.632016] CR2: 0049
[   68.698931] ---[ end trace 4ea71c2b1410e496 ]---
[   68.700365] BUG: unable to handle kernel paging request at fff8
[   68.702051] IP: [] kthread_data+0x11/0x20
[   68.703592] PGD 1c05067 PUD 1c06067 PMD 0
[   68.704312] Oops:  [#2] SMP
[   68.704312] CPU 0
[   68.704312] Modules linked in: bonding psmouse virtio_balloon serio_raw 
i2c_piix4 acpiphp lp parport floppy ixgbevf
[   68.704312]
[   68.704312] Pid: 17, comm: kworker/0:1 Tainted: G  D 3.0.0-13-xen 
#22 Bochs Bochs
[   68.70431

Re: RBD attach via libvirt to kvm vm - VM kernel hang

2012-03-28 Thread Sławomir Skowron
On 28 mar 2012, at 18:24, Josh Durgin  wrote:

> On 03/28/2012 09:13 AM, Tommi Virtanen wrote:
>> 2012/3/28 Sławomir Skowron:
>>> VM on-01 with config like in attachment (dumpxml) - hang, after
>>> attaching rbd device with kernel_bug in attachment.
>>
>> [   68.630913] BUG: unable to handle kernel NULL pointer dereference
>> at 0049
>> [   68.632016] IP: [] pci_find_capability+0x15/0x60
>> ...
>> [   68.632016] Call Trace:
>> [   68.632016]  [] pci_set_payload+0xa7/0x140
>> [   68.632016]  [] pci_configure_slot.part.6+0x18/0x100
>> [   68.632016]  [] pci_configure_slot+0x32/0x40
>> [   68.632016]  [] enable_device+0x188/0x9a0 [acpiphp]
>> [   68.632016]  [] ? pci_bus_read_config_dword+0x89/0xa0
>> ...
>>
>> Well, that sure looks like a bug. I can't tell whether it's in QEmu,
>> the QEmu rbd driver, or what. Josh, have you seen a crash like this?
>
> I've not seen a crash like this before. I'm not aware of RBD being
> treated differently from other block devices in the pci layer in qemu,
> so I'd guess this is a qemu or guest kernel bug.
>
> Does attaching a non-rbd disk (still using the virtio driver) cause the
> same problem?

File disk via virtio exist in vm config. But this configuration works
stable for some time without rbd.

We use for tests qemu-kvm 1.0.0+dfsg+rc2-1~oneiric1 with distro
libvirt, but we experience some stabilization problems with vf
functions on Intel 10GE cards. Rbd cause some problems too.
With same kernel as used now on Guest, and with this 1.0.0-rc2 kvm rbd
attaching goes smooth, but after some time, when rbd was mounted in
VM, network goes down.
Only with console/vnc we can go into VM, and then reload network.
After that everything was fine for some time, and again.
That's why we try newer libvirt, and kvm. And now vf in Intel network
card work like a charm, but after attaching rbd crash the vm as you
can see. Kernel on VM is always stay as bottom.

We try to create kernel on newer version of stable in oneiric, and try
to reproduce, but i feel that kernel is not a trigger for this bug.

> If not, what distro and kernel version is the guest
> running?

It's distro kernel 3.0.0-13-server from ubuntu oneiric with compiled
compatibility of xen. That's why is 3.0.0-13-xen.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem after upgrade to 0.45

2012-04-13 Thread Sławomir Skowron
More info, that after i use filestore_xattr_use_omap = 1 in conf, and
ceph -w is like that in attachment in mail before.

I have downgrade to 0.44, and everything is ok now, but why this happen ??

2012/4/13 Sławomir Skowron :
> 2012-04-13 11:03:20.017166 7f63d62b47a0 -- 0.0.0.0:6848/9291
> accepter.bind my_inst.addr is 0.0.0.0:6848/9291 need_addr=1
> 2012-04-13 11:03:20.018576 7f63d62b47a0 filestore(/vol0/data/osd.1)
> basedir /vol0/data/osd.1 journal /vol0/data/osd.1/journal
> 2012-04-13 11:03:20.027918 7f63d62b47a0 filestore(/vol0/data/osd.1)
> limited size xattrs -- enable filestore_xattr_use_omap
> 2012-04-13 11:03:20.028014 7f63d62b47a0  ** ERROR: error converting
> store /vol0/data/osd.1: (95) Operation not supported
> 2012-04-13 11:07:08.904974 7f35299ed7a0 ceph version 0.45
> (commit:0aea1cb1df5c3e5ab783ca6f2ed7649823b613c5), process ceph-osd,
> pid 20052
> 2012-04-13 11:07:08.905164 7f35299ed7a0 -- 0.0.0.0:6800/20052
> accepter.bind my_inst.addr is 0.0.0.0:6800/20052 need_addr=1
> 2012-04-13 11:07:08.905182 7f35299ed7a0 -- 0.0.0.0:6801/20052
> accepter.bind my_inst.addr is 0.0.0.0:6801/20052 need_addr=1
> 2012-04-13 11:07:08.905197 7f35299ed7a0 -- 0.0.0.0:6802/20052
> accepter.bind my_inst.addr is 0.0.0.0:6802/20052 need_addr=1
> 2012-04-13 11:07:08.906418 7f35299ed7a0 filestore(/vol0/data/osd.1)
> basedir /vol0/data/osd.1 journal /vol0/data/osd.1/journal
> 2012-04-13 11:07:08.906565 7f35299ed7a0 filestore(/vol0/data/osd.1)
> limited size xattrs -- enable filestore_xattr_use_omap
> 2012-04-13 11:07:08.906644 7f35299ed7a0  ** ERROR: error converting
> store /vol0/data/osd.1: (95) Operation not supported
> 2012-04-13 11:09:18.894624 7fd2a60157a0 ceph version 0.45
> (commit:0aea1cb1df5c3e5ab783ca6f2ed7649823b613c5), process ceph-osd,
> pid 31803
> 2012-04-13 11:09:18.894859 7fd2a60157a0 -- 0.0.0.0:6800/31803
> accepter.bind my_inst.addr is 0.0.0.0:6800/31803 need_addr=1
> 2012-04-13 11:09:18.894884 7fd2a60157a0 -- 0.0.0.0:6801/31803
> accepter.bind my_inst.addr is 0.0.0.0:6801/31803 need_addr=1
> 2012-04-13 11:09:18.894905 7fd2a60157a0 -- 0.0.0.0:6802/31803
> accepter.bind my_inst.addr is 0.0.0.0:6802/31803 need_addr=1
> 2012-04-13 11:09:18.896173 7fd2a60157a0 filestore(/vol0/data/osd.1)
> basedir /vol0/data/osd.1 journal /vol0/data/osd.1/journal
> 2012-04-13 11:09:18.896326 7fd2a60157a0 filestore(/vol0/data/osd.1)
> limited size xattrs -- enable filestore_xattr_use_omap
>
> and my ceph -w in attachment. Many failed osd, and more.
>
> --
> -
> Pozdrawiam
>
> Sławek "sZiBis" Skowron



-- 
-
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Best practice - upgrade ceph cluster

2012-04-20 Thread Sławomir Skowron
Maybe it's a lame question, but is anybody knows simplest procedure,
for most non-disrubtive upgrade of ceph cluster with real workload on
it ??

It's most important if we want semi-automate this process with some
tools. Maybe there is a cookbook for this operation ?? I know that
automate this is not simple, and dangerous, but even in manual upgrade
it's important to know what can we expect.

Regards
Slawomir Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Best practice - upgrade ceph cluster

2012-04-20 Thread Sławomir Skowron
On 20 kwi 2012, at 21:35, Greg Farnum  wrote:

> On Friday, April 20, 2012 at 12:00 PM, Sławomir Skowron wrote:
>> Maybe it's a lame question, but is anybody knows simplest procedure,
>> for most non-disrubtive upgrade of ceph cluster with real workload on
>> it ??
>
> Unfortunate though it is, non-disruptive upgrades aren't a great idea to 
> attempt right now. We've architected the system to make it possible, and we 
> *try* to keep things forward-compatible, but we don't currently do any of the 
> testing necessary to promise something like that.
> It will be possible Very Soon in one form or another, but for now you 
> shouldn't count on it. When you can, you'll hear about it — we'll be proudly 
> sharing that we're testing it, it works, whether it's on our main branch or a 
> new long-term stable, etc etc. ;)
>

I am waiting impatiently :)

>> It's most important if we want semi-automate this process with some
>> tools. Maybe there is a cookbook for this operation ?? I know that
>> automate this is not simple, and dangerous, but even in manual upgrade
>> it's important to know what can we expect.
>
> So, for now what we recommend is shutting down the cluster, upgrading 
> everything all at once, and then starting up the monitors, OSDs, and MDS (in 
> that order). Handling disk changes is a lot easier to write and test than 
> making sure that things are wire-compatible, and has been working well for a 
> long time. If for some reason it makes you feel better you should also be 
> able to upgrade the monitors as a group, then the OSDs as a group, then the 
> MDS. Things will start back up and the OSDs will go through a brief peering 
> period, but since nobody will have extra data or anything it should be fast.
> -Greg
>
>

Ok thanks.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html