Re: [ceph-users] NFS

2019-10-03 Thread Matt Benjamin
Hi Mark,

Here's an example that should work--userx and usery are RGW users
created in different tenants, like so:

radosgw-admin --tenant tnt1 --uid userx --display-name "tnt1-userx" \
 --access_key "userxacc" --secret "test123" user create

 radosgw-admin --tenant tnt2 --uid usery --display-name "tnt2-usery" \
 --access_key "useryacc" --secret "test456" user create

Remember that to make use of this feature, you need recent librgw and
matching nfs-ganesha.  In particular, Ceph should have, among other
changes:

commit 65d0ae733defe277f31825364ee52d5102c06ab9
Author: Matt Benjamin 
Date:   Wed Jun 5 07:25:35 2019 -0400

rgw_file: include tenant in hashes of object

Because bucket names are taken as object names in the top
of an export.  Make hashing by tenant general to avoid disjoint
hashing of bucket.

Fixes: http://tracker.ceph.com/issues/40118

Signed-off-by: Matt Benjamin 
(cherry picked from commit 8e0fd5fbfa7c770f6b668e79b772179946027bce)

commit 459b6b2b224953655fd0360e8098ae598e41d3b2
Author: Matt Benjamin 
Date:   Wed May 15 15:53:32 2019 -0400

rgw_file: include tenant when hashing bucket names

Prevent identical paths from distinct tenants from colliding in
RGW NFS handle cache.

Fixes: http://tracker.ceph.com/issues/40118

Signed-off-by: Matt Benjamin 
(cherry picked from commit b800a9de83dff23a150ed7d236cb61c8b7d971ae)
Signed-off-by: Matt Benjamin 


ganesha.conf.deuxtenant:


EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 77;

# Exported path (mandatory)
Path = "/";

# Pseudo Path (required for NFS v4)
Pseudo = "/userx";

# Required for access (default is None)
# Could use CLIENT blocks instead
Access_Type = RW;

SecType = "sys";

Protocols = 3,4;
Transports = UDP,TCP;

#Delegations = Readwrite;

Squash = No_Root_Squash;

# Exporting FSAL
FSAL {
Name = RGW;
User_Id = "userx";
Access_Key_Id = "userxacc";
Secret_Access_Key = "test123";
}
}

EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 78;

# Exported path (mandatory)
Path = "/";

# Pseudo Path (required for NFS v4)
Pseudo = "/usery";

# Required for access (default is None)
# Could use CLIENT blocks instead
Access_Type = RW;

SecType = "sys";

Protocols = 3,4;
Transports = UDP,TCP;

#Delegations = Readwrite;

Squash = No_Root_Squash;

# Exporting FSAL
FSAL {
Name = RGW;
User_Id = "usery";
Access_Key_Id = "useryacc";
Secret_Access_Key = "test456";
}
}

#mount at bucket case
EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 79;

# Exported path (mandatory)
Path = "/buck5";

# Pseudo Path (required for NFS v4)
Pseudo = "/usery_buck5";

# Required for access (default is None)
# Could use CLIENT blocks instead
Access_Type = RW;

SecType = "sys";

Protocols = 3,4;
Transports = UDP,TCP;

#Delegations = Readwrite;

Squash = No_Root_Squash;

# Exporting FSAL
FSAL {
Name = RGW;
User_Id = "usery";
Access_Key_Id = "useryacc";
Secret_Access_Key = "test456";
}
}



RGW {
ceph_conf = "/home/mbenjamin/ceph-noob/build/ceph.conf";
#init_args = "-d --debug-rgw=16";
init_args = "";
}

NFS_Core_Param {
Nb_Worker = 17;
mount_path_pseudo = true;
}

CacheInode {
Chunks_HWMark = 7;
Entries_Hwmark = 200;
}

NFSV4 {
Graceless = true;
Allow_Numeric_Owners = true;
Only_Numeric_Owners = true;
}

LOG {
Components {
#NFS_READDIR = FULL_DEBUG;
#NFS4 = FULL_DEBUG;
#CACHE_INODE = FULL_DEBUG;
#FSAL = FULL_DEBUG;
}
Facility {
name = FILE;
destination = "/tmp/ganesha-rgw.log";
enable = active;
}
}

On Thu, Oct 3, 2019 at 10:34 AM Marc Roos  wrote:
>
>
> How should a multi tenant RGW config look like, I am not able get this
> working:
>
> EXPORT {
>Export_ID=301;
>Path = "test:test3";
>#Path = "/";
>Pseudo = "/rgwtester";
>
>Protocols = 4;
>FSAL {
>Name = RGW;
>User_Id = "test$tester1";
>Access_Key_Id = "TESTER";
>Secret_Access_Key = "xxx";
>}
>Disable_ACL = TRUE;
>CLIENT { Clients = 192.168.10.0/24; access_type = "RO"; }
> }
>
>
>

Re: [ceph-users] NFS

2019-10-03 Thread Matt Benjamin
RGW NFS can support any NFS style of authentication, but users will
have the RGW access of their nfs-ganesha export.  You can create
exports with disjoint privileges, and since recent L, N, RGW tenants.

Matt

On Tue, Oct 1, 2019 at 8:31 AM Marc Roos  wrote:
>
>  I think you can run into problems
> with a multi user environment of RGW and nfs-ganesha.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ZeroDivisionError when running ceph osd status

2019-09-11 Thread Benjamin Tayehanpour
Greetings!

I had an OSD down, so I ran ceph osd status and got this:

[root@ceph1 ~]# ceph osd status
Error EINVAL: Traceback (most recent call last):
  File "/usr/lib64/ceph/mgr/status/module.py", line 313, in handle_command
    return self.handle_osd_status(cmd)
  File "/usr/lib64/ceph/mgr/status/module.py", line 297, in
handle_osd_status
    self.format_dimless(self.get_rate("osd", osd_id.__str__(), "osd.op_w") +
  File "/usr/lib64/ceph/mgr/status/module.py", line 113, in get_rate
    return (data[-1][1] - data[-2][1]) / float(data[-1][0] - data[-2][0])
ZeroDivisionError: float division by zero
[root@ceph1 ~]#

I could still figure out which OSD it was with systemctl, put I had to
purge the OSD before ceph osd status would run again. Is this normal
behaviour?

Cordially yours,
Benjamin




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] memory usage of: radosgw-admin bucket rm

2019-07-11 Thread Matt Benjamin
I don't think one has been created yet.  Eric Ivancich and Mark Kogan
of my team are investigating this behavior.

Matt

On Thu, Jul 11, 2019 at 10:40 AM Paul Emmerich  wrote:
>
> Is there already a tracker issue?
>
> I'm seeing the same problem here. Started deletion of a bucket with a few 
> hundred million objects a week ago or so and I've now noticed that it's also 
> leaking memory and probably going to crash.
> Going to investigate this further...
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, Jul 9, 2019 at 1:26 PM Matt Benjamin  wrote:
>>
>> Hi Harald,
>>
>> Please file a tracker issue, yes.  (Deletes do tend to be slower,
>> presumably due to rocksdb compaction.)
>>
>> Matt
>>
>> On Tue, Jul 9, 2019 at 7:12 AM Harald Staub  wrote:
>> >
>> > Currently removing a bucket with a lot of objects:
>> > radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects
>> >
>> > This process was killed by the out-of-memory killer. Then looking at the
>> > graphs, we see a continuous increase of memory usage for this process,
>> > about +24 GB per day. Removal rate is about 3 M objects per day.
>> >
>> > It is not the fastest hardware, and this index pool is still without
>> > SSDs. The bucket is sharded, 1024 shards. We are on Nautilus 14.2.1, now
>> > about 500 OSDs.
>> >
>> > So with this bucket with 60 M objects, we would need about 480 GB of RAM
>> > to come through. Or is there a workaround? Should I open a tracker issue?
>> >
>> > The killed remove command can just be called again, but it will be
>> > killed again before it finishes. Also, it has to run some time until it
>> > continues to actually remove objects. This "wait time" is also
>> > increasing. Last time, after about 16 M objects already removed, the
>> > wait time was nearly 9 hours. Also during this time, there is a memory
>> > ramp, but not so steep.
>> >
>> > BTW it feels strange that the removal of objects is slower (about 3
>> > times) than adding objects.
>> >
>> >   Harry
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>>
>>
>> --
>>
>> Matt Benjamin
>> Red Hat, Inc.
>> 315 West Huron Street, Suite 140A
>> Ann Arbor, Michigan 48103
>>
>> http://www.redhat.com/en/technologies/storage
>>
>> tel.  734-821-5101
>> fax.  734-769-8938
>> cel.  734-216-5309
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] memory usage of: radosgw-admin bucket rm

2019-07-09 Thread Matt Benjamin
Hi Harald,

Please file a tracker issue, yes.  (Deletes do tend to be slower,
presumably due to rocksdb compaction.)

Matt

On Tue, Jul 9, 2019 at 7:12 AM Harald Staub  wrote:
>
> Currently removing a bucket with a lot of objects:
> radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects
>
> This process was killed by the out-of-memory killer. Then looking at the
> graphs, we see a continuous increase of memory usage for this process,
> about +24 GB per day. Removal rate is about 3 M objects per day.
>
> It is not the fastest hardware, and this index pool is still without
> SSDs. The bucket is sharded, 1024 shards. We are on Nautilus 14.2.1, now
> about 500 OSDs.
>
> So with this bucket with 60 M objects, we would need about 480 GB of RAM
> to come through. Or is there a workaround? Should I open a tracker issue?
>
> The killed remove command can just be called again, but it will be
> killed again before it finishes. Also, it has to run some time until it
> continues to actually remove objects. This "wait time" is also
> increasing. Last time, after about 16 M objects already removed, the
> wait time was nearly 9 hours. Also during this time, there is a memory
> ramp, but not so steep.
>
> BTW it feels strange that the removal of objects is slower (about 3
> times) than adding objects.
>
>   Harry
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread Matt Benjamin
FYI, this PR just merged.  I would expect to see backports at least as
far as N, and others would be possible.

regards,

Matt

On Fri, Jun 28, 2019 at 3:43 PM  wrote:
>
> Matt;
>
> Yep, that would certainly explain it.
>
> My apologies, I almost searched for that information before sending the email.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Matt Benjamin [mailto:mbenj...@redhat.com]
> Sent: Friday, June 28, 2019 9:48 AM
> To: Dominic Hilsbos
> Cc: ceph-users
> Subject: Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?
>
> Hi Dominic,
>
> The reason is likely that RGW doesn't yet support ListObjectsV2.
>
> Support is nearly here though:  https://github.com/ceph/ceph/pull/28102
>
> Matt
>
>
> On Fri, Jun 28, 2019 at 12:43 PM  wrote:
> >
> > All;
> >
> > I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster. 
> >  I'm using Amazon's S3 SDK, and I've run into an annoying little snag.
> >
> > My code looks like this:
> > amazonS3 = builder.build();
> >
> > ListObjectsV2Request req = new 
> > ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
> > ListObjectsV2Result result;
> >
> > do
> > {
> > result = amazonS3.listObjectsV2(req);
> >
> > for (S3ObjectSummary objectSummary : result.getObjectSummaries())
> > {
> > summaries.add(objectSummary);
> > }
> >
> > String token = result.getNextContinuationToken();
> > req.setContinuationToken(token);
> > }
> > while (result.isTruncated());
> >
> > The problem is, the ContinuationToken seems to be ignored, i.e. every call 
> > to amazonS3.listObjectsV2(req) returns the same set, and the loop never 
> > ends (until the summaries LinkedList overflows).
> >
> > Thoughts?
> >
> > Thank you,
> >
> > Dominic L. Hilsbos, MBA
> > Director - Information Technology
> > Perform Air International Inc.
> > dhils...@performair.com
> > www.PerformAir.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread Matt Benjamin
Hi Dominic,

The reason is likely that RGW doesn't yet support ListObjectsV2.

Support is nearly here though:  https://github.com/ceph/ceph/pull/28102

Matt


On Fri, Jun 28, 2019 at 12:43 PM  wrote:
>
> All;
>
> I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster.  
> I'm using Amazon's S3 SDK, and I've run into an annoying little snag.
>
> My code looks like this:
> amazonS3 = builder.build();
>
> ListObjectsV2Request req = new 
> ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
> ListObjectsV2Result result;
>
> do
> {
> result = amazonS3.listObjectsV2(req);
>
> for (S3ObjectSummary objectSummary : result.getObjectSummaries())
> {
> summaries.add(objectSummary);
> }
>
> String token = result.getNextContinuationToken();
> req.setContinuationToken(token);
> }
> while (result.isTruncated());
>
> The problem is, the ContinuationToken seems to be ignored, i.e. every call to 
> amazonS3.listObjectsV2(req) returns the same set, and the loop never ends 
> (until the summaries LinkedList overflows).
>
> Thoughts?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread Matt Benjamin
-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskImage/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-af96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:op
>  status=-104
> 2019-05-03 15:37:28.959 7f4a68484700  2 req 23574:41.87s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskImage/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-af96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:http
>  status=206
> 2019-05-03 15:37:28.959 7f4a68484700  1 == req done req=0x55f2fde20970 op 
> status=-104 http_status=206 ==
>
>
> -Mensaje original-
> De: EDH - Manuel Rios Fernandez 
> Enviado el: viernes, 3 de mayo de 2019 15:12
> Para: 'Matt Benjamin' 
> CC: 'ceph-users' 
> Asunto: RE: [ceph-users] RGW Bucket unable to list buckets 100TB bucket
>
> Hi Matt,
>
> Thanks for your help,
>
> We have done the changes plus a reboot of MONs and RGW they look like strange 
> stucked , now we're able to list  250 directories.
>
> time s3cmd ls s3://datos101 --no-ssl --limit 150
> real2m50.854s
> user0m0.147s
> sys 0m0.042s
>
>
> Is there any recommendation of max_shard ?
>
> Our main goal is cold storage, normally our usage are backups or customers 
> tons of files. This cause that customers in single bucket store millions 
> objetcs.
>
> Its strange because this issue started on Friday without any warning error at 
> OSD / RGW logs.
>
> When you should warning customer that will not be able to list their 
> directory if they reach X Millions objetcs?
>
> Our current ceph.conf
>
> #Normal-Memory 1/5
> debug rgw = 2
> #Disable
> debug osd = 0
> debug journal = 0
> debug ms = 0
>
> fsid = e1ee8086-7cce-43fd-a252-3d677af22428
> mon_initial_members = CEPH001, CEPH002, CEPH003 mon_host = 
> 172.16.2.10,172.16.2.11,172.16.2.12
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd pool default pg num = 128
> osd pool default pgp num = 128
>
> public network = 172.16.2.0/24
> cluster network = 172.16.1.0/24
>
> osd pool default size = 2
> osd pool default min size = 1
>
> rgw_dynamic_resharding = true
> #Increment to 128
> rgw_override_bucket_index_max_shards = 128
>
> #Default: 1000
> rgw list buckets max chunk = 5000
>
>
>
> [osd]
> osd mkfs type = xfs
> osd op threads = 12
> osd disk threads = 12
>
> osd recovery threads = 4
> osd recovery op priority = 1
> osd recovery max active = 2
> osd recovery max single start = 1
>
> osd max backfills = 4
> osd backfill scan max = 16
> osd backfill scan min = 4
> osd client op priority = 63
>
>
> osd_memory_target = 2147483648
>
> osd_scrub_begin_hour = 23
> osd_scrub_end_hour = 6
> osd_scrub_load_threshold = 0.25 #low load scrubbing osd_scrub_during_recovery 
> = false #scrub during recovery
>
> [mon]
> mon allow pool delete = true
> mon osd min down reporters = 3
>
> [mon.a]
> host = CEPH001
> public bind addr = 172.16.2.10
> mon addr = 172.16.2.10:6789
> mon allow pool delete = true
>
> [mon.b]
> host = CEPH002
> public bind addr = 172.16.2.11
> mon addr = 172.16.2.11:6789
> mon allow pool delete = true
>
> [mon.c]
> host = CEPH003
> public bind addr = 172.16.2.12
> mon addr = 172.16.2.12:6789
> mon allow pool delete = true
>
> [client.rgw]
>  rgw enable usage log = true
>
>
> [client.rgw.ceph-rgw01]
>  host = ceph-rgw01
>  rgw enable usage log = true
>  rgw dns name =
>  rgw frontends = "beast port=7480"
>  rgw resolve cname = false
>  rgw thread pool size = 512
>  rgw num rados handles = 1
>  rgw op thread timeout = 600
>
>
> [client.rgw.ceph-rgw03]
>  host = ceph-rgw03
>  rgw enable usage log = true
>  rgw dns name =
>  rgw frontends = "beast port=7480"
>  rgw resolve cname = false
>  rgw thread pool size = 512
>  rgw num rados handles = 1
>  rgw op thread timeout = 600
>
>
> On Thursday customer tell us that listing were instant, and now their 
> programs delay until timeout.
>
> Best Regards
>
> Manuel
>
> -Mensaje original-
> De: Matt Benjamin 
> Enviado el: viernes, 3 de mayo de 2019 14:00
> Para: EDH - Manuel Rios Fernandez 
> CC: ceph-users 
> Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket
>
> Hi Folks,
>
> Thanks for sharing your ceph.conf along with the behavior.
>
> There are some odd things there.
>
> 1. rgw_num_rados_handles is deprecated--it should be 1 (the default), but 
> changing it may require you to check and retune

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread Matt Benjamin
9-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:normalizing buckets and tenants
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:init permissions
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:recalculating target
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:reading permissions
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:init op
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:verifying op mask
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:verifying op permissions
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:verifying op params
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:pre-executing
>
> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
> /[bucketname]/:list_bucket:executing
>
> 2019-05-03 10:42:53.026 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:43:15.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:43:37.028 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:43:59.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:44:21.028 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:44:43.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:45:05.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:45:18.260 7f660cc0e700  2 object expiration: start
>
> 2019-05-03 10:45:18.779 7f660cc0e700  2 object expiration: stop
>
> 2019-05-03 10:45:27.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:45:49.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:46:11.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:46:33.027 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:46:55.028 7f660e411700  2 
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
> /[bucketname]/:list_bucket:completing
>
> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
> /[bucketname]/:list_bucket:op status=0
>
> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
> /[bucketname]/:list_bucket:http status=200
>
> 2019-05-03 10:47:02.092 7f65f33db700  1 == req done req=0x55eba26e8970 op 
> status=0 http_status=200 ==
>
>
>
>
>
> radosgw-admin bucket limit check
>
>  }
>
> "bucket": "[BUCKETNAME]",
>
> "tenant": "",
>
> "num_objects": 7126133,
>
> "num_shards": 128,
>
> "objects_per_shard": 55672,
>
> "fill_status": "OK"
>
> },
>
>
>
>
>
> We ‘realy don’t know who to solve that , looks like a timeout or slow 
> performance for that bucket.
>
>
>
> Our RGW section in ceph.conf
>
>
>
> [client.rgw.ceph-rgw01]
>
> host = ceph-rgw01
>
> rgw enable usage log = true
>
> rgw dns name = XX
>
> rgw frontends = "beast port=7480"
>
> rgw resolve cname = false
>
> rgw thread pool size = 128
>
> rgw num rados handles = 1
>
> rgw op thread timeout = 120
>
>
>
>
>
> [client.rgw.ceph-rgw03]
>
> host = ceph-rgw03
>
> rgw enable usage log = true
>
> rgw dns name = 
>
> rgw frontends = "beast port=7480"
>
> rgw resolve cname = false
>
> rgw thread pool size = 640
>
> rgw num rados handles = 16
>
> rgw op thread timeout = 120
>
>
>
>
>
> Best Regards,
>
>
>
> Manuel
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW ops log lag?

2019-04-17 Thread Matt Benjamin
It should not be best effort.  As written, exactly
rgw_usage_log_flush_threshold outstanding log entries will be
buffered.  The default value for this parameter is 1024, which is
probably not high for a sustained workload, but you could experiment
with reducing it.

Matt

On Fri, Apr 12, 2019 at 11:21 AM Aaron Bassett
 wrote:
>
> Ok thanks. Is the expectation that events will be available on that socket as 
> soon as the occur or is it more of a best effort situation? I'm just trying 
> to nail down which side of the socket might be lagging. It's pretty difficult 
> to recreate this as I have to hit the cluster very hard to get it to start 
> lagging.
>
> Thanks, Aaron
>
> > On Apr 12, 2019, at 11:16 AM, Matt Benjamin  wrote:
> >
> > Hi Aaron,
> >
> > I don't think that exists currently.
> >
> > Matt
> >
> > On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
> >  wrote:
> >>
> >> I have an radogw log centralizer that we use to for an audit trail for 
> >> data access in our ceph clusters. We've enabled the ops log socket and 
> >> added logging of the http_authorization header to it:
> >>
> >> rgw log http headers = "http_authorization"
> >> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
> >> rgw enable ops log = true
> >>
> >> We have a daemon that listens on the ops socket, extracts/manipulates some 
> >> information from the ops log, and sends it off to our log aggregator.
> >>
> >> This setup works pretty well for the most part, except when the cluster 
> >> comes under heavy load, it can get _very_ laggy - sometimes up to several 
> >> hours behind. I'm having a hard time nailing down whats causing this lag. 
> >> The daemon is rather naive, basically just some nc with jq in between, but 
> >> the log aggregator has plenty of spare capacity, so I don't think its 
> >> slowing down how fast the daemon is consuming from the socket.
> >>
> >> I was revisiting the documentation about this ops log and noticed the 
> >> following which I hadn't seen previously:
> >>
> >> When specifying a UNIX domain socket, it is also possible to specify the 
> >> maximum amount of memory that will be used to keep the data backlog:
> >> rgw ops log data backlog = 
> >> Any backlogged data in excess to the specified size will be lost, so the 
> >> socket needs to be read constantly.
> >>
> >> I'm wondering if theres a way I can query radosgw for the current size of 
> >> that backlog to help me narrow down where the bottleneck may be occuring.
> >>
> >> Thanks,
> >> Aaron
> >>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> This e-mail message and any attachments are only for the use of the 
> >> intended recipient and may contain information that is privileged, 
> >> confidential or exempt from disclosure under applicable law. If you are 
> >> not the intended recipient, any disclosure, distribution or other use of 
> >> this e-mail message or attachments is prohibited. If you have received 
> >> this e-mail message in error, please delete and notify the sender 
> >> immediately. Thank you.
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=FzFoCJ8qtZ66OKdL1Ph10qjZbCEjvMg9JyS_9LwEpSg=
> >>
> >>
> >
> >
> > --
> >
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redhat.com_en_technologies_storage=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=hi6_HiZS0D_nzAqKsvJPPfmi8nZSv4lZCRFZ1ru9CxM=
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy

2019-04-16 Thread Matt Benjamin
Hi Francois,

Why is using an explicit unix socket problematic for you?  For what it
does, that decision has always seemed sensible.

Matt

On Tue, Apr 16, 2019 at 7:04 PM Francois Lafont
 wrote:
>
> Hi @all,
>
> On 4/9/19 12:43 PM, Francois Lafont wrote:
>
> > I have tried this config:
> >
> > -
> > rgw enable ops log  = true
> > rgw ops log socket path = /tmp/opslog
> > rgw log http headers= http_x_forwarded_for
> > -
> >
> > and I have logs in the socket /tmp/opslog like this:
> >
> > -
> > {"bucket":"test1","time":"2019-04-09 
> > 09:41:18.188350Z","time_local":"2019-04-09 
> > 11:41:18.188350","remote_addr":"10.111.222.51","user":"flaf","operation":"GET","uri":"GET
> >  /?prefix=toto/=%2F 
> > HTTP/1.1","http_status":"200","error_code":"","bytes_sent":832,"bytes_received":0,"object_size":0,"total_time":39,"user_agent":"DragonDisk
> >  1.05 ( http://www.dragondisk.com 
> > )","referrer":"","http_x_headers":[{"HTTP_X_FORWARDED_FOR":"10.111.222.55"}]},
> > -
> >
> > I can see the IP address of the client in the value of 
> > HTTP_X_FORWARDED_FOR, that's cool.
> >
> > But I don't understand why there is a specific socket to log that? I'm 
> > using radosgw in a Docker container (installed via ceph-ansible) and I have 
> > logs of the "radosgw" daemon in the "/var/log/syslog" file of my host (I'm 
> > using the Docker "syslog" log-driver).
> >
> > 1. Why is there a _separate_ log source for that? Indeed, in 
> > "/var/log/syslog" I have already some logs of civetweb. For instance:
> >
> >  2019-04-09 12:33:45.926 7f02e021c700  1 civetweb: 0x55876dc9c000: 
> > 10.111.222.51 - - [09/Apr/2019:12:33:45 +0200] "GET 
> > /?prefix=toto/=%2F HTTP/1.1" 200 1014 - DragonDisk 1.05 ( 
> > http://www.dragondisk.com )
>
> The fact that radosgw uses a separate log source for "ops log" (ie a specific 
> Unix socket) is still very mysterious for me.
>
>
> > 2. In my Docker container context, is it possible to put the logs above in 
> > the file "/var/log/syslog" of my host, in other words is it possible to 
> > make sure to log this in stdout of the daemon "radosgw"?
>
> It seems to me impossible to put ops log in the stdout of the "radosgw" 
> process (or, if it's possible, I have not found). So I have made a 
> workaround. I have set:
>
>  rgw_ops_log_socket_path = /var/run/ceph/rgw-opslog.asok
>
> in my ceph.conf and I have created a daemon (via un systemd unit file) which 
> runs this loop:
>
>  while true;
>  do
>  netcat -U "/var/run/ceph/rgw-opslog.asok" | logger -t "rgwops" -p 
> "local5.notice"
>  done
>
> to retrieve logs in syslog. It's not very satisfying but it's works.
>
> --
> François (flaf)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW ops log lag?

2019-04-12 Thread Matt Benjamin
Hi Aaron,

I don't think that exists currently.

Matt

On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
 wrote:
>
> I have an radogw log centralizer that we use to for an audit trail for data 
> access in our ceph clusters. We've enabled the ops log socket and added 
> logging of the http_authorization header to it:
>
> rgw log http headers = "http_authorization"
> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
> rgw enable ops log = true
>
> We have a daemon that listens on the ops socket, extracts/manipulates some 
> information from the ops log, and sends it off to our log aggregator.
>
> This setup works pretty well for the most part, except when the cluster comes 
> under heavy load, it can get _very_ laggy - sometimes up to several hours 
> behind. I'm having a hard time nailing down whats causing this lag. The 
> daemon is rather naive, basically just some nc with jq in between, but the 
> log aggregator has plenty of spare capacity, so I don't think its slowing 
> down how fast the daemon is consuming from the socket.
>
> I was revisiting the documentation about this ops log and noticed the 
> following which I hadn't seen previously:
>
> When specifying a UNIX domain socket, it is also possible to specify the 
> maximum amount of memory that will be used to keep the data backlog:
> rgw ops log data backlog = 
> Any backlogged data in excess to the specified size will be lost, so the 
> socket needs to be read constantly.
>
> I'm wondering if theres a way I can query radosgw for the current size of 
> that backlog to help me narrow down where the bottleneck may be occuring.
>
> Thanks,
> Aaron
>
>
>
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the intended 
> recipient and may contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient, any disclosure, distribution or other use of this e-mail message 
> or attachments is prohibited. If you have received this e-mail message in 
> error, please delete and notify the sender immediately. Thank you.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-07 Thread Matt Benjamin
gt; > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
> > >
> > > I would assume then that unlike what documentation says, it's safe to
> > > run 'reshard stale-instances rm' on a multi-site setup.
> > >
> > > However it is quite telling if the author of this feature doesn't
> > > trust what they have written to work correctly.
> > >
> > > There are still thousands of stale index objects that 'stale-instances
> > > list' didn't pick up though.  But it appears that radosgw-admin only
> > > looks at 'metadata list bucket' data, and not what is physically
> > > inside the pool.
> > >
> > > --
> > > Iain Buclaw
> > >
> > > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v14.2.0 Nautilus released

2019-03-19 Thread Benjamin Cherian
Hi,

I'm getting an error when trying to use the APT repo for Ubuntu bionic.
Does anyone else have this issue? Is the mirror sync actually still in
progress? Or was something setup incorrectly?

E: Failed to fetch
https://download.ceph.com/debian-nautilus/dists/bionic/main/binary-amd64/Packages.bz2
File has unexpected size (15515 != 15488). Mirror sync in progress? [IP:
158.69.68.124 443]
   Hashes of expected file:
- Filesize:15488 [weak]
-
SHA256:d5ea08e095eeeaa5cc134b1661bfaf55280fcbf8a265d584a4af80d2a424ec17
- SHA1:6da3a8aa17ed7f828f35f546cdcf923040e8e5b0 [weak]
- MD5Sum:7e5a4ecea4a4edc3f483623d48b6efa4 [weak]
   Release file created at: Mon, 11 Mar 2019 18:44:46 +


Thanks,
Ben


On Tue, Mar 19, 2019 at 7:24 AM Sean Purdy  wrote:

> Hi,
>
>
> Will debian packages be released?  I don't see them in the nautilus repo.
> I thought that Nautilus was going to be debian-friendly, unlike Mimic.
>
>
> Sean
>
> On Tue, 19 Mar 2019 14:58:41 +0100
> Abhishek Lekshmanan  wrote:
>
> >
> > We're glad to announce the first release of Nautilus v14.2.0 stable
> > series. There have been a lot of changes across components from the
> > previous Ceph releases, and we advise everyone to go through the release
> > and upgrade notes carefully.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: OSD service won't stay running - pg incomplete

2019-03-14 Thread Benjamin . Zieglmeier
Would you be willing to elaborate on what configuration specifically is bad? 
That would be helpful for future reference.

Yes, we have tried to access with ceph-objectstore-tool to export the shard. 
The command spits out the tcmalloc lines shown in my previous output and then 
crashes with an 'Abort'.

On 3/14/19, 5:22 AM, "Paul Emmerich"  wrote:

You should never run a production cluster with this configuration.

Have you tried to access the disk with ceph-objectstoretool? The goal
would be export the shard of the PG on that disk and import it into
any other OSD.


Paul




Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Mar 13, 2019 at 7:08 PM Benjamin.Zieglmeier
 wrote:
>
> After restarting several OSD daemons in our ceph cluster a couple days 
ago, a couple of our OSDs won’t come online. The services start and crash with 
the below error. We have one pg marked as incomplete, and will not peer. The 
pool is erasure coded, 2+1, currently set to size=3, min_size=2. The incomplete 
pg states it is not peering due to:
>
>
>
> "comment": "not enough complete instances of this PG" and:
>
>"down_osds_we_would_probe": [
>
> 7,
>
> 16
>
> ],
>
> 7 is completely lost, drive dead, 16 will not come online (refer to log 
output below).
>
>
>
> We’ve tried searching user-list and tweaking osd conf settings for 
several days, to no avail. Reaching out here as a last ditch effort before we 
have to give up on the pg.
>
>
>
> tcmalloc: large alloc 1073741824 bytes == 0x560ada35c000 @  
0x7f5c1081e4ef 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 
0x7f5c0e9469df 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 
0x560a8fdacbb6 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 
0x7f5c0dfc5445 0x560a8f514373
>
> tcmalloc: large alloc 2147483648 bytes == 0x560b1a35c000 @  
0x7f5c1081e4ef 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 
0x7f5c0e9469df 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 
0x560a8fdacbb6 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 
0x7f5c0dfc5445 0x560a8f514373
>
> tcmalloc: large alloc 4294967296 bytes == 0x560b9a35c000 @  
0x7f5c1081e4ef 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 
0x7f5c0e9469df 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 
0x560a8fdacbb6 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 
0x7f5c0dfc5445 0x560a8f514373
>
> tcmalloc: large alloc 3840745472 bytes == 0x560a9a334000 @  
0x7f5c1081e4ef 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e945c76 0x7f5c0e94623e 
0x560a8fdea280 0x560a8fda8f36 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
0x560a8f514373
>
> tcmalloc: large alloc 2728992768 bytes == 0x560e779ee000 @  
0x7f5c1081e4ef 0x7f5c1083f010 0x560a8faa5674 0x560a8faa7125 0x560a8fa835a7 
0x560a8fa5aa3c 0x560a8fa5c238 0x560a8fa77dcc 0x560a8fe439ef 0x560a8fe43c03 
0x560a8fe5acd4 0x560a8fda75ec 0x560a8fda9260 0x560a8fdaa6b6 0x560a8fdab973 
0x560a8fdacbb6 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 
0x7f5c0dfc5445 0x560a8f514373
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In 
function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 2019-03-13 
12:46:39.632156
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: 
FAILED assert(0 == "unexpected aio error")
>
> 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 
/var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but returned: 
2147479552
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)
>
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x560a8fadd2a0]
>
>  2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  4: (()+0x7e25) [0x7f5c0efb0e25]
>
>  5: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed 
to interpret this.
>
> 2019-03-13 12:46:39.633822 7f5c0a749700 -1 
/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In function 
'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 2019-03-13 
12:46:39.632156
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: 
FAILED assert(0 == "unexpected aio error")
>
>
>
>  ceph version 12.2.5 

Re: [ceph-users] Need clarification about RGW S3 Bucket Tagging

2019-03-14 Thread Matt Benjamin
Yes, sorry to misstate that.  I was conflating with lifecycle
configuration support.

Matt

On Thu, Mar 14, 2019 at 10:06 AM Konstantin Shalygin  wrote:
>
> On 3/14/19 8:58 PM, Matt Benjamin wrote:
> > Sorry, object tagging.  There's a bucket tagging question in another thread 
> > :)
>
> Luminous works fine with object tagging, at least on 12.2.11
> getObjectTagging and putObjectTagging.
>
>
> [k0ste@WorkStation]$ curl -s
> https://rwg_civetweb/my_bucket/empty-file.txt?tagging | xq '.Tagging[]'
> "http://s3.amazonaws.com/doc/2006-03-01/;
> {
>"Tag": {
>  "Key": "User",
>  "Value": "Bob"
>}
> }
>
>
>
> k
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need clarification about RGW S3 Bucket Tagging

2019-03-14 Thread Matt Benjamin
Sorry, object tagging.  There's a bucket tagging question in another thread :)

Matt

On Thu, Mar 14, 2019 at 9:58 AM Matt Benjamin  wrote:
>
> Hi Konstantin,
>
> Luminous does not support bucket tagging--although I've done Luminous
> backports for downstream use, and would be willing to help with
> upstream backports if there is community support.
>
> Matt
>
> On Thu, Mar 14, 2019 at 9:53 AM Konstantin Shalygin  wrote:
> >
> > On 3/14/19 8:36 PM, Casey Bodley wrote:
> > > The bucket policy documentation just lists which actions the policy
> > > engine understands. Bucket tagging isn't supported, so those requests
> > > were misinterpreted as normal PUT requests to create a bucket. I
> > > opened https://github.com/ceph/ceph/pull/26952 to return 405 Method
> > > Not Allowed there instead and update the doc to clarify that it's not
> > > supported.
> >
> > As I understand correct, that:
> >
> > - Luminous: support object tagging.
> >
> > - Mimic+: support object tagging and lifecycle policing on this tags [1].
> >
> > ?
> >
> >
> > Thanks,
> >
> > k
> >
> > [1] https://tracker.ceph.com/issues/24011
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need clarification about RGW S3 Bucket Tagging

2019-03-14 Thread Matt Benjamin
Hi Konstantin,

Luminous does not support bucket tagging--although I've done Luminous
backports for downstream use, and would be willing to help with
upstream backports if there is community support.

Matt

On Thu, Mar 14, 2019 at 9:53 AM Konstantin Shalygin  wrote:
>
> On 3/14/19 8:36 PM, Casey Bodley wrote:
> > The bucket policy documentation just lists which actions the policy
> > engine understands. Bucket tagging isn't supported, so those requests
> > were misinterpreted as normal PUT requests to create a bucket. I
> > opened https://github.com/ceph/ceph/pull/26952 to return 405 Method
> > Not Allowed there instead and update the doc to clarify that it's not
> > supported.
>
> As I understand correct, that:
>
> - Luminous: support object tagging.
>
> - Mimic+: support object tagging and lifecycle policing on this tags [1].
>
> ?
>
>
> Thanks,
>
> k
>
> [1] https://tracker.ceph.com/issues/24011
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD service won't stay running - pg incomplete

2019-03-13 Thread Benjamin . Zieglmeier
After restarting several OSD daemons in our ceph cluster a couple days ago, a 
couple of our OSDs won’t come online. The services start and crash with the 
below error. We have one pg marked as incomplete, and will not peer. The pool 
is erasure coded, 2+1, currently set to size=3, min_size=2. The incomplete pg 
states it is not peering due to:

"comment": "not enough complete instances of this PG" and:
   "down_osds_we_would_probe": [
7,
16
],
7 is completely lost, drive dead, 16 will not come online (refer to log output 
below).

We’ve tried searching user-list and tweaking osd conf settings for several 
days, to no avail. Reaching out here as a last ditch effort before we have to 
give up on the pg.


tcmalloc: large alloc 1073741824 bytes == 0x560ada35c000 @  0x7f5c1081e4ef 
0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
0x560a8f514373

tcmalloc: large alloc 2147483648 bytes == 0x560b1a35c000 @  0x7f5c1081e4ef 
0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
0x560a8f514373

tcmalloc: large alloc 4294967296 bytes == 0x560b9a35c000 @  0x7f5c1081e4ef 
0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
0x560a8f514373

tcmalloc: large alloc 3840745472 bytes == 0x560a9a334000 @  0x7f5c1081e4ef 
0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e945c76 0x7f5c0e94623e 0x560a8fdea280 
0x560a8fda8f36 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 0x560a8f9f8f88 
0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 0x560a8f514373

tcmalloc: large alloc 2728992768 bytes == 0x560e779ee000 @  0x7f5c1081e4ef 
0x7f5c1083f010 0x560a8faa5674 0x560a8faa7125 0x560a8fa835a7 0x560a8fa5aa3c 
0x560a8fa5c238 0x560a8fa77dcc 0x560a8fe439ef 0x560a8fe43c03 0x560a8fe5acd4 
0x560a8fda75ec 0x560a8fda9260 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
0x560a8f514373

/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In function 
'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 2019-03-13 
12:46:39.632156

/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: FAILED 
assert(0 == "unexpected aio error")

2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 
/var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but returned: 
2147479552

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x560a8fadd2a0]

 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]

 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]

 4: (()+0x7e25) [0x7f5c0efb0e25]

 5: (clone()+0x6d) [0x7f5c0e0a1bad]

 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

2019-03-13 12:46:39.633822 7f5c0a749700 -1 
/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In function 
'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 2019-03-13 
12:46:39.632156

/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: FAILED 
assert(0 == "unexpected aio error")



 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x560a8fadd2a0]

 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]

 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]

 4: (()+0x7e25) [0x7f5c0efb0e25]

 5: (clone()+0x6d) [0x7f5c0e0a1bad]

 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.



-1> 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 
/var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but returned: 
2147479552

 0> 2019-03-13 12:46:39.633822 7f5c0a749700 -1 
/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In function 
'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 2019-03-13 
12:46:39.632156

/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: FAILED 
assert(0 == "unexpected aio error")



 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)

 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x560a8fadd2a0]

 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]

 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]

 4: (()+0x7e25) [0x7f5c0efb0e25]

 5: 

Re: [ceph-users] Ceph block storage - block.db useless? [solved]

2019-03-12 Thread Benjamin Zapiec
Yeah thank you xD

you just answered another thread where i asked for the kv-sync thread
And consider this done so i know what to do now.

Thank you



Am 12.03.19 um 14:43 schrieb Mark Nelson:
> Our default of 4 256MB WAL buffers is arguably already too big. On one
> hand we are making these buffers large to hopefully avoid short lived
> data going into the DB (pglog writes).  IE if a pglog write comes in and
> later a tombstone invalidating it comes in, we really want those to land
> in the same WAL log to avoid that write being propagated into the DB. 
> On the flip side, large buffers mean that there's more work that rocksdb
> has to perform to compare keys to get everything ordered.  This is done
> in the kv_sync_thread where we often bottleneck on small random write
> workloads:
> 
> 
>     | | |   |   |   | + 13.30%
> rocksdb::InlineSkipList const&>::Insert
> 
> So on one hand we want large buffers to avoid short lived data going
> into the DB, and on the other hand we want small buffers to avoid large
> amounts of comparisons eating CPU, especially in CPU limited environments.
> 
> 
> Mark
> 
> 
> 
> On 3/12/19 8:25 AM, Benjamin Zapiec wrote:
>> May I configure the size of WAL to increase block.db usage?
>> For example I configure 20GB I would get an usage of about 48GB on L3.
>>
>> Or should I stay with ceph defaults?
>> Is there a maximal size for WAL that makes sense?
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Sorry i mean L2


Am 12.03.19 um 14:25 schrieb Benjamin Zapiec:
> May I configure the size of WAL to increase block.db usage?
> For example I configure 20GB I would get an usage of about 48GB on L3.
> 
> Or should I stay with ceph defaults?
> Is there a maximal size for WAL that makes sense?
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Benjamin Zapiec  (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de

* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968



Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
Artikel 13

und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:

https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
formlose Anfrage.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
May I configure the size of WAL to increase block.db usage?
For example I configure 20GB I would get an usage of about 48GB on L3.

Or should I stay with ceph defaults?
Is there a maximal size for WAL that makes sense?



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Okay so i think i don't understand the mechanism of Ceph's RocksDB if
it should place data on block.db or not.

So the amount of data in block.db depends on the wal size?
I thought it depends on the objects saved to the storage.
In this case, say we have a 1GB file, it would have a size
of 10GB in L2.

But if it depends on the wal i would have the same benefit using
a block.db with the size of 30GB instead of 250GB. Is that correct?


Best regards

> block.db is very unlikely to ever grow to 250GB with a 6TB data device.
>
> However, there seems to be a funny "issue" with all block.db sizes
> except 4, 30, and 286 GB being useless, because RocksDB puts the data on
> the fast storage only if it thinks the whole LSM level will fit there.
> Ceph's RocksDB options set WAL to 1GB and leave the default
> max_bytes_for_level_base unchanged so it's 256MB. Multiplier is also
> left at 10. So WAL=1GB, L1=256MB, L2=2560MB, L3=25600MB. So RocksDB will
> put L2 to the block.db only if the block.db's size exceeds
> 1GB+256MB+2560MB (which rounds up to 4GB), and it will put L3 to the
> block.db only if its size exceeds 1GB+256MB+2560MB+25600MB = almost 30GB.
>
>> Hello,
>>
>> i was wondering about ceph block.db to be nearly empty and I started
>> to investigate.
>>
>> The recommendations from ceph are that block.db should be at least
>> 4% the size of block. So my OSD configuration looks like this:
>>
>> wal.db   - not explicit specified
>> block.db - 250GB of SSD storage
>> block- 6TB
>>
>> Since wal is written to block.db if not available i didn't configured
>> wal. With the size of 250GB we are slightly above 4%.
>>
>> So everything should be "fine". But the block.db only contains
>> about 10GB of data.
>>
>> If figured out that an object in block.db gets "amplified" so
>> the space consumption is much higher than the object itself
>> would need.
>>
>> I'm using ceph as storage backend for openstack and raw images
>> with a size of 10GB and more are common. So if i understand
>> this correct i have to consider that a 10GB images may
>> consume 100GB of block.db.
>>
>> Beside the facts that the image may have a size of 100G and
>> they are only used for initial reads unitl all changed
>> blocks gets written to a SSD-only pool i was question me
>> if i need a block.db and if it would be better to
>> save the amount of SSD space used for block.db and just
>> create a 10GB wal.db?
>>
>> Has anyone done this before? Anyone who had sufficient SSD space
>> but stick with wal.db to save SSD space?
>>
>> If i'm correct the block.db will never be used for huge images.
>> And even though it may be used for one or two images does this make
>> sense? The images are used initially to read all unchanged blocks from
>> it. After a while each VM should access the images pool less and
>> less due to the changes made in the VM.
>>
>>
>> Any thoughts about this?
>>
>>
>> Best regards
>>
>> --
>> Benjamin Zapiec  (System Engineer)
>> * GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
>> * Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
>> * http://www.GONICUS.de
>>
>> * Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
>> * Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
>> * Vorsitzender des Beirats: Juergen Michels
>> * Amtsgericht Arnsberg * HRB 1968
>>
>>
>>
>> Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
>> Artikel 13
>>
>> und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:
>>
>> https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
>> formlose Anfrage.
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Benjamin Zapiec  (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de

* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968



Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
Artikel 13

und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:

https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
formlose Anfrage.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Hello,

i was wondering about ceph block.db to be nearly empty and I started
to investigate.

The recommendations from ceph are that block.db should be at least
4% the size of block. So my OSD configuration looks like this:

wal.db   - not explicit specified
block.db - 250GB of SSD storage
block- 6TB

Since wal is written to block.db if not available i didn't configured
wal. With the size of 250GB we are slightly above 4%.

So everything should be "fine". But the block.db only contains
about 10GB of data.

If figured out that an object in block.db gets "amplified" so
the space consumption is much higher than the object itself
would need.

I'm using ceph as storage backend for openstack and raw images
with a size of 10GB and more are common. So if i understand
this correct i have to consider that a 10GB images may
consume 100GB of block.db.

Beside the facts that the image may have a size of 100G and
they are only used for initial reads unitl all changed
blocks gets written to a SSD-only pool i was question me
if i need a block.db and if it would be better to
save the amount of SSD space used for block.db and just
create a 10GB wal.db?

Has anyone done this before? Anyone who had sufficient SSD space
but stick with wal.db to save SSD space?

If i'm correct the block.db will never be used for huge images.
And even though it may be used for one or two images does this make
sense? The images are used initially to read all unchanged blocks from
it. After a while each VM should access the images pool less and
less due to the changes made in the VM.


Any thoughts about this?


Best regards

-- 
Benjamin Zapiec  (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de

* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968



Wir erfüllen unsere Informationspflichten zum Datenschutz gem. der
Artikel 13

und 14 DS-GVO durch Veröffentlichung auf unserer Internetseite unter:

https://www.gonicus.de/datenschutz oder durch Zusendung auf Ihre
formlose Anfrage.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: Multi-Site Cluster RGW Sync issues

2019-02-28 Thread Benjamin . Zieglmeier
The output has 57000 lines (and growing). I’ve uploaded the output to:

https://gist.github.com/zieg8301/7e6952e9964c1e0964fb63f61e7b7be7

Thanks,
Ben

From: Matthew H 
Date: Wednesday, February 27, 2019 at 11:02 PM
To: "Benjamin. Zieglmeier" 
Cc: "ceph-users@lists.ceph.com" 
Subject: [EXTERNAL] Re: Multi-Site Cluster RGW Sync issues

Hey Ben,

Could you include the following?


radosgw-admin mdlog list

Thanks,


From: ceph-users  on behalf of 
Benjamin.Zieglmeier 
Sent: Tuesday, February 26, 2019 9:33 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Multi-Site Cluster RGW Sync issues


Hello,



We have a two zone multisite configured Luminous 12.2.5 cluster. Cluster has 
been running for about 1 year, and has only ~140G of data (~350k objects). We 
recently added a third zone to the zonegroup to facilitate a migration out of 
an existing site. Sync appears to be working and running `radosgw-admin sync 
status` and `radosgw-admin sync status –rgw-zone=` reflects the 
same. The problem we are having, is that once the data replication completes, 
one of the rgws serving the new zone has the radosgw process consuming all the 
CPU, and the rgw log is flooded with “ERROR: failed to read mdlog info with (2) 
No such file or directory”, to the amount of 1000 log entries/sec.



This has been happening for days on end now, and are concerned about what is 
going on between these two zones. Logs are constantly filling up on the rgws 
and we are out of ideas. Are they trying to catch up on metadata? After 
extensive searching and racking our brains, we are unable to figure out what is 
causing all these requests (and errors) between the two zones.



Thanks,

Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-Site Cluster RGW Sync issues

2019-02-26 Thread Benjamin . Zieglmeier
Hello,

We have a two zone multisite configured Luminous 12.2.5 cluster. Cluster has 
been running for about 1 year, and has only ~140G of data (~350k objects). We 
recently added a third zone to the zonegroup to facilitate a migration out of 
an existing site. Sync appears to be working and running `radosgw-admin sync 
status` and `radosgw-admin sync status –rgw-zone=` reflects the 
same. The problem we are having, is that once the data replication completes, 
one of the rgws serving the new zone has the radosgw process consuming all the 
CPU, and the rgw log is flooded with “ERROR: failed to read mdlog info with (2) 
No such file or directory”, to the amount of 1000 log entries/sec.

This has been happening for days on end now, and are concerned about what is 
going on between these two zones. Logs are constantly filling up on the rgws 
and we are out of ideas. Are they trying to catch up on metadata? After 
extensive searching and racking our brains, we are unable to figure out what is 
causing all these requests (and errors) between the two zones.

Thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] thread bstore_kv_sync - high disk utilization

2019-02-22 Thread Benjamin Zapiec
Hello,

I couldn't find anything satisfying that could
clearly describe what this thread does.

And if the average IO wait for the block device
(ca. 60%) is normal on a SSD device. Even though
when there is no/not much client workload.

Output from iotop:

---

9890 be/4 ceph0.00 B/s  817.11 K/s  0.00 % 71.91 % ceph-osd -f
--cluster ceph --id~--setgroup ceph [bstore_kv_sync]

---


I'm using ceph block storage to connect with an Openstack
environment. The ceph cluster consists of three
"identical" Machines driven by 12 OSDs:


ID CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
-1   41.62500 root default
-7   13.87500 host storage1
 7   hdd  5.26299 osd.7 up  1.0 1.0 (6TB)
 9   hdd  5.45799 osd.9 up  1.0 1.0 (6TB)
 2   ssd  1.81898 osd.2 up  1.0 1.0 (2TB)
 4   ssd  1.33499 osd.4 up  1.0 1.0 (2TB)
-3   13.87500 host storage2
 8   hdd  5.26299 osd.8 up  1.0 1.0 (6TB)
10   hdd  5.45799 osd.10up  1.0 1.0 (6TB)
 1   ssd  1.81898 osd.1 up  1.0 1.0 (2TB)
 5   ssd  1.33499 osd.5 up  1.0 1.0 (2TB)
-5   13.87500 host storage3
 6   hdd  5.26299 osd.6 up  1.0 1.0 (6TB)
11   hdd  5.45799 osd.11up  1.0 1.0 (6TB)
 0   ssd  1.81898 osd.0 up  1.0 1.0 (2TB)
 3   ssd  1.33499 osd.3 up  1.0 1.0 (2TB)


The network configuration looks like this:

2x 10Gbit(eth1/eth2) -> Bond0 -> cluster network/backend
2x 10Gbit(eth3/eth4) -> Bond1 -> public network/mons and openstack

The virtual machines are claimed to have poor performance
so i started investigation. Using "top" i saw IO wait was very high
(above wa: 50.0).
To be sure that the load on the cluster is as low as
possible i was doing some research in the evening when all
of the VMs were idle (but still up and running).
And the DISK utilization for the first SSD OSD on
all hosts didn't sink as I would expect it to do.

Even with "ceph -s" telling me that there are just <2Mbit/s RW
and < 1k IOPS client traffic the utilization on the SSD was
very high.

For performance reasons i have created a replicated_rule
just using SSDs for the vms pool. And to save some space
i reduced replication from 3 to 2.

But without the rule i could see the same behaviour. The
only different was that it wasn't utilising the SSD but
the HDD instead.

One possible interesting thing is that the monitors run
next to the OSDs on the same hosts.

In general the OSDs on SSD don't use db.wal or db.block.
The OSDs on HDD do have a separate db.block partition on
SSD (250GB - but db.block just contains about 5GB good question
why this is not used ;-) ).

Any suggestions on the high disk utilization by bstore_kv_sync?



Best regards

-- 
Benjamin Zapiec  (System Engineer)
* GONICUS GmbH * Moehnestrasse 55 (Kaiserhaus) * D-59755 Arnsberg
* Tel.: +49 2932 916-0 * Fax: +49 2932 916-245
* http://www.GONICUS.de

* Sitz der Gesellschaft: Moehnestrasse 55 * D-59755 Arnsberg
* Geschaeftsfuehrer: Rainer Luelsdorf, Alfred Schroeder
* Vorsitzender des Beirats: Juergen Michels
* Amtsgericht Arnsberg * HRB 1968



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Matt Benjamin
Hi Marc,

I'm not actually certain whether the traditional ACLs permit any
solution for that, but I believe with bucket policy, you can achieve
precise control within and across tenants, for any set of desired
resources (buckets).

Matt

On Thu, Jan 24, 2019 at 3:18 PM Marc Roos  wrote:
>
>
> It is correct that it is NOT possible for s3 subusers to have different
> permissions on folders created by the parent account?
> Thus the --access=[ read | write | readwrite | full ] is for everything
> the parent has created, and it is not possible to change that for
> specific folders/buckets?
>
> radosgw-admin subuser create --uid='Company$archive' --subuser=testuser
> --key-type=s3
>
> Thus if archive created this bucket/folder structure.
> └── bucket
> ├── folder1
> ├── folder2
> └── folder3
> └── folder4
>
> It is not possible to allow testuser to only write in folder2?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-12-05 Thread Matt Benjamin
Agree, please file a tracker issue with the info, we'll prioritize
reproducing it.

Cheers,

Matt
On Wed, Dec 5, 2018 at 11:42 AM Florian Haas  wrote:
>
> On 05/12/2018 17:35, Maxime Guyot wrote:
> > Hi Florian,
> >
> > Thanks for the help. I did further testing and narrowed it down to
> > objects that have been uploaded when the bucket has versioning enabled.
> > Objects created before that are not affected: all metadata operations
> > are still possible.
> >
> > Here is a simple way to reproduce
> > this: http://paste.openstack.org/show/736713/
> > And here is the snippet to easily turn on/off S3 versioning on a given
> > bucket: https://gist.github.com/Miouge1/b8ae19b71411655154e74e609b61f24e
> >
> > Cheers,
> > Maxime
>
> All right, by my reckoning this would very much look like a bug then.
> You probably want to chuck an issue for this into
> https://tracker.ceph.com/projects/rgw.
>
> Out of curiosity, are you also seeing Swift metadata getting borked when
> you're enabling *Swift* versioning? (Wholly different animal, I know,
> but still worth taking a look I think.)
>
> Cheers
> Florian
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi tenanted radosgw and existing Keystone users/tenants

2018-12-05 Thread Matt Benjamin
This capability is stable and should merge to master shortly.

Matt
On Wed, Dec 5, 2018 at 11:24 AM Florian Haas  wrote:
>
> Hi Mark,
>
> On 04/12/2018 04:41, Mark Kirkwood wrote:
> > Hi,
> >
> > I've set up a Luminous RGW with Keystone integration, and subsequently set
> >
> > rgw keystone implicit tenants = true
> >
> > So now all newly created users/tenants (or old ones that never accessed
> > RGW) get their own namespaces. However there are some pre-existing users
> > that have created buckets and objects - and these are in the global
> > namespace. Is there any way to move the existing buckets and objects to
> > private namespaces and change these users to use said private namespaces?
>
> It looks like you're running into the issue described in this PR:
> https://github.com/ceph/ceph/pull/23994
>
> Sooo... bit complicated, fix still pending.
>
> Cheers,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inexplicably slow bucket listing at top level

2018-11-05 Thread Matt Benjamin
Hi,

I just did some testing to confirm, and can report, with "mc ls -r"
appears to be definitely inducing latency related to Unix path
emulation.

Matt

On Mon, Nov 5, 2018 at 3:10 PM, J. Eric Ivancich  wrote:
> I did make an inquiry and someone here does have some experience w/ the
> mc command -- minio client. We're curious how "ls -r" is implemented
> under mc. Does it need to get a full listing and then do some path
> parsing to produce nice output? If so, it may be playing a role in the
> delay as well.
>
> Eric
>
> On 9/26/18 5:27 PM, Graham Allan wrote:
>> I have one user bucket, where inexplicably (to me), the bucket takes an
>> eternity to list, though only on the top level. There are two
>> subfolders, each of which lists individually at a completely normal
>> speed...
>>
>> eg (using minio client):
>>
>>> [~] % time ./mc ls fried/friedlab/
>>> [2018-09-26 16:15:48 CDT] 0B impute/
>>> [2018-09-26 16:15:48 CDT] 0B wgs/
>>>
>>> real1m59.390s
>>>
>>> [~] % time ./mc ls -r fried/friedlab/
>>> ...
>>> real 3m18.013s
>>>
>>> [~] % time ./mc ls -r fried/friedlab/impute
>>> ...
>>> real 0m13.512s
>>>
>>> [~] % time ./mc ls -r fried/friedlab/wgs
>>> ...
>>> real 0m6.437s
>>
>> The bucket has about 55k objects total, with 32 index shards on a
>> replicated ssd pool. It shouldn't be taking this long but I can't
>> imagine what could be causing this. I haven't found any others behaving
>> this way. I'd think it has to be some problem with the bucket index, but
>> what...?
>>
>> I did naively try some "radosgw-admin bucket check [--fix]" commands
>> with no change.
>>
>> Graham
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.2.5 - crushable RGW

2018-10-24 Thread Matt Benjamin
 buffer
>>0/ 1 timer
>>0/ 1 filer
>>0/ 1 striper
>>0/ 1 objecter
>>0/ 5 rados
>>0/ 5 rbd
>>0/ 5 rbd_mirror
>>0/ 5 rbd_replay
>>0/ 5 journaler
>>0/ 5 objectcacher
>>0/ 5 client
>>1/ 5 osd
>>0/ 5 optracker
>>0/ 5 objclass
>>1/ 3 filestore
>>1/ 3 journal
>>0/ 5 ms
>>1/ 5 mon
>>0/10 monc
>>1/ 5 paxos
>>0/ 5 tp
>>1/ 5 auth
>>1/ 5 crypto
>>1/ 1 finisher
>>1/ 1 reserver
>>1/ 5 heartbeatmap
>>1/ 5 perfcounter
>>1/ 5 rgw
>>1/10 civetweb
>>1/ 5 javaclient
>>1/ 5 asok
>>1/ 1 throttle
>>0/ 0 refs
>>1/ 5 xio
>>1/ 5 compressor
>>1/ 5 bluestore
>>1/ 5 bluefs
>>1/ 3 bdev
>>1/ 5 kstore
>>4/ 5 rocksdb
>>4/ 5 leveldb
>>4/ 5 memdb
>>1/ 5 kinetic
>>1/ 5 fuse
>>1/ 5 mgr
>>1/ 5 mgrc
>>1/ 5 dpdk
>>1/ 5 eventtrace
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent 1
>>   max_new 1000
>>   log_file /var/log/ceph/ceph-client.rgw.ceph-node-01.log
>> --- end dump of recent events ---
>> 2018-07-13 05:22:16.176189 7fcce6575700 -1 *** Caught signal (Aborted) **
>>  in thread 7fcce6575700 thread_name:civetweb-worker
>>
>>
>>
>> Any trick to enable rgw_dynamic_resharding only during offpeah hours? Is
>> 'rgw_dynamic_resharding' a injectable setting with no need to restart RGWs?
>> So far, I only mananged to change it via configuration file with RGW service
>> restart.
>>
>> Regarding `ERROR: flush_read_list(): d->client_cb->handle_data() returned
>> -5`, I'll try to increaste timeouts on nginx side.
>>
>> Any help with this error `/build/ceph-12.2.5/src/common/buffer.cc: 1967:
>> FAILED assert(len+off <= bp.length())` which seems to directly impact RGW
>> service state ? Could it be caused by lot of GET requests for big files ? It
>> happens only after the flood of `ERROR: flush_read_list()`.
>>
>> Thanks
>> Jakub
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Apply bucket policy to bucket for LDAP user: what is the correct identifier for principal

2018-10-11 Thread Matt Benjamin
right, the user can be the dn component or something else projected
from the entry, details in the docs

Matt

On Thu, Oct 11, 2018 at 1:26 PM, Adam C. Emerson  wrote:
> Ha Son Hai  wrote:
>> Hello everyone,
>> I try to apply the bucket policy to my bucket for LDAP user but it doesn't 
>> work.
>> For user created by radosgw-admin, the policy works fine.
>>
>> {
>>
>>   "Version": "2012-10-17",
>>
>>   "Statement": [{
>>
>> "Effect": "Allow",
>>
>> "Principal": {"AWS": ["arn:aws:iam:::user/radosgw-user"]},
>>
>> "Action": "s3:*",
>>
>> "Resource": [
>>
>>   "arn:aws:s3:::shared-tenant-test",
>>
>>   "arn:aws:s3:::shared-tenant-test/*"
>>
>> ]
>>
>>   }]
>>
>> }
>
> LDAP users essentially are RGW users, so it should be this same
> format. As I understand RGW's LDAP interface (I have not worked with
> LDAP personally), every LDAP users get a corresponding RGW user whose
> name is derived from rgw_ldap_dnattr, often 'uid' or 'cn', but this is
> dependent on site.
>
> If you, can check that part of configuration, and if that doesn't work
> if you'll send some logs I'll take a look. If something fishy is going
> on we can try opening a bug.
>
> Thank you.
>
> --
> Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@OFTC, Actinic@Freenode
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OMAP size on disk

2018-10-09 Thread Matt Benjamin
Hi Luis,

There are currently open issues with space reclamation after dynamic
bucket index resharding, esp. http://tracker.ceph.com/issues/34307

Changes are being worked on to address this, and to permit
administratively reclaiming space.

Matt

On Tue, Oct 9, 2018 at 5:50 AM, Luis Periquito  wrote:
> Hi all,
>
> I have several clusters, all running Luminous (12.2.7) proving S3
> interface. All of them have enabled dynamic resharding and is working.
>
> One of the newer clusters is starting to give warnings on the used
> space for the OMAP directory. The default.rgw.buckets.index pool is
> replicated with 3x copies of the data.
>
> I created a new crush ruleset to only use a few well known SSDs, and
> the OMAP directory size changed as expected: if I set the OSD as out
> and them tell to compact, the size of the OMAP will shrink. If I set
> the OSD as in the OMAP will grow to its previous state. And while the
> backfill is going we get loads of key recoveries.
>
> Total physical space for OMAP in the OSDs that have them is ~1TB, so
> given a 3x replica ~330G before replication.
>
> The data size for the default.rgw.buckets.data is just under 300G.
> There is one bucket who has ~1.7M objects and 22 shards.
>
> After deleting that bucket the size of the database didn't change -
> even after running gc process and telling the OSD to compact its
> database.
>
> This is not happening in older clusters, i.e created with hammer.
> Could this be a bug?
>
> I looked at getting all the OMAP keys and sizes
> (https://ceph.com/geen-categorie/get-omap-keyvalue-size/) and they add
> up to close the value I expected them to take, looking at the physical
> storage.
>
> Any ideas where to look next?
>
> thanks for all the help.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] omap vs. xattr in librados

2018-09-12 Thread Benjamin Cherian
Greg, Paul,

Thank you for the feedback. This has been very enlightening. One last
question (for now at least). Are there any expected performance impacts
from having I/O to multiple pools from the same client? (Given how RGW and
CephFS store metadata, I would hope not, but I thought I'd ask.) Based on
everything that has been described it makes sense to have metadata heavy
objects (i.e., objects with a large fraction of kv data) to be in a
replicated pool while putting the large blobs in an EC pool.

Thanks again,
Ben

On Wed, Sep 12, 2018 at 1:05 PM Gregory Farnum  wrote:

> On Tue, Sep 11, 2018 at 5:32 PM Benjamin Cherian <
> benjamin.cher...@gmail.com> wrote:
>
>> Ok, that’s good to know. I was planning on using an EC pool. Maybe I'll
>> store some of the larger kv pairs in their own objects or move the metadata
>> into it's own replicated pool entirely. If the storage mechanism is the
>> same, is there a reason xattrs are supported and omap is not? (Or is there
>> some hidden cost to storing kv pairs in an EC pool I’m unaware of, e.g.,
>> does the kv data get replicated across all OSDs being used for a PG or
>> something?)
>>
>
> Yeah, if you're on an EC pool there isn't a good way to erasure-code
> key-value data. So we willingly replicate xattrs across all the nodes
> (since they are presumed to be small and limited in number — I think we
> actually have total limits, but not sure?) but don't support omap at all
> (as it's presumed to be a lot of data).
>
> Do note that if small objects are a large proportion of your data you
> might prefer to put them in a replicated pool — in an EC pool you'd need
> very small chunk sizes to get any non-replication happening anyway, and for
> something in the 10KB range at a reasonable k+m you'd be dominated by
> metadata size anyway.
> -Greg
>
>
>>
>> Thanks,
>> Ben
>>
>> On Tue, Sep 11, 2018 at 1:46 PM Patrick Donnelly 
>> wrote:
>>
>>> On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian
>>>  wrote:
>>> > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum 
>>> wrote:
>>> >>
>>> >> 
>>> >> In general, if the key-value storage is of unpredictable or
>>> non-trivial
>>> >> size, you should use omap.
>>> >>
>>> >> At the bottom layer where the data is actually stored, they're likely
>>> to
>>> >> be in the same places (if using BlueStore, they are the same — in
>>> FileStore,
>>> >> a rados xattr *might* be in the local FS xattrs, or it might not). It
>>> is
>>> >> somewhat more likely that something stored in an xattr will get
>>> pulled into
>>> >> memory at the same time as the object's internal metadata, but that
>>> only
>>> >> happens if it's quite small (think the xfs or ext4 xattr rules).
>>> >
>>> >
>>> > Based on this description, if I'm planning on using Bluestore, there
>>> is no
>>> > particular reason to ever prefer using xattrs over omap (outside of
>>> ease of
>>> > use in the API), correct?
>>>
>>> You may prefer xattrs on bluestore if the metadata is small and you
>>> may need to store the xattrs on an EC pool. omap is not supported on
>>> ecpools.
>>>
>>> --
>>> Patrick Donnelly
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] omap vs. xattr in librados

2018-09-11 Thread Benjamin Cherian
Ok, that’s good to know. I was planning on using an EC pool. Maybe I'll
store some of the larger kv pairs in their own objects or move the metadata
into it's own replicated pool entirely. If the storage mechanism is the
same, is there a reason xattrs are supported and omap is not? (Or is there
some hidden cost to storing kv pairs in an EC pool I’m unaware of, e.g.,
does the kv data get replicated across all OSDs being used for a PG or
something?)

Thanks,
Ben

On Tue, Sep 11, 2018 at 1:46 PM Patrick Donnelly 
wrote:

> On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian
>  wrote:
> > On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum 
> wrote:
> >>
> >> 
> >> In general, if the key-value storage is of unpredictable or non-trivial
> >> size, you should use omap.
> >>
> >> At the bottom layer where the data is actually stored, they're likely to
> >> be in the same places (if using BlueStore, they are the same — in
> FileStore,
> >> a rados xattr *might* be in the local FS xattrs, or it might not). It is
> >> somewhat more likely that something stored in an xattr will get pulled
> into
> >> memory at the same time as the object's internal metadata, but that only
> >> happens if it's quite small (think the xfs or ext4 xattr rules).
> >
> >
> > Based on this description, if I'm planning on using Bluestore, there is
> no
> > particular reason to ever prefer using xattrs over omap (outside of ease
> of
> > use in the API), correct?
>
> You may prefer xattrs on bluestore if the metadata is small and you
> may need to store the xattrs on an EC pool. omap is not supported on
> ecpools.
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] omap vs. xattr in librados

2018-09-11 Thread Benjamin Cherian
On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum  wrote:

> 
> In general, if the key-value storage is of unpredictable or non-trivial
> size, you should use omap.
>
> At the bottom layer where the data is actually stored, they're likely to
> be in the same places (if using BlueStore, they are the same — in
> FileStore, a rados xattr *might* be in the local FS xattrs, or it might
> not). It is somewhat more likely that something stored in an xattr will get
> pulled into memory at the same time as the object's internal metadata, but
> that only happens if it's quite small (think the xfs or ext4 xattr rules).
>

Based on this description, if I'm planning on using Bluestore, there is no
particular reason to ever prefer using xattrs over omap (outside of ease of
use in the API), correct?

Thanks,
Ben

On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum  wrote:

> On Tue, Sep 11, 2018 at 7:48 AM Benjamin Cherian <
> benjamin.cher...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm interested in writing a relatively simple application that would use
>> librados for storage. Are there recommendations for when to use the omap as
>> opposed to an xattr? In theory, you could use either a set of xattrs or an
>> omap as a kv store associated with a specific object. Are there
>> recommendations for what kind of data xattrs and omaps are intended to
>> store?
>>
>
> In general, if the key-value storage is of unpredictable or non-trivial
> size, you should use omap.
>
> At the bottom layer where the data is actually stored, they're likely to
> be in the same places (if using BlueStore, they are the same — in
> FileStore, a rados xattr *might* be in the local FS xattrs, or it might
> not). It is somewhat more likely that something stored in an xattr will get
> pulled into memory at the same time as the object's internal metadata, but
> that only happens if it's quite small (think the xfs or ext4 xattr rules).
>
> If you have 250KB of key-value data, omap is definitely the place for it.
> -Greg
>
>
>> Just for background, I have some metadata i'd like to associate with each
>> object (total size of all kv pairs in object metadata is ~250k, some values
>> a few bytes, while others are 10-20k.) The object will store actual data (a
>> relatively large FP array) as a binary blob (~3-5 MB).
>>
>>
>> Thanks,
>> Ben
>> --
>> Regards,
>>
>> Benjamin Cherian
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] omap vs. xattr in librados

2018-09-11 Thread Benjamin Cherian
Hi,

I'm interested in writing a relatively simple application that would use
librados for storage. Are there recommendations for when to use the omap as
opposed to an xattr? In theory, you could use either a set of xattrs or an
omap as a kv store associated with a specific object. Are there
recommendations for what kind of data xattrs and omaps are intended to
store?

Just for background, I have some metadata i'd like to associate with each
object (total size of all kv pairs in object metadata is ~250k, some values
a few bytes, while others are 10-20k.) The object will store actual data (a
relatively large FP array) as a binary blob (~3-5 MB).


Thanks,
Ben
-- 
Regards,

Benjamin Cherian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] omap vs. xattr in librados

2018-09-10 Thread Benjamin Cherian
Hi,

I'm interested in writing a relatively simple application that would use
librados for storage. Are there recommendations for when to use the omap as
opposed to an xattr? In theory, you could use either a set of xattrs or an
omap as a kv store associated with a specific object. Are there
recommendations for what kind of data xattrs and omaps are intended to
store?

Just for background, I have some metadata i'd like to associate with each
object (total size of all kv pairs in object metadata is ~250k, some values
a few bytes, while others are 10-20k.) The object will store actual data (a
relatively large FP array) as a binary blob (~3-5 MB).

Thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OMAP warning ( again )

2018-09-01 Thread Matt Benjamin
",
> "bucket_id": "default.7320.3",
> "tenant": "",
> "explicit_placement": {
> "data_pool": ".rgw.buckets",
> "data_extra_pool": ".rgw.buckets.extra",
> "index_pool": ".rgw.buckets.index"
> }
> },
> "creation_time": "2016-03-09 17:23:50.00Z",
> "owner": "zz",
> "flags": 0,
> "zonegroup": "default",
> "placement_rule": "default-placement",
> "has_instance_obj": "true",
> "quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1024,
> "max_size_kb": 0,
> "max_objects": -1
> },
> "num_shards": 0,
> "bi_shard_hash_type": 0,
> "requester_pays": "false",
> "has_website": "false",
> "swift_versioning": "false",
> "swift_ver_location": "",
> "index_type": 0,
> "mdsearch_config": [],
> "reshard_status": 0,
> "new_bucket_instance_id": ""
>
> When I run that shard setting to change the number of shards:
> "radosgw-admin reshard add --bucket=BKTEST --num-shards=2"
>
> Then run to get the status:
> "radosgw-admin reshard list"
>
> [
> {
> "time": "2018-08-01 21:58:13.306381Z",
> "tenant": "",
> "bucket_name": "BKTEST",
> "bucket_id": "default.7320.3",
> "new_instance_id": "",
> "old_num_shards": 1,
> "new_num_shards": 2
> }
> ]
>
> If it was 0, why does it say old_num_shards was 1?
>
> -Brent
>
> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Tuesday, July 31, 2018 9:07 PM
> To: Brent Kennedy 
> Cc: ceph-users 
> Subject: Re: [ceph-users] OMAP warning ( again )
>
> Search the cluster log for 'Large omap object found' for more details.
>
> On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy  wrote:
>> Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning
>> message, then upgraded to 12.2.7 and the message went away.  I just
>> added four OSDs to balance out the cluster ( we had some servers with
>> fewer drives in them; jbod config ) and now the “1 large omap objects”
>> warning message is back.  I did some googlefoo to try to figure out
>> what it means and then how to correct it, but the how to correct it is a bit 
>> vague.
>>
>>
>>
>> We use rados gateways for all storage, so everything is in the
>> .rgw.buckets pool, which I gather from research is why we are getting
>> the warning message ( there are millions of objects in there ).
>>
>>
>>
>> Is there an if/then process to clearing this error message?
>>
>>
>>
>> Regards,
>>
>> -Brent
>>
>>
>>
>> Existing Clusters:
>>
>> Test: Luminous 12.2.7 with 3 osd servers, 1 mon/man, 1 gateway ( all
>> virtual
>> )
>>
>> US Production: Firefly with 4 osd servers, 3 mons, 3 gateways behind
>> haproxy LB
>>
>> UK Production: Luminous 12.2.7 with 8 osd servers, 3 mons/man, 3
>> gateways behind haproxy LB
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> 
>
> NOTICE AND DISCLAIMER
> This e-mail (including any attachments) is intended for the above-named 
> person(s). If you are not the intended recipient, notify the sender 
> immediately, delete this email from your system and do not disclose or use 
> for any purpose. We may monitor all incoming and outgoing emails in line with 
> current legislation. We have taken steps to ensure that this email and 
> attachments are free from any virus, but it remains your responsibility to 
> ensure that viruses do not adversely affect you
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Librados Keyring Issues

2018-08-20 Thread Benjamin Cherian
Ok...after a bit more searching. I realized you can specify the username
directly in the constructor of the "Rados" object. I'm still not entirely
clear how one would do it through the config file, but this works for me as
well.

import rados
cluster = rados.Rados(conffile="python_ceph.conf", rados_id="dms")
cluster.connect() # No exception when using keyring containing key for dms
user!

Regards,

Benjamin Cherian

On Sun, Aug 19, 2018 at 9:55 PM, Benjamin Cherian <
benjamin.cher...@gmail.com> wrote:

> Hi David,
>
> Thanks for the reply...I had thought there might be something simple like
> this, do you know what key I should use in the config file to specify the
> user? I didn't see anything related to user specification in the
> documentation.
>
> Thanks,
> Ben
>
> On Sun, Aug 19, 2018 at 8:02 PM, David Turner 
> wrote:
>
>> You are not specifying which user you are using. Your config file
>> specifies the keyring, but it's still trying to use the default user admin.
>> If you specify that in your python you'll be good to go.
>>
>> On Sun, Aug 19, 2018, 9:17 PM Benjamin Cherian <
>> benjamin.cher...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to write a simple test application using the Python 3 RADOS
>>> API. I've made a separate keyring for my application with the same
>>> permissions as the admin keyring, but I can't seem to use it to connect to
>>> my cluster. The only keyring that seems to work is the client.admin
>>> keyring. Does anyone have any experience with this issue?
>>>
>>> Cluster info:
>>> OS: Ubuntu 18.04.1
>>> Ceph: Mimic 13.2.1 (from Ceph repository)
>>>
>>> Attempting to connect to the cluster using python3-rados or python-rados
>>> results in the following error:
>>>
>>> >>> import rados
>>> >>> cluster = rados.Rados(conffile="python_ceph.conf")
>>> >>> cluster.connect()
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "rados.pyx", line 895, in rados.Rados.connect
>>>   File "rados.pyx", line 474, in rados.make_ex
>>> TypeError: InvalidArgumentError does not take keyword arguments
>>>
>>> Contents of python_ceph.conf:
>>>
>>> [global]
>>> cluster network = 0.0.0.0/0
>>> fsid = 518403a0-6b6f-42b8-be99-e58788bee5c2
>>> mon host = 
>>> mon initial members = 
>>> mon_allow_pool_delete = True
>>> osd crush chooseleaf type = 0
>>> public network = 0.0.0.0/0
>>>
>>> keyring = /etc/ceph/ceph.client.dms.keyring # Everything works ok if i
>>> use client.adming.keyring
>>>
>>>
>>>
>>> Output of ceph auth ls
>>>
>>> ...
>>> client.admin
>>> key: 
>>> caps: [mds] allow *
>>> caps: [mgr] allow *
>>> caps: [mon] allow *
>>> caps: [osd] allow *
>>> ...
>>> client.dms
>>> key: 
>>> caps: [mgr] allow *
>>> caps: [mon] allow *
>>> caps: [osd] allow *
>>>
>>> ...
>>>
>>>
>>> What's even more odd is that I can use client.dms keyring with the ceph
>>> command line program without issues...
>>> e.g., "ceph --user dms status" does not result in any errors has the
>>> same output as "ceph --user admin status"
>>>
>>>
>>> Does anyone have any thoughts on what could be causing this issue?
>>>
>>>
>>> Thanks,
>>> Ben
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Librados Keyring Issues

2018-08-19 Thread Benjamin Cherian
 Hi David,

Thanks for the reply...I had thought there might be something simple like
this, do you know what key I should use in the config file to specify the
user? I didn't see anything related to user specification in the
documentation.

Thanks,
Ben

On Sun, Aug 19, 2018 at 8:02 PM, David Turner  wrote:

> You are not specifying which user you are using. Your config file
> specifies the keyring, but it's still trying to use the default user admin.
> If you specify that in your python you'll be good to go.
>
> On Sun, Aug 19, 2018, 9:17 PM Benjamin Cherian 
> wrote:
>
>> Hi,
>>
>> I'm trying to write a simple test application using the Python 3 RADOS
>> API. I've made a separate keyring for my application with the same
>> permissions as the admin keyring, but I can't seem to use it to connect to
>> my cluster. The only keyring that seems to work is the client.admin
>> keyring. Does anyone have any experience with this issue?
>>
>> Cluster info:
>> OS: Ubuntu 18.04.1
>> Ceph: Mimic 13.2.1 (from Ceph repository)
>>
>> Attempting to connect to the cluster using python3-rados or python-rados
>> results in the following error:
>>
>> >>> import rados
>> >>> cluster = rados.Rados(conffile="python_ceph.conf")
>> >>> cluster.connect()
>> Traceback (most recent call last):
>>   File "", line 1, in 
>>   File "rados.pyx", line 895, in rados.Rados.connect
>>   File "rados.pyx", line 474, in rados.make_ex
>> TypeError: InvalidArgumentError does not take keyword arguments
>>
>> Contents of python_ceph.conf:
>>
>> [global]
>> cluster network = 0.0.0.0/0
>> fsid = 518403a0-6b6f-42b8-be99-e58788bee5c2
>> mon host = 
>> mon initial members = 
>> mon_allow_pool_delete = True
>> osd crush chooseleaf type = 0
>> public network = 0.0.0.0/0
>>
>> keyring = /etc/ceph/ceph.client.dms.keyring # Everything works ok if i
>> use client.adming.keyring
>>
>>
>>
>> Output of ceph auth ls
>>
>> ...
>> client.admin
>> key: 
>> caps: [mds] allow *
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>> ...
>> client.dms
>> key: 
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>>
>> ...
>>
>>
>> What's even more odd is that I can use client.dms keyring with the ceph
>> command line program without issues...
>> e.g., "ceph --user dms status" does not result in any errors has the
>> same output as "ceph --user admin status"
>>
>>
>> Does anyone have any thoughts on what could be causing this issue?
>>
>>
>> Thanks,
>> Ben
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Librados Keyring Issues

2018-08-19 Thread Benjamin Cherian
Hi,

I'm trying to write a simple test application using the Python 3 RADOS API.
I've made a separate keyring for my application with the same permissions
as the admin keyring, but I can't seem to use it to connect to my cluster.
The only keyring that seems to work is the client.admin keyring. Does
anyone have any experience with this issue?

Cluster info:
OS: Ubuntu 18.04.1
Ceph: Mimic 13.2.1 (from Ceph repository)

Attempting to connect to the cluster using python3-rados or python-rados
results in the following error:

>>> import rados
>>> cluster = rados.Rados(conffile="python_ceph.conf")
>>> cluster.connect()
Traceback (most recent call last):
  File "", line 1, in 
  File "rados.pyx", line 895, in rados.Rados.connect
  File "rados.pyx", line 474, in rados.make_ex
TypeError: InvalidArgumentError does not take keyword arguments

Contents of python_ceph.conf:

[global]
cluster network = 0.0.0.0/0
fsid = 518403a0-6b6f-42b8-be99-e58788bee5c2
mon host = 
mon initial members = 
mon_allow_pool_delete = True
osd crush chooseleaf type = 0
public network = 0.0.0.0/0

keyring = /etc/ceph/ceph.client.dms.keyring # Everything works ok if i use
client.adming.keyring



Output of ceph auth ls

...
client.admin
key: 
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
...
client.dms
key: 
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *

...


What's even more odd is that I can use client.dms keyring with the ceph
command line program without issues...
e.g., "ceph --user dms status" does not result in any errors has the same
output as "ceph --user admin status"


Does anyone have any thoughts on what could be causing this issue?


Thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to set the DB and WAL partition size in Ceph-Ansible?

2018-08-19 Thread Benjamin Cherian
Hi Cody,

AFAIK, Ceph-ansible will not create separate partitions for the
non-collocated scenario (at least in the stable branches). Given, that
ceph-volume is now the recommended way of creating OSDs, you would want to
create all the logical volumes and volume groups you intend to use for
data, DB and WAL prior to running ceph ansible. I believe there may be some
new stuff just added in master related to LVM management. However, it
appears to be targeted to clusters which still use Filestore journals
instead of Blustore.
Regards,

Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Benjamin Naber
hi wido,

after adding the hosts back to monmap the following error occurs in ceph-mon 
log.

e5 ms_verify_authorizer bad authorizer from mon 10.111.73.3:6789/0

i tried to copy the mon key ring to all other nodes, but porblem still exists.

kind regards

Ben 
> Benjamin Naber  hat am 26. Juli 2018 um 12:29 
> geschrieben: 
> 
> hi Wido, 
> 
> i have now one monitor online. i hve removed the two others from monmap. 
> how can i procedure, to reset that mon hosts and add them as new monitors to 
> the monmap? 
> 
> king regards 
> 
> Ben 
> 
> > Wido den Hollander  hat am 26. Juli 2018 um 11:52 
> > geschrieben: 
> > 
> > 
> > 
> > 
> > On 07/26/2018 11:50 AM, Benjamin Naber wrote: 
> > > hi Wido, 
> > > 
> > > got the folowing outputt since ive changed the debug setting: 
> > > 
> > 
> > This is only debug_ms it seems? 
> > 
> > debug_mon = 10 
> > debug_ms = 10 
> > 
> > Those two shoud be set where debug_mon will tell more about the election 
> > process. 
> > 
> > Wido 
> > 
> > > 2018-07-26 11:46:21.004490 7f819e968700 10 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 
> > > cs=1 l=1)._try_send sent bytes 9 remaining bytes 0 
> > > 2018-07-26 11:46:21.004520 7f81a196e700 10 -- 10.111.73.1:6789/0 
> > > dispatch_throttle_release 60 to dispatch throttler 60/104857600 
> > > 2018-07-26 11:46:23.058057 7f81a4173700 1 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 
> > > cs=1 l=1).mark_down 
> > > 2018-07-26 11:46:23.058084 7f81a4173700 2 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 
> > > cs=1 l=1)._stop 
> > > 2018-07-26 11:46:23.058094 7f81a4173700 10 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 
> > > cs=1 l=1).discard_out_queue started 
> > > 2018-07-26 11:46:23.058120 7f81a4173700 1 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 
> > > cs=1 l=1).mark_down 
> > > 2018-07-26 11:46:23.058131 7f81a4173700 2 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 
> > > cs=1 l=1)._stop 
> > > 2018-07-26 11:46:23.058143 7f81a4173700 10 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 
> > > cs=1 l=1).discard_out_queue started 
> > > 2018-07-26 11:46:23.962796 7f819d966700 10 Processor -- accept 
> > > listen_fd=22 
> > > 2018-07-26 11:46:23.962845 7f819d966700 10 Processor -- accept accepted 
> > > incoming on sd 23 
> > > 2018-07-26 11:46:23.962858 7f819d966700 10 -- 10.111.73.1:6789/0 >> - 
> > > conn(0x55aa46afd800 :-1 s=STATE_NONE pgs=0 cs=0 l=0).accept sd=23 
> > > 2018-07-26 11:46:23.962929 7f819e167700 1 -- 10.111.73.1:6789/0 >> - 
> > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0 
> > > l=0)._process_connection sd=23 - 
> > > 2018-07-26 11:46:23.963022 7f819e167700 10 -- 10.111.73.1:6789/0 >> - 
> > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._try_send 
> > > sent bytes 281 remaining bytes 0 
> > > 2018-07-26 11:46:23.963045 7f819e167700 10 -- 10.111.73.1:6789/0 >> - 
> > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 
> > > l=0)._process_connection write banner and addr done: - 
> > > 2018-07-26 11:46:23.963091 7f819e167700 10 -- 10.111.73.1:6789/0 >> - 
> > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 
> > > l=0)._process_connection accept peer addr is 10.111.73.1:0/1745436331 
> > > 2018-07-26 11:46:23.963190 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 
> > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > > l=1)._process_connection accept of host_type 8, policy.lossy=1 
> > > policy.server=1 policy.standby=0 policy.resetcheck=0 
> > > 2018-07-26 11:46:23.963216 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
> > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 
> > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> > > l=1).handle_connect_msg accept my proto 15, their proto 15 
> > > 2018-07-26 11:46:23.963232 7f819e167700 10 -- 10.111.73.1:6789/0 >>

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Benjamin Naber
hi Wido,

i have now one monitor online. i hve removed the two others from monmap.
how can i procedure, to reset that mon hosts and add them as new monitors to 
the monmap?

king regards

Ben

> Wido den Hollander  hat am 26. Juli 2018 um 11:52 geschrieben:
>
>
>
>
> On 07/26/2018 11:50 AM, Benjamin Naber wrote:
> > hi Wido,
> >
> > got the folowing outputt since ive changed the debug setting:
> >
>
> This is only debug_ms it seems?
>
> debug_mon = 10
> debug_ms = 10
>
> Those two shoud be set where debug_mon will tell more about the election
> process.
>
> Wido
>
> > 2018-07-26 11:46:21.004490 7f819e968700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71
> > cs=1 l=1)._try_send sent bytes 9 remaining bytes 0
> > 2018-07-26 11:46:21.004520 7f81a196e700 10 -- 10.111.73.1:6789/0
> > dispatch_throttle_release 60 to dispatch throttler 60/104857600
> > 2018-07-26 11:46:23.058057 7f81a4173700 1 -- 10.111.73.1:6789/0 >>
> > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77
> > cs=1 l=1).mark_down
> > 2018-07-26 11:46:23.058084 7f81a4173700 2 -- 10.111.73.1:6789/0 >>
> > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77
> > cs=1 l=1)._stop
> > 2018-07-26 11:46:23.058094 7f81a4173700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77
> > cs=1 l=1).discard_out_queue started
> > 2018-07-26 11:46:23.058120 7f81a4173700 1 -- 10.111.73.1:6789/0 >>
> > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71
> > cs=1 l=1).mark_down
> > 2018-07-26 11:46:23.058131 7f81a4173700 2 -- 10.111.73.1:6789/0 >>
> > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71
> > cs=1 l=1)._stop
> > 2018-07-26 11:46:23.058143 7f81a4173700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71
> > cs=1 l=1).discard_out_queue started
> > 2018-07-26 11:46:23.962796 7f819d966700 10 Processor -- accept listen_fd=22
> > 2018-07-26 11:46:23.962845 7f819d966700 10 Processor -- accept accepted
> > incoming on sd 23
> > 2018-07-26 11:46:23.962858 7f819d966700 10 -- 10.111.73.1:6789/0 >> -
> > conn(0x55aa46afd800 :-1 s=STATE_NONE pgs=0 cs=0 l=0).accept sd=23
> > 2018-07-26 11:46:23.962929 7f819e167700 1 -- 10.111.73.1:6789/0 >> -
> > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0
> > l=0)._process_connection sd=23 -
> > 2018-07-26 11:46:23.963022 7f819e167700 10 -- 10.111.73.1:6789/0 >> -
> > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._try_send
> > sent bytes 281 remaining bytes 0
> > 2018-07-26 11:46:23.963045 7f819e167700 10 -- 10.111.73.1:6789/0 >> -
> > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
> > l=0)._process_connection write banner and addr done: -
> > 2018-07-26 11:46:23.963091 7f819e167700 10 -- 10.111.73.1:6789/0 >> -
> > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
> > l=0)._process_connection accept peer addr is 10.111.73.1:0/1745436331
> > 2018-07-26 11:46:23.963190 7f819e167700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > l=1)._process_connection accept of host_type 8, policy.lossy=1
> > policy.server=1 policy.standby=0 policy.resetcheck=0
> > 2018-07-26 11:46:23.963216 7f819e167700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > l=1).handle_connect_msg accept my proto 15, their proto 15
> > 2018-07-26 11:46:23.963232 7f819e167700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > l=1).handle_connect_msg accept setting up session_security.
> > 2018-07-26 11:46:23.963248 7f819e167700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > l=1).handle_connect_msg accept new session
> > 2018-07-26 11:46:23.963256 7f819e167700 10 -- 10.111.73.1:6789/0 >>
> > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=87 cs=1
> > l=1).handle_connect_msg accept success, connect_seq = 1 in_seq=0,
> > sending READY
> > 2018-07-26 11:46:23.963264 7f819e167700 10 -- 10.111.

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Benjamin Naber
15403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN_KEEPALIVE2 
pgs=74 cs=1 l=1)._append_keepalive_or_ack
2018-07-26 11:46:24.004828 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 
s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=74 cs=1 l=1).process wants 60 bytes 
from policy throttler 180/104857600
2018-07-26 11:46:24.004847 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1).process aborted 
= 0
2018-07-26 11:46:24.004873 7f819e167700 5 -- 10.111.73.1:6789/0 >> 
10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1). rx client.? seq 
1 0x55aa46be4fc0 auth(proto 0 30 bytes epoch 0) v1
2018-07-26 11:46:24.004914 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 cs=1 
l=1).handle_write
2018-07-26 11:46:24.004921 7f81a196e700 1 -- 10.111.73.1:6789/0 <== client.? 
10.111.73.3:0/1033315403 1  auth(proto 0 30 bytes epoch 0) v1  60+0+0 
(2547518125 0 0) 0x55aa46be4fc0 con 0x55aa46bc1000
2018-07-26 11:46:24.004954 7f819e167700 10 -- 10.111.73.1:6789/0 >> 
10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 cs=1 
l=1)._try_send sent bytes 9 remaining bytes 0
2018-07-26 11:46:24.004965 7f81a196e700 10 -- 10.111.73.1:6789/0 
dispatch_throttle_release 60 to dispatch throttler 60/104857600

kind regards

ben

> Wido den Hollander  hat am 26. Juli 2018 um 11:07 geschrieben:
>
>
>
>
> On 07/26/2018 10:33 AM, Benjamin Naber wrote:
> > hi Wido,
> >
> > thx for your reply.
> > time is also in sync. i forced time sync again to be sure.
> >
>
> Try setting debug_mon to 10 or even 20 and check the logs about what the
> MONs are saying.
>
> debug_ms = 10 might also help to get some more information about the
> Messenger Traffic.
>
> Wido
>
> > kind regards
> >
> > Ben
> >
> >> Wido den Hollander  hat am 26. Juli 2018 um 10:18
> > geschrieben:
> >>
> >>
> >>
> >>
> >> On 07/26/2018 10:12 AM, Benjamin Naber wrote:
> >> > Hi together,
> >> >
> >> > we currently have some problems with monitor quorum after shutting
> > down all cluster nodes for migration to another location.
> >> >
> >> > mon_status gives uns the following outputt:
> >> >
> >> > {
> >> > "name": "mon01",
> >> > "rank": 0,
> >> > "state": "electing",
> >> > "election_epoch": 20345,
> >> > "quorum": [],
> >> > "features": {
> >> > "required_con": "153140804152475648",
> >> > "required_mon": [
> >> > "kraken",
> >> > "luminous"
> >> > ],
> >> > "quorum_con": "0",
> >> > "quorum_mon": []
> >> > },
> >> > "outside_quorum": [],
> >> > "extra_probe_peers": [],
> >> > "sync_provider": [],
> >> > "monmap": {
> >> > "epoch": 1,
> >> > "fsid": "c1e3c489-67a4-47a2-a3ca-98816d1c9d44",
> >> > "modified": "2018-06-21 13:48:58.796939",
> >> > "created": "2018-06-21 13:48:58.796939",
> >> > "features": {
> >> > "persistent": [
> >> > "kraken",
> >> > "luminous"
> >> > ],
> >> > "optional": []
> >> > },
> >> > "mons": [
> >> > {
> >> > "rank": 0,
> >> > "name": "mon01",
> >> > "addr": "10.111.73.1:6789/0",
> >> > "public_addr": "10.111.73.1:6789/0"
> >> > },
> >> > {
> >> > "rank": 1,
> >> > "name": "mon02",
> >> > "addr": "10.111.73.2:6789/0",
> >> > "public_addr": "10.111.73.2:6789/0"
> >> > },
> >> > {
> >> > "rank": 2,
> >> > "name": "mon03",
> >> > "addr": "10.111.73.3:6789/0",
> >> > "public_addr": "10.111.73.3:6789/0"
> >> > }
> >> > ]
> >> > },
> >> > "feature_map": {
> >> > "mon": {
> >> > "group": {
> >> > "features": "0x3ffddff8eea4fffb",
> >> > "release": "luminous",
> >> > "num": 1
> >> > }
> >> > }
> >> > }
> >> > }
> >> >
> >> > ceph ping mon.id gives us also just dosent work. monitoring nodes
> > have full network connectivity. firewall rules are also ok.
> >> >
> >> > what cloud be the reson for stucking quorum election ?
> >> >
> >>
> >> Is the time in sync between the nodes?
> >>
> >> Wido
> >>
> >> > kind regards
> >> >
> >> > Ben
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Benjamin Naber
hi Wido,

thx for your reply.
time is also in sync. i forced time sync again to be sure.

kind regards

Ben

> Wido den Hollander  hat am 26. Juli 2018 um 10:18 geschrieben:
>
>
>
>
> On 07/26/2018 10:12 AM, Benjamin Naber wrote:
> > Hi together,
> >
> > we currently have some problems with monitor quorum after shutting down all 
> > cluster nodes for migration to another location.
> >
> > mon_status gives uns the following outputt:
> >
> > {
> > "name": "mon01",
> > "rank": 0,
> > "state": "electing",
> > "election_epoch": 20345,
> > "quorum": [],
> > "features": {
> > "required_con": "153140804152475648",
> > "required_mon": [
> > "kraken",
> > "luminous"
> > ],
> > "quorum_con": "0",
> > "quorum_mon": []
> > },
> > "outside_quorum": [],
> > "extra_probe_peers": [],
> > "sync_provider": [],
> > "monmap": {
> > "epoch": 1,
> > "fsid": "c1e3c489-67a4-47a2-a3ca-98816d1c9d44",
> > "modified": "2018-06-21 13:48:58.796939",
> > "created": "2018-06-21 13:48:58.796939",
> > "features": {
> > "persistent": [
> > "kraken",
> > "luminous"
> > ],
> > "optional": []
> > },
> > "mons": [
> > {
> > "rank": 0,
> > "name": "mon01",
> > "addr": "10.111.73.1:6789/0",
> > "public_addr": "10.111.73.1:6789/0"
> > },
> > {
> > "rank": 1,
> > "name": "mon02",
> > "addr": "10.111.73.2:6789/0",
> > "public_addr": "10.111.73.2:6789/0"
> > },
> > {
> > "rank": 2,
> > "name": "mon03",
> > "addr": "10.111.73.3:6789/0",
> > "public_addr": "10.111.73.3:6789/0"
> > }
> > ]
> > },
> > "feature_map": {
> > "mon": {
> > "group": {
> > "features": "0x3ffddff8eea4fffb",
> > "release": "luminous",
> > "num": 1
> > }
> > }
> > }
> > }
> >
> > ceph ping mon.id gives us also just dosent work. monitoring nodes have full 
> > network connectivity. firewall rules are also ok.
> >
> > what cloud be the reson for stucking quorum election ?
> >
>
> Is the time in sync between the nodes?
>
> Wido
>
> > kind regards
> >
> > Ben
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Benjamin Naber
Hi together,

we currently have some problems with monitor quorum after shutting down all 
cluster nodes for migration to another location.

mon_status gives uns the following outputt:

{
 "name": "mon01",
 "rank": 0,
 "state": "electing",
 "election_epoch": 20345,
 "quorum": [],
 "features": {
 "required_con": "153140804152475648",
 "required_mon": [
 "kraken",
 "luminous"
 ],
 "quorum_con": "0",
 "quorum_mon": []
 },
 "outside_quorum": [],
 "extra_probe_peers": [],
 "sync_provider": [],
 "monmap": {
 "epoch": 1,
 "fsid": "c1e3c489-67a4-47a2-a3ca-98816d1c9d44",
 "modified": "2018-06-21 13:48:58.796939",
 "created": "2018-06-21 13:48:58.796939",
 "features": {
 "persistent": [
 "kraken",
 "luminous"
 ],
 "optional": []
 },
 "mons": [
 {
 "rank": 0,
 "name": "mon01",
 "addr": "10.111.73.1:6789/0",
 "public_addr": "10.111.73.1:6789/0"
 },
 {
 "rank": 1,
 "name": "mon02",
 "addr": "10.111.73.2:6789/0",
 "public_addr": "10.111.73.2:6789/0"
 },
 {
 "rank": 2,
 "name": "mon03",
 "addr": "10.111.73.3:6789/0",
 "public_addr": "10.111.73.3:6789/0"
 }
 ]
 },
 "feature_map": {
 "mon": {
 "group": {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 1
 }
 }
 }
}

ceph ping mon.id gives us also just dosent work. monitoring nodes have full 
network connectivity. firewall rules are also ok.

what cloud be the reson for stucking quorum election ?

kind regards

Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests

2018-07-04 Thread Benjamin Naber
hi Caspar,

ty for the reply. ive updatet all SSDs to actual firmware. Still having the 
same error. the strange thing is that this issue switches from node to node and 
from osd to osd.

HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
    1 ops are blocked > 1048.58 sec
    3 ops are blocked > 262.144 sec
    osds 20,21 have blocked requests > 262.144 sec
    osd.12 has blocked requests > 1048.58 sec

also there is no symptom that gives me any idea what could be the problem. 
Syslog etc. also say nothing. 

any other ideas ?

Kind regards and thanks for your help

Ben

> Caspar Smit  hat am 4. Juli 2018 um 11:32 
> geschrieben: 
> 
> Hi Ben,
> 
> At first glance i would say the CPU's are a bit weak for this setup. 
> Recommended is to have at least 1 core per OSD. Since you have 8 cores and 10 
> OSD's there isn't much left for other processes.
> 
> Furthermore, did you upgrade the firmware of those DC S4500's to the latest 
> firmware? ( SCV10142)
> If not, you can upgrade them via the Intel Data Center upgrade tool: 
> https://downloadcenter.intel.com/download/27863?v=t
> 
> Kind regards,
> 
> Caspar 
> 
> 2018-07-04 10:26 GMT+02:00 Benjamin Naber : 
> 
> > Hi @all, 
> > 
> >  im currently in testing for setup an production environment based on the 
> > following OSD Nodes: 
> > 
> >  CEPH Version: luminous 12.2.5 
> > 
> >  5x OSD Nodes with following specs: 
> > 
> >  - 8 Core Intel Xeon 2,0 GHZ 
> > 
> >  - 96GB Ram 
> > 
> >  - 10x 1,92 TB Intel DC S4500 connectet via SATA 
> > 
> >  - 4x 10 Gbit NIC 2 bonded via LACP for Backend Network 2 bonded via LACP 
> > for Backend Network. 
> > 
> >  if i run some fio benchmark via a VM that ist running on a RBD Device on a 
> > KVM testing Host. the cluster always runs into slow request warning. Also 
> > the performance goes heavy down. 
> > 
> >  If i dump the osd that stucks, i get the following output: 
> > 
> >  { 
> >      "ops": [ 
> >      { 
> >      "description": "osd_op(client.141944.0: 359346834 13.1da 
> > 13:5b8b7fd3:::rbd_data. 170a3238e1f29. 00be:head [write 
> > 2097152~1048576] snapc 0=[] ondisk+write+known_if_ redirected e2755)", 
> >      "initiated_at": "2018-07-04 10:00:49.475879", 
> >      "age": 287.180328, 
> >      "duration": 287.180355, 
> >      "type_data": { 
> >      "flag_point": "waiting for sub ops", 
> >      "client_info": { 
> >      "client": "client.141944", 
> >      "client_addr": " 
> > [10.111.90.1:0/3532639465](http://10.111.90.1:0/3532639465)", 
> >      "tid": 359346834 
> >      }, 
> >      "events": [ 
> >      { 
> >      "time": "2018-07-04 10:00:49.475879", 
> >      "event": "initiated" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.476935", 
> >      "event": "queued_for_pg" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.477547", 
> >      "event": "reached_pg" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.477578", 
> >      "event": "started" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.477614", 
> >      "event": "waiting for subops from 5,26" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.484679", 
> >      "event": "op_commit" 
> >      }, 
> >      { 
> >      "time": "2018-07-04 10:00:49.484681", 
> >      "event": "op_applied" 
> >      }, 
> >    

[ceph-users] Slow requests

2018-07-04 Thread Benjamin Naber
Hi @all,

im currently in testing for setup an production environment based on the 
following OSD Nodes:

CEPH Version: luminous 12.2.5

5x OSD Nodes with following specs:

- 8 Core Intel Xeon 2,0 GHZ

- 96GB Ram

- 10x 1,92 TB Intel DC S4500 connectet via SATA

- 4x 10 Gbit NIC 2 bonded via LACP for Backend Network 2 bonded via LACP for 
Backend Network.

if i run some fio benchmark via a VM that ist running on a RBD Device on a KVM 
testing Host. the cluster always runs into slow request warning. Also the 
performance goes heavy down.

If i dump the osd that stucks, i get the following output:

{
    "ops": [
    {
    "description": "osd_op(client.141944.0:359346834 13.1da 
13:5b8b7fd3:::rbd_data.170a3238e1f29.00be:head [write 
2097152~1048576] snapc 0=[] ondisk+write+known_if_redirected e2755)",
    "initiated_at": "2018-07-04 10:00:49.475879",
    "age": 287.180328,
    "duration": 287.180355,
    "type_data": {
    "flag_point": "waiting for sub ops",
    "client_info": {
    "client": "client.141944",
    "client_addr": "10.111.90.1:0/3532639465",
    "tid": 359346834
    },
    "events": [
    {
    "time": "2018-07-04 10:00:49.475879",
    "event": "initiated"
    },
    {
    "time": "2018-07-04 10:00:49.476935",
    "event": "queued_for_pg"
    },
    {
    "time": "2018-07-04 10:00:49.477547",
    "event": "reached_pg"
    },
    {
    "time": "2018-07-04 10:00:49.477578",
    "event": "started"
    },
    {
    "time": "2018-07-04 10:00:49.477614",
    "event": "waiting for subops from 5,26"
    },
    {
    "time": "2018-07-04 10:00:49.484679",
    "event": "op_commit"
    },
    {
    "time": "2018-07-04 10:00:49.484681",
    "event": "op_applied"
    },
    {
    "time": "2018-07-04 10:00:49.485588",
    "event": "sub_op_commit_rec from 5"
    }
    ]
    }
    },
    {
    "description": "osd_op(client.141944.0:359346835 13.1da 
13:5b8b7fd3:::rbd_data.170a3238e1f29.00be:head [write 
3145728~1048576] snapc 0=[] ondisk+write+known_if_redirected e2755)",
    "initiated_at": "2018-07-04 10:00:49.477065",
    "age": 287.179143,
    "duration": 287.179221,
    "type_data": {
    "flag_point": "waiting for sub ops",
    "client_info": {
    "client": "client.141944",
    "client_addr": "10.111.90.1:0/3532639465",
    "tid": 359346835
    },
    "events": [
    {
    "time": "2018-07-04 10:00:49.477065",
    "event": "initiated"
    },
    {
    "time": "2018-07-04 10:00:49.478116",
    "event": "queued_for_pg"
    },
    {
    "time": "2018-07-04 10:00:49.478178",
    "event": "reached_pg"
    },
    {
    "time": "2018-07-04 10:00:49.478201",
    "event": "started"
    },
    {
    "time": "2018-07-04 10:00:49.478232",
    "event": "waiting for subops from 5,26"
    },
    {
    "time": "2018-07-04 10:00:49.484695",
    "event": "op_commit"
    },
    {
    "time": "2018-07-04 10:00:49.484696",
    "event": "op_applied"
    },
    {
    "time": "2018-07-04 10:00:49.485621",
    "event": "sub_op_commit_rec from 5"
    }
    ]
    }
    },
    {
    "description": "osd_op(client.141944.0:359346440 13.11d 
13:b8afbe4a:::rbd_data.170a3238e1f29.005c:head [write 0~1048576] 
snapc 0=[] ondisk+write+known_if_redirected e2755)",
    "initiated_at": "2018-07-04 10:00:49.091127",
    "age": 287.565080,
    "duration": 287.565196,
    "type_data": {
    "flag_point": "waiting for sub ops",
    "client_info": {
    "client": 

Re: [ceph-users] RGW bucket sharding in Jewel

2018-06-19 Thread Matt Benjamin
The increased time to list sharded buckets is currently expected, yes.
In turn other operations such as put and delete should be faster in
proportion to two factors, the number of shards on independent PGs
(serialization by PG), and the spread of shards onto independent OSD
devices (speedup from scaling onto more OSD devices, presuming
available iops on those devices).

New bucket index formats are coming in future to help listing
workloads.  Also, as of recent master (and probably Jewel and Luminous
at this point, modulo some latency for the backports) we have added an
"allow-unordered" option to S3 and Swift listing arguments that should
remove the penalty from sharding.  This causes results to be returned
in partial order, rather than the total order most applications
expect.

Matt

On Tue, Jun 19, 2018 at 9:34 AM, Matthew Vernon  wrote:
> Hi,
>
> Some of our users have Quite Large buckets (up to 20M objects in a
> bucket), and AIUI best practice would be to have sharded indexes for
> those buckets (of the order of 1 shard per 100k objects).
>
> On a trivial test case (make a 1M-object bucket, shard index to 10
> shards, s3cmd ls s3://bucket >/dev/null), sharding makes the bucket
> listing slower (not a lot, but a bit).
>
> Are there simple(ish) workflows we could use to demonstrate an
> improvement from index sharding?
>
> Thanks,
>
> Matthew
>
> [I understand that Luminous has dynamic resharding, but it seems a bit
> unstable for production use; is that still the case?]
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS-ganesha with RGW

2018-05-30 Thread Matt Benjamin
Hi Josef,

The main thing to make sure is that you have set up the host/vm
running nfs-ganesha exactly as if it were going to run radosgw.  For
example, you need an appropriate keyring and ceph config.  If radosgw
starts and services requests, nfs-ganesha should too.

With the debug settings you've described, you should be able to see a
bunch of output when you run ganesha.nfsd with -F.  You should see the
FSAL starting up with lots of debug output.

Matt

On Wed, May 30, 2018 at 8:19 AM, Josef Zelenka
 wrote:
> Hi, thanks for the quick reply. As for 1. I mentioned that i'm running
> ubuntu 16.04, kernel 4.4.0-121 - as it seems the platform
> package(nfs-ganesha-ceph) does not include the rgw fsal.
>
> 2. Nfsd was running - after rebooting i managed to get ganesha to bind,
> rpcbind is running, though i still can't mount the rgw due to timeouts. I
> suspect my conf might be wrong, but i'm not sure how to make sure it is.
> I've set up my ganesha.conf with the FSAL and RGW block - do i need anything
> else?
>
> EXPORT
> {
>  Export_ID=1;
>  Path = "/";
>  Pseudo = "/";
>  Access_Type = RW;
>  SecType = "sys";
>  NFS_Protocols = 4;
>  Transport_Protocols = TCP;
>
>  # optional, permit unsquashed access by client "root" user
>  #Squash = No_Root_Squash;
>
> FSAL {
>  Name = RGW;
>  User_Id =  key/secret>;
>  Access_Key_Id = "";
>  Secret_Access_Key = "";
>  }
>
> RGW {
> cluster = "ceph";
> name = "client.radosgw.radosgw-s2";
> ceph_conf = "/etc/ceph/ceph.conf";
> init_args = "-d --debug-rgw=16";
> }
> }
> Josef
>
>
>
>
>
> On 30/05/18 13:18, Matt Benjamin wrote:
>>
>> Hi Josef,
>>
>> 1. You do need the Ganesha fsal driver to be present;  I don't know
>> your platform and os version, so I couldn't look up what packages you
>> might need to install (or if the platform package does not build the
>> RGW fsal)
>> 2. The most common reason for ganesha.nfsd to fail to bind to a port
>> is that a Linux kernel nfsd is already running--can you make sure
>> that's not the case;  meanwhile you -do- need rpcbind to be running
>>
>> Matt
>>
>> On Wed, May 30, 2018 at 6:03 AM, Josef Zelenka
>>  wrote:
>>>
>>> Hi everyone, i'm currently trying to set up a NFS-ganesha instance that
>>> mounts a RGW storage, however i'm not succesful in this. I'm running Ceph
>>> Luminous 12.2.4 and ubuntu 16.04. I tried compiling ganesha from
>>> source(latest version), however i didn't manage to get the mount running
>>> with that, as ganesha refused to bind to the ipv6 interface - i assume
>>> this
>>> is a ganesha issue, but i didn't find any relevant info on what might
>>> cause
>>> this - my network setup should allow for that. Then i installed
>>> ganesha-2.6
>>> from the official repos, set up the config for RGW as per the official
>>> howto
>>> http://docs.ceph.com/docs/master/radosgw/nfs/, but i'm getting:
>>> Could not dlopen module:/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so
>>> Error:/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so: cannot open shared
>>> object file: No such file or directory
>>> and lo and behold, the libfsalrgw.so isn't present in the folder. I
>>> installed the nfs-ganesha and nfs-ganesha-fsal packages. I tried googling
>>> around, but i didn't find any relevant info or walkthroughs for this
>>> setup,
>>> so i'm asking - was anyone succesful in setting this up? I can see that
>>> even
>>> the redhat solution is still in progress, so i'm not sure if this even
>>> works. Thanks for any help,
>>>
>>> Josef
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS-ganesha with RGW

2018-05-30 Thread Matt Benjamin
Hi Josef,

1. You do need the Ganesha fsal driver to be present;  I don't know
your platform and os version, so I couldn't look up what packages you
might need to install (or if the platform package does not build the
RGW fsal)
2. The most common reason for ganesha.nfsd to fail to bind to a port
is that a Linux kernel nfsd is already running--can you make sure
that's not the case;  meanwhile you -do- need rpcbind to be running

Matt

On Wed, May 30, 2018 at 6:03 AM, Josef Zelenka
 wrote:
> Hi everyone, i'm currently trying to set up a NFS-ganesha instance that
> mounts a RGW storage, however i'm not succesful in this. I'm running Ceph
> Luminous 12.2.4 and ubuntu 16.04. I tried compiling ganesha from
> source(latest version), however i didn't manage to get the mount running
> with that, as ganesha refused to bind to the ipv6 interface - i assume this
> is a ganesha issue, but i didn't find any relevant info on what might cause
> this - my network setup should allow for that. Then i installed ganesha-2.6
> from the official repos, set up the config for RGW as per the official howto
> http://docs.ceph.com/docs/master/radosgw/nfs/, but i'm getting:
> Could not dlopen module:/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so
> Error:/usr/lib/x86_64-linux-gnu/ganesha/libfsalrgw.so: cannot open shared
> object file: No such file or directory
> and lo and behold, the libfsalrgw.so isn't present in the folder. I
> installed the nfs-ganesha and nfs-ganesha-fsal packages. I tried googling
> around, but i didn't find any relevant info or walkthroughs for this setup,
> so i'm asking - was anyone succesful in setting this up? I can see that even
> the redhat solution is still in progress, so i'm not sure if this even
> works. Thanks for any help,
>
> Josef
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-04 Thread Matt Benjamin
Hi Dan,

We agreed in upstream RGW to make this change.  Do you intend to
submit this as a PR?

regards

Matt

On Fri, May 4, 2018 at 10:57 AM, Dan van der Ster <d...@vanderster.com> wrote:
> Hi Valery,
>
> Did you eventually find a workaround for this? I *think* we'd also
> prefer rgw to fallback to external plugins, rather than checking them
> before local. But I never understood the reasoning behind the change
> from jewel to luminous.
>
> I saw that there is work towards a cache for ldap [1] and I assume a
> similar approach would be useful for keystone as well.
>
> In the meantime, would a patch like [2] work?
>
> Cheers, Dan
>
> [1] https://github.com/ceph/ceph/pull/20624
>
> [2] diff --git a/src/rgw/rgw_auth_s3.h b/src/rgw/rgw_auth_s3.h
> index 6bcdebaf1c..3c343adf66 100644
> --- a/src/rgw/rgw_auth_s3.h
> +++ b/src/rgw/rgw_auth_s3.h
> @@ -129,20 +129,17 @@ public:
>add_engine(Control::SUFFICIENT, anonymous_engine);
>  }
>
> +/* The local auth. */
> +if (cct->_conf->rgw_s3_auth_use_rados) {
> +  add_engine(Control::SUFFICIENT, local_engine);
> +}
> +
>  /* The external auth. */
>  Control local_engine_mode;
>  if (! external_engines.is_empty()) {
>add_engine(Control::SUFFICIENT, external_engines);
> -
> -  local_engine_mode = Control::FALLBACK;
> -} else {
> -  local_engine_mode = Control::SUFFICIENT;
>  }
>
> -/* The local auth. */
> -if (cct->_conf->rgw_s3_auth_use_rados) {
> -  add_engine(local_engine_mode, local_engine);
> -}
>}
>
>const char* get_name() const noexcept override {
>
>
> On Thu, Feb 1, 2018 at 4:44 PM, Valery Tschopp <valery.tsch...@switch.ch> 
> wrote:
>> Hi,
>>
>> We are operating a Luminous 12.2.2 radosgw, with the S3 Keystone
>> authentication enabled.
>>
>> Some customers are uploading millions of objects per bucket at once,
>> therefore the radosgw is doing millions of s3tokens POST requests to the
>> Keystone. All those s3tokens requests to Keystone are the same (same
>> customer, same EC2 credentials). But because there is no cache in radosgw
>> for the EC2 credentials, every incoming S3 operation generates a call to the
>> external auth Keystone. It can generate hundreds of s3tokens requests per
>> second to Keystone.
>>
>> We had already this problem with Jewel, but we implemented a workaround. The
>> EC2 credentials of the customer were added directly in the local auth engine
>> of radosgw. So for this particular heavy user, the radosgw local
>> authentication was checked first, and no external auth request to Keystone
>> was necessary.
>>
>> But the default behavior for the S3 authentication have change in Luminous.
>>
>> In Luminous, if you enable the S3 Keystone authentication, every incoming S3
>> operation will first check for anonymous authentication, then external
>> authentication (Keystone and/or LDAP), and only then local authentication.
>> See https://github.com/ceph/ceph/blob/master/src/rgw/rgw_auth_s3.h#L113-L141
>>
>> Is there a way to get the old authentication behavior (anonymous -> local ->
>> external) to work again?
>>
>> Or is it possible to implement a caching mechanism (similar to the Token
>> cache) for the EC2 credentials?
>>
>> Cheers,
>> Valery
>>
>> --
>> SWITCH
>> Valéry Tschopp, Software Engineer
>> Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
>> email: valery.tsch...@switch.ch phone: +41 44 268 1544
>>
>> 30 years of pioneering the Swiss Internet.
>> Celebrate with us at https://swit.ch/30years
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW GC Processing Stuck

2018-04-24 Thread Matt Benjamin
process: removing
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_12
>
>
>
> We seem completely unable to get this deleted, and nothing else of immediate
> concern is flagging up as a potential cause of all RGWs become unresponsive
> at the same time. On the bucket containing this object (the one we
> originally tried to purge), I have attempted a further purge passing the
> “—bypass-gc” parameter to it, but this also resulted in all rgws becoming
> unresponsive within 30 minutes and so I terminated the operation and
> restarted the rgws again.
>
>
>
> The bucket we attempted to remove has no shards and I have attached the
> details below. 90% of the contents of the bucket have already been
> successfully removed to our knowledge, and the bucket had no sharding (old
> bucket, sharding is now active for new buckets).
>
>
>
> root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket stats
> --bucket=
>
> {
>
> "bucket": "",
>
> "pool": ".rgw.buckets",
>
> "index_pool": ".rgw.buckets.index",
>
> "id": "default.290071.4",
>
> "marker": "default.290071.4",
>
> "owner": "yy",
>
> "ver": "0#107938549",
>
> "master_ver": "0#0",
>
> "mtime": "2014-10-24 14:58:48.955805",
>
> "max_marker": "0#",
>
> "usage": {
>
> "rgw.none": {
>
> "size_kb": 0,
>
> "size_kb_actual": 0,
>
> "num_objects": 0
>
> },
>
> "rgw.main": {
>
> "size_kb": 186685939,
>
> "size_kb_actual": 189914068,
>
> "num_objects": 1419528
>
> },
>
> "rgw.multimeta": {
>
> "size_kb": 0,
>
> "size_kb_actual": 0,
>
> "num_objects": 24
>
> }
>
> },
>
> "bucket_quota": {
>
> "enabled": false,
>
> "max_size_kb": -1,
>
> "max_objects": -1
>
> }
>
> }
>
>
>
> If anyone has any thoughts, they’d be greatly appreciated!
>
>
>
> Kind Regards,
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixing bad radosgw index

2018-04-23 Thread Matt Benjamin
Mimic (and higher) contain a new async gc mechanism, which should
handle this workload internally.

Matt

On Mon, Apr 23, 2018 at 2:55 PM, David Turner <drakonst...@gmail.com> wrote:
> When figuring out why space is not freeing up after deleting buckets and
> objects in RGW, look towards the RGW Garbage Collection.  This has come up
> on the ML several times in the past.  I am almost finished catching up on a
> GC of 200 Million objects that was taking up a substantial amount of space
> in my cluster.  I did this by running about 30 screens with the command
> `while true; do radosgw-admin gc process; sleep 10; done` in each of them.
> It appears that there are 32 available sockets for the gc to be processed
> and this helped us catch up on 200M objects in under 2 months.
>
> On Mon, Apr 16, 2018 at 12:01 PM Robert Stanford <rstanford8...@gmail.com>
> wrote:
>>
>>
>>  This doesn't work for me:
>>
>> for i in `radosgw-admin bucket list`; do radosgw-admin bucket unlink
>> --bucket=$i --uid=myuser; done   (tried with and without '=')
>>
>>  Errors for each bucket:
>>
>> failure: (2) No such file or directory2018-04-16 15:37:54.022423
>> 7f7c250fbc80  0 could not get bucket info for bucket="bucket5",
>>
>> On Mon, Apr 16, 2018 at 8:30 AM, Casey Bodley <cbod...@redhat.com> wrote:
>>>
>>>
>>>
>>> On 04/14/2018 12:54 PM, Robert Stanford wrote:
>>>
>>>
>>>  I deleted my default.rgw.buckets.data and default.rgw.buckets.index
>>> pools in an attempt to clean them out.  I brought this up on the list and
>>> received replies telling me essentially, "You shouldn't do that."  There was
>>> however no helpful advice on recovering.
>>>
>>>  When I run 'radosgw-admin bucket list' I get a list of all my old
>>> buckets (I thought they'd be cleaned out when I deleted and recreated
>>> default.rgw.buckets.index, but I was wrong.)  Deleting them with s3cmd and
>>> radosgw-admin does nothing; they still appear (though s3cmd will give a
>>> '404' error.)  Running radosgw-admin with 'bucket check' and '--fix' does
>>> nothing as well.  So, how do I get myself out of this mess.
>>>
>>>  On another, semi-related note, I've been deleting (existing) buckets and
>>> their contents with s3cmd (and --recursive); the space is never freed from
>>> ceph and the bucket still appears in s3cmd ls.  Looks like my radosgw has
>>> several issues, maybe all related to deleting and recreating the pools.
>>>
>>>  Thanks
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> The 'bucket list' command takes a user and prints the list of buckets
>>> they own - this list is read from the user object itself. You can remove
>>> these entries with the 'bucket unlink' command.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [rgw] civetweb behind haproxy doesn't work with absolute URI

2018-03-31 Thread Matt Benjamin
I think if you haven't defined it in the Ceph config, it's disabled?

Matt

On Sat, Mar 31, 2018 at 4:59 PM, Rudenko Aleksandr <arude...@croc.ru> wrote:
> Hi, Sean.
>
> Thank you for the reply.
>
> What does it mean: “We had to disable "rgw dns name" in the end”?
>
> "rgw_dns_name": “”, has no effect for me.
>
>
>
> On 29 Mar 2018, at 11:23, Sean Purdy <s.pu...@cv-library.co.uk> wrote:
>
> We had something similar recently.  We had to disable "rgw dns name" in the
> end
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Getting a public file from radosgw

2018-03-28 Thread Matt Benjamin
niqueid
>>> <https://radosgw.example.com/uniqueid>
>>>  ___
>>>  ceph-users mailing list
>>>  ceph-users@lists.ceph.com
>>>      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ganesha-rgw export with LDAP auth

2018-03-09 Thread Matt Benjamin
Hi Benjeman,

It is -intended- to work, identically to the standalone radosgw
server.  I can try to verify whether there could be a bug affecting
this path.

Matt

On Fri, Mar 9, 2018 at 12:01 PM, Benjeman Meekhof <bmeek...@umich.edu> wrote:
> I'm having issues exporting a radosgw bucket if the configured user is
> authenticated using the rgw ldap connectors.  I've verified that this
> same ldap token works ok for other clients, and as I'll note below it
> seems like the rgw instance is contacting the LDAP server and
> successfully authenticating the user.  Details:
>
> Ganesha export:
>  FSAL {
> Name = RGW;
> User_Id = "";
>
> Access_Key_Id =
> "eyJSR1dfVE9LRU4iOnsidmVyc2lvbiI6MSwidHlwZSI6ImxkYXAiLCJpZCI6ImJtZWVraG9mX29zaXJpc2FkbWluIiwia2V$
>
> # Secret_Access_Key =
> "eyJSR1dfVE9LRU4iOnsidmVyc2lvbiI6MSwidHlwZSI6ImxkYXAiLCJpZCI6ImJtZWVraG9mX29zaXJpc2FkbWluI$
> # Secret_Access_Key = "weW\/XGiHfcVhtH3chUTyoF+uz9Ldz3Hz";
>
> }
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Civetweb log format

2018-03-08 Thread Matt Benjamin
Hi Yehuda,

I did add support for logging arbitrary headers, but not a
configurable log record a-la webservers.  To level set, David, are you
speaking about a file or pipe log sync on the RGW host?

Matt

On Thu, Mar 8, 2018 at 7:55 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com> wrote:
> On Thu, Mar 8, 2018 at 2:22 PM, David Turner <drakonst...@gmail.com> wrote:
>> I remember some time ago Yehuda had commented on a thread like this saying
>> that it would make sense to add a logging/auditing feature like this to RGW.
>> I haven't heard much about it since then, though.  Yehuda, do you remember
>> that and/or think that logging like this might become viable.
>
> I vaguely remember Matt was working on this. Matt?
>
> Yehuda
>
>>
>>
>> On Thu, Mar 8, 2018 at 4:17 PM Aaron Bassett <aaron.bass...@nantomics.com>
>> wrote:
>>>
>>> Yea thats what I was afraid of. I'm looking at possibly patching to add
>>> it, but i really dont want to support my own builds. I suppose other
>>> alternatives are to use proxies to log stuff, but that makes me sad.
>>>
>>> Aaron
>>>
>>>
>>> On Mar 8, 2018, at 12:36 PM, David Turner <drakonst...@gmail.com> wrote:
>>>
>>> Setting radosgw debug logging to 10/10 is the only way I've been able to
>>> get the access key in the logs for requests.  It's very unfortunate as it
>>> DRASTICALLY increases the amount of log per request, but it's what we needed
>>> to do to be able to have the access key in the logs along with the request.
>>>
>>> On Tue, Mar 6, 2018 at 3:09 PM Aaron Bassett <aaron.bass...@nantomics.com>
>>> wrote:
>>>>
>>>> Hey all,
>>>> I'm trying to get something of an audit log out of radosgw. To that end I
>>>> was wondering if theres a mechanism to customize the log format of 
>>>> civetweb.
>>>> It's already writing IP, HTTP Verb, path, response and time, but I'm hoping
>>>> to get it to print the Authorization header of the request, which 
>>>> containers
>>>> the access key id which we can tie back into the systems we use to issue
>>>> credentials. Any thoughts?
>>>>
>>>> Thanks,
>>>> Aaron
>>>> CONFIDENTIALITY NOTICE
>>>> This e-mail message and any attachments are only for the use of the
>>>> intended recipient and may contain information that is privileged,
>>>> confidential or exempt from disclosure under applicable law. If you are not
>>>> the intended recipient, any disclosure, distribution or other use of this
>>>> e-mail message or attachments is prohibited. If you have received this
>>>> e-mail message in error, please delete and notify the sender immediately.
>>>> Thank you.
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] swift capabilities support in radosgw

2018-01-26 Thread Matt Benjamin
Hi Syed,

RGW supports Swift /info in Luminous.

By default iirc those aren't at the root of the URL hierarchy, but
there's an option to change that, since last year, see
https://github.com/ceph/ceph/pull/10280.

Matt

On Fri, Jan 26, 2018 at 5:10 AM, Syed Armani <syed.arm...@hastexo.com> wrote:
> Hello folks,
>
>
> I am getting this error "Capabilities GET failed: https://SWIFT:8080/info 404 
> Not Found",
> when executing a "$ swift capabilities" command against a radosgw cluster.
>
>
> I was wondering whether radosgw supports the listing of activated 
> capabilities[0] via Swift API?
> Something a user can see with "$ swift capabilities" in a native swift 
> cluster.
>
>
> [0] 
> https://developer.openstack.org/api-ref/object-store/index.html#list-activated-capabilities
>
> Thanks!
>
> Cheers,
> Syed
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't make LDAP work

2018-01-26 Thread Matt Benjamin
Hi Theofilos,

I'm not sure what's going wrong offhand, I see all the pieces in your writeup.

The first thing I would verify is that "CN=cephs3,OU=Users,OU=Organic
Units,DC=example,DC=com" see the users in
ldaps://ldap.example.com:636, and that "cn=myuser..." can itself
simple bind using standard tools.

What Ceph version are you running?

Matt

On Fri, Jan 26, 2018 at 5:27 AM, Theofilos Mouratidis
<mtheofi...@gmail.com> wrote:
> They gave me a ldap server working with users inside, and I want to create
> tokens for these users
>  to use s3 from their ldap credentials.
> I tried using the sanity check and I got this one working:
>
> ldapsearch -x -D "CN=cephs3,OU=Users,OU=Organic Units,DC=example,DC=com" -W
> -H ldaps://ldap.example.com:636 -b 'OU=Users,OU=Organic
> Units,DC=example,DC=com' 'cn=*' dn
>
> My config is like this:
> [global]
> rgw_ldap_binddn = "CN=cephs3,OU=Users,OU=Organic Units,DC=example,DC=com"
> rgw_ldap_dnattr = "cn"
> rgw_ldap_searchdn = "OU=Users,OU=Organic Units,DC=example,DC=com"
> rgw_ldap_secret = "plaintext_pass"
> rgw_ldap_uri = ldaps://ldap.example.com:636
> rgw_s3_auth_use_ldap = true
>
> I create my token to test the ldap feature:
>
> export RGW_ACCESS_KEY_ID="myuser" #where "dn: cn=myuser..." is in
> ldap.example.com
> export RGW_SECRET_ACCESS_KEY="mypass"
> radosgw-token --encode --ttype=ad
> abcad=
> radosgw-token --encode --ttype=ldap
> abcldap=
>
> Now I go to s3cmd and in config I have something like this:
> acess_key = abcad=
> secret_key =
> use_https = false
> host_base = ceph_rgw.example.com:8080
> host_bucket = ceph_rgw.example.com:8080
>
>
> I get access denied,
> then I try with the ldap key and I get the same problem.
> I created a local user out of curiosity and I put in s3cmd acess and secret
> and I could create a bucket. What am I doing wrong?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mapping data and metadata between rados and cephfs

2017-06-28 Thread Matt Benjamin
Hi,

That's true, sure.  We hope to support async mounts and more normal workflows 
in future, but those are important caveats.  Editing objects in place doesn't 
work with RGW NFS.

Matt

- Original Message -
> From: "Gregory Farnum" <gfar...@redhat.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>, "David Turner" 
> <drakonst...@gmail.com>
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, June 28, 2017 4:14:39 PM
> Subject: Re: [ceph-users] Mapping data and metadata between rados and cephfs
> 
> On Wed, Jun 28, 2017 at 2:10 PM Matt Benjamin <mbenja...@redhat.com> wrote:
> 
> > Hi,
> >
> > A supported way to access S3 objects from a filesystem mount is with RGW
> > NFS.  That is, RGW now exports the S3 namespace directly as files and
> > directories, one consumer is an nfs-ganesha NFS driver.
> >
> 
> This supports a very specific subset of use cases/fs operations though,
> right? You can use it if you're just doing bulk file shuffling but it's not
> a way to upload via S3 and then perform filesystem update-in-place
> operations in any reasonable fashion (which is what I think was described
> in the original query).
> -Greg
> 
> 
> >
> > Regards,
> >
> > Matt
> >
> > - Original Message -
> > > From: "David Turner" <drakonst...@gmail.com>
> > > To: "Jonathan Lefman" <jonathan.lef...@intel.com>,
> > ceph-users@lists.ceph.com
> > > Sent: Wednesday, June 28, 2017 2:59:12 PM
> > > Subject: Re: [ceph-users] Mapping data and metadata between rados and
> > cephfs
> > >
> > > CephFS is very different from RGW. You may be able to utilize s3fs-fuse
> > to
> > > interface with RGW, but I haven't heard of anyone using that on the ML
> > > before.
> > >
> > > On Wed, Jun 28, 2017 at 2:57 PM Lefman, Jonathan <
> > jonathan.lef...@intel.com
> > > > wrote:
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the prompt reply. I was hoping that there would be an s3fs (
> > > https://github.com/s3fs-fuse/s3fs-fuse ) equivalent for Ceph since
> > there are
> > > numerous functional similarities. Ideally one would be able to upload
> > data
> > > to a bucket and have the file synced to the local filesystem mount of
> > that
> > > bucket. This is similar to the idea of uploading data through RadosGW and
> > > have the data be available in CephFS.
> > >
> > >
> > >
> > > -Jon
> > >
> > >
> > >
> > > From: David Turner [mailto: drakonst...@gmail.com ]
> > > Sent: Wednesday, June 28, 2017 2:51 PM
> > >
> > >
> > >
> > > To: Lefman, Jonathan < jonathan.lef...@intel.com >;
> > ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Mapping data and metadata between rados and
> > cephfs
> > >
> > >
> > >
> > >
> > >
> > > CephFS and RGW store data differently. I have never heard of, nor do I
> > > believe that it's possible, to have CephFS and RGW sharing the same data
> > > pool.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jun 28, 2017 at 2:48 PM Lefman, Jonathan <
> > jonathan.lef...@intel.com
> > > > wrote:
> > >
> > >
> > >
> > >
> > >
> > > Yes, sorry. I meant the RadosGW. I still do not know what the mechanism
> > is to
> > > enable the mapping between data inserted by the rados component and the
> > > cephfs component. I hope that makes sense.
> > >
> > >
> > >
> > > -Jon
> > >
> > >
> > >
> > > From: David Turner [mailto: drakonst...@gmail.com ]
> > > Sent: Wednesday, June 28, 2017 2:46 PM
> > > To: Lefman, Jonathan < jonathan.lef...@intel.com >;
> > ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Mapping data and metadata between rados and
> > cephfs
> > >
> > >
> > >
> > >
> > >
> > > You want to access the same data via a rados API and via cephfs? Are you
> > > thinking RadosGW?
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jun 28, 2017 at 1:54 PM Lefman, Jonathan <
> > jonathan.lef...@intel.com
> > > > wrote:
> > >
> > >
> > >
> > >
> > >
> &

Re: [ceph-users] Mapping data and metadata between rados and cephfs

2017-06-28 Thread Matt Benjamin
Hi,

A supported way to access S3 objects from a filesystem mount is with RGW NFS.  
That is, RGW now exports the S3 namespace directly as files and directories, 
one consumer is an nfs-ganesha NFS driver.

Regards,

Matt

- Original Message -
> From: "David Turner" <drakonst...@gmail.com>
> To: "Jonathan Lefman" <jonathan.lef...@intel.com>, ceph-users@lists.ceph.com
> Sent: Wednesday, June 28, 2017 2:59:12 PM
> Subject: Re: [ceph-users] Mapping data and metadata between rados and cephfs
> 
> CephFS is very different from RGW. You may be able to utilize s3fs-fuse to
> interface with RGW, but I haven't heard of anyone using that on the ML
> before.
> 
> On Wed, Jun 28, 2017 at 2:57 PM Lefman, Jonathan < jonathan.lef...@intel.com
> > wrote:
> 
> 
> 
> 
> 
> Thanks for the prompt reply. I was hoping that there would be an s3fs (
> https://github.com/s3fs-fuse/s3fs-fuse ) equivalent for Ceph since there are
> numerous functional similarities. Ideally one would be able to upload data
> to a bucket and have the file synced to the local filesystem mount of that
> bucket. This is similar to the idea of uploading data through RadosGW and
> have the data be available in CephFS.
> 
> 
> 
> -Jon
> 
> 
> 
> From: David Turner [mailto: drakonst...@gmail.com ]
> Sent: Wednesday, June 28, 2017 2:51 PM
> 
> 
> 
> To: Lefman, Jonathan < jonathan.lef...@intel.com >; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Mapping data and metadata between rados and cephfs
> 
> 
> 
> 
> 
> CephFS and RGW store data differently. I have never heard of, nor do I
> believe that it's possible, to have CephFS and RGW sharing the same data
> pool.
> 
> 
> 
> 
> 
> On Wed, Jun 28, 2017 at 2:48 PM Lefman, Jonathan < jonathan.lef...@intel.com
> > wrote:
> 
> 
> 
> 
> 
> Yes, sorry. I meant the RadosGW. I still do not know what the mechanism is to
> enable the mapping between data inserted by the rados component and the
> cephfs component. I hope that makes sense.
> 
> 
> 
> -Jon
> 
> 
> 
> From: David Turner [mailto: drakonst...@gmail.com ]
> Sent: Wednesday, June 28, 2017 2:46 PM
> To: Lefman, Jonathan < jonathan.lef...@intel.com >; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Mapping data and metadata between rados and cephfs
> 
> 
> 
> 
> 
> You want to access the same data via a rados API and via cephfs? Are you
> thinking RadosGW?
> 
> 
> 
> 
> 
> On Wed, Jun 28, 2017 at 1:54 PM Lefman, Jonathan < jonathan.lef...@intel.com
> > wrote:
> 
> 
> 
> 
> 
> Hi all,
> 
> 
> 
> I would like to create a 1-to-1 mapping between rados and cephfs. Here's the
> usage scenario:
> 
> 
> 
> 1. Upload file via rest api through rados compatible APIs
> 
> 2. Run "local" operations on the file delivered via rados on the linked
> cephfs mount
> 
> 3. Retrieve/download file via rados API on newly created data available on
> the cephfs mount
> 
> 
> 
> I would like to know whether this is possible out-of-the-box; this will never
> work; or this may work with a bit of effort. If this is possible, can this
> be achieved in a scalable manner to accommodate multiple (10s to 100s) users
> on the same system?
> 
> 
> 
> I asked this question in #ceph and #ceph-devel. So far, there have not been
> replies with a way to accomplish this. Thank you.
> 
> 
> 
> -Jon
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Matt Benjamin
Hi,

- Original Message -
> From: "Yehuda Sadeh-Weinraub" <ysade...@redhat.com>
> To: "Sage Weil" <sw...@redhat.com>
> Cc: "Gregory Farnum" <gfar...@redhat.com>, "Jason Dillaman" 
> <dilla...@redhat.com>, "Piotr Dałek"
> <piotr.da...@corp.ovh.com>, "ceph-devel" <ceph-de...@vger.kernel.org>, 
> "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Thursday, January 12, 2017 3:22:06 PM
> Subject: Re: [ceph-users] Any librados C API users out there?
> 
> On Thu, Jan 12, 2017 at 12:08 PM, Sage Weil <sw...@redhat.com> wrote:
> > On Thu, 12 Jan 2017, Gregory Farnum wrote:
> >> On Thu, Jan 12, 2017 at 5:54 AM, Jason Dillaman <jdill...@redhat.com>
> >> wrote:
> >> > There is option (3) which is to have a new (or modified)
> >> > "buffer::create_static" take an optional callback to invoke when the
> >> > buffer::raw object is destructed. The raw pointer would be destructed
> >> > when the last buffer::ptr / buffer::list containing it is destructed,
> >> > so you know it's no longer being referenced.
> >> >
> >> > You could then have the new C API methods that wrap the C buffer in a
> >> > bufferlist and set a new flag in the librados::AioCompletion to delay
> >> > its completion until after it's both completed and the memory is
> >> > released. When the buffer is freed, the callback would unblock the
> >> > librados::AioCompltion completion callback.
> >>
> >> I much prefer an approach like this: it's zero-copy; it's not a lot of
> >> user overhead; but it requires them to explicitly pass memory off to
> >> Ceph and keep it immutable until Ceph is done (at which point they are
> >> told so explicitly).
> >
> > Yeah, this is simpler.  I still feel like we should provide a way to
> > revoke buffers, though, because otherwise it's possible for calls to block
> > semi-indefinitey if, say, an old MOSDOp is quueed for another OSD and that
> > OSD is not reading data off the socket but has not failed (e.g., due to
> > it's rx throttling).
> >
> 
> We need to provide some way to cancel requests (at least from the
> client's aspect), that would guarantee that buffers are not going to
> be used (and no completion callback is going to be called).

is the client/consumer cancellation async wrt completion?  a cancellation in 
that case could ensure that, if it succeeds, those guarantees are met, or else 
fails (because the callback and completion have raced cancellation)?

Matt

> 
> Yehuda
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help on RGW NFS function

2016-09-21 Thread Matt Benjamin
Hi,

- Original Message -
> From: "yiming xie" <plato...@gmail.com>
> To: ceph-users@lists.ceph.com
> Sent: Wednesday, September 21, 2016 3:53:35 AM
> Subject: [ceph-users] Help on RGW NFS function
> 
> Hi,
> I have some question about rgw nfs.
> 
> ceph release notes: You can now access radosgw buckets via NFS
> (experimental).
> In addition to the sentence, ceph documents does not do any explanation
> I don't understand the experimental implications.
> 
> 1. RGW nfs functional integrity of it? If nfs function is not complete, which
> features missing?

NFSv4 only initially (but NFS3 support just added on master).  The I/O model is 
simplified.  Objects in RGW cannot be mutated in place, and the NFS client 
always overwrites.  Clients are currently expected to write sequentually from 
offset 0--on Linux, you should mount with -osync.

The RGW/S3 namespace is an emulation of a posix one using substring search, so 
we impose some limitations.  You cannot move directories, is one.  There are 
likely be published limits on bucket/object listing.

Some bugfixes are still in backport to Jewel.  That release supports NFSv4 and 
not NFS3.

> 2. How stable is the RGW nfs?

Some features are still being backported to Jewel.  I've submitted one 
important bugfix on master this week.  We are aiming for "general usability" in 
over the next 1-2 months (NFSv4).

> 3. RGW nfs latest version can be used in a production environment yet?

If you're conservative, it's probably not "ready."  Now would be a good time to 
experiment with the feature and see whether it is potentially useful to you.

Matt

> 
> Please reply to my question as soon as possible. Very grateful, thank you!
> 
> plato.xie
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOSGW and LDAP

2016-09-16 Thread Matt Benjamin
Hi Brian,

This issue is fixed upstream in commit 08d54291435e.  It looks like this did 
not make it to Jewel, we're prioritizing this, and will follow up when this and 
any related LDAP and NFS commits make it there.

Thanks for bringing this to our attention!

Matt

- Original Message -
> From: "Brian Contractor Andrus" <bdand...@nps.edu>
> To: ceph-users@lists.ceph.com
> Sent: Thursday, September 15, 2016 12:56:29 PM
> Subject: [ceph-users] RADOSGW and LDAP
> 
> 
> 
> All,
> 
> I have been making some progress on troubleshooting this.
> 
> I am seeing that when rgw is configured for LDAP, I am getting an error in my
> slapd log:
> 
> 
> 
> Sep 14 06:56:21 mgmt1 slapd[23696]: conn=1762 op=0 RESULT tag=97 err=2
> text=historical protocol version requested, use LDAPv3 instead
> 
> 
> 
> Am I correct with an interpretation that rgw does not do LDAPv3?
> Is there a way to enable this, or must I allow older versions in my OpenLDAP
> configuration?
> 
> 
> 
> Brian Andrus
> 
> ITACS/Research Computing
> 
> Naval Postgraduate School
> 
> Monterey, California
> 
> voice: 831-656-6238
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any Docs to configure NFS to access RADOSGW buckets on Jewel

2016-04-27 Thread Matt Benjamin
Hi WD,

No, it's not the same.  The new mechanism uses an nfs-ganesha server to export 
the RGW namespace.  Some upstream documentation will be forthcoming...

Regards,

Matt

- Original Message -
> From: "WD Hwang" <wd_hw...@wistron.com>
> To: "a jazdzewski" <a.jazdzew...@googlemail.com>
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, April 27, 2016 5:03:12 AM
> Subject: Re: [ceph-users] Any Docs to configure NFS to access RADOSGW buckets 
> on Jewel
> 
> Hi Ansgar,
>   Thanks for your information.
>   I have tried 's3fs-fuse' to mount RADOSGW buckets on Ubuntu client node. It
>   works.
>   But I am not sure this is the technique that access RADOSGW buckets via NFS
>   on Jewel.
> 
> Best Regards,
> WD
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Ansgar Jazdzewski
> Sent: Wednesday, April 27, 2016 4:32 PM
> To: WD Hwang/WHQ/Wistron
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Any Docs to configure NFS to access RADOSGW buckets
> on Jewel
> 
> all informations i have so far are from the FOSDEM
> 
> https://fosdem.org/2016/schedule/event/virt_iaas_ceph_rados_gateway_overview/attachments/audio/1079/export/events/attachments/virt_iaas_ceph_rados_gateway_overview/audio/1079/Fosdem_RGW.pdf
> 
> Cheers,
> Ansgar
> 
> 2016-04-27 2:28 GMT+02:00  <wd_hw...@wistron.com>:
> > Hello:
> >
> >   Are there any documents or examples to explain the configuration of
> > NFS to access RADOSGW buckets on Jewel?
> >
> > Thanks a lot.
> >
> >
> >
> > Best Regards,
> >
> > WD
> >
> > --
> > --
> > ---
> >
> > This email contains confidential or legally privileged information and
> > is for the sole use of its intended recipient.
> >
> > Any unauthorized review, use, copying or distribution of this email or
> > the content of this email is strictly prohibited.
> >
> > If you are not the intended recipient, you may reply to the sender and
> > should delete this e-mail immediately.
> >
> > --
> > --
> > ---
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About the NFS on RGW

2016-03-22 Thread Matt Benjamin
Hi Xusangdi,

NFS on RGW is not intended as an alternative to CephFS.  The basic idea is to 
expose the S3 namespace using Amazon's prefix+delimiter convention (delimiter 
currently limited to '/').  We use opens for atomicity, which implies NFSv4 (or 
4.1).  In addition to limitations by design, there are some limitations in 
Jewel.  For example, clients should use (or emulate) sync mount behavior.  
Also, I/O is proxied--that restriction should be lifted in future releases.  
I'll post here when we have some usage documentation ready.

Matt

- Original Message -
> From: "Xusangdi" <xu.san...@h3c.com>
> To: mbenja...@redhat.com, ceph-us...@ceph.com
> Cc: ceph-de...@vger.kernel.org
> Sent: Tuesday, March 22, 2016 8:12:41 AM
> Subject: About the NFS on RGW
> 
> Hi Matt & Cephers,
> 
> I am looking for advise on setting up a file system based on Ceph. As CephFS
> is not yet productive ready(or I missed some breakthroughs?), the new NFS on
> RadosGW should be a promising alternative, especially for large files, which
> is what we are most interested in. However, after searching around the Ceph
> documentation (http://docs.ceph.com/docs/master/) and recent community
> mails, I cannot find much information about it. Could you please provide
> some introduction about the new NFS, and (if possible) a raw way to try it?
> Thank you!
> 
> Regards,
> ---Sandy
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C,
> which is
> intended only for the person or entity whose address is listed above. Any use
> of the
> information contained herein in any way (including, but not limited to, total
> or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Hey there,

I've set up a small VirtualBox cluster of Ceph VMs. I have one
ceph-admin0 node, and three ceph0,ceph1,ceph2 nodes for a total of 4.

I've been following this guide:
http://ceph.com/docs/master/start/quick-ceph-deploy/ to the letter.

At the end of the guide, it calls for you to run ceph health... this is
what happens when I do.

HEALTH_ERR 64 pgs stale; 64 pgs stuck stale; 2 full osd(s); 2/2 in osds
are down

Additionally I would like to build and run Calamari to have an overview of
the cluster once it's up and running. I followed all the directions here:
http://calamari.readthedocs.org/en/latest/development/building_packages.html

but the calamari-client package refuses to properly build under
trusty-package for some reason. This is the output at the end of salt-call:

Summary

Succeeded: 3 (changed=4)
Failed:3

Here is the full (verbose!) output: http://pastebin.com/WJwCxxxK

The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
disk. They have between 10% and 30% disk utilization but common between all
of them is that they *have free disk space* meaning I have no idea what the
heck is causing Ceph to complain.

Help? :(

~ Benjamin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
- B
On Dec 15, 2014 5:06 PM, Craig Lewis cle...@centraldesktop.com wrote:


 On Sun, Dec 14, 2014 at 6:31 PM, Benjamin zor...@gmail.com wrote:

 The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
 disk. They have between 10% and 30% disk utilization but common between all
 of them is that they *have free disk space* meaning I have no idea what
 the heck is causing Ceph to complain.


 Each OSD is 8GB?  You need to make them at least 10 GB.

 Ceph weights each disk as it's size in TiB, and it truncates to two
 decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
 10 GiB, and it'll get a weight of 0.01.

 You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.

 If that doesn't fix the problem, go ahead and post the things Udo
 mentioned.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
I increased the OSDs to 10.5GB each and now I have a different issue...

cephy@ceph-admin0:~/ceph-cluster$ echo {Test-data}  testfile.txt
cephy@ceph-admin0:~/ceph-cluster$ rados put test-object-1 testfile.txt
--pool=data
error opening pool data: (2) No such file or directory
cephy@ceph-admin0:~/ceph-cluster$ ceph osd lspools
0 rbd,

Here's ceph -w:
cephy@ceph-admin0:~/ceph-cluster$ ceph -w
cluster b3e15af-SNIP
 health HEALTH_WARN mon.ceph0 low disk space; mon.ceph1 low disk space;
mon.ceph2 low disk space; clock skew detected on mon.ceph0, mon.ceph1,
mon.ceph2
 monmap e3: 4 mons at {ceph-admin0=
10.0.1.10:6789/0,ceph0=10.0.1.11:6789/0,ceph1=10.0.1.12:6789/0,ceph2=10.0.1.13:6789/0},
election epoch 10, quorum 0,1,2,3 ceph-admin0,ceph0,ceph1,ceph2
 osdmap e17: 3 osds: 3 up, 3 in
  pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
19781 MB used, 7050 MB / 28339 MB avail
  64 active+clean

Any other commands to run that would be helpful? Is it safe to simply
manually create the data and metadata pools myself?

On Mon, Dec 15, 2014 at 5:07 PM, Benjamin zor...@gmail.com wrote:

 Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
 - B
 On Dec 15, 2014 5:06 PM, Craig Lewis cle...@centraldesktop.com wrote:


 On Sun, Dec 14, 2014 at 6:31 PM, Benjamin zor...@gmail.com wrote:

 The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
 disk. They have between 10% and 30% disk utilization but common between all
 of them is that they *have free disk space* meaning I have no idea what
 the heck is causing Ceph to complain.


 Each OSD is 8GB?  You need to make them at least 10 GB.

 Ceph weights each disk as it's size in TiB, and it truncates to two
 decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
 10 GiB, and it'll get a weight of 0.01.

 You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.

 If that doesn't fix the problem, go ahead and post the things Udo
 mentioned.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Hi Udo,

Thanks! Creating the MDS did not add a data and metadata pool for me but I
was able to simply create them myself.

The tutorials also suggest you make new pools, cephfs_data and
cephfs_metadata - would simply using data and metadata work better?

- B

On Mon, Dec 15, 2014, 10:37 PM Udo Lembke ulem...@polarzone.de wrote:

  Hi,
 see here:
 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg15546.html

 Udo


 On 16.12.2014 05:39, Benjamin wrote:

 I increased the OSDs to 10.5GB each and now I have a different issue...

 cephy@ceph-admin0:~/ceph-cluster$ echo {Test-data}  testfile.txt
 cephy@ceph-admin0:~/ceph-cluster$ rados put test-object-1 testfile.txt
 --pool=data
 error opening pool data: (2) No such file or directory
 cephy@ceph-admin0:~/ceph-cluster$ ceph osd lspools
 0 rbd,

  Here's ceph -w:
 cephy@ceph-admin0:~/ceph-cluster$ ceph -w
 cluster b3e15af-SNIP
  health HEALTH_WARN mon.ceph0 low disk space; mon.ceph1 low disk
 space; mon.ceph2 low disk space; clock skew detected on mon.ceph0,
 mon.ceph1, mon.ceph2
  monmap e3: 4 mons at {ceph-admin0=
 10.0.1.10:6789/0,ceph0=10.0.1.11:6789/0,ceph1=10.0.1.12:6789/0,ceph2=10.0.1.13:6789/0},
 election epoch 10, quorum 0,1,2,3 ceph-admin0,ceph0,ceph1,ceph2
  osdmap e17: 3 osds: 3 up, 3 in
   pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
 19781 MB used, 7050 MB / 28339 MB avail
   64 active+clean

  Any other commands to run that would be helpful? Is it safe to simply
 manually create the data and metadata pools myself?

 On Mon, Dec 15, 2014 at 5:07 PM, Benjamin zor...@gmail.com wrote:

 Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
 - B
  On Dec 15, 2014 5:06 PM, Craig Lewis cle...@centraldesktop.com
 wrote:


 On Sun, Dec 14, 2014 at 6:31 PM, Benjamin zor...@gmail.com wrote:

 The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
 disk. They have between 10% and 30% disk utilization but common between all
 of them is that they *have free disk space* meaning I have no idea
 what the heck is causing Ceph to complain.


 Each OSD is 8GB?  You need to make them at least 10 GB.

  Ceph weights each disk as it's size in TiB, and it truncates to two
 decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
 10 GiB, and it'll get a weight of 0.01.

  You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.

  If that doesn't fix the problem, go ahead and post the things Udo
 mentioned.



 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-08-29 Thread Matt W. Benjamin
Hi Mark,

Yeah.  The application defines portals which are active threaded, then the 
transport layer is servicing the portals with EPOLL.

Matt

- Mark Nelson mark.nel...@inktank.com wrote:

 Excellent, I've been meaning to check into how the TCP transport is 
 going.  Are you using a hybrid threadpool/epoll approach?  That I 
 suspect would be very effective at reducing context switching, 
 especially compared to what we do now.
 
 Mark
 
 On 08/28/2014 10:40 PM, Matt W. Benjamin wrote:
  Hi,
 
  There's also an early-stage TCP transport implementation for
 Accelio, also EPOLL-based.  (We haven't attempted to run Ceph
 protocols over it yet, to my knowledge, but it should be
 straightforward.)
 
  Regards,
 
  Matt
 
  - Haomai Wang haomaiw...@gmail.com wrote:
 
  Hi Roy,
 
 
  As for messenger level, I have some very early works on
  it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains
 a
  new messenger implementation which support different event
 mechanism.
  It looks like at least one more week to make it work.
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-08-28 Thread Matt W. Benjamin
Hi,

There's also an early-stage TCP transport implementation for Accelio, also 
EPOLL-based.  (We haven't attempted to run Ceph protocols over it yet, to my 
knowledge, but it should be straightforward.)

Regards,

Matt

- Haomai Wang haomaiw...@gmail.com wrote:

 Hi Roy,
 
 
 As for messenger level, I have some very early works on
 it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a
 new messenger implementation which support different event mechanism.
 It looks like at least one more week to make it work.
 


-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Prioritize Heartbeat packets

2014-08-27 Thread Matt W. Benjamin

- Sage Weil sw...@redhat.com wrote:
 
 What would be best way for us to mark which sockets are heartbeat
 related?  
 Is there some setsockopt() type call we should be using, or should we
 
 perhaps use a different port range for heartbeat traffic?

Would be be plausible to have hb messengers identify themselves to a bus as 
such,
that external tools (here, the ts scripts) could introspect? 

Matt

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Using large SSD cache tier instead of SSD journals?

2014-07-08 Thread Somhegyi Benjamin
Hi,

Consider a ceph cluster with one IO intensive pool (e.g. VM storage) plus a few 
not-so IO intensive.
I'm thinking of whether it makes sense to use the available SSDs in the cluster 
nodes (1 SSD for 4 HDDs) as part of a writeback cache pool in front of the IO 
intensive pool, instead of using them as journal SSDs? With this method, the 
OSD journals would be co-located on the HDDs or the SSD:HDD ratio could be 
reduced from 1:4 to something like 1:10.
The write operations would still hit SSDs first (though latency would increase 
compared to writing to dedicated SSD partitions local to the server), and as 
far as I understand the cache flush operations are happening in a coalesced 
fashion.
Plus a definite advantage would be that besides functioning as a 'write log' 
(aka. journal), the SSDs would be serving as a read cache for hot data.

What do you think?

Cheers,
Benjamin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using large SSD cache tier instead of SSD journals?

2014-07-08 Thread Somhegyi Benjamin
Hi James,
  
Yes, I've checked bcache, but as far as I can tell you need to manually 
configure and register the backing devices and attach them to the cache device, 
which is not really suitable to dynamic environment (like RBD devices for cloud 
VMs).

Benjamin


 -Original Message-
 From: James Harper [mailto:ja...@ejbdigital.com.au]
 Sent: Tuesday, July 08, 2014 10:17 AM
 To: Somhegyi Benjamin; ceph-users@lists.ceph.com
 Subject: RE: Using large SSD cache tier instead of SSD journals?
 
 Have you considered bcache? It's in the kernel since 3.10 I think.
 
 It would be interesting to see comparisons between no ssd, journal on
 ssd, and bcache with ssd (with journal on same fs as osd)
 
 James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Using large SSD cache tier instead of SSD journals?

2014-07-08 Thread Somhegyi Benjamin
Hi Arne and James,

Ah, I misunderstood James' suggestion. Using bcache w/ SSDs can be another 
viable alternative to SSD journal partitions indeed.
I think ultimately I will need to test the options since very few people have 
experience with cache tiering or bcache.

Thanks,
Benjamin

From: Arne Wiebalck [mailto:arne.wieba...@cern.ch]
Sent: Tuesday, July 08, 2014 11:27 AM
To: Somhegyi Benjamin
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Using large SSD cache tier instead of SSD journals?

Hi Benjamin,

Unless I misunderstood, I think the suggestion was to use bcache devices on the 
OSDs
(not on the clients), so what you use it for in the end doesn't really matter.

The setup of bcache devices is pretty similar to a mkfs and once set up, bcache 
devices
come up and can be mounted as any other device.

Cheers,
 Arne

--
Arne Wiebalck
CERN IT

On 08 Jul 2014, at 11:01, Somhegyi Benjamin 
somhegyi.benja...@wigner.mta.humailto:somhegyi.benja...@wigner.mta.hu wrote:


Hi James,

Yes, I've checked bcache, but as far as I can tell you need to manually 
configure and register the backing devices and attach them to the cache device, 
which is not really suitable to dynamic environment (like RBD devices for cloud 
VMs).

Benjamin



-Original Message-
From: James Harper [mailto:ja...@ejbdigital.com.au]
Sent: Tuesday, July 08, 2014 10:17 AM
To: Somhegyi Benjamin; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: RE: Using large SSD cache tier instead of SSD journals?

Have you considered bcache? It's in the kernel since 3.10 I think.

It would be interesting to see comparisons between no ssd, journal on
ssd, and bcache with ssd (with journal on same fs as osd)

James
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw ERROR: can't get key: ret=-2

2014-06-29 Thread Benjamin Lynch
Thanks JC.

- Ben

On Fri, Jun 27, 2014 at 5:05 PM, Jean-Charles LOPEZ
jc.lo...@inktank.com wrote:
 Hi Benjamin,

 code extract

 sync_all_users() erroring is the sync of user stats

 /*
 * thread, full sync all users stats periodically
 *
 * only sync non idle users or ones that never got synced before, this is 
 needed so that
 * users that didn't have quota turned on before (or existed before the user 
 objclass
 * tracked stats) need to get their backend stats up to date.
 */

 Nothing to really worry if it is a brand new fresh install as there is 
 nothing to synchronize in terms of stat.

 JC



 On Jun 27, 2014, at 12:35, Benjamin Lynch bly...@umn.edu wrote:

 Hello Ceph users,

 Has anyone seen a radosgw error like this:

 2014-06-27 14:02:39.254210 7f06b11587c0  0 ceph version 0.80.1
 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 15471
 2014-06-27 14:02:39.341198 7f06955ea700  0 ERROR: can't get key: ret=-2
 2014-06-27 14:02:39.341212 7f06955ea700  0 ERROR: sync_all_users()
 returned ret=-2

 This is a new install of radosgw.  It created the default pools after
 I started it up, so I would assume the client keyring it set
 correctly. However, it's having trouble getting a key of sort (from
 the error message).  Any idea which key it's looking for?

 Thanks.

 - Ben
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw ERROR: can't get key: ret=-2

2014-06-27 Thread Benjamin Lynch
Hello Ceph users,

Has anyone seen a radosgw error like this:

2014-06-27 14:02:39.254210 7f06b11587c0  0 ceph version 0.80.1
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 15471
2014-06-27 14:02:39.341198 7f06955ea700  0 ERROR: can't get key: ret=-2
2014-06-27 14:02:39.341212 7f06955ea700  0 ERROR: sync_all_users()
returned ret=-2

This is a new install of radosgw.  It created the default pools after
I started it up, so I would assume the client keyring it set
correctly. However, it's having trouble getting a key of sort (from
the error message).  Any idea which key it's looking for?

Thanks.

- Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD server alternatives to choose

2014-06-03 Thread Benjamin Somhegyi
Hi,

We are at the end of the process of designing and purchasing storage to provide 
Ceph based backend for VM images, VM boot (ephemeral) disks, persistent volumes 
(and possibly object storage) for our future Openstack cloud. We considered 
many options and we chose to prefer commodity storage server vendors over 
'brand vendors'. There are 3 (+1 extra) types of storage server options we 
consider at the moment:

1.) 2U, 12x 3.5 bay storage server, 24 pieces
  - Dual Intel E5-2620v2, 128GB RAM
 - Integrated LSI 2308 controller, cabled with 4-lane iPass-to-iPass cable to 
backplane (single LSI expander chip)
 - 2x Intel DC S3700 200GB SSDs for journal, 10x4TB Seagate Constellation ES.3
 - 2x10GbE connectivity (in LAG, client and cluster network in separate VLANs)

2.) 3U, 16x 3.5 bay storage server, 18 pieces
 - Dual Intel E5-2630v2, 128GB RAM
  - Integrated LSI 2308 controller, cabled with 4-lane iPass-to-iPass cable to 
backplane (single LSI expander chip)
  - 3x Intel DC S3700 200GB SSDs for journal, 13x4TB Seagate Constellation ES.3
 - 2x10GbE connectivity
 - Possibly use 1 out of 13 bays with 400 GB SSD in cache tier in front of RDB 
pools.
3.)  4U 24x 3.5 bay, 12 pieces
  - Dual Intel E5-2630v2, 256GB RAM
 - Integrated LSI 2308 controller, single 4-lane iPass can be a limiting factor 
here with SSD journals!
 - 4x journal SSD, 20x4TB HDD
  - 2x10GbE for client, 2x10GbE for replication
 - Possibly use 1 out of 20 bays with 400 GB SSD in cache tier in front of RDB 
pools.

3.extra) 8 pieces
 - 3U 36x 3.5 bay, supermicro recommended: 
http://www.supermicro.com/products/nfo/storage_ceph.cfm
  - Dual Intel E5-2630v2, 256GB RAM
 - 24 front disks served by onboard LSI 2308, 12 rear served by HBA with LSI 
2308
 - 30 HDDs (24 in front, 6 in rear) + 6 SSDs for journal
 - 2x10GbE for client, 2x10GbE for replication
  - Possibly use 1-2 out of 30 bays with 400 GB SSD in cache tier in front of 
RDB pools.

So, the question is: which one would you prefer? Of course the best would be 
1.) in terms of performance and reliability but we'd rather avoid that if 
possible due to budget constraints (48x Intel CPU is pricy). Or maybe do you 
have alternative suggestions for this cluster size?
Many thanks for the tips!

Cheers,
Ben

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD server alternatives to choose

2014-06-03 Thread Benjamin Somhegyi
Hello Robert  Christian,

First, thank you for the general considerations, 3 and 3.extra has been ruled 
out. 


 A simple way to make 1) and 2) cheaper is to use AMD CPUs, they will do
 just fine at half the price with these loads.
 If you're that tight on budget, 64GB RAM will do fine, too.
 
 I assume you're committed to 10GbE in your environment, at least when it
 comes to the public side.
 I have found Infiniband cheaper (especially when it comes to switches)
 and faster that 10GbE.
 

We decided to go with 10GbE on the storage side to consolidate the 10GbE 
external network connectivity requirement with the storage networking, and not 
use two separate technologies/switches/NICs in the compute and storage nodes.

 Looking purely at bandwidth (sequential writes), your proposals are all
 underpowered when it comes to the ratio of SSD journals to HDDs and the
 available network bandwidth.
 For example with 1) you have up to 2GB/s of inbound writes from the
 network and about 1.7GB/s worth on your HDDs, but just 700GB/s on your
 SSDs.
 Even if you're more interested in IOPS (as you probably should), it
 feels like a waste.
 2) with 4 SSDs (or bigger ones that are faster) would make a decent
 storage node it my book.

This is a very good point that I totally overlooked. I concentrated more on the 
IOPS alignment plus write durability, and forgot to check the sequential write 
bandwidth.
The 400GB Intel S3700 is a lot more faster but double the price (around $950) 
compared to the 200GB. Maybe I would be better off using enterprise SLC SSDs 
for journals? 
For example OCZ Deneva 2 C 60GB SLC costs around $640, and have 75K write IOPS 
and ~510MB/s write bandwidth by spec.


Cheers,
Benjamin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD server alternatives to choose

2014-06-03 Thread Benjamin Somhegyi
  This is a very good point that I totally overlooked. I concentrated
  more on the IOPS alignment plus write durability, and forgot to check
  the sequential write bandwidth. The 400GB Intel S3700 is a lot more
  faster but double the price (around $950) compared to the 200GB.
 Indeed, thus my suggestion of 4 200GBs (at about 1.4GB/s).
 Still not a total match, but a lot closer. Also gives you a nice 1:3
 ratio of SSDs to HDDs, meaning that the load and thus endurance is
 balanced.
 With uneven numbers of HDDs one of your journal SSDs will wear
 noticeably earlier than the others.
 A dead SSD will also just bring down 3 OSDs in that case (of course the
 HDDS are much more likely to fail, but a number to remember).
 

Thanks, that 1:3 ration with 200GB SSDs may still fit into our budget. Also, 
good point on the unbalanced journal partitions.

 There's one thing I forgot to mention which makes 2) a bit inferior to
 1) and that is density, as in 3U cases are less dense than 2U or 4U
 ones.
 For example 12U of 2) will give you 64 drive bays instead of 72 for 1).
 
  Maybe I would
  be better off using enterprise SLC SSDs for journals? For example OCZ
  Deneva 2 C 60GB SLC costs around $640, and have 75K write IOPS and
  ~510MB/s write bandwidth by spec.
 
 The fact that they don't have power-loss protection will result in loud
 cries of woe and doom in this ML. ^o^
 

Woa, i didn't know that! It would be funny to loose entire Ceph cluster data in 
case of power cut due to corruption of the majority of journal filesystems..


 As they don't give that data on their homepage, I would try to find
 benchmarks that include what its latency and variances are, the DC 3700s
 deliver their IOPS without any stutters.
 

The eMLC version of the OCZ Deneva 2 didn't perform that well during stress 
test, the actual results were much below the expected:
http://www.storagereview.com/ocz_deneva_2_enterprise_ssd_review



Regards,
Benjamin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Infiniband: was: Red Hat to acquire Inktank

2014-05-01 Thread Matt W. Benjamin
Hi,

The XioMessenger work provides native support for Infiniband, if I understand
you correctly.

Early testing is now possible, on a codebase pulled up to Firefly.  See 
discussion
from earlier this week.

Regards,

Matt

- Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote:

 2014-04-30 14:18 GMT+02:00 Sage Weil s...@inktank.com:
  Today we are announcing some very big news: Red Hat is acquiring
 Inktank.
 
 Great news.
 Any changes to get native Infiniband support in ceph like in GlusterFS
 ?
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Red Hat to acquire Inktank

2014-05-01 Thread Matt W. Benjamin
Hi,

Sure, that's planned for integration in Giant (see Blueprints).

Matt

- Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote:

 2014-05-01 0:11 GMT+02:00 Mark Nelson inktank.com:
  Usable is such a vague word.  I imagine it's testable after a
 fashion. :D
 
 Ok but I prefere an official support with IB integrated in main ceph
 repo
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Red Hat to acquire Inktank

2014-05-01 Thread Matt W. Benjamin
Hi,

I should have been careful.  Our efforts are aimed at Giant.  We're
serious about meeting delivery targets.  There's lots of shakedown, and
of course further integration work, still to go.

Regards,

Matt

- Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote:

 2014-05-01 0:20 GMT+02:00 Matt W. Benjamin m...@linuxbox.com:
  Hi,
 
  Sure, that's planned for integration in Giant (see Blueprints).
 
 Great. Any ETA? Firefly was planned for February :)

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com