from:"Yehuda Sadeh\\\\\\\-Weinraub"

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: B L super.itera...@gmail.com
 To: ceph-users@lists.ceph.com
 Sent: Friday, February 13, 2015 11:55:22 PM
 Subject: [ceph-users] Having problem to start Radosgw
 
 Hi all,
 
 I’m having a problem to start radosgw, giving me error that I can’t diagnose:
 
 $ radosgw -c ceph.conf -d
 
 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609
 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0)
 AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
 bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17)
 File exists
 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi
 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi
 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9
 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects
 pool_iterate returned r=-2
 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2
 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned
 ret=-2
 ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm
 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120
 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down
 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown
 
 
 since it complains that this file exists:
 /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this
 error:
 
 $ radosgw -c ceph.conf -d
 
 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741
 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi
 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi
 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9

Error 9 is EBADF (bad file number). Looks like there's an issue with the socket 
created for the fastcgi communication. How did you configure it?

Yehuda

 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects
 pool_iterate returned r=-2
 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2
 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned
 ret=-2
 
 
 Cant somebody help me where to start fixing this?
 
 Thanks!
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: B L super.itera...@gmail.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Saturday, February 14, 2015 11:03:42 AM
 Subject: Re: [ceph-users] Having problem to start Radosgw

 Hello Yehyda,

 The strace command you referred to me, shows this:
 https://gist.github.com/anonymous/8e9f1ced485996a263bb

 Additionally, I traced this log file:
 /var/log/radosgw/ceph-client.radosgw.gateway

 it has the following:

 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using
 default settings.
 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477
 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider
 (RADOS)
 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using
 default settings.
 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509
 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider
 (RADOS)
 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using
 default settings.
 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989
 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider
 (RADOS)
 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using
 default settings.
 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033
 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider
 (RADOS)
 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using
 default settings.
 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7
 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086
 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider
 (RADOS)

 Turns out to be that the last line of the logs is thrown out by this piece of
 code in rgw_main.cc:

 …
 …

 FCGX_Init();

 RGWStoreManager store_manager;

 if (!store_manager.init(rados, g_ceph_context)) {
 derr  Couldn't init storage provider (RADOS)  dendl;
 return EIO;
 }

 RGWProcess process(g_ceph_context, 20);

 process.run();

 return 0;

 N.B. you can find it in:(
 http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc
 ) , 10th line from below.

 Is that by any means related to the problem?

Not related. This actually means that it couldn't connect to the rados backend, 
so there's a different issue now. The strace log doesn't provide much with 
regard to the original issue as it didn't get to that part now. You can try 
bumping up the debug level (debug rgw = 20, debug ms = 1). I assume that the 
issue that you're seeing is that the wrong rados user and/or wrong cephx keys 
are being used. Try to run it again as you do usually, and see what the regular 
params that are being passed when starting radosgw; use these when running the 
strace command. 

Yehuda 

  On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
  wrote:
 

  sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow files

2015-03-18 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Ben b@benjackson.email
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Craig Lewis cle...@centraldesktop.com, ceph-users 
 ceph-us...@ceph.com
 Sent: Tuesday, March 17, 2015 7:28:28 PM
 Subject: Re: [ceph-users] Shadow files

 None of this helps with trying to remove defunct shadow files which
 number in the 10s of millions.

Did it at least reflect that the garbage collection system works?

 Is there a quick way to see which shadow files are safe to delete
 easily?

There's no easy process. If you know that a lot of the removed data is on 
buckets that shouldn't exist anymore then you could start by trying to identify 
that. You could do that by:

$ radosgw-admin metadata list bucket

then, for each bucket:

$ radosgw-admin metadata get bucket:bucket name

This will give you the bucket markers of all existing buckets. Each data object 
(head and shadow objects) is prefixed by bucket markers. Objects that don't 
have valid bucket markers can be removed. Note that I would first list all 
objects, then get the list of valid bucket markers, as the operation is racy 
and new buckets can be created in the mean time.

We did discuss a new garbage cleanup tool that will address your specific 
issue, and we have a design for it, but it's not there yet.

Yehuda

 Remembering that there are MILLIONS of objects.

 We have a 320TB cluster which is 272TB full. Of this, we should only
 actually be seeing 190TB. There is 80TB of shadow files that should no
 longer exist.

 On 2015-03-18 02:00, Yehuda Sadeh-Weinraub wrote:
  - Original Message -
  From: Ben b@benjackson.email
  To: Craig Lewis cle...@centraldesktop.com
  Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com, ceph-users
  ceph-us...@ceph.com
  Sent: Monday, March 16, 2015 3:38:42 PM
  Subject: Re: [ceph-users] Shadow files

  Thats the thing. The peaks and troughs are in USERS BUCKETS only.
  The actual cluster usage does not go up and down, it just goes up up
  up.

  I would expect to see peaks and troughs much the same as the user
  buckets peaks and troughs on the overall cluster disk usage.
  But this is not the case.

  We upgraded the cluster and radosgws to GIANT (0.87.1) yesterday, and
  now we are seeing a large number of misplaced(??) objects being moved
  around.
  Does this mean it has found all the shadow files that shouldn't exist
  anymore, and is deleting them? If so I would expect to start seeing
  overall cluster usage drop, but this hasn't happened yet.

  No, I don't think so. Sounds like your cluster is recovering, and it
  happens in a completely different layer.

  Any ideas?

  try running:
  $ radosgw-admin gc list --include-all

  This should be showing all the shadow objects that are pending for
  delete. Note that if you have a non-default radosgw configuration,
  make sure you run radosgw-admin using the same user and config that
  radosgw is running (e.g., add -n client.user appropriately),
  otherwise it might not look at the correct zone data.
  You could create an object, identify the shadow objects for that
  object, remove it, check to see that the gc list command shows these
  shadow objects. Then, wait the configured time (2 hours?), and see if
  it was removed.

  Yehuda

  On 2015-03-17 06:12, Craig Lewis wrote:
   Out of curiousity, what's the frequency of the peaks and troughs?

   RadosGW has configs on how long it should wait after deleting before
   garbage collecting, how long between GC runs, and how many objects it
   can GC in per run.

   The defaults are 2 hours, 1 hour, and 32 respectively.  Search
   http://docs.ceph.com/docs/master/radosgw/config-ref/ [2] for rgw gc.

   If your peaks and troughs have a frequency less than 1 hour, then GC
   is going to delay and alias the disk usage w.r.t. the object count.

   If you have millions of objects, you probably need to tweak those
   values.  If RGW is only GCing 32 objects an hour, it's never going to
   catch up.

   Now that I think about it, I bet I'm having issues here too.  I delete
   more than (32*24) objects per day...

   On Sun, Mar 15, 2015 at 4:41 PM, Ben b@benjackson.email wrote:

   It is either a problem with CEPH, Civetweb or something else in our
   configuration.
   But deletes in user buckets is still leaving a high number of old
   shadow files. Since we have millions and millions of objects, it is
   hard to reconcile what should and shouldnt exist.

   Looking at our cluster usage, there are no troughs, it is just a
   rising peak.
   But when looking at users data usage, we can see peaks and troughs
   as you would expect as data is deleted and added.

   Our ceph version 0.80.9

   Please ideas?

   On 2015-03-13 02:25, Yehuda Sadeh-Weinraub wrote:

   - Original Message -
   From: Ben b@benjackson.email
   To: ceph-us...@ceph.com
   Sent: Wednesday, March 11, 2015 8:46:25 PM
   Subject: Re: [ceph-users] Shadow files

   Anyone

Re: [ceph-users] RadosGW Direct Upload Limitation

2015-03-16 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Craig Lewis cle...@centraldesktop.com
 To: Gregory Farnum g...@gregs42.com
 Cc: ceph-users@lists.ceph.com
 Sent: Monday, March 16, 2015 11:48:15 AM
 Subject: Re: [ceph-users] RadosGW Direct Upload Limitation

 Maybe, but I'm not sure if Yehuda would want to take it upstream or
 not. This limit is present because it's part of the S3 spec. For
 larger objects you should use multi-part upload, which can get much
 bigger.
 -Greg

 Note that the multi-part upload has a lower limit of 4MiB per part, and the
 direct upload has an upper limit of 5GiB.

The limit is 10MB, but it does not apply to the last part, so basically you 
could upload any object size with it. I would still recommend using the plain 
upload for smaller object sizes, it is faster, and the resulting object might 
be more efficient (for really small sizes).

Yehuda

 So you have to use both methods - direct upload for small files, and
 multi-part upload for big files.

 Your best bet is to use the Amazon S3 libraries. They have functions that
 take care of it for you.

 I'd like to see this mentioned in the Ceph documentation someplace. When I
 first encountered the issue, I couldn't find a limit in the RadosGW
 documentation anywhere. I only found the 5GiB limit in the Amazon API
 documentation, which lead me to test on RadosGW. Now that I know it was done
 to preserve Amazon compatibility, I don't want to override the value
 anymore.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FastCGI and RadosGW issue?

2015-03-19 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Potato Farmer potato_far...@outlook.com
 To: ceph-users@lists.ceph.com
 Sent: Thursday, March 19, 2015 12:26:41 PM
 Subject: [ceph-users] FastCGI and RadosGW issue?
 
 
 
 Hi,
 
 
 
 I am running into an issue uploading to a bucket over an s3 connection to
 ceph. I can create buckets just fine. I just can’t create a key and copy
 data to it.
 
 
 
 Command that causes the error:
 
  key.set_contents_from_string(testing from string)
 
 
 
 I encounter the following error:
 
 Traceback (most recent call last):
 
 File stdin, line 1, in module
 
 File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 1424, in
 set_contents_from_string
 
 encrypt_key=encrypt_key)
 
 File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 1291, in
 set_contents_from_file
 
 chunked_transfer=chunked_transfer, size=size)
 
 File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 748, in
 send_file
 
 chunked_transfer=chunked_transfer, size=size)
 
 File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 949, in
 _send_file_internal
 
 query_args=query_args
 
 File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 664, in
 make_request
 
 retry_handler=retry_handler
 
 File /usr/lib/python2.7/site-packages/boto/connection.py, line 1068, in
 make_request
 
 retry_handler=retry_handler)
 
 File /usr/lib/python2.7/site-packages/boto/connection.py, line 1025, in
 _mexe
 
 raise BotoServerError(response.status, response.reason, body)
 
 boto.exception.BotoServerError: BotoServerError: 500 Internal Server Error
 
 None
 
 
 
 In the Apache logs I see the following:
 
 [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: comm with server
 /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 
 [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: incomplete headers (0 bytes)
 received from server /var/www/s3gw.fcgi
 
 [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: comm with server
 /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 
 [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: incomplete headers (0 bytes)
 received from server /var/www/s3gw.fcgi
 
 
 
 I do not get any data to show in the radosgw logs, it is empty. I have turned
 off FastCGIWrapper and set rgw print continue to false in ceph.conf. I am
 using the version of FastCGI provided by the ceph repo.

In this case you don't need to have 'rgw print continue' set to false; either 
remove that line, or set it to true.

Yehuda
 
 
 
 Has anyone run into this before? Any suggestions?
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow files

2015-03-18 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Abhishek L abhishek.lekshma...@gmail.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com
 Sent: Wednesday, March 18, 2015 10:54:37 AM
 Subject: Re: [ceph-users] Shadow files

 Yehuda Sadeh-Weinraub writes:

  Is there a quick way to see which shadow files are safe to delete
  easily?

  There's no easy process. If you know that a lot of the removed data is on
  buckets that shouldn't exist anymore then you could start by trying to
  identify that. You could do that by:

  $ radosgw-admin metadata list bucket

  then, for each bucket:

  $ radosgw-admin metadata get bucket:bucket name

  This will give you the bucket markers of all existing buckets. Each data
  object (head and shadow objects) is prefixed by bucket markers. Objects
  that don't have valid bucket markers can be removed. Note that I would
  first list all objects, then get the list of valid bucket markers, as the
  operation is racy and new buckets can be created in the mean time.

  We did discuss a new garbage cleanup tool that will address your specific
  issue, and we have a design for it, but it's not there yet.

 Could you share the design/ideas for making the cleanup tool. After an
 initial search I could only find two issues
 [1] http://tracker.ceph.com/issues/10342

It is sketched in here (10342), probably needs to be better formatted and 
documented.

Yehuda

 [2] http://tracker.ceph.com/issues/9604

 though not much details are there to get started.
 --
 Abhishek

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Steffen Winther ceph.u...@siimnet.dk
 To: ceph-users@lists.ceph.com
 Sent: Monday, March 9, 2015 12:43:58 AM
 Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP

 Steffen W Sørensen stefws@... writes:

  Response:
  HTTP/1.1 200 OK
  Date: Fri, 06 Mar 2015 10:41:14 GMT
  Server: Apache/2.2.22 (Fedora)
  Connection: close
  Transfer-Encoding: chunked
  Content-Type: application/xml

  This response makes the App say:

  S3.createBucket, class S3, code UnexpectedContent,
  message Inconsistency in S3 response. error
  response is not a valid xml message

  Are our S3 GW not responding properly?
 Why doesn't the radosGW return a Content-Length: 0 header
 when the body is empty?

If you're using apache, then it filters out zero Content-Length. Nothing much 
radosgw can do about it.

 http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

 Maybe this is confusing my App to expect some XML in body

You can try using the radosgw civetweb frontend, see if it changes anything.

Yehuda

  2. at every create bucket OP the GW create what looks like new containers
  for ACLs in .rgw pool, is this normal
  or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?

  # rados -p .rgw ls
  .bucket.meta.mssCl:default.6309817.1
  .bucket.meta.mssCl:default.6187712.3
  .bucket.meta.mssCl:default.6299841.7
  .bucket.meta.mssCl:default.6309817.5
  .bucket.meta.mssCl:default.6187712.2
  .bucket.meta.mssCl:default.6187712.19
  .bucket.meta.mssCl:default.6187712.12
  mssCl
  ...

  # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
  ceph.objclass.version
  user.rgw.acl

 /Steffen
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Steffen Winther ceph.u...@siimnet.dk
 To: ceph-users@lists.ceph.com
 Sent: Monday, March 9, 2015 1:25:43 PM
 Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP

 Yehuda Sadeh-Weinraub yehuda@... writes:

  If you're using apache, then it filters out zero Content-Length.
  Nothing much radosgw can do about it.
  You can try using the radosgw civetweb frontend, see if it changes
  anything.
 Thanks, only no difference...

 Req:
 PUT /mssCl/ HTTP/1.1
 Host: rgw.gsp.sprawl.dk:7480
 Authorization: AWS auth id
 Date: Mon, 09 Mar 2015 20:18:16 GMT
 Content-Length: 0

 Response:
 HTTP/1.1 200 OK
 Content-type: application/xml
 Content-Length: 0

 App still says:

 S3.createBucket, class S3, code UnexpectedContent,
 message Inconsistency in S3 response. error response is not a valid xml
 message

 :/

According to the api specified here 
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html, there's no 
response expected. I can only assume that the application tries to decode the 
xml if xml content type is returned. What kind of application is that?

 Yehuda any comments on below 2. issue?

 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?

That's a bug.

Yehuda

 # rados -p .rgw ls
 .bucket.meta.mssCl:default.6309817.1
 .bucket.meta.mssCl:default.6187712.3
 .bucket.meta.mssCl:default.6299841.7
 .bucket.meta.mssCl:default.6309817.5
 .bucket.meta.mssCl:default.6187712.2
 .bucket.meta.mssCl:default.6187712.19
 .bucket.meta.mssCl:default.6187712.12
 mssCl
 ...

 # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
 ceph.objclass.version
 user.rgw.acl

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw admin api - users

2015-03-05 Thread Yehuda Sadeh-Weinraub

The metadata api can do it:

GET /admin/metadata/user

Yehuda

- Original Message -
 From: Joshua Weaver joshua.wea...@ctl.io
 To: ceph-us...@ceph.com
 Sent: Thursday, March 5, 2015 1:43:33 PM
 Subject: [ceph-users] rgw admin api - users

 According to the docs at
 http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info
 I should be able to invoke /admin/user without a quid specified, and get a
 list of users.
 No matter what I try, I get a 403.
 After looking at the source at github (ceph/ceph), it appears that there
 isn’t any code path that would result in a collection of users to be
 generated from that resource.

 Am I missing something?

 TIA,
 _josh

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] not existing key from s3 list

2015-03-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Dominik Mostowiec dominikmostow...@gmail.com
 To: ceph-users@lists.ceph.com
 Sent: Friday, March 13, 2015 4:50:18 PM
 Subject: [ceph-users] not existing key from s3 list
 
 Hi,
 I found a strange problem with not existing file in s3.
 Object exists in list
 # s3 -u list bucketimages | grep 'files/fotoobject_83884@2/55673'
 files/fotoobject_83884@2/55673.JPG   2014-03-26T22:25:59Z   349K
 but:
 # s3 -u head 'bucketimages/files/fotoobject_83884@2/55673.JPG'
 
 ERROR: HttpErrorNotFound
 
 After a little digging:
 # radosgw-admin --bucket=bucketimages bucket stats | grep marker
   marker: default.7573587.55,
 
 # rados listomapkeys .dir.default.7573587.55 -p .rgw.buckets.index |
 grep 'files/fotoobject'
 files/fotoobject_83884@2/55673.JPG
 
 # rados -p .rgw.buckets.index getomapval .dir.default.7573587.55
 'files/fotoobject_83884@2/55673.JPG'
 No such key:
 .rgw.buckets.index/.dir.default.7573587.55/files/fotoobject_83884@2/55673.JPG
 
 What is wrong?

It is likely that this object failed to upload, we returned an error for that, 
but there was a bug (fixed recently) that we didn't clear the bucket index 
entry correctly.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow files

2015-03-12 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Ben b@benjackson.email
 To: ceph-us...@ceph.com
 Sent: Wednesday, March 11, 2015 8:46:25 PM
 Subject: Re: [ceph-users] Shadow files

 Anyone got any info on this?

 Is it safe to delete shadow files?

It depends. Shadow files are badly named objects that represent part of the 
objects data. They are only safe to remove if you know that the corresponding 
objects no longer exist.

Yehuda

 On 2015-03-11 10:03, Ben wrote:
  We have a large number of shadow files in our cluster that aren't
  being deleted automatically as data is deleted.

  Is it safe to delete these files?
  Is there something we need to be aware of when deleting them?
  Is there a script that we can run that will delete these safely?

  Is there something wrong with our cluster that it isn't deleting these
  files when it should be?

  We are using civetweb with radosgw, with tengine ssl proxy infront of
  it

  Any advice please
  Thanks
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-10 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Steffen Winther ceph.u...@siimnet.dk
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, March 10, 2015 12:06:38 AM
 Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP

 Yehuda Sadeh-Weinraub yehuda@... writes:

  According to the api specified here
  http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html,
  there's no response expected. I can only assume that the application
  tries to decode the xml if xml content type is returned.
 Also what I hinted App vendor

  What kind of application is that?
 Commercial Email platform from Openwave.com

Maybe it could be worked around using an apache rewrite rule. In any case, I 
opened issue #11091.

   2. at every create bucket OP the GW create what looks like new containers
   for ACLs in .rgw pool, is this normal
   or howto avoid such multiple objects clottering the GW pools?
   Is there something wrong since I get multiple ACL object for this bucket
   everytime my App tries to recreate same bucket or
   is this a feature/bug in radosGW?

  That's a bug.
 Ok, any resolution/work-around to this?

Not at the moment. There's already issue #6961, I bumped its priority higher, 
and we'll take a look at it.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Auth URL not found when using object gateway

2015-03-24 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Greg Meier greg.me...@nyriad.com
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, March 24, 2015 4:24:16 PM
 Subject: [ceph-users] Auth URL not found when using object gateway
 
 Hi,
 
 I'm having trouble setting up an object gateway on an existing cluster. The
 cluster I'm trying to add the gateway to is running on a Precise 12.04
 virtual machine.
 
 The cluster is up and running, with a monitor, two OSDs, and a metadata
 server. It returns HEALTH_OK and active+clean, so I am somewhat assured that
 it is running correctly.
 
 I've:
 - set up an apache2 webserver with the fastcgi mod installed
 - created an rgw.conf file
 - added an s3gw.fcgi script
 - enabled the rgw.conf site and disabled the default
 - created a keyring and gateway user with appropriate cap's
 - restarted ceph, apache2, and the radosgw daemon
 - created a user and subuser
 - tested both s3 and swift calls
 
 Unfortunately, both s3 and swift fail to authorize. An attempt to create a
 new bucket with s3 using a python script returns:
 
 Traceback (most recent call last):
 File s3test.py, line 13, in module
 bucket = conn.create_bucket('my-new-bucket')
 File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 422, in
 create_bucket
 response.status, response.reason, body)
 boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
 None
 
 And an attempt to post a container using the python-swiftclient from the
 command line with command:
 
 swift --debug --info -A http://localhost/auth/1.0 -U gatewayuser:swift -K
 key post new_container
 
 returns:
 
 INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
 DEBUG:urllib3.connectionpool:GET /auth/1.0 HTTP/1.1 404 180
 INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET
 INFO:swiftclient:RESP STATUS: 404 Not Found
 INFO:swiftclient:RESP HEADERS: [('content-length', '180'),
 ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'),
 ('content-type', 'text/html; charset=iso-8859-1'), ('vary',
 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')]
 INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve
 ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+
 �|-�X)��V��b�a���與'@Uo���-�n��?�
 ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found
 Traceback (most recent call last):
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181, in
 _retry
 self.url, self.token = self.get_auth()
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155, in
 get_auth
 insecure=self.insecure)
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318, in
 get_auth
 insecure=insecure)
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241, in
 get_auth_1_0
 http_reason=resp.reason)
 ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found
 INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
 DEBUG:urllib3.connectionpool:GET /auth/1.0 HTTP/1.1 404 180
 INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET
 INFO:swiftclient:RESP STATUS: 404 Not Found
 INFO:swiftclient:RESP HEADERS: [('content-length', '180'),
 ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'),
 ('content-type', 'text/html; charset=iso-8859-1'), ('vary',
 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')]
 INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve
 ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+
 �|-�X)��V��b�a���與'@Uo���-�n��?�
 ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found
 Traceback (most recent call last):
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181, in
 _retry
 self.url, self.token = self.get_auth()
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155, in
 get_auth
 insecure=self.insecure)
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318, in
 get_auth
 insecure=insecure)
 File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241, in
 get_auth_1_0
 http_reason=resp.reason)
 ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found
 Auth GET failed: http://localhost/auth/1.0 404 Not Found
 
 I'm not at all sure why it doesn't work when I've followed the documentation
 for setting it up. Please find attached, the config files for rgw.conf,
 ceph.conf, and apache2.conf
 

What does the rgw log show? (please add 'debug rgw = 20' and 'debug ms = 1')

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw authorization failed

2015-03-25 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Neville neville.tay...@hotmail.co.uk
 To: ceph-users@lists.ceph.com
 Sent: Wednesday, March 25, 2015 8:16:39 AM
 Subject: [ceph-users] Radosgw authorization failed
 
 Hi all,
 
 I'm testing backup product which supports Amazon S3 as target for Archive
 storage and I'm trying to setup a Ceph cluster configured with the S3 API to
 use as an internal target for backup archives instead of AWS.
 
 I've followed the online guide for setting up Radosgw and created a default
 region and zone based on the AWS naming convention US-East-1. I'm not sure
 if this is relevant but since I was having issues I thought it might need to
 be the same.
 
 I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
 create a bucket, create a folder, list buckets etc. The problem is when the
 backup software tries to create an object I get an authorization failure.
 It's using the same user/access/secret as I'm using from boto.s3 and I'm
 sure the creds are right as it lets me create the initial connection, it
 just fails when trying to create an object (backup folder).
 
 Here's the extract from the radosgw log:
 
 -
 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
 /:list_bucket:init op
 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
 /:list_bucket:verifying op mask
 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7
 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
 /:list_bucket:verifying op permissions
 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test
 mask=49
 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1
 mask=49
 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2
 mask=49
 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
 owner=test perm=1
 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1,
 policy perm=1, user_perm_mask=1, acl perm=1
 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
 /:list_bucket:verifying op params
 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
 /:list_bucket:executing
 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
 test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
 start num 1001
 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
 /:list_bucket:http status=200
 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0
 http_status=200 ==
 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
 req=0x7f107000f0e0
 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
 req=0x7f107000f6b0
 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
 req=0x7f107000f0e0
 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
 2015-03-25 15:07:26.517084 7f1058dd7700 20
 CONTENT_TYPE=application/octet-stream
 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
 F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
 15:07:26 GMT
 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
 2015-03-25 15:07:26.517093 7f1058dd7700 20
 HTTP_HOST=test1.devops-os-cog01.devops.local
 2015-03-25 15:07:26.517094 7f1058dd7700 20
 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1
 Java_HotSpot(TM)_Client_VM/24.55-b03
 2015-03-25 15:07:26.517096 7f1058dd7700 20
 HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26
 2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88
 2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
 2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on
 2015-03-25 15:07:26.517100 7f1058dd7700 20
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 2015-03-25 15:07:26.517100 7f1058dd7700 20 QUERY_STRING=
 2015-03-25 15:07:26.517101 7f1058dd7700 20 REMOTE_ADDR=10.40.41.106
 2015-03-25 15:07:26.517102 7f1058dd7700 20 REMOTE_PORT=55439
 2015-03-25 15:07:26.517103 7f1058dd7700 20

Re: [ceph-users] Radosgw authorization failed

2015-03-30 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Neville neville.tay...@hotmail.co.uk
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Monday, March 30, 2015 6:49:29 AM
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
  Date: Wed, 25 Mar 2015 11:43:44 -0400
  From: yeh...@redhat.com
  To: neville.tay...@hotmail.co.uk
  CC: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Radosgw authorization failed
  
  
  
  - Original Message -
   From: Neville neville.tay...@hotmail.co.uk
   To: ceph-users@lists.ceph.com
   Sent: Wednesday, March 25, 2015 8:16:39 AM
   Subject: [ceph-users] Radosgw authorization failed
   
   Hi all,
   
   I'm testing backup product which supports Amazon S3 as target for Archive
   storage and I'm trying to setup a Ceph cluster configured with the S3 API
   to
   use as an internal target for backup archives instead of AWS.
   
   I've followed the online guide for setting up Radosgw and created a
   default
   region and zone based on the AWS naming convention US-East-1. I'm not
   sure
   if this is relevant but since I was having issues I thought it might need
   to
   be the same.
   
   I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
   create a bucket, create a folder, list buckets etc. The problem is when
   the
   backup software tries to create an object I get an authorization failure.
   It's using the same user/access/secret as I'm using from boto.s3 and I'm
   sure the creds are right as it lets me create the initial connection, it
   just fails when trying to create an object (backup folder).
   
   Here's the extract from the radosgw log:
   
   -
   2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
   /:list_bucket:init op
   2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
   /:list_bucket:verifying op mask
   2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
   user.op_mask=7
   2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
   /:list_bucket:verifying op permissions
   2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
   uid=test
   mask=49
   2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
   group=1
   mask=49
   2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
   group=2
   mask=49
   2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
   2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
   owner=test perm=1
   2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
   (type)=1,
   policy perm=1, user_perm_mask=1, acl perm=1
   2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
   /:list_bucket:verifying op params
   2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
   /:list_bucket:executing
   2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
   test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
   start num 1001
   2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
   /:list_bucket:http status=200
   2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
   req=0x7f107000e2e0
   http_status=200 ==
   2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
   req=0x7f107000f0e0
   2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
   2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
   2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
   req=0x7f107000f6b0
   2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
   req=0x7f107000f0e0
   2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
   2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
   2015-03-25 15:07:26.517084 7f1058dd7700 20
   CONTENT_TYPE=application/octet-stream
   2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
   2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
   2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
   2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
   2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
   2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
   F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
   2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
   2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
   15:07:26 GMT
   2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
   2015-03-25 15:07:26.517093 7f1058dd7700 20
   HTTP_HOST=test1.devops-os-cog01.devops.local
   2015-03-25 15:07:26.517094 7f1058dd7700 20
   HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1
   Java_HotSpot(TM

Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: Steffen W Sørensen ste...@me.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Friday, February 27, 2015 9:39:46 AM
 Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

 On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub  yeh...@redhat.com  wrote:

  I'd look at two things first. One is the '{fqdn}' string, which I'm not
  sure
  whether that's the actual string that you have, or whether you just
  replaced
  it for the sake of anonymity. The second is the port number, which should
  be
  fine, but maybe the fact that it appears as part of the script uri triggers
  some issue.
 

 When launching radosgw it logs this:

 ...
 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0
 read_len=524288
 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678
 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put:
 name=.rgw.root+zone_info.default
 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default
 to cache LRU end
 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master
 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0
 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0
 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get:
 name=.rgw.root+region_map : miss
 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put:
 name=.rgw.root+region_map
 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to
 cache LRU end
 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start
 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start
 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi
 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb
 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480
 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb
 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start
 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi
 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20
 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my
 user name}
 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop
 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my
 user name}
 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header:
 ret=-2
 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my
 user name} ret=-2

 Why does it seem to find my radosgw defined user name as a pool and what
 might bring it to fail to read user header?

That's just a red herring. It tries to sync the user stats, but it can't 
because quota is not enabled (iirc). We should probably get rid of these 
messages as they're pretty confusing. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Steffen W Sørensen ste...@me.com
 To: ceph-users@lists.ceph.com
 Sent: Friday, February 27, 2015 6:40:01 AM
 Subject: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
 
 Hi,
 
 Newbie to RadosGW+Ceph, but learning...
 Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to
 verify a RadosGW S3 api, but seems to have an issue with RadosGW access.
 
 I get the error (not found anything searching so far...):
 
 S3ResponseError: 405 Method Not Allowed
 
 when trying to access the rgw.
 
 Apache vhost access log file says:
 
 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 -
 Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
 
 and Apache's general error_log file says:
 
 [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/
 auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc=
 
 
 RadosGW seems to launch and run fine, though /var/log/messages at launches
 says:
 
 Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip
 003fb36cb1dc sp 7fffde221410 error 4 in
 librados.so.2.0.0[3fb320+6d]
 
 # ps -fuapache
 UIDPID  PPID  C STIME TTY  TIME CMD
 apache   15113 15111  0 14:07 ?00:00:00 /usr/sbin/fcgi-
 apache   15114 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15115 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15116 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15117 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15118 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15119 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15120 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15121 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
 apache   15224 1  1 14:12 ?00:00:25 /usr/bin/radosgw -n
 client.radosgw.owmblob
 
 RadosGW create my FastCGI socket and a default .asok, (not sure why/what
 default socket are meant for) as well as the configured log file though it
 never logs anything...
 
 # tail -18 /etc/ceph/ceph.conf:
 
 [client.radosgw.owmblob]
  keyring = /etc/ceph/ceph.client.radosgw.keyring
  host = rgw
  rgw data = /var/lib/ceph/radosgw/ceph-rgw
  log file = /var/log/radosgw/client.radosgw.owmblob.log
  debug rgw = 20
  rgw enable log rados = true
  rgw enable ops log = true
  rgw enable apis = s3
  rgw cache enabled = true
  rgw cache lru size = 1
  rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock
  ;#rgw host = localhost
  ;#rgw port = 8004
  rgw dns name = {fqdn}
  rgw print continue = true
  rgw thread pool size = 20
 
 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file
 didn't exist,
 assuming radosgw creates this log file when opening it, only it creates it as
 root not $USER, thus not output, manually chowning it and restarting GW
 gives output ala:
 
 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40
 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ:
 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40
 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050
 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40
 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty
 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0
 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html
 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER
 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1
 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity
 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS
 WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg=
 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015
 14:25:14 GMT
 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005
 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0
 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin
 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING=
 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29
 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386
 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET
 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/
 2015-02-27 15:25:14.469677 7fef431e4700 20
 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi
 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/
 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/
 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/
 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29
 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email}
 2015-02-27 15:25:14.469782 7fef431e4700

Re: [ceph-users] Hammer sharded radosgw bucket indexes question

2015-03-04 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Ben Hines bhi...@gmail.com
 To: ceph-users ceph-users@lists.ceph.com
 Sent: Wednesday, March 4, 2015 1:03:16 PM
 Subject: [ceph-users] Hammer sharded radosgw bucket indexes question

 Hi,

 These questions were asked previously but perhaps lost:

 We have some large buckets.

 - When upgrading to Hammer (0.93 or later), is it necessary to
 recreate the buckets to get a sharded index?

 - What parameters does the system use for deciding when to shard the index?

The system does not re-shard the bucket index, it will only affect new buckets. 
There is a per-zone configurable that specifies num of shards for buckets 
created in that zone (by default it's disabled). There's also a ceph.conf 
configurable that can be set to override that value.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Daniel Schneller daniel.schnel...@centerdevice.com
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, March 3, 2015 2:54:13 AM
 Subject: [ceph-users] Understand RadosGW logs
 
 Hi!
 
 After realizing the problem with log rotation (see
 http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
 and fixing it, I now for the first time have some
 meaningful (and recent) logs to look at.
 
 While from an application perspective there seem
 to be no issues, I would like to understand some
 messages I find with relatively high frequency in
 the logs:
 
 Exhibit 1
 -
 2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1

It means that returning data to the client got some error, usually means that 
the client disconnected before completion.
 
 I cannot find anything relevant by Googling for
 that, apart from the actual line of code that
 produces this line.
 What does that mean? Is it an indication of data
 corruption or are there more benign reasons for
 this line?
 
 
 Exhibit 2
 --
 Several of these blocks
 
 2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
 req=0x7fcf5800f3b0 =
 2015-03-03 07:06:17.836671 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836758 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836918 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
 req=0x7fcf5800f3b0 http_status=200 ==
 ...
 2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:27:29.866718 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866778 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866852 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866917 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.875466 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.884434 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.906155 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.914364 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.940653 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:30.272816 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.125773 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 ...
 2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:28:43.016414 7fcf2a7d4700  0
 RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
 part_ofs=0 rule-part_size=0
 2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 
 First, what is the req= line? Is that a thread-id?
 I am asking, because the same id is used over and over
 in the same file over time.

It's the request id (within the current radosgw instance)

 
 More

Re: [ceph-users] RadosGW - multiple dns names

2015-02-23 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: Shinji Nakamoto shinji.nakam...@mgo.com
 To: ceph-us...@ceph.com
 Sent: Friday, February 20, 2015 3:58:39 PM
 Subject: [ceph-users] RadosGW - multiple dns names

 We have multiple interfaces on our Rados gateway node, each of which is
 assigned to one of our many VLANs with a unique IP address.

 Is it possible to set multiple DNS names for a single Rados GW, so it can
 handle the request to each of the VLAN specific IP address DNS names?

Not yet, however, the upcoming hammer release will support that (hostnames will 
be configured as part of the region). 

Yehuda 

 eg.
 rgw dns name = prd-apiceph001
 rgw dns name = prd-backendceph001
 etc.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mixed ceph versions

2015-02-25 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Gregory Farnum g...@gregs42.com
 To: Tom Deneau tom.den...@amd.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, February 25, 2015 3:20:07 PM
 Subject: Re: [ceph-users] mixed ceph versions

 On Wed, Feb 25, 2015 at 3:11 PM, Deneau, Tom tom.den...@amd.com wrote:
  I need to set up a cluster where the rados client (for running rados
  bench) may be on a different architecture and hence running a different
  ceph version from the osd/mon nodes.  Is there a list of which ceph
  versions work together for a situation like this?

 The RADOS protocol is architecture-independent, and while we don't
 test across a huge version divergence (mostly between LTS releases)
 the client should also be compatible with pretty much anything you
 have server-side.

Client stuff like rgw usually requires that the backend runs a version at least 
as new (for objclass functionality).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS Gateway quota management

2015-04-03 Thread Yehuda Sadeh-Weinraub

Great, I opened issue # 11323. 

Thanks, 
Yehuda 

- Original Message -

 From: Sergey Arkhipov sarkhi...@asdco.ru
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Friday, April 3, 2015 1:00:02 AM
 Subject: Re: [ceph-users] RADOS Gateway quota management

 Hi,

 Thank you for your answer! Meanwhile I did some investigations and found the
 reason: quota works on PUTs perfectly, but there are no checks on POSTs.
 I've made a pull-request: https://github.com/ceph/ceph/pull/4240

 2015-04-02 18:40 GMT+03:00 Yehuda Sadeh-Weinraub  yeh...@redhat.com  :

   From: Sergey Arkhipov  sarkhi...@asdco.ru 
  
 
   To: ceph-users@lists.ceph.com
  
 
   Sent: Monday, March 30, 2015 2:55:33 AM
  
 
   Subject: [ceph-users] RADOS Gateway quota management
  
 

   Hi,
  
 

   Currently I am trying to figure out how to work with RADOS Gateway (ceph
   0.87) limits and I've managed to produce such strange behavior:
  
 

   { bucket: test1-8,
  
 
   pool: .rgw.buckets,
  
 
   index_pool: .rgw.buckets.index,
  
 
   id: default.17497.14,
  
 
   marker: default.17497.14,
  
 
   owner: cb254310-8b24-4622-93fb-640ca4a45998,
  
 
   ver: 21,
  
 
   master_ver: 0,
  
 
   mtime: 1427705802,
  
 
   max_marker: ,
  
 
   usage: { rgw.main: { size_kb: 16000,
  
 
   size_kb_actual: 16020,
  
 
   num_objects: 9}},
  
 
   bucket_quota: { enabled: true,
  
 
   max_size_kb: -1,
  
 
   max_objects: 3}}
  
 

   Steps to reproduce: create bucket, set quota like that (max_objects = 3
   and
   enable) and successfully upload 9 files. User quota is also defined:
  
 

   bucket_quota: { enabled: true,
  
 
   max_size_kb: -1,
  
 
   max_objects: 3},
  
 
   user_quota: { enabled: true,
  
 
   max_size_kb: 1048576,
  
 
   max_objects: 5},
  
 

   Could someone please help me to understand how to limit users?
  
 

   --
  
 

  The question is whether the user is able to continue writing objects at
  this
  point. The quota system is working asynchronously, so it's possible to get
  into edge cases where users exceeded it a bit (it looks a whole lot better
  with larger numbers). The question is whether it's working for you at all.
 

  Yehuda
 

 --
 Sergey Arkhipov
 Software Engineer, ASD Technologies
 Phone: +7 920 018 9404
 Skype: serge.arkhipov
 sarkhi...@asdco.ru
 asdtech.co

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Francois Lafont flafdiv...@free.fr
 To: ceph-users@lists.ceph.com
 Sent: Monday, April 13, 2015 5:17:47 PM
 Subject: Re: [ceph-users] Purpose of the s3gw.fcgi script?
 
 Hi,
 
 Yehuda Sadeh-Weinraub wrote:
 
  You're not missing anything. The script was only needed when we used
  the process manager of the fastcgi module, but it has been very long
  since we stopped using it.
 
 Just to be sure, so if I understand well, these parts of the documentation:
 
 1.
 
 http://docs.ceph.com/docs/master/radosgw/config/#create-a-cgi-wrapper-script
 2.
 
 http://docs.ceph.com/docs/master/radosgw/config/#adjust-cgi-wrapper-script-permission
 
 can be completely skipped. Is it correct?
 

Yes.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Swift and Ceph

2015-04-23 Thread Yehuda Sadeh-Weinraub

Sounds like you're hitting a known issue that was fixed a while back (although 
might not be fixed on the specific version you're running). Can you try 
creating a second subuser for the same user, see if that one works?

Yehuda

- Original Message -
 From: alistair whittle alistair.whit...@barclays.com
 To: ceph-users@lists.ceph.com
 Sent: Thursday, April 23, 2015 8:38:44 AM
 Subject: [ceph-users] Swift and Ceph

 All,

 I was hoping for some advice. I have recently built a Ceph cluster on RHEL
 6.5 and have configured RGW. I want to test Swift API access, and as a
 result have created a user, swift subuser and swift keys as per the output
 below:

 1. Create user

 radosgw-admin user create --uid=testuser1 --display-name=Test User1

 { user_id: testuser1,

 display_name: Test User1,

 email: ,

 suspended: 0,

 max_buckets: 1000,

 auid: 0,

 subusers: [],

 keys: [

 { user: testuser1,

 access_key: MJBEZLJ7BYG8XODXT71V,

 secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}],

 swift_keys: [],

 caps: [],

 op_mask: read, write, delete,

 default_placement: ,

 placement_tags: [],

 bucket_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 user_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 temp_url_keys: []}

 2. Create subuser.

 radosgw-admin subuser create --uid=testuser1 --subuser=testuser1:swift
 --access=full

 { user_id: testuser1,

 display_name: Test User1,

 email: ,

 suspended: 0,

 max_buckets: 1000,

 auid: 0,

 subusers: [

 { id: testuser1:swift,

 permissions: full-control}],

 keys: [

 { user: testuser1:swift,

 access_key: HX9Q30EJWCZG825AT7B0,

 secret_key: },

 { user: testuser1,

 access_key: MJBEZLJ7BYG8XODXT71V,

 secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}],

 swift_keys: [],

 caps: [],

 op_mask: read, write, delete,

 default_placement: ,

 placement_tags: [],

 bucket_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 user_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 temp_url_keys: []}

 3. Create key

 radosgw-admin key create --subuser=testuser1:swift --key-type=swift
 --gen-secret

 { user_id: testuser1,

 display_name: Test User1,

 email: ,

 suspended: 0,

 max_buckets: 1000,

 auid: 0,

 subusers: [

 { id: testuser1:swift,

 permissions: full-control}],

 keys: [

 { user: testuser1:swift,

 access_key: HX9Q30EJWCZG825AT7B0,

 secret_key: },

 { user: testuser1,

 access_key: MJBEZLJ7BYG8XODXT71V,

 secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}],

 swift_keys: [

 { user: testuser1:swift,

 secret_key: KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf}],

 caps: [],

 op_mask: read, write, delete,

 default_placement: ,

 placement_tags: [],

 bucket_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 user_quota: { enabled: false,

 max_size_kb: -1,

 max_objects: -1},

 temp_url_keys: []}

 When I try and do anything using the credentials above, I get “Account not
 found” errors as per the example below:

 swift -A https://FQDN/auth/1.0 -U testuser1:swift -K
 KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf list

 That’s the first thing.

 Secondly, when I follow the process above to create a second user
 “testuser2”, the user and subuser is created, however, when I try and
 generate a swift key for it, I get the following error:

 radosgw-admin key create --subuser=testuser2:swift --key-type=swift
 --gen-secret

 could not create key: unable to add access key, unable to store user info

 2015-04-23 15:42:38.897090 7f38e157d820 0 WARNING: can't store user info,
 swift id () already mapped to another user (testuser2)

 This suggests there is something wrong with the users or the configuration of
 the gateway somewhere. Can someone provide some advice on what might be
 wrong, or where I can look to find out. I have gone through whatever log
 files I can and don’t see anything of any use at the moment.

 Any help appreciated.

 Thanks

 Alistair

 ___

 This message is for information purposes only, it is not a recommendation,
 advice, offer or solicitation to buy or sell a product or service nor an
 official confirmation of any transaction. It is directed at persons who are
 professionals and is not intended for retail customer use. Intended for
 recipient only. This message is subject to the terms at:
 www.barclays.com/emaildisclaimer .

 For important disclosures, please see:
 www.barclays.com/salesandtradingdisclaimer regarding market commentary from
 Barclays Sales and/or Trading, who are active market participants; and in
 respect of Barclays Research, including disclosures relating to specific
 issuers, please see http://publicresearch.barclays.com .

 ___

 ___

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub

These ones:

http://tracker.ceph.com/issues/10295
http://tracker.ceph.com/issues/11447

- Original Message -
 From: Ben Jackson b@benjackson.email
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users ceph-us...@ceph.com
 Sent: Friday, April 24, 2015 3:06:02 PM
 Subject: Re: [ceph-users] Shadow Files
 
 We were firefly, then we upgraded to giant, now we are on hammer.
 
 What issues?
 
 On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
 
  What version are you running? There are two different issues that we were
  fixing this week, and we should have that upstream pretty soon.
 
  Yehuda
 
  - Original Message -
   From: Ben b@benjackson.email
   To: ceph-users ceph-us...@ceph.com
   Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com
   Sent: Thursday, April 23, 2015 7:42:06 PM
   Subject: [ceph-users] Shadow Files
   
   We are still experiencing a problem with out gateway not properly
   clearing out shadow files.
   
   I have done numerous tests where I have:
   -Uploaded a file of 1.5GB in size using s3browser application
   -Done an object stat on the file to get its prefix
   -Done rados ls -p .rgw.buckets | grep prefix to count the number of
   shadow files associated (in this case it is around 290 shadow files)
   -Deleted said file with s3browser
   -Performed a gc list, which shows the ~290 files listed
   -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to
   recount the shadow files only to be left with 290 files still there
   
    From log output /var/log/ceph/radosgw.log, I can see the following when
   clicking DELETE (this appears 290 times)
   2015-04-24 10:43:29.996523 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996557 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996564 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996570 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996576 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996581 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996586 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0
   2015-04-24 10:43:29.996592 7f0b0afb5700  0 RGWObjManifest::operator++():
   result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0
   
   In this same log, I also see the gc process saying it is removing said
   file (these records appear 290 times too)
   2015-04-23 14:16:27.926952 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.928572 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.929636 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.930448 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.931226 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.932103 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   2015-04-23 14:16:27.933470 7f15be0ee700  0 gc::process: removing
   .rgw.buckets:objectname
   
   So even though it appears that the GC is processing its removal, the
   shadow files remain!
   
   Please help!
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub

What version are you running? There are two different issues that we were 
fixing this week, and we should have that upstream pretty soon.

Yehuda

- Original Message -
 From: Ben b@benjackson.email
 To: ceph-users ceph-us...@ceph.com
 Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Sent: Thursday, April 23, 2015 7:42:06 PM
 Subject: [ceph-users] Shadow Files
 
 We are still experiencing a problem with out gateway not properly
 clearing out shadow files.
 
 I have done numerous tests where I have:
 -Uploaded a file of 1.5GB in size using s3browser application
 -Done an object stat on the file to get its prefix
 -Done rados ls -p .rgw.buckets | grep prefix to count the number of
 shadow files associated (in this case it is around 290 shadow files)
 -Deleted said file with s3browser
 -Performed a gc list, which shows the ~290 files listed
 -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to
 recount the shadow files only to be left with 290 files still there
 
  From log output /var/log/ceph/radosgw.log, I can see the following when
 clicking DELETE (this appears 290 times)
 2015-04-24 10:43:29.996523 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996557 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996564 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996570 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996576 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996581 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996586 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996592 7f0b0afb5700  0 RGWObjManifest::operator++():
 result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0
 
 In this same log, I also see the gc process saying it is removing said
 file (these records appear 290 times too)
 2015-04-23 14:16:27.926952 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.928572 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.929636 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.930448 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.931226 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.932103 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.933470 7f15be0ee700  0 gc::process: removing
 .rgw.buckets:objectname
 
 So even though it appears that the GC is processing its removal, the
 shadow files remain!
 
 Please help!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-25 Thread Yehuda Sadeh-Weinraub


Yeah, that's definitely something that we'd address soon.

Yehuda

- Original Message -
 From: Ben b@benjackson.email
 To: Ben Hines bhi...@gmail.com, Yehuda Sadeh-Weinraub 
 yeh...@redhat.com
 Cc: ceph-users ceph-us...@ceph.com
 Sent: Friday, April 24, 2015 5:14:11 PM
 Subject: Re: [ceph-users] Shadow Files
 
 Definitely need something to help clear out these old shadow files.
 
 I'm sure our cluster has around 100TB of these shadow files.
 
 I've written a script to go through known objects to get prefixes of objects
 that should exist to compare to ones that shouldn't, but the time it takes
 to do this over millions and millions of objects is just too long.
 
 On 25/04/15 09:53, Ben Hines wrote:
 
 
 
 When these are fixed it would be great to get good steps for listing /
 cleaning up any orphaned objects. I have suspicions this is affecting us.
 
 thanks-
 
 -Ben
 
 On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 wrote:
 
 
 These ones:
 
 http://tracker.ceph.com/issues/10295
 http://tracker.ceph.com/issues/11447
 
 - Original Message -
  From: Ben Jackson b@benjackson.email
  To: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
  Cc: ceph-users  ceph-us...@ceph.com 
  Sent: Friday, April 24, 2015 3:06:02 PM
  Subject: Re: [ceph-users] Shadow Files
  
  We were firefly, then we upgraded to giant, now we are on hammer.
  
  What issues?
  
  On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub  yeh...@redhat.com  wrote:
   
   What version are you running? There are two different issues that we were
   fixing this week, and we should have that upstream pretty soon.
   
   Yehuda
   
   - Original Message -
From: Ben b@benjackson.email
To: ceph-users  ceph-us...@ceph.com 
Cc: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
Sent: Thursday, April 23, 2015 7:42:06 PM
Subject: [ceph-users] Shadow Files

We are still experiencing a problem with out gateway not properly
clearing out shadow files.

I have done numerous tests where I have:
-Uploaded a file of 1.5GB in size using s3browser application
-Done an object stat on the file to get its prefix
-Done rados ls -p .rgw.buckets | grep prefix to count the number of
shadow files associated (in this case it is around 290 shadow files)
-Deleted said file with s3browser
-Performed a gc list, which shows the ~290 files listed
-Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix
to
recount the shadow files only to be left with 290 files still there

From log output /var/log/ceph/radosgw.log, I can see the following when
clicking DELETE (this appears 290 times)
2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++():
result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0

In this same log, I also see the gc process saying it is removing said
file (these records appear 290 times too)
2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.929636 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.930448 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.931226 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.932103 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.933470 7f15be0ee700 0 gc::process: removing
.rgw.buckets:objectname

So even though it appears that the GC is processing its removal, the
shadow files remain!

Please help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-04-28 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Sean seapasu...@uchicago.edu
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, April 28, 2015 2:52:35 PM
 Subject: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb 
 logs stop after rotation

 Hey yall!

 I have a weird issue and I am not sure where to look so any help would
 be appreciated. I have a large ceph giant cluster that has been stable
 and healthy almost entirely since its inception. We have stored over
 1.5PB into the cluster currently through RGW and everything seems to be
 functioning great. We have downloaded smaller objects without issue but
 last night we did a test on our largest file (almost 1 terabyte) and it
 continuously times out at almost the exact same place. Investigating
 further it looks like Civetweb/RGW is returning that the uploads
 completed even though the objects are truncated. At least when we
 download the objects they seem to be truncated.

 I have tried searching through the mailing list archives to see what may
 be going on but it looks like the mailing list DB may be going through
 some mainenance:

 Unable to read word database file
 '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'

 After checking through the gzipped logs I see that civetweb just stops
 logging after a rotation for some reason as well and my last log is from
 the 28th of march. I tried manually running /etc/init.d/radosgw reload
 but this didn't seem to work. As running the download again could take
 all day to error out we instead use the range request to try and pull
 the missing bites.

 https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
 code we are using to download via S3 / boto as well as the returned size
 report and overview of our issue.
 http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
 server they are hitting.

 Here is our current config ::
 http://pastebin.com/2SGfSDYG

 Current output of ceph health::
 http://pastebin.com/3f6iJEbu

 I am thinking that this must be a civetweb/radosgw bug of somekind. My
 question is 1.) is there a way to try and download the object via rados
 directly I am guessing I will need to find the prefix and then just cat
 all of them together and hope I get it right? 2.) Why would ceph say the
 upload went fine but then return a smaller object?

Note that the returned http resonse returns 206 (partial content):
/var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700  2 
req 0:1.067030:s3:GET 
/tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http
 status=206

It'll only return that if partial content is requested (through the http Range 
header). It's really hard to tell from these logs whether there's any actual 
problem. I suggest bumping up the log level (debug ms = 1, debug rgw = 20), and 
take a look at an entire request (one that include all the request http 
headers).

Yehuda

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-27 Thread Yehuda Sadeh-Weinraub

It will get to the ceph mainline eventually. We're still reviewing and testing 
the fix, and there's more work to be done on the cleanup tool.

Yehuda

- Original Message -
 From: Ben b@benjackson.email
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users ceph-us...@ceph.com
 Sent: Sunday, April 26, 2015 11:02:23 PM
 Subject: Re: [ceph-users] Shadow Files
 
 Are these fixes going to make it into the repository versions of ceph,
 or will we be required to compile and install manually?
 
 On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:
  Yeah, that's definitely something that we'd address soon.
  
  Yehuda
  
  - Original Message -
  From: Ben b@benjackson.email
  To: Ben Hines bhi...@gmail.com, Yehuda Sadeh-Weinraub
  yeh...@redhat.com
  Cc: ceph-users ceph-us...@ceph.com
  Sent: Friday, April 24, 2015 5:14:11 PM
  Subject: Re: [ceph-users] Shadow Files
  
  Definitely need something to help clear out these old shadow files.
  
  I'm sure our cluster has around 100TB of these shadow files.
  
  I've written a script to go through known objects to get prefixes of
  objects
  that should exist to compare to ones that shouldn't, but the time it
  takes
  to do this over millions and millions of objects is just too long.
  
  On 25/04/15 09:53, Ben Hines wrote:
  
  
  
  When these are fixed it would be great to get good steps for listing /
  cleaning up any orphaned objects. I have suspicions this is affecting
  us.
  
  thanks-
  
  -Ben
  
  On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub 
  yeh...@redhat.com 
  wrote:
  
  
  These ones:
  
  http://tracker.ceph.com/issues/10295
  http://tracker.ceph.com/issues/11447
  
  - Original Message -
   From: Ben Jackson b@benjackson.email
   To: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
   Cc: ceph-users  ceph-us...@ceph.com 
   Sent: Friday, April 24, 2015 3:06:02 PM
   Subject: Re: [ceph-users] Shadow Files
  
   We were firefly, then we upgraded to giant, now we are on hammer.
  
   What issues?
  
   On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
   wrote:
   
What version are you running? There are two different issues that we
were
fixing this week, and we should have that upstream pretty soon.
   
Yehuda
   
- Original Message -
 From: Ben b@benjackson.email
 To: ceph-users  ceph-us...@ceph.com 
 Cc: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 Sent: Thursday, April 23, 2015 7:42:06 PM
 Subject: [ceph-users] Shadow Files

 We are still experiencing a problem with out gateway not properly
 clearing out shadow files.

 I have done numerous tests where I have:
 -Uploaded a file of 1.5GB in size using s3browser application
 -Done an object stat on the file to get its prefix
 -Done rados ls -p .rgw.buckets | grep prefix to count the number
 of
 shadow files associated (in this case it is around 290 shadow files)
 -Deleted said file with s3browser
 -Performed a gc list, which shows the ~290 files listed
 -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep
 prefix
 to
 recount the shadow files only to be left with 290 files still there

 From log output /var/log/ceph/radosgw.log, I can see the following
 when
 clicking DELETE (this appears 290 times)
 2015-04-24 10:43:29.996523 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996557 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0
 2015-04-24 10:43:29.996564 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=13107200 stripe_ofs=13107200 part_ofs=0
 rule-part_size=0
 2015-04-24 10:43:29.996570 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=17301504 stripe_ofs=17301504 part_ofs=0
 rule-part_size=0
 2015-04-24 10:43:29.996576 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=21495808 stripe_ofs=21495808 part_ofs=0
 rule-part_size=0
 2015-04-24 10:43:29.996581 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=25690112 stripe_ofs=25690112 part_ofs=0
 rule-part_size=0
 2015-04-24 10:43:29.996586 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=29884416 stripe_ofs=29884416 part_ofs=0
 rule-part_size=0
 2015-04-24 10:43:29.996592 7f0b0afb5700 0
 RGWObjManifest::operator++():
 result: ofs=34078720 stripe_ofs=34078720 part_ofs=0
 rule-part_size=0

 In this same log, I also see the gc process saying it is removing
 said
 file (these records appear 290 times too)
 2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing
 .rgw.buckets:objectname
 2015-04-23 14:16:27.929636 7f15be0ee700 0 gc

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-02 Thread Yehuda Sadeh-Weinraub

- Original Message -
From: Sean seapasu...@uchicago.edu
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Friday, May 1, 2015 6:47:09 PM
Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects;
civetweb logs stop after rotation

Hey there,

Sorry for the delay. I have been moving apartments UGH. Our dev team
found out how to quickly identify these files that are downloading a
smaller size::

iterate through all of the objects in a bucket and call for a key.size
in each item and compare it to conn.get_bucket().get_key().size of each
key and the sizes differ. If the sizes differ these correspond exactly
to any object that seems to have missing objects in ceph.

The objects always seem to be intervals of 512k as well which is really
odd.

==
http://pastebin.com/R34wF7PB
==

My main question is why are these sizes different at all? Shouldn't they
be exactly the same? Why are they off by multiples of 512k as well?
Finally I need a way to rule out that this is a ceph issue and the only
way I can think of is grabbing a list of all of the data files and
concatenating them together in order in hopes that the manifest is wrong
and I get the whole file.

For example::

implicit size 7745820218 explicit size 7744771642. Absolute
1048576; name =
86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam

I explicitly called one of the gateways and then piped the output to a
text file while downloading this bam:

https://drive.google.com/file/d/0B16pfLB7yY6GcTZXalBQM3RHT0U/view?usp=sharing
(25 Mb of text)

As we can see above. Ceph is saying that the size is 7745820218 bytes
somewhere but when we download it we get 7744771642 bytes. If I download

There are two different things: the bucket index, and the object manifest. The
bucket index has the former, and the object manifest specifies the latter.

the object I get a 7744771642 byte file. Finally if I do a range request
of all of the bytes from 7744771642 to the end I get a cannot compete
request::

http://pastebin.com/CVvmex4m -- traceback of the python range request.
http://pastebin.com/4sd1Jc0G -- the radoslog of the range request

If I request the file with a shorter range (say 7744771642 -2 bytes
(7744771640)) I am left with just a 2 byte file::

http://pastebin.com/Sn7Y0t9G -- range request of file - 2 bytes to end
of file.
lacadmin@kh10-9:~$ ls -lhab 7gtest-range.bam
-rw-r--r-- 1 lacadmin lacadmin 2 Feb 24 01:00 7gtest-range.bam

I think that rados-gw may not be keeping track of the multipart chunks
errors possibly? How did rados get the original and correct file size
and why is it short when it returns the actual chunks? Finally why are
the corrupt / missing chunks always a multipe of 512K? I do not see
anything obvious that is set to 512K on the configuration/user side.

Sorry for the questions and babling but I am at a loss as to how to
address this.

So, the question is which is correct, the index, or the object itself. Do you
have any way to know which one is the correct one? Also, does it only happen to
you with very large objects? Does it happen with every such object (e.g.,
4GBs)?

Here's some extra information you could gather:

- Get the object manifest:

$ radosgw-admin object stat --bucket=bucket --object=object

- Get status for each rados object to the corresponding logical rgw object:

First, identify the object names that correspond to this specific rgw object.
From the manifest you'd get a 'prefix', which is a random hash that all tail
objects should contain. Then you should do something like:

$ rados -p data pool, e.g., .rgw.buckets ls | grep $prefix

And then, for each object:

$ rados -p data pool, e.g., .rgw.buckets stat $object

There's also the head object that you'd want to inspect (named after the actual
rgw object name, grep it too).

HTH,
Yehuda

On 04/28/2015 05:03 PM, Yehuda Sadeh-Weinraub wrote:

- Original Message -
From: Sean seapasu...@uchicago.edu
To: ceph-users@lists.ceph.com
Sent: Tuesday, April 28, 2015 2:52:35 PM
Subject: [ceph-users] Civet RadosGW S3 not storing complete obects;
civetweb logs stop after rotation

Hey yall!

I have a weird issue and I am not sure where to look so any help would
be appreciated. I have a large ceph giant cluster that has been stable
and healthy almost entirely since its inception. We have stored over
1.5PB into the cluster currently through RGW and everything seems to be
functioning great. We have downloaded smaller objects without issue but
last night we did a test on our largest file (almost 1 terabyte) and it
continuously times out at almost the exact same place. Investigating
further it looks like Civetweb/RGW is returning that the uploads
completed even though the objects are truncated. At least when we
download

Re: [ceph-users] Shadow Files

2015-05-04 Thread Yehuda Sadeh-Weinraub


I've been working on a new tool that would detect leaked rados objects. It will 
take some time for it to be merged into an official release, or even into the 
master branch, but if anyone likes to play with it, it is in the 
wip-rgw-orphans branch.

At the moment I recommend to not remove any object that the tool reports, but 
rather move it to a different pool for backup (using the rados tool cp command).

The tool works in a few stages:
(1) list all the rados objects in the specified pool, store in repository
(2) list all bucket instances in the system, store in repository
(3) iterate through bucket instances in repository, list (logical) objects, for 
each object store the expected rados objects that build it
(4) compare data from (1) and (3), each object that is in (1), but not in (3), 
stat, if older than $start_time - $stale_period, report it

There can be lot's of things that can go wrong with this, so we really need to 
be careful here.

The tool can be run by the following command:

$ radosgw-admin orphans find --pool=data pool --job-id=name  
[--num-shards=num shards] [--orphan-stale-secs=seconds]

The tool can be stopped, and restarted, and it will continue from the stage 
where it stopped. Note that some of the stages will restart from the beginning 
(of the stages), due to system limitation (specifically 1, 2).

In order to clean up a job's data:

$ radosgw-admin orphans finish --job-id=name

Note that the jobs run in the radosgw-admin process context, it does not 
schedule a job on the radosgw process.

Please let me know of any issue you find.

Thanks,
Yehuda

- Original Message -
 From: Ben Hines bhi...@gmail.com
 To: Ben b@benjackson.email
 Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com, ceph-users 
 ceph-us...@ceph.com
 Sent: Thursday, April 30, 2015 3:00:16 PM
 Subject: Re: [ceph-users] Shadow Files
 
 Going to hold off on our 94.1 update for this issue
 
 Hopefully this can make it into a 94.2 or a v95 git release.
 
 -Ben
 
 On Mon, Apr 27, 2015 at 2:32 PM, Ben  b@benjackson.email  wrote:
 
 
 How long are you thinking here?
 
 We added more storage to our cluster to overcome these issues, and we can't
 keep throwing storage at it until the issues are fixed.
 
 
 On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote:
 
 
 It will get to the ceph mainline eventually. We're still reviewing and
 testing the fix, and there's more work to be done on the cleanup tool.
 
 Yehuda
 
 - Original Message -
 
 
 From: Ben b@benjackson.email
 To: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 Cc: ceph-users  ceph-us...@ceph.com 
 Sent: Sunday, April 26, 2015 11:02:23 PM
 Subject: Re: [ceph-users] Shadow Files
 
 Are these fixes going to make it into the repository versions of ceph,
 or will we be required to compile and install manually?
 
 On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:
 
 
 Yeah, that's definitely something that we'd address soon.
 
 Yehuda
 
 - Original Message -
 
 
 From: Ben b@benjackson.email
 To: Ben Hines  bhi...@gmail.com , Yehuda Sadeh-Weinraub
  yeh...@redhat.com 
 Cc: ceph-users  ceph-us...@ceph.com 
 Sent: Friday, April 24, 2015 5:14:11 PM
 Subject: Re: [ceph-users] Shadow Files
 
 Definitely need something to help clear out these old shadow files.
 
 I'm sure our cluster has around 100TB of these shadow files.
 
 I've written a script to go through known objects to get prefixes of
 objects
 that should exist to compare to ones that shouldn't, but the time it
 takes
 to do this over millions and millions of objects is just too long.
 
 On 25/04/15 09:53, Ben Hines wrote:
 
 
 
 When these are fixed it would be great to get good steps for listing /
 cleaning up any orphaned objects. I have suspicions this is affecting
 us.
 
 thanks-
 
 -Ben
 
 On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub 
 yeh...@redhat.com 
 wrote:
 
 
 These ones:
 
 http://tracker.ceph.com/issues/10295
 http://tracker.ceph.com/issues/11447
 
 - Original Message -
 
 
 From: Ben Jackson b@benjackson.email
 To: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 Cc: ceph-users  ceph-us...@ceph.com 
 Sent: Friday, April 24, 2015 3:06:02 PM
 Subject: Re: [ceph-users] Shadow Files
 
 We were firefly, then we upgraded to giant, now we are on hammer.
 
 What issues?
 
 On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 wrote:
 
 
 What version are you running? There are two different issues that we
 were
 fixing this week, and we should have that upstream pretty soon.
 
 Yehuda
 
 - Original Message -
 
 
 From: Ben b@benjackson.email
 To: ceph-users  ceph-us...@ceph.com 
 Cc: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 Sent: Thursday, April 23, 2015 7:42:06 PM
 Subject: [ceph-users] Shadow Files
 
 We are still experiencing a problem with out gateway not properly
 clearing out shadow files.
 
 I have done numerous tests where I have:
 -Uploaded a file of 1.5GB in size using s3browser application
 -Done an object stat on the file to get its prefix
 -Done

Re: [ceph-users] RGW - Can't download complete object

2015-05-07 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Sean seapasu...@uchicago.edu
 To: ceph-users@lists.ceph.com
 Sent: Thursday, May 7, 2015 3:35:14 PM
 Subject: [ceph-users] RGW - Can't download complete object
 
 I have another thread goign on about truncation of objects and I believe
 this is a separate but equally bad issue in civetweb/radosgw. My cluster
 is completely healthy
 
 I have one (possibly more) objects stored in ceph rados gateway that
 will return a different size every time I Try to download it::
 
 http://pastebin.com/hK1iqXZH --- ceph -s
 http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object

The two interesting things that I see here is:
 - the multipart upload size for each part is on the big side (is it 1GB for 
each part?)
 - it seems that there are a lot of parts that suffered from retries, could be 
a source for the 512k issue

 http://pastebin.com/5TnvgMrX --- python download code
 
 The weird part is every time I download the file it is of a different
 size. I am grabbing the individual objects of the 14g file and will
 update this email once I have them all statted out. Currently I am
 getting, on average, 1.5G to 2Gb files when the total object should be
 14G in size.
 
 lacadmin@kh10-9:~$ python corruptpull.py
 the download failed. The filesize = 2125988202. The actual size is
 14577056082. Attempts = 1
 the download failed. The filesize = 2071462250. The actual size is
 14577056082. Attempts = 2
 the download failed. The filesize = 2016936298. The actual size is
 14577056082. Attempts = 3
 the download failed. The filesize = 1643643242. The actual size is
 14577056082. Attempts = 4
 the download failed. The filesize = 1597505898. The actual size is
 14577056082. Attempts = 5
 the download failed. The filesize = 2075656554. The actual size is
 14577056082. Attempts = 6
 the download failed. The filesize = 650117482. The actual size is
 14577056082. Attempts = 7
 the download failed. The filesize = 1987576170. The actual size is
 14577056082. Attempts = 8
 the download failed. The filesize = 2109210986. The actual size is
 14577056082. Attempts = 9
 the download failed. The filesize = 2142765418. The actual size is
 14577056082. Attempts = 10
 the download failed. The filesize = 2134376810. The actual size is
 14577056082. Attempts = 11
 the download failed. The filesize = 2146959722. The actual size is
 14577056082. Attempts = 12
 the download failed. The filesize = 2142765418. The actual size is
 14577056082. Attempts = 13
 the download failed. The filesize = 1467482474. The actual size is
 14577056082. Attempts = 14
 the download failed. The filesize = 2046296426. The actual size is
 14577056082. Attempts = 15
 the download failed. The filesize = 2021130602. The actual size is
 14577056082. Attempts = 16
 the download failed. The filesize = 177366. The actual size is
 14577056082. Attempts = 17
 the download failed. The filesize = 2146959722. The actual size is
 14577056082. Attempts = 18
 the download failed. The filesize = 2016936298. The actual size is
 14577056082. Attempts = 19
 the download failed. The filesize = 1983381866. The actual size is
 14577056082. Attempts = 20
 the download failed. The filesize = 2134376810. The actual size is
 14577056082. Attempts = 21
 
 Notice it is always different. Once the rados -p .rgw.buckets ls | grep
 finishes I will return the listing of objects as well but this is quite
 odd and I think this is a separate issue.
 
 Has anyone seen this before? Why wouldn't radosgw return an error and
 why am I getting different file sizes?

Usually that means that there was some error in the middle of the download, 
maybe client to radosgw communication issue. What does the radosgw show when 
this happens?

 
 I would post the log from radosgw but I don't see any err|wrn|fatal
 mentions in the log and the client completes without issue every time.
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-05-05 Thread Yehuda Sadeh-Weinraub

Yes, so it seems. The librados::nobjects_begin() call expects at least a Hammer 
(0.94) backend. Probably need to add a try/catch there to catch this issue, and 
maybe see if using a different api would be better compatible with older 
backends.

Yehuda

- Original Message -
 From: Anthony Alba ascanio.al...@gmail.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com
 Sent: Tuesday, May 5, 2015 10:14:38 AM
 Subject: Re: [ceph-users] Shadow Files
 
 Unfortunately it immediately aborted (running against a 0.80.9 Ceph).
 Does Ceph also have to be a 0.94 level?
 
 last error was
-3 2015-05-06 01:11:11.710947 7f311dd15880  0 run(): building
 index of all objects in pool
 -2 2015-05-06 01:11:11.710995 7f311dd15880  1 --
 10.200.3.92:0/1001510 -- 10.200.3.32:6800/1870 --
 osd_op(client.4065115.0:27 ^A/ [pgnls start_epoch 0] 11.0 ack+read
 +known_if_redirected e952) v5 -- ?+0 0x39a4e80 con 0x39a4aa0
 -1 2015-05-06 01:11:11.712125 7f31026f4700  1 --
 10.200.3.92:0/1001510 == osd.1 10.200.3.32:6800/1870 1 
 osd_op_reply(27  [pgnls start_epoch 0] v934'6252 uv6252
 ondisk = -22 ((22) Invalid argument)) v6  167+0+0 (3260127617 0 0)
 0x7f30c4000a90 con 0x39a4aa0
  0 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal
 (Aborted) **
  in thread 7f311dd15880
 
 
 
 
 
 2015-05-06 01:11:11.710947 7f311dd15880  0 run(): building index of
 all objects in pool
 terminate called after throwing an instance of 'std::runtime_error'
   what():  rados returned (22) Invalid argument
 *** Caught signal (Aborted) **
  in thread 7f311dd15880
  ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6)
  1: radosgw-admin() [0x61e604]
  2: (()+0xf130) [0x7f311a59f130]
  3: (gsignal()+0x37) [0x7f31195d85d7]
  4: (abort()+0x148) [0x7f31195d9cc8]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3119edc9b5]
  6: (()+0x5e926) [0x7f3119eda926]
  7: (()+0x5e953) [0x7f3119eda953]
  8: (()+0x5eb73) [0x7f3119edab73]
  9: (()+0x4d116) [0x7f311b606116]
  10: (librados::IoCtx::nobjects_begin()+0x2e) [0x7f311b60c60e]
  11: (RGWOrphanSearch::build_all_oids_index()+0x62) [0x516a02]
  12: (RGWOrphanSearch::run()+0x1e3) [0x51ad23]
  13: (main()+0xa430) [0x4fbc30]
  14: (__libc_start_main()+0xf5) [0x7f31195c4af5]
  15: radosgw-admin() [0x5028d9]
 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal (Aborted) **
  in thread 7f311dd15880
 
  ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6)
  1: radosgw-admin() [0x61e604]
  2: (()+0xf130) [0x7f311a59f130]
 
 
 
 
 
 
 On Tue, May 5, 2015 at 10:41 PM, Yehuda Sadeh-Weinraub
 yeh...@redhat.com wrote:
  Can you try creating the .log pool?
 
  Yehda
 
  - Original Message -
  From: Anthony Alba ascanio.al...@gmail.com
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com
  Sent: Tuesday, May 5, 2015 3:37:15 AM
  Subject: Re: [ceph-users] Shadow Files
 
  ...sorry clicked send to quickly
 
  /opt/ceph/bin/radosgw-admin orphans find --pool=.rgw.buckets --job-id=abcd
  ERROR: failed to open log pool ret=-2
  job not found
 
  On Tue, May 5, 2015 at 6:36 PM, Anthony Alba ascanio.al...@gmail.com
  wrote:
   Hi Yehuda,
  
   First run:
  
   /opt/ceph/bin/radosgw-admin  --pool=.rgw.buckets --job-id=testing
   ERROR: failed to open log pool ret=-2
   job not found
  
   Do I have to precreate some pool?
  
  
   On Tue, May 5, 2015 at 8:17 AM, Yehuda Sadeh-Weinraub
   yeh...@redhat.com
   wrote:
  
   I've been working on a new tool that would detect leaked rados objects.
   It
   will take some time for it to be merged into an official release, or
   even
   into the master branch, but if anyone likes to play with it, it is in
   the
   wip-rgw-orphans branch.
  
   At the moment I recommend to not remove any object that the tool
   reports,
   but rather move it to a different pool for backup (using the rados tool
   cp command).
  
   The tool works in a few stages:
   (1) list all the rados objects in the specified pool, store in
   repository
   (2) list all bucket instances in the system, store in repository
   (3) iterate through bucket instances in repository, list (logical)
   objects, for each object store the expected rados objects that build it
   (4) compare data from (1) and (3), each object that is in (1), but not
   in
   (3), stat, if older than $start_time - $stale_period, report it
  
   There can be lot's of things that can go wrong with this, so we really
   need to be careful here.
  
   The tool can be run by the following command:
  
   $ radosgw-admin orphans find --pool=data pool --job-id=name
   [--num-shards=num shards] [--orphan-stale-secs=seconds]
  
   The tool can be stopped, and restarted, and it will continue from the
   stage where it stopped. Note that some of the stages will restart from
   the beginning (of the stages), due to system limitation (specifically
   1,
   2

Re: [ceph-users] Shadow Files

2015-05-11 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: Daniel Hoffman daniel.hoff...@13andrew.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com
 Sent: Sunday, May 10, 2015 5:03:22 PM
 Subject: Re: [ceph-users] Shadow Files

 Any updates on when this is going to be released?

 Daniel

 On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 wrote:

  Yes, so it seems. The librados::nobjects_begin() call expects at least a
  Hammer (0.94) backend. Probably need to add a try/catch there to catch this
  issue, and maybe see if using a different api would be better compatible
  with older backends.

  Yehuda

I cleaned up the commits a bit, but it needs to be reviewed, and it'll be nice 
to get some more testing to it before it goes on an official release. There's 
still the issue of running it against a firefly backend. I looked at 
backporting it to firefly, but it's not going to be a trivial work, so I think 
the better time usage would be to get the hammer one to work against a firefly 
backend. There are some librados api quirks that we need to flush out first. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-05-11 Thread Yehuda Sadeh-Weinraub

It's the wip-rgw-orphans branch.

- Original Message -
 From: Daniel Hoffman daniel.hoff...@13andrew.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Ben b@benjackson.email, David Zafman dzaf...@redhat.com, 
 ceph-users ceph-us...@ceph.com
 Sent: Monday, May 11, 2015 4:30:11 PM
 Subject: Re: [ceph-users] Shadow Files

 Thanks.

 Can you please let me know the suitable/best git version/tree to be pulling
 to compile and use this feature/patch?

 Thanks

 On Tue, May 12, 2015 at 4:38 AM, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 wrote:

 From: Daniel Hoffman  daniel.hoff...@13andrew.com 
 To: Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 Cc: Ben b@benjackson.email, ceph-users  ceph-us...@ceph.com 
 Sent: Sunday, May 10, 2015 5:03:22 PM
 Subject: Re: [ceph-users] Shadow Files

 Any updates on when this is going to be released?

 Daniel

 On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub  yeh...@redhat.com 
 wrote:

 Yes, so it seems. The librados::nobjects_begin() call expects at least a
 Hammer (0.94) backend. Probably need to add a try/catch there to catch this
 issue, and maybe see if using a different api would be better compatible
 with older backends.

 Yehuda
 I cleaned up the commits a bit, but it needs to be reviewed, and it'll be
 nice to get some more testing to it before it goes on an official release.
 There's still the issue of running it against a firefly backend. I looked at
 backporting it to firefly, but it's not going to be a trivial work, so I
 think the better time usage would be to get the hammer one to work against a
 firefly backend. There are some librados api quirks that we need to flush
 out first.

 Yehuda

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-06 Thread Yehuda Sadeh-Weinraub

- Original Message -
From: Sean seapasu...@uchicago.edu
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, May 5, 2015 12:14:19 PM
Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects;
civetweb logs stop after rotation

Hello Yehuda and the rest of the mailing list.

My main question currently is why are the bucket index and the object
manifest ever different? Based on how we are uploading data I do not think
that the rados gateway should ever know the full file size without having
all of the objects within ceph at one point in time. So after the multipart
is marked as completed Rados gateway should cat through all of the objects
and make a complete part, correct?

That's what *should* happen, but obviously there's some bug there.

Secondly,

I think I am not understanding the process to grab all of the parts
correctly. To continue to use my example file
86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
in bucket tcga_cghub_protected. I would be using the following to grab the
prefix:

prefix=$(radosgw-admin object stat --bucket=tcga_cghub_protected
--object=86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
| grep -iE 'prefix' | awk -F\ '{print $4}')

Which should take everything between quotes for the prefix key and give me
the value.

In this case::

prefix:
86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S,

lacadmin@kh10-9:~$ echo ${prefix}

86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S

From here I list all of the objects in the .rgw.buckets pool and grep for
that said prefix which yields 1335 objects. From here if I cat all of these
objects together I only end up with a 5468160 byte file which is 2G short of
what the object manifest says it should be. If I grab the file and tail the
Rados gateway log I end up with 1849 objects and when I sum them all up I

How are these objects named?

end up with 7744771642 which is the same size that the manifest reports. I
understand that this does nothing other than verify the manifests accuracy
but I still find it interesting. The missing chunks may still exist in ceph
outside of the object manifest and tagged with the same prefix, correct? Or
am I misunderstanding something?

Either it's missing a chunk, or one of the objects is truncated. Can you stat
all the parts? I expect most of the objects to have two different sizes (e.g.,
4MB, 1MB), but at it is likely that the last part is smaller, and maybe another
object that is missing 512k.

We have over 40384 files in the tcga_cghub_protected bucket and only 66 of
these files are suffering from this truncation issue. What I need to know
is: is this happening on the gateway side or on the client side? Next I need
to know what possible actions can occur where the bucket index and the
object manifest would be mismatched like this as 40318 out of 40384 are
working without issue.

The truncated files are of all different sizes (5 megabytes - 980 gigabytes)
and the truncation seems to be all over. By all over I mean some files are
missing the first few bytes that should read bam and some are missing
parts in the middle.

Can you give an example of an object manifest for a broken object, and all the
rados objects that build it (e.g., the output of 'rados stat' on these
objects). A smaller object might be easier.

So our upload code is using mmap to stream chunks of the file to the Rados
gateway via a multipart upload but no where on the client side do we have a
direct reference to the files we are using nor do we specify the size in
anyway. So where is the gateway getting the correct complete filesize from
and how is the bucket index showing the intended file size?

This implies that, at some point in time, ceph was able to see all of the
parts of the file and calculate the correct total size. This to me seems
like a rados gateway bug regardless of how the file is being uploaded. I
think that the RGW should be able to be fuzzed and still store the data
correctly.

Why is the bucket list not matching the bucket index and how can I verify
that the data is not being corrupted by the RGW or worse, after it is
committed to ceph ?

That's what we're trying to find out.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] civetweb lockups

2015-05-11 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: Daniel Hoffman daniel.hoff...@13andrew.com
 To: ceph-users ceph-us...@ceph.com
 Sent: Sunday, May 10, 2015 10:54:21 PM
 Subject: [ceph-users] civetweb lockups

 Hi All.

 We have a wierd issue where civetweb just locks up, it just fails to respond
 to HTTP and a restart resolves the problem. This happens anywhere from every
 60 seconds to every 4 hours with no reason behind it.

 We have run the gateway in full debug mode and there is nothing there that
 seems to be an issue.

 We run 2 gateways on 6core machines, there is no load, cpu or memory wise,
 the machines seem fine. They are load balanced behind HA proxy. We run 12
 data nodes at the moment with ~170 disks.

 We see around the 40-60MB/s into the array. Is this just too much for
 civetweb to handle? Should we look at virtual machines on the hardware/mode
 nodes?

 [client.radosgw.ceph-obj02]
 host = ceph-obj02
 keyring = /etc/ceph/keyring.radosgw.ceph-obj02
 rgw socket path = /tmp/radosgw.sock
 log file = /var/log/ceph/radosgw.log
 rgw data = /var/lib/ceph/radosgw/ceph-obj02
 rgw thread pool size = 1024
 rgw print continue = False
 debug rgw = 0
 debug ms = 0
 rgw enable ops log = False
 log to stderr = False
 rgw enable usage log = False

 Advice appreciated.

Not sure what would be the issue. I'd look at the number of threads, maybe try 
reducing it, see if it makes any difference? Also, try to see how many open fds 
are there when it hangs. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-12 Thread Yehuda Sadeh-Weinraub

Hi,

Thank you for a very thorough investigation. See my comments below:

- Original Message -
 From: Mark Murphy murphyma...@uchicago.edu
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: Sean Sullivan seapasu...@uchicago.edu, ceph-users@lists.ceph.com
 Sent: Tuesday, May 12, 2015 10:50:49 AM
 Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
 civetweb logs stop after rotation

 Hey Yehuda,

 I work with Sean on the dev side. We thought we should put together a short
 report on what we’ve been seeing in the hopes that the behavior might make
 some sense to you.

 We had originally noticed these issues a while ago with our first iteration
 of this particular Ceph deployment. The issues we had seen were
 characterized by two different behaviors:

   • Some objects would appear truncated, returning different sizes for 
 each
   request. Repeated attempts would eventually result in a successful
   retrieval if the second behavior doesn’t apply.

This really sound like some kind of networking issue, maybe a load balancer 
that is on the way that clobbers things?

   • Some objects would always appear truncated, missing an integer 
 multiple of
   512KB.

 This is where the report that we are encountering ‘truncation’ came from,
 which is slightly misleading. We recently verified that we are indeed
 encountering the first behavior, for which I believe Sean has supplied or
 will be supplying Ceph logs showcasing the server-side errors, and is true
 truncation. However, the second behavior is not really truncation, but
 missing 512KB chunks, as Sean has brought up.

 We’ve had some luck with identifying some of the patterns that are seemingly
 related to this issue. Without going into too great of detail, we’ve found
 the following appear to hold true for all objects affected by the second
 behavior:

   • The amount of data missing is always in integer multiples of 512KB.
   • The expected file size is always found via the bucket index.
   • Ceph objects do not appear to be missing chunks or have holes in them.
   • The missing 512KB chunks are always at the beginning of multipart 
 segments
   (1GB in our case).

This matches some of my original suspicions. Here's some basic background that 
might help clarify things:

This looks like some kind of rgw bug. A radosgw object is usually composed of 
two different parts: the object head, and the object tail. The head is usually 
composed of the first 512k of data of the object (and never more than that), 
and the tail has the rest of the object's data. However, the head data part is 
optional, and it can be zero. For example, in the case of multipart upload, 
after combining the parts, the head will not have any data, and the tail will 
be compiled out of the different parts data.
However, when dealing with multipart parts, the parts do not really have a head 
(due to their immutability), so it is expected that the part object sizes to be 
4MB. So it seems that for some reason these specific parts were treated as if 
they had a head, although they shouldn't have. Now, that brings me to the 
issue, where I noticed that some of the parts were retried. When this happens, 
the part name is different than the default part name, so there's a note in the 
manifest, and a special handling that start at specific offsets. It might be 
that this is related, and the code that handles the retries generate bad object 
parts.

   • For large files missing multiple chunks, the segments affected appear 
 to
   be clustered and contiguous.

That would point at a cluster of retries, maybe due to networking issues around 
the time these were created.

 The first pattern was identified when we noticed that the bucket index and
 the object manifest differed in reported size. This is useful as an quick
 method of identifying affected objects. We’ve used this to avoid having to
 pull down and check each object individually. In total, we have 108 affected
 objects, which translates to approximately 0.25% of our S3 objects.

 We noticed that the bucket index always reports the object size that would be
 expected had the upload gone correctly. Since we only ever report the
 segment sizes to the gateway, this would suggest that the segment sizes were
 reported accurately and aggregated correctly server side.

 Sean identified the Ceph objects that compose one of our affected S3 objects.
 We thought we might see the first Ceph object missing some data, but found
 it to be a full 4MB. Retrieving the first Ceph object and comparing it to
 the bytes in the corresponding file, it appears that the Ceph object matches
 the 4MB of the file after the first 512KB. We took this as evidence that the
 data was never getting to Ceph in the first place. However, in our testing,
 we were unable to get the gateway to accept segments with less data than
 reported.

 Dissecting some of the affected objects, we were

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

That's another interesting issue. Note that for part 12_80 the manifest 
specifies (I assume, by the messenger log) this part:

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
(note the '2/...')

The part that the manifest specifies does not exist, which makes me think that 
there is some weird upload sequence, something like:

 - client uploads part, upload finishes but client does not get ack for it
 - client retries (second upload)
 - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a look at 
the code. Could such a sequence happen with the client that you're using to 
upload?

Yehuda

- Original Message -
 From: Sean Sullivan seapasu...@uchicago.edu
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, May 13, 2015 2:07:22 PM
 Subject: Re: [ceph-users] RGW - Can't download complete object
 
 Sorry for the delay. It took me a while to figure out how to do a range
 request and append the data to a single file. The good news is that the end
 file seems to be 14G in size which matches the files manifest size. The bad
 news is that the file is completely corrupt and the radosgw log has errors.
 I am using the following code to perform the download::
 
 https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
 
 Here is a clip of the log file::
 --
 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 ==
 osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
 [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
 completion ofs=12934184960 len=858004
 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 ==
 osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
 completion ofs=12145655808 len=4194304
 
 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
 trying to read object: -2
 
 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 ==
 osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
 completion ofs=10786701312 len=3671316
 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 ==
 osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io
 completion ofs=12917407744 len=4194304
 
 I couldn't really find any good documentation on how fragments/files are
 layed out on the object file system so I am not sure on where the file will
 be. How could the 4mb object have issues but the cluster be completely
 health okay? I did do the rados stat of each object inside ceph and they all
 appear to be there::
 
 http://paste.ubuntu.com/8561/
 
 The sum of all of the objects :: 14584887282
 The stat of the object inside ceph:: 14577056082
 
 So for some reason I have more data in objects than the key manifest. We
 easiliy identified this object via the same method as the other thread I
 have::
 
 for key in keys:
: if ( key.name ==
'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam'
):
: implicit = key.size
: explicit = conn.get_bucket(bucket).get_key(key.name).size
: absolute = abs(implicit - explicit)
: print key.name
: print implicit
: print explicit
:
 
 b235040a-46b6-42b3-b134-962b1f8813d5

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

Ok, I dug a bit more, and it seems to me that the problem is with the manifest 
that was created. I was able to reproduce a similar issue (opened ceph bug 
#11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes as 
soon as we can, after we test for any regressions.

Thanks,
Yehuda

- Original Message -
 From: Yehuda Sadeh-Weinraub yeh...@redhat.com
 To: Sean Sullivan seapasu...@uchicago.edu
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, May 13, 2015 2:33:07 PM
 Subject: Re: [ceph-users] RGW - Can't download complete object
 
 That's another interesting issue. Note that for part 12_80 the manifest
 specifies (I assume, by the messenger log) this part:
 
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
 
 whereas it seems that you do have the original part:
 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
 (note the '2/...')
 
 The part that the manifest specifies does not exist, which makes me think
 that there is some weird upload sequence, something like:
 
  - client uploads part, upload finishes but client does not get ack for it
  - client retries (second upload)
  - client gets ack for the first upload and gives up on the second one
 
 But I'm not sure if it would explain the manifest, I'll need to take a look
 at the code. Could such a sequence happen with the client that you're using
 to upload?
 
 Yehuda
 
 - Original Message -
  From: Sean Sullivan seapasu...@uchicago.edu
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-users@lists.ceph.com
  Sent: Wednesday, May 13, 2015 2:07:22 PM
  Subject: Re: [ceph-users] RGW - Can't download complete object
  
  Sorry for the delay. It took me a while to figure out how to do a range
  request and append the data to a single file. The good news is that the end
  file seems to be 14G in size which matches the files manifest size. The bad
  news is that the file is completely corrupt and the radosgw log has errors.
  I am using the following code to perform the download::
  
  https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
  
  Here is a clip of the log file::
  --
  2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 ==
  osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
  [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
  2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
  2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
  completion ofs=12934184960 len=858004
  2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 ==
  osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
  [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
  302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
  2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
  completion ofs=12145655808 len=4194304
  
  2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
  trying to read object: -2
  
  2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 ==
  osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
  [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
  0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
  2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
  completion ofs=10786701312 len=3671316
  2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 ==
  osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
  [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
  0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
  2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io
  completion ofs=12917407744 len=4194304
  
  I couldn't really find any good documentation on how fragments/files are
  layed out on the object file system so I am not sure on where the file will
  be. How could the 4mb object have issues but the cluster be completely
  health okay? I did do the rados stat of each object inside ceph and they
  all
  appear to be there::
  
  http://paste.ubuntu.com/8561/
  
  The sum of all of the objects

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-12 Thread Yehuda Sadeh-Weinraub

I opened issue #11604, and have a fix for the issue. I updated our test suite 
to cover the specific issue that you were hitting. We'll backport the fix to 
both hammer and firefly soon.

Thanks!
Yehuda

- Original Message -
 From: Yehuda Sadeh-Weinraub yeh...@redhat.com
 To: Mark Murphy murphyma...@uchicago.edu
 Cc: ceph-users@lists.ceph.com, Sean Sullivan seapasu...@uchicago.edu
 Sent: Tuesday, May 12, 2015 12:59:48 PM
 Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
 civetweb logs stop after rotation

 Hi,

 Thank you for a very thorough investigation. See my comments below:

 - Original Message -
  From: Mark Murphy murphyma...@uchicago.edu
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: Sean Sullivan seapasu...@uchicago.edu, ceph-users@lists.ceph.com
  Sent: Tuesday, May 12, 2015 10:50:49 AM
  Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects;
  civetweb logs stop after rotation

  Hey Yehuda,

  I work with Sean on the dev side. We thought we should put together a short
  report on what we’ve been seeing in the hopes that the behavior might make
  some sense to you.

  We had originally noticed these issues a while ago with our first iteration
  of this particular Ceph deployment. The issues we had seen were
  characterized by two different behaviors:

  • Some objects would appear truncated, returning different sizes for 
  each
  request. Repeated attempts would eventually result in a successful
  retrieval if the second behavior doesn’t apply.

 This really sound like some kind of networking issue, maybe a load balancer
 that is on the way that clobbers things?

  • Some objects would always appear truncated, missing an integer 
  multiple
  of
  512KB.

  This is where the report that we are encountering ‘truncation’ came from,
  which is slightly misleading. We recently verified that we are indeed
  encountering the first behavior, for which I believe Sean has supplied or
  will be supplying Ceph logs showcasing the server-side errors, and is true
  truncation. However, the second behavior is not really truncation, but
  missing 512KB chunks, as Sean has brought up.

  We’ve had some luck with identifying some of the patterns that are
  seemingly
  related to this issue. Without going into too great of detail, we’ve found
  the following appear to hold true for all objects affected by the second
  behavior:

  • The amount of data missing is always in integer multiples of 512KB.
  • The expected file size is always found via the bucket index.
  • Ceph objects do not appear to be missing chunks or have holes in them.
  • The missing 512KB chunks are always at the beginning of multipart
  segments
  (1GB in our case).

 This matches some of my original suspicions. Here's some basic background
 that might help clarify things:

 This looks like some kind of rgw bug. A radosgw object is usually composed of
 two different parts: the object head, and the object tail. The head is
 usually composed of the first 512k of data of the object (and never more
 than that), and the tail has the rest of the object's data. However, the
 head data part is optional, and it can be zero. For example, in the case of
 multipart upload, after combining the parts, the head will not have any
 data, and the tail will be compiled out of the different parts data.
 However, when dealing with multipart parts, the parts do not really have a
 head (due to their immutability), so it is expected that the part object
 sizes to be 4MB. So it seems that for some reason these specific parts were
 treated as if they had a head, although they shouldn't have. Now, that
 brings me to the issue, where I noticed that some of the parts were retried.
 When this happens, the part name is different than the default part name, so
 there's a note in the manifest, and a special handling that start at
 specific offsets. It might be that this is related, and the code that
 handles the retries generate bad object parts.

  • For large files missing multiple chunks, the segments affected appear 
  to
  be clustered and contiguous.

 That would point at a cluster of retries, maybe due to networking issues
 around the time these were created.

  The first pattern was identified when we noticed that the bucket index and
  the object manifest differed in reported size. This is useful as an quick
  method of identifying affected objects. We’ve used this to avoid having to
  pull down and check each object individually. In total, we have 108
  affected
  objects, which translates to approximately 0.25% of our S3 objects.

  We noticed that the bucket index always reports the object size that would
  be
  expected had the upload gone correctly. Since we only ever report the
  segment sizes to the gateway, this would suggest that the segment sizes
  were
  reported accurately and aggregated correctly server

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

The code is in wip-11620, abd it's currently on top of the next branch. We'll 
get it through the tests, then get it into hammer and firefly. I wouldn't 
recommend installing it in production without proper testing first.

Yehuda

- Original Message -
 From: Sean Sullivan seapasu...@uchicago.edu
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, May 13, 2015 7:22:10 PM
 Subject: Re: [ceph-users] RGW - Can't download complete object
 
 Thank you so much Yahuda! I look forward to testing these. Is there a way
 for me to pull this code in? Is it in master?
 
 
 On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
 
  Ok, I dug a bit more, and it seems to me that the problem is with the
  manifest that was created. I was able to reproduce a similar issue (opened
  ceph bug #11622), for which I also have a fix.
 
  I created new tests to cover this issue, and we'll get those recent fixes
  as soon as we can, after we test for any regressions.
 
  Thanks,
  Yehuda
 
  - Original Message -
   From: Yehuda Sadeh-Weinraub yeh...@redhat.com
   To: Sean Sullivan seapasu...@uchicago.edu
   Cc: ceph-users@lists.ceph.com
   Sent: Wednesday, May 13, 2015 2:33:07 PM
   Subject: Re: [ceph-users] RGW - Can't download complete object
  
   That's another interesting issue. Note that for part 12_80 the manifest
   specifies (I assume, by the messenger log) this part:
  
   
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
   (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
  
   whereas it seems that you do have the original part:
   
  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
   (note the '2/...')
  
   The part that the manifest specifies does not exist, which makes me think
   that there is some weird upload sequence, something like:
  
- client uploads part, upload finishes but client does not get ack for
it
- client retries (second upload)
- client gets ack for the first upload and gives up on the second one
  
   But I'm not sure if it would explain the manifest, I'll need to take a
   look
   at the code. Could such a sequence happen with the client that you're
   using
   to upload?
  
   Yehuda
  
   - Original Message -
From: Sean Sullivan seapasu...@uchicago.edu
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:07:22 PM
Subject: Re: [ceph-users] RGW - Can't download complete object
   
Sorry for the delay. It took me a while to figure out how to do a range
request and append the data to a single file. The good news is that the
end
file seems to be 14G in size which matches the files manifest size. The
bad
news is that the file is completely corrupt and the radosgw log has
errors.
I am using the following code to perform the download::
   

  https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
   
Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287

  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
[read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004
(1180387808 0
2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142

  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
[read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6

302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12145655808 len=4194304
   
2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error
when
trying to read object: -2
   
2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144

  default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
[read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316
(1695485150
0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=10786701312 len=3671316
2015-05

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Francois Lafont flafdiv...@free.fr
 To: ceph-users@lists.ceph.com
 Sent: Sunday, April 12, 2015 8:47:40 PM
 Subject: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to 
 create bucket
 
 Hi,
 
 On a testing cluster, I have a radosgw on Firefly and the other
 nodes, OSDs and monitors, are on Hammer. The nodes are installed
 with puppet in personal VM, so I can reproduce the problem.
 Generally, I use s3cmd to check the radosgw. While radosgw is on
 Firefly, I can create bucket, no problem. Then, I upgrade the
 radosgw (it's a Ubuntu Trusty):
 
 sed -i 's/firefly/hammer/g' /etc/apt/sources.list.d/ceph.list
 apt-get update  apt-get dist-upgrade -y
 service stop apache2
 stop radosgw-all
 start radosgw-all
 service apache2 start
 
 After that, impossible to create a bucket with s3cmd:
 
 --
 ~# s3cmd -d mb s3://bucket-2
 DEBUG: ConfigParser: Reading file '/root/.s3cfg'
 DEBUG: ConfigParser: bucket_location-US
 DEBUG: ConfigParser: cloudfront_host-cloudfront.amazonaws.com
 DEBUG: ConfigParser: default_mime_type-binary/octet-stream
 DEBUG: ConfigParser: delete_removed-False
 DEBUG: ConfigParser: dry_run-False
 DEBUG: ConfigParser: enable_multipart-True
 DEBUG: ConfigParser: encoding-UTF-8
 DEBUG: ConfigParser: encrypt-False
 DEBUG: ConfigParser: follow_symlinks-False
 DEBUG: ConfigParser: force-False
 DEBUG: ConfigParser: get_continue-False
 DEBUG: ConfigParser: gpg_command-/usr/bin/gpg
 DEBUG: ConfigParser: gpg_decrypt-%(gpg_command)s -d --verbose --no-use-agent
 --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
 %(input_file)s
 DEBUG: ConfigParser: gpg_encrypt-%(gpg_command)s -c --verbose --no-use-agent
 --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
 %(input_file)s
 DEBUG: ConfigParser: gpg_passphrase-...-3_chars...
 DEBUG: ConfigParser: guess_mime_type-True
 DEBUG: ConfigParser: host_base-ostore.athome.priv
 DEBUG: ConfigParser: access_key-5R...17_chars...Y
 DEBUG: ConfigParser: secret_key-Ij...37_chars...I
 DEBUG: ConfigParser: host_bucket-%(bucket)s.ostore.athome.priv
 DEBUG: ConfigParser: human_readable_sizes-False
 DEBUG: ConfigParser: invalidate_on_cf-False
 DEBUG: ConfigParser: list_md5-False
 DEBUG: ConfigParser: log_target_prefix-
 DEBUG: ConfigParser: mime_type-
 DEBUG: ConfigParser: multipart_chunk_size_mb-15
 DEBUG: ConfigParser: preserve_attrs-True
 DEBUG: ConfigParser: progress_meter-True
 DEBUG: ConfigParser: proxy_host-
 DEBUG: ConfigParser: proxy_port-0
 DEBUG: ConfigParser: recursive-False
 DEBUG: ConfigParser: recv_chunk-4096
 DEBUG: ConfigParser: reduced_redundancy-False
 DEBUG: ConfigParser: send_chunk-4096
 DEBUG: ConfigParser: simpledb_host-sdb.amazonaws.com
 DEBUG: ConfigParser: skip_existing-False
 DEBUG: ConfigParser: socket_timeout-300
 DEBUG: ConfigParser: urlencoding_mode-normal
 DEBUG: ConfigParser: use_https-False
 DEBUG: ConfigParser: verbosity-WARNING
 DEBUG: ConfigParser:
 website_endpoint-http://%(bucket)s.s3-website-%(location)s.amazonaws.com/
 DEBUG: ConfigParser: website_error-
 DEBUG: ConfigParser: website_index-index.html
 DEBUG: Updating Config.Config encoding - UTF-8
 DEBUG: Updating Config.Config follow_symlinks - False
 DEBUG: Updating Config.Config verbosity - 10
 DEBUG: Unicodising 'mb' using UTF-8
 DEBUG: Unicodising 's3://bucket-2' using UTF-8
 DEBUG: Command: mb
 DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
 +\n/bucket-2/'
 DEBUG: CreateRequest: resource[uri]=/
 DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
 +\n/bucket-2/'
 DEBUG: Processing request, please wait...
 DEBUG: get_hostname(bucket-2): bucket-2.ostore.athome.priv
 DEBUG: format_uri(): /
 DEBUG: Sending request method_string='PUT', uri='/',
 headers={'content-length': '0', 'Authorization': 'AWS
 5RUS0Z3SBG6IK263PLFY:3V1MdXoCGFrJKrO2LSJaBpNMcK4=', 'x-amz-date': 'Mon, 13
 Apr 2015 03:32:23 +'}, body=(0 bytes)
 DEBUG: Response: {'status': 405, 'headers': {'date': 'Mon, 13 Apr 2015
 03:32:23 GMT', 'accept-ranges': 'bytes', 'content-type': 'application/xml',
 'content-length': '82', 'server': 'Apache/2.4.7 (Ubuntu)'}, 'reason':
 'Method Not Allowed', 'data': '?xml version=1.0
 encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error'}
 DEBUG: S3Error: 405 (Method Not Allowed)
 DEBUG: HttpHeader: date: Mon, 13 Apr 2015 03:32:23 GMT
 DEBUG: HttpHeader: accept-ranges: bytes
 DEBUG: HttpHeader: content-type: application/xml
 DEBUG: HttpHeader: content-length: 82
 DEBUG: HttpHeader: server: Apache/2.4.7 (Ubuntu)
 DEBUG: ErrorXML: Code: 'MethodNotAllowed'
 ERROR: S3 error: 405 (MethodNotAllowed):
 --
 
 But before the upgrade, the same command worked fine.
 I see nothing in the log. Here is my ceph.conf:
 
 --
 [global]
   auth client required  = cephx
   auth cluster required = cephx
   auth

Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-12 Thread Yehuda Sadeh-Weinraub

You're not missing anything. The script was only needed when we used the 
process manager of the fastcgi module, but it has been very long since we 
stopped using it.

Yehuda

- Original Message -
 From: Greg Meier greg.me...@nyriad.com
 To: ceph-users@lists.ceph.com
 Sent: Saturday, April 11, 2015 10:54:27 PM
 Subject: [ceph-users] Purpose of the s3gw.fcgi script?

 From my observation, the s3gw.fcgi script seems to be completely superfluous
 in the operation of Ceph. With or without the script, swift requests execute
 correctly, as long as a radosgw daemon is running.

 Is there something I'm missing here?

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-14 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Francois Lafont flafdiv...@free.fr
 To: ceph-users@lists.ceph.com
 Sent: Monday, April 13, 2015 7:11:49 PM
 Subject: Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to 
 create bucket

 Hi,

 Yehuda Sadeh-Weinraub wrote:

  The 405 in this case usually means that rgw failed to translate the http
  hostname header into
  a bucket name. Do you have 'rgw dns name' set correctly?

 Ah, I have found and indeed it concerned rgw dns name as also Karan
 thought. ;)
 But it's a little curious. Explanations:

 My s3cmd client use these hostnames (which are well resolved with the IP
 address
 of the radosgw host):

 bucket-name.ostore.athome.priv

 And in the configuration of my radosgw, I had:

 ---
 [client.radosgw.gw1]
   host= ceph-radosgw1
   rgw dns name= ostore
   ...
 ---

 ie just the *short* name of the radosgw's fqdn (its fqdn is
 ostore.athome.priv).
 And with Firefly, it worked well, I never had problem with this
 configuration!
 But with Hammer, it doesn't work anymore (I don't know why). Now, with
 Hammer,
 I just notice that I have to put the fqdn in rgw dns name not the short
 name:

 ---
 [client.radosgw.gw1]
   host= ceph-radosgw1
   rgw dns name= ostore.athome.priv
   ...
 ---

 And with this configuration, it works.

 Is it normal? In fact, maybe my configuration with the short name (instead of
 the
 fqdn) was not valid and I just was lucky it work well so far. Is it the good
 conclusion
 of the story?

 In fact, I think I never have well understood the meaning of the rgw dns
 name
 parameter. Can you confirm to me (or not) this:

 This parameter is *only* used when a S3 client accesses to a bucket with
 the method http://bucket-name.radosgw-address. If we don't set this
 parameter, such access will not work and a S3 client could access to a
 bucket only with the method http://radosgw-address/bucket-name

 Is it correct?

Yes.

Not sure why it *was* working in firefly. We did do some work around this in 
hammer, might have changed the behavior inadvertently.

Yehuda

 Thx Yehuda and thx to Karan (who has pointed the real problem in fact ;)).

 --
 François Lafont
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS Gateway quota management

2015-04-02 Thread Yehuda Sadeh-Weinraub

- Original Message -

 From: Sergey Arkhipov sarkhi...@asdco.ru
 To: ceph-users@lists.ceph.com
 Sent: Monday, March 30, 2015 2:55:33 AM
 Subject: [ceph-users] RADOS Gateway quota management

 Hi,

 Currently I am trying to figure out how to work with RADOS Gateway (ceph
 0.87) limits and I've managed to produce such strange behavior:

 { bucket: test1-8,
 pool: .rgw.buckets,
 index_pool: .rgw.buckets.index,
 id: default.17497.14,
 marker: default.17497.14,
 owner: cb254310-8b24-4622-93fb-640ca4a45998,
 ver: 21,
 master_ver: 0,
 mtime: 1427705802,
 max_marker: ,
 usage: { rgw.main: { size_kb: 16000,
 size_kb_actual: 16020,
 num_objects: 9}},
 bucket_quota: { enabled: true,
 max_size_kb: -1,
 max_objects: 3}}

 Steps to reproduce: create bucket, set quota like that (max_objects = 3 and
 enable) and successfully upload 9 files. User quota is also defined:

 bucket_quota: { enabled: true,
 max_size_kb: -1,
 max_objects: 3},
 user_quota: { enabled: true,
 max_size_kb: 1048576,
 max_objects: 5},

 Could someone please help me to understand how to limit users?

 --

The question is whether the user is able to continue writing objects at this 
point. The quota system is working asynchronously, so it's possible to get into 
edge cases where users exceeded it a bit (it looks a whole lot better with 
larger numbers). The question is whether it's working for you at all. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw authorization failed

2015-04-01 Thread Yehuda Sadeh-Weinraub



- Original Message -
 From: Neville neville.tay...@hotmail.co.uk
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Wednesday, April 1, 2015 11:45:09 AM
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
 
  On 31 Mar 2015, at 11:38, Neville neville.tay...@hotmail.co.uk wrote:
  
  
   
   Date: Mon, 30 Mar 2015 12:17:48 -0400
   From: yeh...@redhat.com
   To: neville.tay...@hotmail.co.uk
   CC: ceph-users@lists.ceph.com
   Subject: Re: [ceph-users] Radosgw authorization failed
   
   
   
   - Original Message -
From: Neville neville.tay...@hotmail.co.uk
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Monday, March 30, 2015 6:49:29 AM
Subject: Re: [ceph-users] Radosgw authorization failed


 Date: Wed, 25 Mar 2015 11:43:44 -0400
 From: yeh...@redhat.com
 To: neville.tay...@hotmail.co.uk
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Radosgw authorization failed
 
 
 
 - Original Message -
  From: Neville neville.tay...@hotmail.co.uk
  To: ceph-users@lists.ceph.com
  Sent: Wednesday, March 25, 2015 8:16:39 AM
  Subject: [ceph-users] Radosgw authorization failed
  
  Hi all,
  
  I'm testing backup product which supports Amazon S3 as target for
  Archive
  storage and I'm trying to setup a Ceph cluster configured with the
  S3 API
  to
  use as an internal target for backup archives instead of AWS.
  
  I've followed the online guide for setting up Radosgw and created a
  default
  region and zone based on the AWS naming convention US-East-1. I'm
  not
  sure
  if this is relevant but since I was having issues I thought it
  might need
  to
  be the same.
  
  I've tested the radosgw using boto.s3 and it seems to work ok i.e.
  I can
  create a bucket, create a folder, list buckets etc. The problem is
  when
  the
  backup software tries to create an object I get an authorization
  failure.
  It's using the same user/access/secret as I'm using from boto.s3
  and I'm
  sure the creds are right as it lets me create the initial
  connection, it
  just fails when trying to create an object (backup folder).
  
  Here's the extract from the radosgw log:
  
  -
  2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
  /:list_bucket:init op
  2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
  /:list_bucket:verifying op mask
  2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
  user.op_mask=7
  2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
  /:list_bucket:verifying op permissions
  2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
  uid=test
  mask=49
  2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
  group=1
  mask=49
  2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
  group=2
  mask=49
  2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
  2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions
  id=test
  owner=test perm=1
  2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
  (type)=1,
  policy perm=1, user_perm_mask=1, acl perm=1
  2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
  /:list_bucket:verifying op params
  2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
  /:list_bucket:executing
  2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
  test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
  start num 1001
  2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
  /:list_bucket:http status=200
  2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
  req=0x7f107000e2e0
  http_status=200 ==
  2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
  req=0x7f107000f0e0
  2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
  2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
  2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
  req=0x7f107000f6b0
  2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
  req=0x7f107000f0e0
  2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
  2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
  2015-03-25 15:07:26.517084 7f1058dd7700 20
  CONTENT_TYPE=application/octet-stream
  2015-03-25 15:07:26.517085

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: GuangYang yguan...@outlook.com
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
 Sent: Wednesday, June 24, 2015 2:12:23 PM
 Subject: RE: radosgw crash within libfcgi

  Date: Wed, 24 Jun 2015 17:04:05 -0400
  From: yeh...@redhat.com
  To: yguan...@outlook.com
  CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
  Subject: Re: radosgw crash within libfcgi

  - Original Message -
  From: GuangYang yguan...@outlook.com
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
  Sent: Wednesday, June 24, 2015 1:53:20 PM
  Subject: RE: radosgw crash within libfcgi

  Thanks Yehuda for the response.

  We already patched libfcgi to use poll instead of select to overcome the
  limitation.

  Thanks,
  Guang

  Date: Wed, 24 Jun 2015 14:40:25 -0400
  From: yeh...@redhat.com
  To: yguan...@outlook.com
  CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
  Subject: Re: radosgw crash within libfcgi

  - Original Message -
  From: GuangYang yguan...@outlook.com
  To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com,
  yeh...@redhat.com
  Sent: Wednesday, June 24, 2015 10:09:58 AM
  Subject: radosgw crash within libfcgi

  Hello Cephers,
  Recently we have several radosgw daemon crashes with the same following
  kernel log:

  Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
  7ffa069996f2 sp 7ff55c432710 error 6 in

  error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault.
  Is the pointer actually invalid?
 With (ip - {address_load_the_sharded_library}) to get the instruction which
 caused this crash, the objdump shows the crash happened at instruction 46f2
 (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I
 don't quite understand how/why it could crash there.

 4690 FCGX_Free:
     4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
     4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
     469a:       48 83 ec 18             sub    $0x18,%rsp
     469e:       48 85 ff                test   %rdi,%rdi
     46a1:       48 89 fb                mov    %rdi,%rbx
     46a4:       89 f5                   mov    %esi,%ebp
     46a6:       74 28                   je     46d0 FCGX_Free+0x40
     46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
     46ac:       e8 67 e3 ff ff          callq  2a18 FCGX_FreeStream@plt
     46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
     46b5:       e8 5e e3 ff ff          callq  2a18 FCGX_FreeStream@plt
     46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
     46be:       e8 55 e3 ff ff          callq  2a18 FCGX_FreeStream@plt
     46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
     46c7:       e8 d4 f4 ff ff          callq  3ba0 FCGX_PutS+0x40
     46cc:       85 ed                   test   %ebp,%ebp
     46ce:       75 10                   jne    46e0 FCGX_Free+0x50
     46d0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
     46d5:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
     46da:       48 83 c4 18             add    $0x18,%rsp
     46de:       c3                      retq
     46df:       90                      nop
     46e0:       31 f6                   xor    %esi,%esi
     46e2:       83 7b 4c 00             cmpl   $0x0,0x4c(%rbx)
     46e6:       8b 7b 30                mov    0x30(%rbx),%edi
     46e9:       40 0f 94 c6             sete   %sil
     46ed:       e8 86 e6 ff ff          callq  2d78 OS_IpcClose@plt
     46f2:       c7 43 30 ff ff ff ff    movl   $0x,0x30(%rbx)

info registers?

Not too familiar with the specific message, but it could be that OS_IpcClose() 
aborts (not highly unlikely) and it only dumps the return address of the 
current function (shouldn't be referenced as ip though).

What's rbx? Is the memory at %rbx + 0x30 valid?

Also, did you by any chance upgrade the binaries while the code was running? is 
the code running over nfs?

Yehuda

  Yehuda

  libfcgi.so.0.0.0[7ffa06995000+a000] in
  libfcgi.so.0.0.0[7ffa06995000+a000]

  Looking at the assembly, it seems crashing at this point -
  http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
  confused me. I tried to see if there is any other reference holding the
  FCGX_Request which release the handle without any luck.

  There are also other observations:
  1 Several radosgw daemon across different hosts crashed around the same
  time.
  2 Apache's error log has some fcgi error complaining ##idle timeout##
  during the time.

  Does anyone experience similar issue?

  In the past we've had issues with libfcgi that were related to the number
  of open fds on the process ( 1024). The issue was a buggy libfcgi that
  was using select

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub

Also, looking at the code, I see an extra call to FCGX_Finish_r():

diff --git a/src/rgw/rgw_main.cc b/src/rgw/rgw_main.cc
index 9a8aa5f..0aa7ded 100644
--- a/src/rgw/rgw_main.cc
+++ b/src/rgw/rgw_main.cc
@@ -669,8 +669,6 @@ void RGWFCGXProcess::handle_request(RGWRequest *r)
 dout(20)  process_request() returned   ret  dendl;
   }
 
-  FCGX_Finish_r(fcgx);
-
   delete req;
 }
 

Maybe this is a problem on the specific libfcgi version that you're using?

- Original Message -
 From: Yehuda Sadeh-Weinraub yeh...@redhat.com
 To: GuangYang yguan...@outlook.com
 Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
 Sent: Wednesday, June 24, 2015 2:21:04 PM
 Subject: Re: radosgw crash within libfcgi
 
 
 
 - Original Message -
  From: GuangYang yguan...@outlook.com
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
  Sent: Wednesday, June 24, 2015 2:12:23 PM
  Subject: RE: radosgw crash within libfcgi
  
  
   Date: Wed, 24 Jun 2015 17:04:05 -0400
   From: yeh...@redhat.com
   To: yguan...@outlook.com
   CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
   Subject: Re: radosgw crash within libfcgi
  
  
  
   - Original Message -
   From: GuangYang yguan...@outlook.com
   To: Yehuda Sadeh-Weinraub yeh...@redhat.com
   Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
   Sent: Wednesday, June 24, 2015 1:53:20 PM
   Subject: RE: radosgw crash within libfcgi
  
   Thanks Yehuda for the response.
  
   We already patched libfcgi to use poll instead of select to overcome the
   limitation.
  
   Thanks,
   Guang
  
  
   
   Date: Wed, 24 Jun 2015 14:40:25 -0400
   From: yeh...@redhat.com
   To: yguan...@outlook.com
   CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
   Subject: Re: radosgw crash within libfcgi
  
  
  
   - Original Message -
   From: GuangYang yguan...@outlook.com
   To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com,
   yeh...@redhat.com
   Sent: Wednesday, June 24, 2015 10:09:58 AM
   Subject: radosgw crash within libfcgi
  
   Hello Cephers,
   Recently we have several radosgw daemon crashes with the same
   following
   kernel log:
  
   Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
   7ffa069996f2 sp 7ff55c432710 error 6 in
  
   error 6 is sigabrt, right? With invalid pointer I'd expect to get
   segfault.
   Is the pointer actually invalid?
  With (ip - {address_load_the_sharded_library}) to get the instruction which
  caused this crash, the objdump shows the crash happened at instruction 46f2
  (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but
  I
  don't quite understand how/why it could crash there.
  
  4690 FCGX_Free:
      4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
      4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
      469a:       48 83 ec 18             sub    $0x18,%rsp
      469e:       48 85 ff                test   %rdi,%rdi
      46a1:       48 89 fb                mov    %rdi,%rbx
      46a4:       89 f5                   mov    %esi,%ebp
      46a6:       74 28                   je     46d0 FCGX_Free+0x40
      46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
      46ac:       e8 67 e3 ff ff          callq  2a18 FCGX_FreeStream@plt
      46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
      46b5:       e8 5e e3 ff ff          callq  2a18 FCGX_FreeStream@plt
      46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
      46be:       e8 55 e3 ff ff          callq  2a18 FCGX_FreeStream@plt
      46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
      46c7:       e8 d4 f4 ff ff          callq  3ba0 FCGX_PutS+0x40
      46cc:       85 ed                   test   %ebp,%ebp
      46ce:       75 10                   jne    46e0 FCGX_Free+0x50
      46d0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
      46d5:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
      46da:       48 83 c4 18             add    $0x18,%rsp
      46de:       c3                      retq
      46df:       90                      nop
      46e0:       31 f6                   xor    %esi,%esi
      46e2:       83 7b 4c 00             cmpl   $0x0,0x4c(%rbx)
      46e6:       8b 7b 30                mov    0x30(%rbx),%edi
      46e9:       40 0f 94 c6             sete   %sil
      46ed:       e8 86 e6 ff ff          callq  2d78 OS_IpcClose@plt
      46f2:       c7 43 30 ff ff ff ff    movl   $0x,0x30(%rbx)
 
 info registers?
 
 Not too familiar with the specific message, but it could be that
 OS_IpcClose() aborts (not highly unlikely) and it only dumps the return
 address of the current function (shouldn't be referenced as ip though).
 
 What's rbx? Is the memory at %rbx + 0x30 valid?
 
 Also, did you by any chance upgrade the binaries while the code was running

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: GuangYang yguan...@outlook.com
 To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com
 Sent: Wednesday, June 24, 2015 10:09:58 AM
 Subject: radosgw crash within libfcgi

 Hello Cephers,
 Recently we have several radosgw daemon crashes with the same following
 kernel log:

 Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
 7ffa069996f2 sp 7ff55c432710 error 6 in
 libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000]

 Looking at the assembly, it seems crashing at this point -
 http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
 confused me. I tried to see if there is any other reference holding the
 FCGX_Request which release the handle without any luck.

 There are also other observations:
  1 Several radosgw daemon across different hosts crashed around the same
  time.
  2 Apache's error log has some fcgi error complaining ##idle timeout##
  during the time.

 Does anyone experience similar issue?

In the past we've had issues with libfcgi that were related to the number of 
open fds on the process ( 1024). The issue was a buggy libfcgi that was using 
select() instead of poll(), so this might be the issue you're noticing.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw get quota

2015-10-29 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 29, 2015 at 11:29 AM, Derek Yarnell  wrote:
> Sorry, the information is in the headers.  So I think the valid question
> to follow up is why is this information in the headers and not the body
> of the request.  I think this is a bug, but maybe I am not aware of a
> subtly.  It would seem this json comes from this line[0].
>
> [0] -
> https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697
>
> For example the information is returned in what seems to be the
> Content-type header as follows.  Maybe the missing : in the json
> encoding would explain something?

It's definitely a bug. It looks like we fail to call end_header()
before it, so everything is dumped before we close the http header.
Can you open a ceph tracker issue with the info you provided here?

Thanks,
Yehuda

>
> INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS
> connection (1): ceph.umiacs.umd.edu
> DEBUG:requests.packages.urllib3.connectionpool:"GET
> /admin/user?quota=json=foo1209=user HTTP/1.1" 200 0
> INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'),
> ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type:
> application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6
> (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')]
>
> On 10/28/15 11:15 PM, Derek Yarnell wrote:
>> I have had this issue before, and I don't think I have resolved it.  I
>> have been using the RGW admin api to set quota based on the docs[0].
>> But I can't seem to be able to get it to cough up and show me the quota
>> now.  Any ideas I get a 200 back but no body, I have tested this on a
>> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster.  The latter is what
>> the logs are for.
>>
>> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas
>>
>> DEBUG:rgwadmin.rgw:URL:
>> http://ceph.umiacs.umd.edu/admin/user?quota=derek=user
>> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD
>> DEBUG:rgwadmin.rgw:Verify: True  CA Bundle: None
>> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP
>> connection (1): ceph.umiacs.umd.edu
>> DEBUG:requests.packages.urllib3.connectionpool:"GET
>> /admin/user?quota=derek=user HTTP/1.1" 200 0
>> INFO:rgwadmin.rgw:No JSON object could be decoded
>>
>>
>> 2015-10-28 23:02:46.445367 7f444cff1700  1 civetweb: 0x7f445c026d00:
>> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1
>> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:02.063755 7f447ace2700  2
>> RGWDataChangesLog::ChangesRenewThread: start
>> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST:
>> localhost:7480
>> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set():
>> HTTP_ACCEPT_ENCODING: gzip, deflate
>> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */*
>> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set():
>> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE:
>> Thu, 29 Oct 2015 03:03:17 GMT
>> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set():
>> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_FOR: 128.8.132.4
>> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu
>> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu
>> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set():
>> HTTP_CONNECTION: Keep-Alive
>> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set():
>> REQUEST_METHOD: GET
>> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING:
>> quota=derek=user
>> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER:
>> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480
>> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/*
>> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip,
>> deflate
>> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS
>> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015
>> 03:03:17 GMT
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20
>> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4
>> 2015-10-28 23:03:17.139415

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 12:53 PM, Łukasz Jagiełło
 wrote:
> Hi all,
>
> Recently I've noticed a problem with one of our buckets:
>
> I cannot list or stats on a bucket:
> #v+
> root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
> error getting bucket stats ret=-22

That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
radosgw-admin version mismatch vs. version that osds are running. Try
to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
more info about the source of this error.

> ➜  ~  s3cmd -c /etc/s3cmd/prod.cfg ls
> s3://problematic_bucket/images/e/e0/file.png
> ERROR: S3 error: None
> #v-
>
> ,but direct request for an object is working perfectly fine:
> #v+
> ➜  ~  curl -svo /dev/null
> http://ceph-s1/problematic_bucket/images/e/e0/file.png
> […]
> < HTTP/1.1 200 OK
> < Content-Type: image/png
> < Content-Length: 379906
> […]
> #v-
>
> Any solution how to fix it? We're still running ceph 0.67.11
>

You're really behind.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 1:37 PM, Łukasz Jagiełło
 wrote:
>> >> > Recently I've noticed a problem with one of our buckets:
>> >> >
>> >> > I cannot list or stats on a bucket:
>> >> > #v+
>> >> > root@ceph-s1:~# radosgw-admin bucket stats
>> >> > --bucket=problematic_bucket
>> >> > error getting bucket stats ret=-22
>> >>
>> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
>> >> radosgw-admin version mismatch vs. version that osds are running. Try
>> >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
>> >> more info about the source of this error.
>> >
>> >
>> > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
>> >
>> > Result of more verbose debug.
>> >
>> 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
>> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
>> .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
>> ?+0 0x15f3740 con 0x15daa60
>> 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
>> osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
>> .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
>> 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
>> error getting bucket stats ret=-22
>>
>> You can try taking a look at osd.12 logs. Any chance osd.12 and
>> radosgw-admin aren't running the same major version? (more likely
>> radosgw-admin running a newer version).
>
>
> From last 12h it's just deep-scrub info
> #v+
> 2015-11-13 08:23:00.690076 7fc4c62ee700  0 log [INF] : 15.621 deep-scrub ok
> #v-

This is unrelated.

>
> But yesterday there was a big rebalance and a host with that osd was
> rebuilding from scratch.
>
> We're running the same version (ceph, rados) across entire cluster just
> double check it.
>

what does 'radosgw-admin --version' return?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 1:14 PM, Łukasz Jagiełło
<jagiello.luk...@gmail.com> wrote:
> On Fri, Nov 13, 2015 at 1:07 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com>
> wrote:
>>
>> > Recently I've noticed a problem with one of our buckets:
>> >
>> > I cannot list or stats on a bucket:
>> > #v+
>> > root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
>> > error getting bucket stats ret=-22
>>
>> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
>> radosgw-admin version mismatch vs. version that osds are running. Try
>> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
>> more info about the source of this error.
>
>
> https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
>
> Result of more verbose debug.
>
2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
.dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
?+0 0x15f3740 con 0x15daa60
2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
.dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
error getting bucket stats ret=-22

You can try taking a look at osd.12 logs. Any chance osd.12 and
radosgw-admin aren't running the same major version? (more likely
radosgw-admin running a newer version).

>>
>> You're really behind.
>
>
> I know, we've got scheduled update for 2016 it's a big project to ensure
> everything is fine.
>
> --
> Łukasz Jagiełło
> lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw keystone accepted roles not matching

2015-10-15 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 15, 2015 at 8:34 AM, Mike Lowe  wrote:
> I’m having some trouble with radosgw and keystone integration, I always get 
> the following error:
>
> user does not hold a matching role; required roles: Member,user,_member_,admin
>
> Despite my token clearly having one of the roles:
>
> "user": {
> "id": "401375297eb540bbb1c32432439827b0",
> "name": "jomlowe",
> "roles": [
> {
> "id": "8adcf7413cd3469abe4ae13cf259be6e",
> "name": "user"
> }
> ],
> "roles_links": [],
> "username": "jomlowe"
> }
>
> Does anybody have any hints?


Does the user has these roles assigned on keystone?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 8, 2015 at 1:55 PM, Christian Sarrasin
 wrote:
> After discovering this excellent blog post [1], I thought that taking
> advantage of users' "default_placement" feature would be a preferable way to
> achieve my multi-tenancy requirements (see previous post).
>
> Alas I seem to be hitting a snag. Any attempt to create a bucket with a user
> setup with a non-empty default_placement results in a 400 error thrown back
> to the client and the following msg in the radosgw logs:
>
> "could not find placement rule placement-user2 within region"
>
> (The pools exist, I reloaded the radosgw service and ran 'radosgw-admin
> regionmap update' as suggested in the blog post before running the client
> test)
>
> Here's the setup.  What am I doing wrong?  Any insight is really
> appreciated!

Not sure. Did you run 'radosgw-admin regionmap update'?

>
> radosgw-admin region get
> { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}
>
> radosgw-admin zone get default
> { "domain_root": ".rgw",
>   "control_pool": ".rgw.control",
>   "gc_pool": ".rgw.gc",
>   "log_pool": ".log",
>   "intent_log_pool": ".intent-log",
>   "usage_log_pool": ".usage",
>   "user_keys_pool": ".users",
>   "user_email_pool": ".users.email",
>   "user_swift_pool": ".users.swift",
>   "user_uid_pool": ".users.uid",
>   "system_key": { "access_key": "",
>   "secret_key": ""},
>   "placement_pools": [
> { "key": "default-placement",
>   "val": { "index_pool": ".rgw.buckets.index",
>   "data_pool": ".rgw.buckets",
>   "data_extra_pool": ".rgw.buckets.extra"}},
> { "key": "placement-user2",
>   "val": { "index_pool": ".rgw.index.user2",
>   "data_pool": ".rgw.buckets.user2",
>   "data_extra_pool": ".rgw.buckets.extra"}}]}
>
> radosgw-admin user info --uid=user2
> { "user_id": "user2",
>   "display_name": "User2",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "user2",
>   "access_key": "VYM2EEU1X5H6Y82D0K4F",
>   "secret_key": "vEeJ9+yadvtqZrb2xoCAEuM2AlVyZ7UTArbfIEek"}],
>   "swift_keys": [],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "placement-user2",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "temp_url_keys": []}
>
> [1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw
>
>
> On 03/10/15 19:48, Christian Sarrasin wrote:
>>
>> What are the best options to setup the Ceph radosgw so it supports
>> separate/independent "tenants"? What I'm after:
>>
>> 1. Ensure isolation between tenants, ie: no overlap/conflict in bucket
>> namespace; something separate radosgw "users" doesn't achieve
>> 2. Ability to backup/restore tenants' pools individually
>>
>> Referring to the docs [1], it seems this could possibly be achieved with
>> zones; one zone per tenant and leave out synchronization. Seems a little
>> heavy handed and presumably the overhead is non-negligible.
>>
>> Is this "supported"? Is there a better way?
>>
>> I'm running Firefly. I'm also rather new to Ceph so apologies if this is
>> already covered somewhere; kindly send pointers if so...
>>
>> Cheers,
>> Christian
>>
>> PS: cross-posted from [2]
>>
>> [1] http://docs.ceph.com/docs/v0.80/radosgw/federated-config/
>> [2]
>>
>> http://serverfault.com/questions/726491/how-to-setup-ceph-radosgw-to-support-multi-tenancy
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub

When you start radosgw, do you explicitly state the name of the region
that gateway belongs to?


On Thu, Oct 8, 2015 at 2:19 PM, Christian Sarrasin
<c.n...@cleansafecloud.com> wrote:
> Hi Yehuda,
>
> Yes I did run "radosgw-admin regionmap update" and the regionmap appears to
> know about my custom placement_target.  Any other idea?
>
> Thanks a lot
> Christian
>
> radosgw-admin region-map get
> { "regions": [
> { "key": "default",
>   "val": { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}}],
>   "master_region": "default",
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1}}
>
> On 08/10/15 23:02, Yehuda Sadeh-Weinraub wrote:
>
>>> Here's the setup.  What am I doing wrong?  Any insight is really
>>> appreciated!
>>
>>
>> Not sure. Did you run 'radosgw-admin regionmap update'?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3：Permissions of access-key

2015-08-28 Thread Yehuda Sadeh-Weinraub

On Fri, Aug 28, 2015 at 2:17 AM, Zhengqiankun zheng.qian...@h3c.com wrote:
 hi,Yehuda:

   I have a question and hope that you can help me answer it. Different
 subuser of swift

   can set specific permissions, but why not set specific permission for
 access-key of s3?


Probably because no one ever asked it. It shouldn't be hard to do
this, sounds like an easy starter project if anyone wants to get their
hands dirty in the rgw code. Note that the canonical way to do it in
S3 is through user policies that we don't (yet?) support.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

As long as you're 100% sure that the prefix is only being used for the
specific bucket that was previously removed, then it is safe to remove
these objects. But please do double check and make sure that there's
no other bucket that matches this prefix somehow.

Yehuda

On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines  wrote:
> No input, eh? (or maybe TL,DR for everyone)
>
> Short version: Presuming the bucket index shows blank/empty, which it
> does and is fine, would me manually deleting the rados objects with
> the prefix matching the former bucket's ID cause any problems?
>
> thanks,
>
> -Ben
>
> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines  wrote:
>> Ceph 0.93->94.2->94.3
>>
>> I noticed my pool used data amount is about twice the bucket used data count.
>>
>> This bucket was emptied long ago. It has zero objects:
>> "globalcache01",
>> {
>> "bucket": "globalcache01",
>> "pool": ".rgw.buckets",
>> "index_pool": ".rgw.buckets.index",
>> "id": "default.8873277.32",
>> "marker": "default.8873277.32",
>> "owner": "...",
>> "ver": "0#12348839",
>> "master_ver": "0#0",
>> "mtime": "2015-03-08 11:44:11.00",
>> "max_marker": "0#",
>> "usage": {
>> "rgw.none": {
>> "size_kb": 0,
>> "size_kb_actual": 0,
>> "num_objects": 0
>> },
>> "rgw.main": {
>> "size_kb": 0,
>> "size_kb_actual": 0,
>> "num_objects": 0
>> }
>> },
>> "bucket_quota": {
>> "enabled": false,
>> "max_size_kb": -1,
>> "max_objects": -1
>> }
>> },
>>
>>
>>
>> bucket check shows nothing:
>>
>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>> --bucket=globalcache01 --fix
>> []
>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>> --check-head-obj-locator --bucket=globalcache01 --fix
>> {
>> "bucket": "globalcache01",
>> "check_objects": [
>> ]
>> }
>>
>>
>> However, i see a lot of data for it on an OSD (all shadow files with
>> escaped underscores)
>>
>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
>> 
>>
>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
>> 
>>
>> Is there still a bug in the fix obj locator command perhaps? I suppose
>> can just do something like:
>>
>>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>>
>> Since i want to destroy the bucket anyway, but if this affects other
>> buckets, i may want to clean those a better way.
>>
>> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

Make sure you use the underscore also, e.g., "default.8873277.32_".
Otherwise you could potentially erase objects you did't intend to,
like ones who start with "default.8873277.320" and such.

On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote:
> Ok. I'm not too familiar with the inner workings of RGW, but i would
> assume that for a bucket with these parameters:
>
>"id": "default.8873277.32",
>"marker": "default.8873277.32",
>
> Tha it would be the only bucket using the files that start with
> "default.8873277.32"
>
> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>
>
>
> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
> <yeh...@redhat.com> wrote:
>> As long as you're 100% sure that the prefix is only being used for the
>> specific bucket that was previously removed, then it is safe to remove
>> these objects. But please do double check and make sure that there's
>> no other bucket that matches this prefix somehow.
>>
>> Yehuda
>>
>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote:
>>> No input, eh? (or maybe TL,DR for everyone)
>>>
>>> Short version: Presuming the bucket index shows blank/empty, which it
>>> does and is fine, would me manually deleting the rados objects with
>>> the prefix matching the former bucket's ID cause any problems?
>>>
>>> thanks,
>>>
>>> -Ben
>>>
>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>> Ceph 0.93->94.2->94.3
>>>>
>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>> count.
>>>>
>>>> This bucket was emptied long ago. It has zero objects:
>>>> "globalcache01",
>>>> {
>>>> "bucket": "globalcache01",
>>>> "pool": ".rgw.buckets",
>>>> "index_pool": ".rgw.buckets.index",
>>>> "id": "default.8873277.32",
>>>> "marker": "default.8873277.32",
>>>> "owner": "...",
>>>> "ver": "0#12348839",
>>>> "master_ver": "0#0",
>>>> "mtime": "2015-03-08 11:44:11.00",
>>>> "max_marker": "0#",
>>>> "usage": {
>>>> "rgw.none": {
>>>> "size_kb": 0,
>>>> "size_kb_actual": 0,
>>>> "num_objects": 0
>>>> },
>>>> "rgw.main": {
>>>> "size_kb": 0,
>>>> "size_kb_actual": 0,
>>>> "num_objects": 0
>>>> }
>>>> },
>>>> "bucket_quota": {
>>>> "enabled": false,
>>>> "max_size_kb": -1,
>>>> "max_objects": -1
>>>> }
>>>> },
>>>>
>>>>
>>>>
>>>> bucket check shows nothing:
>>>>
>>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>> --bucket=globalcache01 --fix
>>>> []
>>>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>> --check-head-obj-locator --bucket=globalcache01 --fix
>>>> {
>>>> "bucket": "globalcache01",
>>>> "check_objects": [
>>>> ]
>>>> }
>>>>
>>>>
>>>> However, i see a lot of data for it on an OSD (all shadow files with
>>>> escaped underscores)
>>>>
>>>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
>>>> 
>>>>
>>>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
>>>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
>>>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
>>>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
>>>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
>>>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
>>>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
>>>> 
>>>>
>>>> Is there still a bug in the fix obj locator command perhaps? I suppose
>>>> can just do something like:
>>>>
>>>>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>>>>
>>>> Since i want to destroy the bucket anyway, but if this affects other
>>>> buckets, i may want to clean those a better way.
>>>>
>>>> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

The bucket index objects are most likely in the .rgw.buckets.index pool.

Yehuda

On Mon, Aug 31, 2015 at 3:27 PM, Ben Hines <bhi...@gmail.com> wrote:
> Good call, thanks!
>
> Is there any risk of also deleting parts of the bucket index? I'm not
> sure what the objects for the index itself look like, or if they are
> in the .rgw.buckets pool.
>
>
> On Mon, Aug 31, 2015 at 3:23 PM, Yehuda Sadeh-Weinraub
> <yeh...@redhat.com> wrote:
>> Make sure you use the underscore also, e.g., "default.8873277.32_".
>> Otherwise you could potentially erase objects you did't intend to,
>> like ones who start with "default.8873277.320" and such.
>>
>> On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote:
>>> Ok. I'm not too familiar with the inner workings of RGW, but i would
>>> assume that for a bucket with these parameters:
>>>
>>>"id": "default.8873277.32",
>>>"marker": "default.8873277.32",
>>>
>>> Tha it would be the only bucket using the files that start with
>>> "default.8873277.32"
>>>
>>> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
>>> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
>>> <yeh...@redhat.com> wrote:
>>>> As long as you're 100% sure that the prefix is only being used for the
>>>> specific bucket that was previously removed, then it is safe to remove
>>>> these objects. But please do double check and make sure that there's
>>>> no other bucket that matches this prefix somehow.
>>>>
>>>> Yehuda
>>>>
>>>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>>> No input, eh? (or maybe TL,DR for everyone)
>>>>>
>>>>> Short version: Presuming the bucket index shows blank/empty, which it
>>>>> does and is fine, would me manually deleting the rados objects with
>>>>> the prefix matching the former bucket's ID cause any problems?
>>>>>
>>>>> thanks,
>>>>>
>>>>> -Ben
>>>>>
>>>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>>>> Ceph 0.93->94.2->94.3
>>>>>>
>>>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>>>> count.
>>>>>>
>>>>>> This bucket was emptied long ago. It has zero objects:
>>>>>> "globalcache01",
>>>>>> {
>>>>>> "bucket": "globalcache01",
>>>>>> "pool": ".rgw.buckets",
>>>>>> "index_pool": ".rgw.buckets.index",
>>>>>> "id": "default.8873277.32",
>>>>>> "marker": "default.8873277.32",
>>>>>> "owner": "...",
>>>>>> "ver": "0#12348839",
>>>>>> "master_ver": "0#0",
>>>>>> "mtime": "2015-03-08 11:44:11.00",
>>>>>> "max_marker": "0#",
>>>>>> "usage": {
>>>>>> "rgw.none": {
>>>>>> "size_kb": 0,
>>>>>> "size_kb_actual": 0,
>>>>>> "num_objects": 0
>>>>>> },
>>>>>> "rgw.main": {
>>>>>> "size_kb": 0,
>>>>>> "size_kb_actual": 0,
>>>>>> "num_objects": 0
>>>>>> }
>>>>>> },
>>>>>> "bucket_quota": {
>>>>>> "enabled": false,
>>>>>> "max_size_kb": -1,
>>>>>> "max_objects": -1
>>>>>> }
>>>>>> },
>>>>>>
>>>>>>
>>>>>>
>>>>>> bucket check shows nothing:
>>>>>>
>>>>>> 16:07:09 root

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Yehuda Sadeh-Weinraub

I assume you filtered the log by thread? I don't see the response
messages. For the bucket check you can run radosgw-admin with
--log-to-stderr.

Can you also set 'debug objclass = 20' on the osds? You can do it by:

$ ceph tell osd.\* injectargs --debug-objclass 20

Also, it'd be interesting to get the following:

$ radosgw-admin bi list --bucket=
--object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5


Thanks,
Yehuda

On Tue, Sep 1, 2015 at 10:44 AM, Sam Wouters <s...@ericom.be> wrote:
> not sure where I can find the logs for the bucket check, I can't really
> filter them out in the radosgw log.
>
> -Sam
>
> On 01-09-15 19:25, Sam Wouters wrote:
>> It looks like it, this is what shows in the logs after bumping the debug
>> and requesting a bucket list.
>>
>> 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list
>> aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
>> start
>> abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
>> num_entries 1
>> 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from
>> .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
>> 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state:
>> rctx=0x7fccb17c84d0
>> obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
>> state=0x7fcde01a4060 s->prefetch_data=0
>> 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get:
>> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
>> 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was
>> set empty
>> 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get:
>> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
>> 2015-09-01 17:14:53.008675 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
>> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874
>> ...
>> .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
>> ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
>>
>> On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote:
>>> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
>>> operations (bucket listing and bucket check) go into some kind of
>>> infinite loop?
>>>
>>> Yehuda
>>>
>>> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote:
>>>> Hi, I've started the bucket --check --fix on friday evening and it's
>>>> still running. 'ceph -s' shows the cluster health as OK, I don't know if
>>>> there is anything else I could check? Is there a way of finding out if
>>>> its actually doing something?
>>>>
>>>> We only have this issue on the one bucket with versioning enabled, I
>>>> can't get rid of the feeling it has something todo with that. The
>>>> "underscore bug" is also still present on that bucket
>>>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
>>>> way.
>>>> Are there any alternatives, as for example copy all the objects into a
>>>> new bucket without versioning? Simple way would be to list the objects
>>>> and copy them to a new bucket, but bucket listing is not working so...
>>>>
>>>> -Sam
>>>>
>>>>
>>>> On 31-08-15 10:47, Gregory Farnum wrote:
>>>>> This generally shouldn't be a problem at your bucket sizes. Have you
>>>>> checked that the cluster is actually in a healthy state? The sleeping
>>>>> locks are normal but should be getting woken up; if they aren't it
>>>>> means the object access isn't working for some reason. A down PG or
>>>>> something would be the simplest explanation.
>>>>> -Greg
>>>>>
>>>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote:
>>>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>>>>> or progress logging of the radosgw-admin tool.
>>>>>> I will start a check and let it run over the weekend.
>>>>>>
>>>>>> tnx,
>>>>>> Sam
>>>>>>
>>>>>> On 28-08-15 18:16, Sam Wouters wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>>>>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>>>>>> objects (but no versioning configured), which has no issues.
>>

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Yehuda Sadeh-Weinraub

Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
operations (bucket listing and bucket check) go into some kind of
infinite loop?

Yehuda

On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters  wrote:
> Hi, I've started the bucket --check --fix on friday evening and it's
> still running. 'ceph -s' shows the cluster health as OK, I don't know if
> there is anything else I could check? Is there a way of finding out if
> its actually doing something?
>
> We only have this issue on the one bucket with versioning enabled, I
> can't get rid of the feeling it has something todo with that. The
> "underscore bug" is also still present on that bucket
> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
> way.
> Are there any alternatives, as for example copy all the objects into a
> new bucket without versioning? Simple way would be to list the objects
> and copy them to a new bucket, but bucket listing is not working so...
>
> -Sam
>
>
> On 31-08-15 10:47, Gregory Farnum wrote:
>> This generally shouldn't be a problem at your bucket sizes. Have you
>> checked that the cluster is actually in a healthy state? The sleeping
>> locks are normal but should be getting woken up; if they aren't it
>> means the object access isn't working for some reason. A down PG or
>> something would be the simplest explanation.
>> -Greg
>>
>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters  wrote:
>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>> or progress logging of the radosgw-admin tool.
>>> I will start a check and let it run over the weekend.
>>>
>>> tnx,
>>> Sam
>>>
>>> On 28-08-15 18:16, Sam Wouters wrote:
 Hi,

 this bucket only has 13389 objects, so the index size shouldn't be a
 problem. Also, on the same cluster we have an other bucket with 1200543
 objects (but no versioning configured), which has no issues.

 when we run a radosgw-admin bucket --check (--fix), nothing seems to be
 happening. Putting an strace on the process shows a lot of lines like 
 these:
 [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
 
 [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL 
 [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 
 [pid 99385] <... futex resumed> )   = -1 EAGAIN (Resource
 temporarily unavailable)
 [pid 99371] <... futex resumed> )   = 0

 but no errors in the ceph logs or health warnings.

 r,
 Sam

 On 28-08-15 17:49, Ben Hines wrote:
> How many objects in the bucket?
>
> RGW has problems with index size once number of objects gets into the
> 90+ level. The buckets need to be recreated with 'sharded bucket
> indexes' on:
>
> rgw override bucket index max shards = 23
>
> You could also try repairing the index with:
>
>  radosgw-admin bucket check --fix --bucket=
>
> -Ben
>
> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters  wrote:
>> Hi,
>>
>> we have a rgw bucket (with versioning) where PUT and GET operations for
>> specific objects succeed,  but retrieving an object list fails.
>> Using python-boto, after a timeout just gives us an 500 internal error;
>> radosgw-admin just hangs.
>> Also a radosgw-admin bucket check just seems to hang...
>>
>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>> quietly hoped upgrading would fix but it didn't...
>>
>> r,
>> Sam
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to observed civetweb.

2015-09-08 Thread Yehuda Sadeh-Weinraub

You can increase the civetweb logs by adding 'debug civetweb = 10' in
your ceph.conf. The output will go into the rgw logs.

Yehuda

On Tue, Sep 8, 2015 at 2:24 AM, Vickie ch  wrote:
> Dear cephers,
>Just upgrade radosgw from apache to civetweb.
> It's really simple to installed and used. But I can't find any parameters or
> logs to adjust(or observe) civetweb. (Like apache log).  I'm really confuse.
> Any ideas?
>
>
> Best wishes,
> Mika
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw and keystone version 3 domains

2015-09-30 Thread Yehuda Sadeh-Weinraub

At the moment radosgw just doesn't support v3 (so it seems). I created
issue #13303. If anyone wants to pick this up (or provide some
information as to what it would require to support that) it would be
great.

Thanks,
Yehuda

On Wed, Sep 30, 2015 at 3:32 AM, Robert Duncan  wrote:
> Yes, but it always results in 401 from horizon and cli
>
> swift --debug --os-auth-url http://172.25.60.2:5000/v3 --os-username ldapuser 
> --os-user-domain-name ldapdomain --os-project-name someproject 
> --os-project-domain-name ldapdomain --os-password password123 -V 3 post 
> containerV3
> DEBUG:keystoneclient.auth.identity.v3:Making authentication request to 
> http://172.25.60.2:5000/v3/auth/tokens
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366
> DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:46.053177Z into {'tz_sign': 
> None, 'second_fraction': u'053177', 'hour': u'11', 'daydash': u'30', 
> 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'46', 
> 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', 
> 'day': None, 'minute': u'20'} with default timezone  object at 0x1736f50>
> DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None
> DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None
> DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9
> DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None
> DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30
> DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None
> DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None
> DEBUG:iso8601.iso8601:Got u'46' for 'second' with default None
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to  0x7f193dc590b0>
> DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None
> INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X 
> POST -H "Content-Length: 0" -H "X-Auth-Token: 
> 30fd924774bf480d8814c61c7fdf128e"
> INFO:swiftclient:RESP STATUS: 401 Unauthorized
> INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), 
> ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', 
> 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 
> Sep 2015 10:20:46 GMT'), ('content-type', 'text/plain; charset=utf-8')]
> INFO:swiftclient:RESP BODY: AccessDenied
>
> DEBUG:keystoneclient.auth.identity.v3:Making authentication request to 
> http://172.25.60.2:5000/v3/auth/tokens
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366
> DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:47.839422Z into {'tz_sign': 
> None, 'second_fraction': u'839422', 'hour': u'11', 'daydash': u'30', 
> 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'47', 
> 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', 
> 'day': None, 'minute': u'20'} with default timezone  object at 0x1736f50>
> DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None
> DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None
> DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9
> DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None
> DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30
> DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None
> DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None
> DEBUG:iso8601.iso8601:Got u'47' for 'second' with default None
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to  0x7f193dc590b0>
> DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None
> INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X 
> POST -H "Content-Length: 0" -H "X-Auth-Token: 
> fc7bb4a07baf41058546d8a85b2cd2b8"
> INFO:swiftclient:RESP STATUS: 401 Unauthorized
> INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), 
> ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', 
> 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 
> Sep 2015 10:20:47 GMT'), ('content-type', 'text/plain; charset=utf-8')]
> INFO:swiftclient:RESP BODY: AccessDenied
>
> ERROR:swiftclient:Container POST failed: 
> http://172.25.60.2:8080/swift/v1/containerV3 401 Unauthorized   AccessDenied
> Traceback (most recent call last):
>   File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 1243, 
> in _retry
> rv = func(self.url, self.token, *args, **kwargs)
>   File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 771, in 
> post_container
> http_response_content=body)

Re: [ceph-users] radosgw Storage policies

2015-09-28 Thread Yehuda Sadeh-Weinraub

On Mon, Sep 28, 2015 at 4:00 AM, Luis Periquito  wrote:
> Hi All,
>
> I was hearing the ceph talk about radosgw and Yehuda talks about storage
> policies. I started looking for it in the documentation, on how to
> implement/use and couldn't much information:
> http://docs.ceph.com/docs/master/radosgw/s3/ says it doesn't currently
> support it, and http://docs.ceph.com/docs/master/radosgw/swift/ doesn't
> mention it.
>
> From the release notes it seems to be for the swift interface, not S3. Is
> this correct? Can we create them for S3 interface, or only Swift?
>
>

You can create buckets in both swift and s3 that utilize this feature.
You need to define different placement targets in the zone
configuration.
In S3 when you create a bucket, you need to specify a location
constrain that specifies this policy. The location constraint should
be specified as follows: [region][:policy]. So if you're creating a
bucket in the current region using your 'gold' policy that you
defined, you'll need to set it to ':gold'.
In swift, the api requires sending it through a special http header
(X-Storage-Policy).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados gateway / no socket server point defined

2015-09-24 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 24, 2015 at 8:59 AM, Mikaël Guichard  wrote:
> Hi,
>
> I encounter this error :
>
>> /usr/bin/radosgw -d --keyring /etc/ceph/ceph.client.radosgw.keyring -n
>> client.radosgw.myhost
> 2015-09-24 17:41:18.223206 7f427f074880  0 ceph version 0.94.3
> (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process radosgw, pid 4570
> 2015-09-24 17:41:18.349037 7f427f074880  0 framework: fastcgi
> 2015-09-24 17:41:18.349044 7f427f074880  0 framework: civetweb
> 2015-09-24 17:41:18.349048 7f427f074880  0 framework conf key: port, val:
> 7480
> 2015-09-24 17:41:18.349056 7f427f074880  0 starting handler: civetweb
> 2015-09-24 17:41:18.351852 7f427f074880  0 starting handler: fastcgi
> 2015-09-24 17:41:18.351921 7f41fc7a0700  0 ERROR: no socket server point
> defined, cannot start fcgi frontend
>
> I can force the socket file with the followed option and it works :
> --rgw-socket-path=/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> but why the ceph.conf parameter is ignored ?
>
> I look in the radosgw code, it should work :
>
>   conf->get_val("socket_path", "", _path);
>   conf->get_val("socket_port", g_conf->rgw_port, _port);
>   conf->get_val("socket_host", g_conf->rgw_host, _host);
>
>   if (socket_path.empty() && socket_port.empty() && socket_host.empty()) {
> socket_path = g_conf->rgw_socket_path;
> if (socket_path.empty()) {
>   dout(0) << "ERROR: no socket server point defined, cannot start fcgi
> frontend" << dendl;
>   return;
> }
>   }
>
>
>
> My ceph.conf content :
>
> [client.radosgw.gateway]

You're using a different user for starting rgw
(client.radosgw.myhost), so this config section doesn't get used.
Either rename this section, or use the client.radosgw.gateway user.

Yehuda

> host = myhost
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> rgw print continue = false
> rgw enable usage log = true
> rgw enable ops log = true
> log file = /var/log/radosgw/client.radosgw.gateway.log
> rgw usage log tick interval = 30
> rgw usage log flush threshold = 1024
> rgw usage max shards = 32
> rgw usage max user shards = 1
>
> thanks for your response.
>
> regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3cmd --disable-multipart

2015-12-10 Thread Yehuda Sadeh-Weinraub

On Thu, Dec 10, 2015 at 11:10 AM, Deneau, Tom  wrote:
> If using s3cmd to radosgw and using s3cmd's --disable-multipart option, is 
> there any limit to the size of the object that can be stored thru radosgw?
>

rgw limits plain uploads to 5GB

> Also, is there a recommendation for multipart chunk size for radosgw?
>

Having it as a multiply of the underlying rgw stripe size (default is
4MB) might be a good idea.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] about federated gateway

2015-12-14 Thread Yehuda Sadeh-Weinraub

On Sun, Dec 13, 2015 at 7:27 AM, 孙方臣  wrote:
> Hi, All,
>
> I'm setting up federated gateway. One is master zone, the other is slave
> zone. Radosgw-agent is running in slave zone. I have encountered some
> problems, can anybody help answering this:
>
> 1.  When put a object to radosgw, there are two bilogs to generate. One is
> "pending" state, the other is "complete" state.This should be ignored when
> the entry is "pending" state, otherwise the same object will be copied
> twice. I have a pull request that is at
> https://github.com/ceph/radosgw-agent/pull/39, please give some suggestions
> about it.
>
> 2.  When the "rgw_num_rados_handles" is set as 16, the radosgw-agent caannot
> unlock, the error code is 404.  the log is following:
> ..
> 2015-12-13 21:52:33,373 26594 [radosgw_agent.lock][WARNING] failed to unlock
> shard 115 in zone zone-a: Http error code 404 content Not Found
> ..
> 2015-12-13 21:53:00,732 26594 [radosgw_agent.lock][ERROR ] locking shard 116
> in zone zone-a failed: Http error code 423 content
> ..
>
> I can find the locker with the "rados lock info" command, and can break the
> lock with  "rados lock break" command.
> I find the reason finally, the reason is that the lock request from
> radosgw-agent is processed by rados client and the unlock request from
> radosgw-agent is processed by anther rados client. When  the
> "rgw_num_rados_handles" is set as 1, the warning message did not appeared.
> Can anybody help giving some suggestions about this, and can the warning
> message be ignored?

Hi,

it certainly seems like a bug. Can you open an issue at tracker.ceph.com?

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw pool names

2016-06-10 Thread Yehuda Sadeh-Weinraub

On Fri, Jun 10, 2016 at 11:44 AM, Deneau, Tom  wrote:
> When I start radosgw, I create the pool .rgw.buckets manually to control
> whether it is replicated or erasure coded and I let the other pools be
> created automatically.
>
> However, I have noticed that sometimes the pools get created with the 
> "default"
> prefix, thus
> rados lspools
>   .rgw.root
>   default.rgw.control
>   default.rgw.data.root
>   default.rgw.gc
>   default.rgw.log
>   .rgw.buckets  # the one I created
>   default.rgw.users.uid
>   default.rgw.users.keys
>   default.rgw.meta
>   default.rgw.buckets.index
>   default.rgw.buckets.data  # the one actually being used
>
> What controls whether these pools have the "default" prefix or not?
>

The prefix is the name of the zone ('default' by default). This was
added for the jewel release, as well as dropping the requirement of
having the pool names starts with a dot.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw s3website issue

2016-05-29 Thread Yehuda Sadeh-Weinraub

On Sun, May 29, 2016 at 4:47 AM, Gaurav Bafna  wrote:
> Hi Cephers,
>
> I am unable to create bucket hosting a webstite in my vstart cluster.
>
> When I do this in boto :
>
> website_bucket.configure_website('index.html','error.html')
>
> I get :
>
> boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed
>
>
> Here is my ceph.conf for radosgw:
>
> rgw frontends = fastcgi, civetweb port=8010
>
> rgw enable static website = true
>
> rgw dns name = 10.140.13.22
>
> rgw dns s3website name = 10.140.13.22
>
>
> Here are the logs in rgw :
>
> 2016-05-29 00:00:47.191297 7ff404ff9700  1 == starting new request
> req=0x7ff404ff37d0 =
>
> 2016-05-29 00:00:47.191325 7ff404ff9700  2 req 1:0.28::PUT
> /s3website/::initializing for trans_id =
> tx1-005749967f-101f-default
>
> 2016-05-29 00:00:47.191330 7ff404ff9700 10 host=10.140.13.22
>
> 2016-05-29 00:00:47.191338 7ff404ff9700 20 subdomain=
> domain=10.140.13.22 in_hosted_domain=1 in_hosted_domain_s3website=1
>

Could it be that the endpoint is configured to serve both S3 and
static websites?

Yehuda

> 2016-05-29 00:00:47.191350 7ff404ff9700  5 the op is PUT
>
> 2016-05-29 00:00:47.191395 7ff404ff9700 20 get_handler
> handler=32RGWHandler_REST_Bucket_S3Website
>
> 2016-05-29 00:00:47.191399 7ff404ff9700 10
> handler=32RGWHandler_REST_Bucket_S3Website
>
> 2016-05-29 00:00:47.191401 7ff404ff9700  2 req 1:0.000104:s3:PUT
> /s3website/::getting op 1
>
> 2016-05-29 00:00:47.191410 7ff404ff9700 10
> RGWHandler_REST_S3Website::error_handler err_no=-2003 http_ret=405
>
> 2016-05-29 00:00:47.191412 7ff404ff9700 20 No special error handling today!
>
> 2016-05-29 00:00:47.191415 7ff404ff9700 20 handler->ERRORHANDLER:
> err_no=-2003 new_err_no=-2003
>
> 2016-05-29 00:00:47.191504 7ff404ff9700  2 req 1:0.000207:s3:PUT
> /s3website/::op status=0
>
> 2016-05-29 00:00:47.191510 7ff404ff9700  2 req 1:0.000213:s3:PUT
> /s3website/::http status=405
>
> 2016-05-29 00:00:47.191511 7ff404ff9700  1 == req done
> req=0x7ff404ff37d0 op status=0 http_status=405 ==
>
>
> Code wise I see that put_op is not defined for
> RGWHandler_REST_S3Website class but is defined for
> RGWHandler_REST_Bucket_S3 class .
>
> Can somebody please help me out ?
>
>
>
>
> --
> Gaurav Bafna
> 9540631400
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-15 Thread Yehuda Sadeh-Weinraub

On Thu, Jan 14, 2016 at 10:51 PM, seapasu...@uchicago.edu
 wrote:
> It looks like the gateway is experiencing a similar race condition to what
> we reported before.
>
> The rados object has a size of 0 bytes but the bucket index shows the object
> listed and the object metadata shows a size of
> 7147520 bytes.
>
> I have a lot of logs but I don't think any of them have the full data from
> the upload of this object.
>
> I thought this bug was fixed back in firefly/giant
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19971.html
>
> --
>
> root@kg34-33:/srv/nfs/griffin_temp# rados -p .rgw.buckets stat
> default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar
> ..rgw.buckets/default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar
> mtime 1446672570, size 0
>
> --
>
> SError: [Errno 2] No such file or directory:
> '/srv/nfs/griffin_tempnoaa-nexrad-l2/2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar'
>
> In [13]: print(key.size)
> 7147520
>
> We are currently using 94.5 and the file were uploaded to hammer as well
>
> lacadmin@kh28-10:~$ ceph --version
> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
> lacadmin@kh28-10:~$ radosgw --version
> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>
>
> The cluster is health_ok and was ok during the upload. I need to confirm
> with the person who uploaded the data but I think they did it with s3cmd.
> Has anyone seen this before? I think I need to file a bug :-(
>

What does 'radosgw-admin object stat --bucket= --object=' show?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-15 Thread Yehuda Sadeh-Weinraub

On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> Hello Yehuda,
>
> Here it is::
>
> radosgw-admin object stat --bucket="noaa-nexrad-l2"
> --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar"
> {
> "name":
> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
> "size": 7147520,
> "policy": {
> "acl": {
> "acl_user_map": [
> {
> "user": "b05f707271774dbd89674a0736c9406e",
> "acl": 15
> }
> ],
> "acl_group_map": [
> {
> "group": 1,
> "acl": 1
> }
> ],
> "grant_map": [
> {
> "id": "",
> "grant": {
> "type": {
> "type": 2
> },
> "id": "",
> "email": "",
> "permission": {
> "flags": 1
> },
> "name": "",
> "group": 1
> }
> },
> {
> "id": "b05f707271774dbd89674a0736c9406e",
> "grant": {
> "type": {
> "type": 0
> },
> "id": "b05f707271774dbd89674a0736c9406e",
> "email": "",
> "permission": {
> "flags": 15
> },
> "name": "noaa-commons",
> "group": 0
> }
> }
> ]
> },
> "owner": {
> "id": "b05f707271774dbd89674a0736c9406e",
> "display_name": "noaa-commons"
> }
> },
> "etag": "b91b6f1650350965c5434c547b3c38ff-1\u",
> "tag": "_cWrvEa914Gy1AeyzIhRlUdp1wJnek3E\u",
> "manifest": {
> "objs": [],
> "obj_size": 7147520,
> "explicit_objs": "false",
> "head_obj": {
> "bucket": {
> "name": "noaa-nexrad-l2",
> "pool": ".rgw.buckets",
> "data_extra_pool": ".rgw.buckets.extra",
> "index_pool": ".rgw.buckets.index",
> "marker": "default.384153.1",
> "bucket_id": "default.384153.1"
> },
> "key": "",
> "ns": "",
> "object":
> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>     "instance": ""
> },
> "head_size": 0,
> "max_head_size": 0,
> "prefix":
> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar.2~pcu5Hz6foFXjlSxBat22D8YMcHlQOBD",

Try running:
$ rados -p .rgw.buckets ls | grep pcu5Hz6

Yehuda


> "tail_bucket": {
> "name": "noaa-nexrad-l2",
> "pool": ".rgw.buckets",
> "data_extra_pool": ".rgw.buckets.extra",
> "index_pool": ".rgw.buckets.index",
> "marker": "default.384153.1",
> "bucket_id": "default.384153.1"
> },
> "rules": [
> {
> "key": 0,
> "val": {
> "start_part_num": 1,
> "start_ofs": 0,
> "part_size": 0,
> "stripe_max_size": 4194304,
> "override_prefix": ""
> }
> }
> ]
> },
> "attrs": {}
>
> }
>
> On 1/15/16 11:17 AM, Yehuda Sadeh-Weinraub wrote:
>>
>> radosgw-admin object stat --bucket= --object='
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-15 Thread Yehuda Sadeh-Weinraub

That's interesting, and might point at the underlying issue that
caused it. Could be a racing upload that somehow ended up with the
wrong object head. The 'multipart' object should be 4M in size, and
the 'shadow' one should have the remainder of the data. You can run
'rados stat -p .rgw.buckets ' to validate that. If that's the
case, you can copy these to the expected object names:

$ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2
$ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD

$ rados -p .rgw.buckets cp
default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1
default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1

$ rados -p .rgw.buckets cp
default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1
default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1

Yehuda


On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6'
> lacadmin@kh28-10:~$
>
> Nothing was found. That said when I run the command with another prefix
> snippet::
> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto'
> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1
> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1
>
>
>
>
> On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>> Hello Yehuda,
>>>
>>> Here it is::
>>>
>>> radosgw-admin object stat --bucket="noaa-nexrad-l2"
>>>
>>> --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar"
>>> {
>>>  "name":
>>>
>>> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>>>  "size": 7147520,
>>>  "policy": {
>>>  "acl": {
>>>  "acl_user_map": [
>>>  {
>>>  "user": "b05f707271774dbd89674a0736c9406e",
>>>  "acl": 15
>>>  }
>>>  ],
>>>  "acl_group_map": [
>>>  {
>>>  "group": 1,
>>>  "acl": 1
>>>  }
>>>  ],
>>>  "grant_map": [
>>>  {
>>>  "id": "",
>>>  "grant": {
>>>  "type": {
>>>  "type": 2
>>>  },
>>>  "id": "",
>>>  "email": "",
>>>  "permission": {
>>>  "flags": 1
>>>  },
>>>  "name": "",
>>>  "group": 1
>>>  }
>>>  },
>>>  {
>>>  "id": "b05f707271774dbd89674a0736c9406e",
>>>  "grant": {
>>>  "type": {
>>>  "type": 0
>>>  },
>>>  "id": "b05f707271774dbd89674a0736c9406e",
>>>  "email": "",
>>>  "permission": {
>>>  "flags": 15
>>>  },
>>>  "name": "noaa-commons",
>>>  "group": 0
>>>  }
>>>  }
>>>  ]
>>>  },
>>>  "owner": {
>>>  "id": "b05f707271774dbd89674a0736c9406e",
>>>  "display_n

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-15 Thread Yehuda Sadeh-Weinraub

Ah, I see. Misread that and the object names were very similar. No,
don't copy it. You can try to grep for the specific object name and
see if there are pieces of it lying around under a different upload
id.

Yehuda

On Fri, Jan 15, 2016 at 1:44 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> Sorry I am a bit confused. The successful list that I provided is from a
> different object of the same size to show that I could indeed get a list.
> Are you saying to copy the working object to the missing object? Sorry for
> the confusion.
>
>
> On 1/15/16 3:20 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> That's interesting, and might point at the underlying issue that
>> caused it. Could be a racing upload that somehow ended up with the
>> wrong object head. The 'multipart' object should be 4M in size, and
>> the 'shadow' one should have the remainder of the data. You can run
>> 'rados stat -p .rgw.buckets ' to validate that. If that's the
>> case, you can copy these to the expected object names:
>>
>> $ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2
>> $ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD
>>
>> $ rados -p .rgw.buckets cp
>>
>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1
>>
>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1
>>
>> $ rados -p .rgw.buckets cp
>>
>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1
>>
>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1
>>
>> Yehuda
>>
>>
>> On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6'
>>> lacadmin@kh28-10:~$
>>>
>>> Nothing was found. That said when I run the command with another prefix
>>> snippet::
>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto'
>>>
>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1
>>>
>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1
>>>
>>>
>>>
>>>
>>> On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu
>>>> <seapasu...@uchicago.edu> wrote:
>>>>>
>>>>> Hello Yehuda,
>>>>>
>>>>> Here it is::
>>>>>
>>>>> radosgw-admin object stat --bucket="noaa-nexrad-l2"
>>>>>
>>>>>
>>>>> --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar"
>>>>> {
>>>>>   "name":
>>>>>
>>>>>
>>>>> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>>>>>   "size": 7147520,
>>>>>   "policy": {
>>>>>   "acl": {
>>>>>   "acl_user_map": [
>>>>>   {
>>>>>   "user": "b05f707271774dbd89674a0736c9406e",
>>>>>   "acl": 15
>>>>>   }
>>>>>   ],
>>>>>   "acl_group_map": [
>>>>>   {
>>>>>   "group": 1,
>>>>>   "acl": 1
>>>>>   }
>>>>>   ],
>>>>>   "grant_map": [
>>>>>   {
>>>>>   "id": "",
>>>>>   "grant": {
>>>>>   "type": {
>>>>>   "type": 2
>>>>>   },
>>>>>   "id": "",
>>>>>   "email": "",
>>>>>   "permission": {
>

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-15 Thread Yehuda Sadeh-Weinraub

The head object of a multipart object has 0 size, so it's expected.
What's missing is the tail of the object. I don't assume you have any
logs from when the object was uploaded?

Yehuda

On Fri, Jan 15, 2016 at 2:12 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> Sorry for the confusion::
>
> When I grepped for the prefix of the missing object::
> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar.2~pcu5Hz6foFXjlSxBat22D8YMcHlQOBD"
>
> I am not able to find any chunks of the object::
>
> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6'
> lacadmin@kh28-10:~$
>
> The only piece of the object that I can seem to find is the original one I
> posted::
> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep
> 'NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959'
> default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar
>
> And when we stat this object is is 0 bytes as shown earlier::
> lacadmin@kh28-10:~$ rados -p .rgw.buckets stat
> 'default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar'
> .rgw.buckets/default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar
> mtime 2015-11-04 15:29:30.00, size 0
>
> Sorry again for the confusion.
>
>
>
> On 1/15/16 3:58 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> Ah, I see. Misread that and the object names were very similar. No,
>> don't copy it. You can try to grep for the specific object name and
>> see if there are pieces of it lying around under a different upload
>> id.
>>
>> Yehuda
>>
>> On Fri, Jan 15, 2016 at 1:44 PM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>> Sorry I am a bit confused. The successful list that I provided is from a
>>> different object of the same size to show that I could indeed get a list.
>>> Are you saying to copy the working object to the missing object? Sorry
>>> for
>>> the confusion.
>>>
>>>
>>> On 1/15/16 3:20 PM, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> That's interesting, and might point at the underlying issue that
>>>> caused it. Could be a racing upload that somehow ended up with the
>>>> wrong object head. The 'multipart' object should be 4M in size, and
>>>> the 'shadow' one should have the remainder of the data. You can run
>>>> 'rados stat -p .rgw.buckets ' to validate that. If that's the
>>>> case, you can copy these to the expected object names:
>>>>
>>>> $ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2
>>>> $ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD
>>>>
>>>> $ rados -p .rgw.buckets cp
>>>>
>>>>
>>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1
>>>>
>>>>
>>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1
>>>>
>>>> $ rados -p .rgw.buckets cp
>>>>
>>>>
>>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1
>>>>
>>>>
>>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1
>>>>
>>>> Yehuda
>>>>
>>>>
>>>> On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu
>>>> <seapasu...@uchicago.edu> wrote:
>>>>>
>>>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6'
>>>>> lacadmin@kh28-10:~$
>>>>>
>>>>> Nothing was found. That said when I run the command with another prefix
>>>>> snippet::
>>>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto'
>>>>>
>>>>>
>>>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1
>>>>>
>>>>>
>>>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote:
>>>>>>
>>>>>> On Fri, Jan 15, 2016 at 9:

Re: [ceph-users] v10.0.2 released

2016-01-14 Thread Yehuda Sadeh-Weinraub

On Thu, Jan 14, 2016 at 7:37 AM, Sage Weil  wrote:
> This development release includes a raft of changes and improvements for
> Jewel. Key additions include CephFS scrub/repair improvements, an AIX and
> Solaris port of librados, many librbd journaling additions and fixes,
> extended per-pool options, and NBD driver for RBD (rbd-nbd) that allows
> librbd to present a kernel-level block device on Linux, multitenancy
> support for RGW, RGW bucket lifecycle support, RGW support for Swift

rgw bucket lifecycle isn't there, it still has some more way to go
before we merge it in.

Yehuda

> static large objects (SLO), and RGW support for Swift bulk delete.
>
> There are also lots of smaller optimizations and performance fixes going
> in all over the tree, particular in the OSD and common code.
>
> Notable Changes
> ---
>
> See
>
> http://ceph.com/releases/v10-0-2-released/
>
> [I'd include the changelog here but I'm missing a oneliner that renders
> the rst in email-suitable form...]
>
> Getting Ceph
> 
>
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-10.0.2.tar.gz
> * For packages, see http://ceph.com/docs/master/install/get-packages
> * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw anonymous write

2016-02-09 Thread Yehuda Sadeh-Weinraub

On Tue, Feb 9, 2016 at 5:15 AM, Jacek Jarosiewicz
 wrote:
> Hi list,
>
> My setup is: ceph 0.94.5, ubuntu 14.04, tengine (patched nginx).
>
> I'm trying to migrate from our old file storage (MogileFS) to the new ceph
> radosgw. The problem is that the old storage had no access control - no
> authorization, so the access to read and/or write was controlled by the web
> server (ie per IP/network).
>
> I want to keep the clients using old storage, but get rid of the MogileFS so
> I don't have to maintain two different storage solutions.
>
> Basically MogileFS http API is similar to S3, except for the authorization
> part - so the methods are the same (PUT, GET, DELETE..).
>
> I've created a bucket with public-read-write access and tried to connect
> MogileFS client to it - the uploads work fine, and the files get acl
> public-read so are readable, but they don't have an owner.
>
> So after upload I can't manage them (ie modify acl) - I can only remove
> objects.
>
> Is there a way to force files that are uploaded anonymously to have an
> owner? Is there a way maybe to have them inherit owner from the bucket?
>

Currently there's no way to change it. I'm not sure though that we're
doing the correct thing. Did you try it with Amazon S3 by any chance?

> Cheers,
> J
>
> --
> Jacek Jarosiewicz
> Administrator Systemów Informatycznych
>
> 
> SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
> ul. Senatorska 13/15, 00-075 Warszawa
> Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru
> Sądowego,
> nr KRS 029537; kapitał zakładowy 42.756.000 zł
> NIP: 957-05-49-503
> Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
>
> 
> SUPERMEDIA ->   http://www.supermedia.pl
> dostep do internetu - hosting - kolokacja - lacza - telefonia
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW: oddity when creating users via admin api

2016-01-27 Thread Yehuda Sadeh-Weinraub

On Wed, Jan 27, 2016 at 4:20 PM, seapasu...@uchicago.edu
 wrote:
> So when I create a new user with the admin api. If the user already exists
> it just generates a new keypair for that user. Shouldn't the admin api
> report that the user already exists? I ask because I can end up with
> multiple keypairs for the same user unintentionally which could be an issue.
> I was not sure if this was a feature or a bug so I thought I would ask here
> prior to filing a bug.

It's definitely a bug. But note that it sounds familiar, and we might
have already fixed it for the next major version.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW :: bucket quota not enforced below 1

2016-01-27 Thread Yehuda Sadeh-Weinraub

On Wed, Jan 27, 2016 at 4:18 PM, seapasu...@uchicago.edu
 wrote:
> if you set a RGW user to have abucket quota of 0 buckets you can still
> create buckets. The only way I have found to prevent a user from being able
> to create buckets is to set the op_mask to read. 1.) it looks like
> bucket_policy is not enforced when you have it set to anything below 1. It
> looks like the only way to prevent a user from creating buckets is to set
> the op_mask but this is not documented. How would I set the op_mask via the
> radosgw admin api? I keep getting a 200 success code but the op_mask of the
> user stays the same.
>
> relavent pastebins:
> http://pastebin.com/Rbzdy52c -- shows user info with bucket quota set but
> shows ability to create buckets.
> http://pastebin.com/J9K3dgdF -- shows inability to set op_mask from admin
> api (that or I don't know how)
>
>
> 1.) does anyone know how to set the op_mask via the admin api? 2.) why can I
> create what seems like an infinite amount of buckets when my bucket quota is
> set to 0 objects and 0 size? Shouldn't it be enforced for anything above -1?

That's not bucket quota, that's the user's max_buckets param. When
this value is set to '0' it means the user has no limit on the number
of the buckets. Sadly due to backward compatibility issue, having 0
mean something else is a bit problematic. We can probably add a new
bool param that will specify whether bucket creation is allowed at
all.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 411 Content-Length required error

2016-01-27 Thread Yehuda Sadeh-Weinraub

On Wed, Jan 27, 2016 at 3:31 PM, John Hogenmiller  wrote:
> I did end up switching to civetweb and I also found that rgw content length
> compat, which I set to true. I am still getting the 411 Length required
> issue.
>
> I have had more discussions with our testing team, and I am still trying to
> ascertain how valid this issue is.
>
> With AWS Sig v4, you use a different method to do chunked transfers. With
> the sigv2, you do it as a "Transfer-Encoding: chunked" (as detailed in my
> s3curl example).  However, that v2 method may only apply to the
> implementation we have have (we have a proprietary implementation of s3 that
> I am hoping to replace with Ceph, if I can match our acceptance testing).
>
> The reason I think that this is a valid issue is because of this commit
>
> http://tracker.ceph.com/projects/ceph/repository/revisions/14fa77d9277b5ef5d0c6683504b368773b39ccc4
>
>> Fixes: #2878
>> We now allow complete multipart upload to use chunked encoding
>> when sending request data. With chunked encoding the HTTP_LENGTH
>> header is not required.
>
>
> What I would like to see is the test code for this (ideally in a curl or
> s3curl format) so that I can compare locally to see if we're saying the same
> thing, or if that commit from 3 years ago is still valid.
>

I don't think it's related. Try bumping up the rgw debug log, (debug
rgw = 20), and see what are the http header fields that are being sent
for the specific request. It could be that apache is not passing on
the Transfer-Encoding header, or does something else to it.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem create user RGW

2016-02-24 Thread Yehuda Sadeh-Weinraub

try running:

$ radosgw-admin --name client.rgw.servergw001 metadata list user


Yehuda

On Wed, Feb 24, 2016 at 8:41 AM, Andrea Annoè  wrote:
> I don’t see any user create in RGW
>
>
>
> sudo radosgw-admin metadata list user
>
> [
>
> ]
>
>
>
>
>
> sudo radosgw-admin user create --uid="user1site1" --display-name="User test
> replica site1" --name client.rgw.servergw001 --access-key=user1site1
> --secret=pwd1
>
> {
>
> "user_id": "user1site1",
>
> "display_name": "User test replica site1",
>
> "email": "",
>
> "suspended": 0,
>
> "max_buckets": 1000,
>
> "auid": 0,
>
> "subusers": [],
>
> "keys": [
>
> {
>
> "user": "user1site1",
>
> "access_key": "user1site1",
>
> "secret_key": "pwd1"
>
> }
>
> ],
>
> "swift_keys": [],
>
> "caps": [],
>
> "op_mask": "read, write, delete",
>
> "default_placement": "",
>
> "placement_tags": [],
>
> "bucket_quota": {
>
> "enabled": false,
>
> "max_size_kb": -1,
>
> "max_objects": -1
>
> },
>
> "user_quota": {
>
> "enabled": false,
>
> "max_size_kb": -1,
>
> "max_objects": -1
>
> },
>
> "temp_url_keys": []
>
> }
>
>
>
> sudo radosgw-admin metadata list user
>
> [
>
> ]
>
>
>
>
>
> The list of user don’t change… what’s the problem? Command, keyring… ??
>
> The command for create user don’t report error if I try to retry more time.
>
>
>
> Please help me.
>
>
>
> Best regards.
>
> Andrea
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-02-27 Thread Yehuda Sadeh-Weinraub

On Wed, Feb 24, 2016 at 5:48 PM, Ben Hines  wrote:
> Any idea what is going on here? I get these intermittently, especially with
> very large file.
>
> The client is doing RANGE requests on this >51 GB file, incrementally
> fetching later chunks.
>
> 2016-02-24 16:30:59.669561 7fd33b7fe700  1 == starting new request
> req=0x7fd32c0879c0 =
> 2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for
> trans_id = tx00037ad24-0056ce4b43-259914b-default
> 2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=
> 2016-02-24 16:30:59.669757 7fd33b7fe700 10
> s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
> s->bucket=
> 2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
> 2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing
> 2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading
> permissions
> 2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
> 50346000384
> 2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op
> 2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
> name=.users.uid+ : hit
> 2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
> name=.users.uid+ : hit
> 2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
> op mask
> 2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
> op permissions
> 2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
> uid=anonymous mask=49
> 2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not found
> 2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for group=1
> mask=49
> 2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
> 2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for group=2
> mask=49
> 2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not found
> 2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions id=anonymous
> owner= perm=1
> 2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested perm
> (type)=1, policy perm=1, user_perm_mask=15, acl perm=1
> 2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
> op params
> 2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
> //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing
> 2016-02-24 16:30:59.674107 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:30:59.674193 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:30:59.674317 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:30:59.674433 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:31:00.046110 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:31:00.150966 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:31:00.151118 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
> rule->part_size=52428800
> 2016-02-24 16:31:00.161000 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
> rule->part_size=52428800
> 2016-02-24 16:31:00.199553 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
> rule->part_size=52428800
> 2016-02-24 16:31:00.278308 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
> rule->part_size=52428800
> 2016-02-24 16:31:00.312306 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
> rule->part_size=52428800
> 2016-02-24 16:31:00.751626 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
> rule->part_size=52428800
> 2016-02-24 16:31:00.833570 7fd33b7fe700  0 RGWObjManifest::operator++():
> result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400
>

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-19 Thread Yehuda Sadeh-Weinraub

On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> I have looked all over and I do not see any explicit mention of
> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor do I
> see a timestamp from November 4th although I do see log rotations dating
> back to october 15th. I don't think it's possible it wasn't logged so I am
> going through the bucket logs from the 'radosgw-admin log show --object'
> side and I found the following::
>
> 4604932 {
> 4604933 "bucket": "noaa-nexrad-l2",
> 4604934 "time": "2015-11-04 21:29:27.346509Z",
> 4604935 "time_local": "2015-11-04 15:29:27.346509",
> 4604936 "remote_addr": "",
> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e",
> 4604938 "user": "b05f707271774dbd89674a0736c9406e",
> 4604939 "operation": "PUT",

I'd expect a multipart upload completion to be done with a POST, not a PUT.

> 4604940 "uri":
> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
> 4604941 "http_status": "200",
> 4604942 "error_code": "",
> 4604943 "bytes_sent": 19,
> 4604944 "bytes_received": 0,
> 4604945 "object_size": 0,

Do you see a zero object_size for other multipart uploads?

Yehuda

> 4604946 "total_time": 142640400,
> 4604947 "user_agent": "Boto\/2.38.0 Python\/2.7.7
> Linux\/2.6.32-573.7.1.el6.x86_64",
> 4604948 "referrer": ""
> 4604949 }
>
> Does this help at all. The total time seems exceptionally high. Would it be
> possible that there is a timeout issue where the put request started a
> multipart upload with the correct header and then timed out but the radosgw
> took the data anyway?
>
> I am surprised the radosgw returned a 200 let alone placed the key in the
> bucket listing.
>
>
> That said here is another object (different object) that 404s:
> 1650873 {
> 1650874 "bucket": "noaa-nexrad-l2",
> 1650875 "time": "2015-11-05 04:50:42.606838Z",
> 1650876 "time_local": "2015-11-04 22:50:42.606838",
> 1650877 "remote_addr": "",
> 1650878 "object_owner": "b05f707271774dbd89674a0736c9406e",
> 1650879 "user": "b05f707271774dbd89674a0736c9406e",
> 1650880 "operation": "PUT",
> 1650881 "uri":
> "\/noaa-nexrad-l2\/2015\/02\/25\/KVBX\/NWS_NEXRAD_NXL2DP_KVBX_2015022516_20150225165959.tar",
> 1650882 "http_status": "200",
> 1650883 "error_code": "",
> 1650884 "bytes_sent": 19,
> 1650885 "bytes_received": 0,
> 1650886 "object_size": 0,
> 1650887 "total_time": 0,
> 1650888 "user_agent": "Boto\/2.38.0 Python\/2.7.7
> Linux\/2.6.32-573.7.1.el6.x86_64",
> 1650889 "referrer": ""
> 1650890 }
>
> And this one fails with a 404 as well. Does this help at all? Here is a
> successful object (different object) log entry as well just in case::
>
> 17462367 {
> 17462368 "bucket": "noaa-nexrad-l2",
> 17462369 "time": "2015-11-04 21:16:44.148603Z",
> 17462370 "time_local": "2015-11-04 15:16:44.148603",
> 17462371 "remote_addr": "",
> 17462372 "object_owner": "b05f707271774dbd89674a0736c9406e",
> 17462373 "user": "b05f707271774dbd89674a0736c9406e",
> 17462374 "operation": "PUT",
> 17462375 "uri":
> "\/noaa-nexrad-l2\/2015\/01\/01\/KAKQ\/NWS_NEXRAD_NXL2DP_KAKQ_2015010108_20150101085959.tar",
> 17462376 "http_status": "200",
> 17462377 "error_code": "",
> 17462378 "bytes_sent": 19,
> 17462379 "bytes_received": 0,
> 17462380 "object_size": 0,
> 17462381 "total_time": 0,
>

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-21 Thread Yehuda Sadeh-Weinraub

On Thu, Jan 21, 2016 at 4:02 PM, seapasu...@uchicago.edu
 wrote:
> I haven't been able to reproduce the issue on my end but I do not fully
> understand how the bug exists or why it is happening. I was finally given
> the code they are using to upload the files::
>
> http://pastebin.com/N0j86NQJ
>
> I don't know if this helps at all :-(. the other thing is that I have only
> experienced this bug on this 'noaa-nexrad-l2' bucket. The other buckets have
> substantially less data and objects though.
>
> Right now I am trying to trigger this bug using python requests-aws and I
> keep getting a 403 while trying to authenticate. I am not a developer by any
> means and a piss poor sysadmin haha. My plan is to start a multipart upload
> and initiate a put for the first part but hang when placing the data inside.
> Then try to complete the multipart upload in another session.

The reproduction I had in mind would be something like:

init multipart upload
upload part

then run multiple operations *concurrently* that complete the upload

also try to complete and abort concurrently

Yehuda

>
> I guess please stand by while I figure this out :/ Thanks for all of your
> help!
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-20 Thread Yehuda Sadeh-Weinraub

On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
>
>
> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>> I have looked all over and I do not see any explicit mention of
>>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor do
>>> I
>>> see a timestamp from November 4th although I do see log rotations dating
>>> back to october 15th. I don't think it's possible it wasn't logged so I
>>> am
>>> going through the bucket logs from the 'radosgw-admin log show --object'
>>> side and I found the following::
>>>
>>> 4604932 {
>>> 4604933 "bucket": "noaa-nexrad-l2",
>>> 4604934 "time": "2015-11-04 21:29:27.346509Z",
>>> 4604935 "time_local": "2015-11-04 15:29:27.346509",
>>> 4604936 "remote_addr": "",
>>> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e",
>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e",
>>> 4604939 "operation": "PUT",
>>
>> I'd expect a multipart upload completion to be done with a POST, not a
>> PUT.
>
> Indeed it seems really weird.
>>
>>
>>> 4604940 "uri":
>>>
>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>>> 4604941 "http_status": "200",
>>> 4604942 "error_code": "",
>>> 4604943 "bytes_sent": 19,
>>> 4604944 "bytes_received": 0,
>>> 4604945 "object_size": 0,
>>
>> Do you see a zero object_size for other multipart uploads?
>
> I think so. I still don't know how to tell for certain if a radosgw object
> is a multipart object or not. I think all of the objects in noaa-nexrad-l2
> bucket are multipart::
>
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket":
> "noaa-nexrad-l2",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16
> 19:49:30.579738Z",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local":
> "2015-10-16 14:49:30.579738",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user":
> "b05f707271774dbd89674a0736c9406e",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri":
> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status": "200",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331,
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received": 152,
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0,
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0,
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent":
> "Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64",
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": ""
> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-}
>
> The objects above (NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar)
> pulls down without an issue though. Below is a paste for object
> "NWS_NEXRAD_NXL2DP_KVBX_2015022516_20150225165959.tar" which 404's::
> http://pastebin.com/Jtw8z7G4

Sadly the log doesn't provide all the input, but I can guess what the
operations were:

 - POST (init multipart upload)
 - PUT (upload part)
 - GET (list parts)
 - POST (complete multipart) <-- took > 57 seconds to process
 - POST (complete multipart)
 - HEAD (stat object)

For some reason the complete multipart operation took too long, which
I think triggered a client retry (either that, or an abort). Then
there were two completions racing (or a complete and abort), which
might have caused the issue we're seeing for some reason. E.g., two
completions might have ended up with the second completion noticing
that it's overwriting an existing object (that we just created),
sending the 'old' object to be garbage collected, when that object's
tail is actually its own tail.


>
> I see two posts one recorded a minute before for this object both with 0
> size though. Does this help at all?

Yes, very much

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-20 Thread Yehuda Sadeh-Weinraub

Keep in mind that if the problem is that the tail is being sent to
garbage collection, you'll only see the 404 after a few hours. A
shorter way to check it would be by listing the gc entries (with
--include-all).

Yehuda

On Wed, Jan 20, 2016 at 1:52 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> I'm working on getting the code they used and trying different timeouts in
> my multipart upload code. Right now I have not created any new 404 keys
> though :-(
>
>
> On 1/20/16 3:44 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> We'll need to confirm that this is the actual issue, and then have it
>> fixed. It would be nice to have some kind of a unitest that reproduces
>> it.
>>
>> Yehuda
>>
>> On Wed, Jan 20, 2016 at 1:34 PM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>> So is there any way to prevent this from happening going forward? I mean
>>> ideally this should never be possible, right? Even with a complete object
>>> that is 0 bytes it should be downloaded as 0 bytes and have a different
>>> md5sum and not report as 7mb?
>>>
>>>
>>>
>>> On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu
>>>> <seapasu...@uchicago.edu> wrote:
>>>>>
>>>>>
>>>>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote:
>>>>>>
>>>>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu
>>>>>> <seapasu...@uchicago.edu> wrote:
>>>>>>>
>>>>>>> I have looked all over and I do not see any explicit mention of
>>>>>>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs
>>>>>>> nor
>>>>>>> do
>>>>>>> I
>>>>>>> see a timestamp from November 4th although I do see log rotations
>>>>>>> dating
>>>>>>> back to october 15th. I don't think it's possible it wasn't logged so
>>>>>>> I
>>>>>>> am
>>>>>>> going through the bucket logs from the 'radosgw-admin log show
>>>>>>> --object'
>>>>>>> side and I found the following::
>>>>>>>
>>>>>>> 4604932 {
>>>>>>> 4604933 "bucket": "noaa-nexrad-l2",
>>>>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z",
>>>>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509",
>>>>>>> 4604936 "remote_addr": "",
>>>>>>> 4604937 "object_owner":
>>>>>>> "b05f707271774dbd89674a0736c9406e",
>>>>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e",
>>>>>>> 4604939 "operation": "PUT",
>>>>>>
>>>>>> I'd expect a multipart upload completion to be done with a POST, not a
>>>>>> PUT.
>>>>>
>>>>> Indeed it seems really weird.
>>>>>>
>>>>>>
>>>>>>> 4604940 "uri":
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>>>>>>> 4604941 "http_status": "200",
>>>>>>> 4604942 "error_code": "",
>>>>>>> 4604943 "bytes_sent": 19,
>>>>>>> 4604944 "bytes_received": 0,
>>>>>>> 4604945 "object_size": 0,
>>>>>>
>>>>>> Do you see a zero object_size for other multipart uploads?
>>>>>
>>>>> I think so. I still don't know how to tell for certain if a radosgw
>>>>> object
>>>>> is a multipart object or not. I think all of the objects in
>>>>> noaa-nexrad-l2
>>>>> bucket are multipart::
>>>>>
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{
>>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket":
>>>>> "noaa-nexrad-l

Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.

2016-01-20 Thread Yehuda Sadeh-Weinraub

We'll need to confirm that this is the actual issue, and then have it
fixed. It would be nice to have some kind of a unitest that reproduces
it.

Yehuda

On Wed, Jan 20, 2016 at 1:34 PM, seapasu...@uchicago.edu
<seapasu...@uchicago.edu> wrote:
> So is there any way to prevent this from happening going forward? I mean
> ideally this should never be possible, right? Even with a complete object
> that is 0 bytes it should be downloaded as 0 bytes and have a different
> md5sum and not report as 7mb?
>
>
>
> On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu
>> <seapasu...@uchicago.edu> wrote:
>>>
>>>
>>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu
>>>> <seapasu...@uchicago.edu> wrote:
>>>>>
>>>>> I have looked all over and I do not see any explicit mention of
>>>>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor
>>>>> do
>>>>> I
>>>>> see a timestamp from November 4th although I do see log rotations
>>>>> dating
>>>>> back to october 15th. I don't think it's possible it wasn't logged so I
>>>>> am
>>>>> going through the bucket logs from the 'radosgw-admin log show
>>>>> --object'
>>>>> side and I found the following::
>>>>>
>>>>> 4604932 {
>>>>> 4604933 "bucket": "noaa-nexrad-l2",
>>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z",
>>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509",
>>>>> 4604936 "remote_addr": "",
>>>>> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e",
>>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e",
>>>>> 4604939 "operation": "PUT",
>>>>
>>>> I'd expect a multipart upload completion to be done with a POST, not a
>>>> PUT.
>>>
>>> Indeed it seems really weird.
>>>>
>>>>
>>>>> 4604940 "uri":
>>>>>
>>>>>
>>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar",
>>>>> 4604941 "http_status": "200",
>>>>> 4604942 "error_code": "",
>>>>> 4604943 "bytes_sent": 19,
>>>>> 4604944 "bytes_received": 0,
>>>>> 4604945 "object_size": 0,
>>>>
>>>> Do you see a zero object_size for other multipart uploads?
>>>
>>> I think so. I still don't know how to tell for certain if a radosgw
>>> object
>>> is a multipart object or not. I think all of the objects in
>>> noaa-nexrad-l2
>>> bucket are multipart::
>>>
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket":
>>> "noaa-nexrad-l2",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16
>>> 19:49:30.579738Z",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local":
>>> "2015-10-16 14:49:30.579738",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user":
>>> "b05f707271774dbd89674a0736c9406e",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri":
>>>
>>> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status":
>>> "200",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "",
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331,
>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received":
>>> 152,

Re: [ceph-users] How-to doc: hosting a static website on radosgw

2016-01-26 Thread Yehuda Sadeh-Weinraub

On Tue, Jan 26, 2016 at 2:37 PM, Florian Haas  wrote:
> On Tue, Jan 26, 2016 at 8:56 PM, Wido den Hollander  wrote:
>> On 01/26/2016 08:29 PM, Florian Haas wrote:
>>> Hi everyone,
>>>
>>> we recently worked a bit on running a full static website just on
>>> radosgw (akin to
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html),
>>> and didn't find a good how-to writeup out there. So we did a bit of
>>> fiddling with radosgw and HAproxy, and wrote one:
>>> https://www.hastexo.com/resources/hints-and-kinks/hosting-website-radosgw/#.VqfGx99vFhG
>>>
>>> Hopefully some of you find this useful. If you spot errors or
>>> omissions, just let us know in the comments at the bottom of the page.
>>> Thanks!
>>>
>>
>> Thanks!
>>
>> Were you aware of this work going on:
>> https://github.com/ceph/ceph/tree/wip-static-website
>>
>> This might be in the RADOS Gateway soon and then you don't need HAProxy
>> anymore.
>
> The moment this lands in a release, we'll be more than happy to ditch
> the HAProxy request/response mangling bits. But that WIP branch hasn't
> seen commits in 4 months, so we took it as an exercise in coming up

Here's a more up-to-date branch:

https://github.com/ceph/ceph/tree/wip-rgw-static-website-yehuda

We're currently testing it, and the plan is to get it in before jewel.
One caveat though, the error page handling still has some issues so at
the moment so the feature will be disabled by default for now.

Yehuda

> with something workable as an interim solution. :)
>
> Cheers,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Idea for speedup RadosGW for buckets with many objects.

2016-02-18 Thread Yehuda Sadeh-Weinraub

On Wed, Feb 17, 2016 at 12:51 PM, Krzysztof Księżyk  wrote:
> Hi,
>
> I'm experiencing problem with poor performance of RadosGW while operating on
> bucket with many object. That's known issue with LevelDB and can be
> partially resolved using shrading but I have one more idea. As I see in ceph
> osd logs all slow requests are while making call to rgw.bucket_list:
>
> 2016-02-17 03:17:56.846694 7f5396f63700  0 log_channel(cluster) log [WRN] :
> slow request 30.272904 seconds old, received at 2016-02-17 03:17:26.573742:
> osd_op(client.12611484.0:15137332 .dir.default.4162.3 [call rgw.bucket_list]
> 9.2955279 ack+read+known_if_redirected e3252) currently started
>
> I don't know exactly how Ceph internally works but maybe data required to
> return results for rgw.bucket_list could be cached for some time. Cache TTL
> would be parametrized and could be disabled to keep the same behaviour as
> current one. There can be 3 cases when there's a call to rgw.bucket_list:
> 1. no cached data
> 2. up-to-date cache
> 3. outdated cache
>
> Ad 1. First call starts generating full list. All new requests are put on
> hold. When list is ready it's saved to cache
> Ad 2. All calls are served from cache
> Ad 3. First request starts generating full list. All new requests are served
> from outdated cache until new cached data is ready
>
> This can be even optimized by periodically generating fresh cache, even if
> it's not expired yet to reduce cases when cache is outdated.

Where is the cache going to live in? Note that for it to be on rgw, it
will need to be shared among all rgw instances (serving the same
zone). On the other hand, I'm not exactly sure how the osd could cache
it (there's not mechanism at the moment that would allow that). And
the cache itself will need to be part of the osd that serves the
specific bucket index, otherwise you'd need to go to multiple osds for
that operation, which will slow down things for the general case.
Note that we need for things to be durable, otherwise we might end up
with inconsistencies when things don't go as expected (e.g., when rgw
/ osd went down).

We did some thinking recently around the bucket index area, see how
things can be improved. One way would be (for some use cases) to drop
it altogether. This could work in environment where 1. you don't need
to list objects in the bucket, and 2. no multi-zone sync. Another
possible mechanisms would be to relax the bucket index update, and
replace it with some kind of a lazy update (maybe similar to what you
suggested), and some way to rebuild the index out of the raw pool data
(maybe combining it with rados namespace).

>
> Maybe this idea is stupid, maybe not, but if it's doable it would be nice to
> have choice.

Thanks for the suggestions!

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket deletion woes

2016-03-19 Thread Yehuda Sadeh-Weinraub

On Tue, Mar 15, 2016 at 11:36 PM, Pavan Rallabhandi
 wrote:
> Hi,
>
> I find this to be discussed here before, but couldn¹t find any solution
> hence the mail. In RGW, for a bucket holding objects in the range of ~
> millions, one can find it to take for ever to delete the bucket(via
> radosgw-admin). I understand the gc(and its parameters) that would reclaim
> the space eventually, but am looking more at the bucket deletion options
> that can possibly speed up the operation.
>
> I realize, currently rgw_remove_bucket(), does it 1000 objects at a time,
> serially. Wanted to know if there is a reason(that am possibly missing and
> discussed) for this to be left that way, otherwise I was considering a
> patch to make it happen better.
>

There is no real reason. You might want to have a version of that
command that doesn't schedule the removal to gc, but rather removes
all the object parts by itself. Otherwise, you're just going to flood
the gc. You'll need to iterate through all the objects, and for each
object you'll need to remove all of it's rados objects (starting with
the tail, then the head). Removal of each rados object can be done
asynchronously, but you'll need to throttle the operations, not send
everything to the osds at once (which will be impossible, as the
objecter will throttle the requests anyway, which will lead to a high
memory consumption).

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-03 Thread Yehuda Sadeh-Weinraub

On Thu, Feb 25, 2016 at 7:17 AM, Ritter Sławomir
 wrote:
> Hi,
>
>
>
> We have two CEPH clusters running on Dumpling 0.67.11 and some of our
> "multipart objects" are incompleted. It seems that some slow requests could
> cause corruption of related S3 objects. Moveover GETs for that objects are
> working without any error messages. There are only HTTP 200 in logs as well
> as no information about problems from popular client tools/libs.
>
>
>
> The situation looks very similiar to described in bug #8269, but we are
> using fixed 0.67.11 version:  http://tracker.ceph.com/issues/8269
>
>
>
> Regards,
>
>
>
> Sławomir Ritter
>
>
>
>
>
>
>
> EXAMPLE#1
>
>
>
> slow_request
>
> 
>
> 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4
> slow requests, 4 included below; oldest blocked for > 30.727096 secs
>
> 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN]
> slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460:
> osd_op(c
>
> lient.47792965.0:185007087
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> [writef
>
> ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from
> [469,9]
>

Did these requests ever finish?

>
>
>
>
> HTTP_500 in apache.log
>
> ==
>
> 127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=56
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=57
> HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=57
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
> 127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=58
> HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> Linux/3.13.0-39-generic(syncworker)"
>
>
>
>
>
> Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST
>
> ==
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2
> 2097152   ok  2097152   10.7acc9476 (10.1476) [278,142,436]
> [278,142,436]
>
> found
> default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57
> 0 diff4194304   10.4f5be025 (10.25)   [57,310,428]
> [57,310,428]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1
> 4194304   ok  4194304   10.81191602 (10.1602) [441,109,420]
> [441,109,420]
>
> found
> default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> 2097152   ok  2097152   10.ce729ebe (10.1ebe) [260,469,9]
> [260,469,9]
>
>
>
>
>
> "Silent" GETs
>
> =
>
> # object size from headers
>
> $ s3 -u head
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> Content-Type: binary/octet-stream
>
> Content-Length: 641775701
>
> Server: nginx
>
>
>
> # but GETs only 637581397 (641775701 - missing 4194304 = 637581397)
>
> $ s3 -u get
> video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv >
> /tmp/test
>
> $  ls -al /tmp/test
>
> -rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test
>
>
>
> # no error in logs
>
> 127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET
> /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
> HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux
> x86_64)"
>
>
>
> # wget - retry for missing part, but there is no missing part, so it GETs
> head/tail of the file again
>
> $ wget
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> --2016-02-23 17:10:11--
> http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv
>
> Connecting to 127.0.0.1:88... connected.
>
> HTTP request sent, awaiting response... 200 OK
>
> Length: 641775701 (612M) [binary/octet-stream]
>
> Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'
>
>
>
> 99%
> [==>
> ] 637,581,397 63.9M/s   in 9.5s
>
>
>
> 2016-02-23 17:10:20

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-03-04 Thread Yehuda Sadeh-Weinraub

On Fri, Mar 4, 2016 at 7:26 AM, Ritter Sławomir
 wrote:
>> From: Robin H. Johnson [mailto:robb...@gentoo.org]
>> Sent: Friday, March 04, 2016 12:40 AM
>> To: Ritter Sławomir
>> Cc: ceph-us...@ceph.com; ceph-devel
>> Subject: Re: [ceph-users] Problem: silently corrupted RadosGW objects caused
>> by slow requests
>>
>> On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote:
>> > Hi,
>> >
>> > I think this is really serious problem - again:
>> >
>> > - we silently lost S3/RGW objects in clusters
>> >
>> > Moreover, it our situation looks very similiar to described in
>> > uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling).
>> FYI fix in #8269 _is_ present in Hammer:
>> commit bd8e026f88b rgw: don't allow multiple writers to same multiobject part
>>
>> --
>> Robin Hugh Johnson
>> Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
>> E-Mail : robb...@gentoo.org
>> GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> Yes,
>
> fix for #8269 also has been included in our version: Dumpling 0.67.11.
> Guys from #13764 are using patched Hammer version

I didn't notice that you were actually running Dumpling (which we
haven't supported and backported fixes for a while). Here's one issue
that you might have hit:

http://tracker.ceph.com/issues/11604

Yehuda

>
> Both situations with corrupted files are very similiar to that described in 
> #8269.
> There was a problem with 2 threads writing to the same RADOS objects.
>
> Maybe there is another one uknown and specific exception to fix?
>
> Cheers,
> SR
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can Jewel read Hammer radosgw buckets?

2016-04-25 Thread Yehuda Sadeh-Weinraub

(sorry for resubmission, adding ceph-users)

On Mon, Apr 25, 2016 at 9:47 AM, Richard Chan
 wrote:
> Hi Yehuda
>
> I created a test 3xVM setup with Hammer and one radosgw on the (separate)
> admin node; creating one user and buckets.
>
> I upgraded the VMs to jewel and created a new radosgw on one of the nodes.
>
> The object store didn't seem to survive the upgrade
>
> # radosgw-admin user info --uid=testuser
> 2016-04-26 00:41:50.713069 7fcdcc6fca40  0 RGWZoneParams::create(): error
> creating default zone params: (17) File exists
> could not fetch user info: no user info saved
>
> rados lspools
> rbd
> .rgw.root
> .rgw.control
> .rgw
> .rgw.gc
> .users.uid
> .users
> .rgw.buckets.index
> .rgw.buckets
> default.rgw.control
> default.rgw.data.root
> default.rgw.gc
> default.rgw.log
> default.rgw.users.uid
> default.rgw.users.keys
>
> Do I have to configure radosgw to use the pools with default.*?

No. Need to get it to play along nicely with the old pools.

> How do you actually do that?

What does 'radosgw-admin zone get' return?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can Jewel read Hammer radosgw buckets?

2016-04-25 Thread Yehuda Sadeh-Weinraub

I managed to reproduce the issue, and there seem to be multiple
problems. Specifically we have an issue when upgrading a default
cluster that hasn't had a zone (and region) explicitly configured
before. There is another bug that I found
(http://tracker.ceph.com/issues/15597) that makes things even a bit
more complicated.

I created the following script that might be able to fix things for you:
https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone

For future reference, this script shouldn't be used if there are any
zones configured other than the default one. It also makes some ninja
patching to the zone config because of a bug that exists currently,
but will probably not apply to any next versions.

Please let me know if you have any issues, or if this actually does its magic.

Thanks,
Yehuda

On Mon, Apr 25, 2016 at 4:10 PM, Richard Chan
 wrote:
>
>> > How do you actually do that?
>>
>> What does 'radosgw-admin zone get' return?
>>
>> Yehuda
>
>
>
> [root@node1 ceph]# radosgw-admin zone get
> unable to initialize zone: (2) No such file or directory
>
> (I don't have any rgw configuration in /etc/ceph/ceph.conf; this is from a
> clean
>
> ceph-deploy rgw create node1
>
> ## user created under Hammer
> [root@node1 ceph]# radosgw-admin user info --uid=testuser
> 2016-04-26 07:07:06.159497 7f410c33ca40  0 RGWZoneParams::create(): error
> creating default zone params: (17) File exists
> could not fetch user info: no user info saved
>
> "rgw_max_chunk_size": "524288",
> "rgw_max_put_size": "5368709120",
> "rgw_override_bucket_index_max_shards": "0",
> "rgw_bucket_index_max_aio": "8",
> "rgw_enable_quota_threads": "true",
> "rgw_enable_gc_threads": "true",
> "rgw_data": "\/var\/lib\/ceph\/radosgw\/ceph-rgw.node1",
> "rgw_enable_apis": "s3, s3website, swift, swift_auth, admin",
> "rgw_cache_enabled": "true",
> "rgw_cache_lru_size": "1",
> "rgw_socket_path": "",
> "rgw_host": "",
> "rgw_port": "",
> "rgw_dns_name": "",
> "rgw_dns_s3website_name": "",
> "rgw_content_length_compat": "false",
> "rgw_script_uri": "",
> "rgw_request_uri": "",
> "rgw_swift_url": "",
> "rgw_swift_url_prefix": "swift",
> "rgw_swift_auth_url": "",
> "rgw_swift_auth_entry": "auth",
> "rgw_swift_tenant_name": "",
> "rgw_swift_account_in_url": "false",
> "rgw_swift_enforce_content_length": "false",
> "rgw_keystone_url": "",
> "rgw_keystone_admin_token": "",
> "rgw_keystone_admin_user": "",
> "rgw_keystone_admin_password": "",
> "rgw_keystone_admin_tenant": "",
> "rgw_keystone_admin_project": "",
> "rgw_keystone_admin_domain": "",
> "rgw_keystone_api_version": "2",
> "rgw_keystone_accepted_roles": "Member, admin",
> "rgw_keystone_token_cache_size": "1",
> "rgw_keystone_revocation_interval": "900",
> "rgw_keystone_verify_ssl": "true",
> "rgw_keystone_implicit_tenants": "false",
> "rgw_s3_auth_use_rados": "true",
> "rgw_s3_auth_use_keystone": "false",
> "rgw_ldap_uri": "ldaps:\/\/",
> "rgw_ldap_binddn": "uid=admin,cn=users,dc=example,dc=com",
> "rgw_ldap_searchdn": "cn=users,cn=accounts,dc=example,dc=com",
> "rgw_ldap_dnattr": "uid",
> "rgw_ldap_secret": "\/etc\/openldap\/secret",
> "rgw_s3_auth_use_ldap": "false",
> "rgw_admin_entry": "admin",
> "rgw_enforce_swift_acls": "true",
> "rgw_swift_token_expiration": "86400",
> "rgw_print_continue": "true",
> "rgw_remote_addr_param": "REMOTE_ADDR",
> "rgw_op_thread_timeout": "600",
> "rgw_op_thread_suicide_timeout": "0",
> "rgw_thread_pool_size": "100",
> "rgw_num_control_oids": "8",
> "rgw_num_rados_handles": "1",
> "rgw_nfs_lru_lanes": "5",
> "rgw_nfs_lru_lane_hiwat": "911",
> "rgw_nfs_fhcache_partitions": "3",
> "rgw_nfs_fhcache_size": "2017",
> "rgw_zone": "",
> "rgw_zone_root_pool": ".rgw.root",
> "rgw_default_zone_info_oid": "default.zone",
> "rgw_region": "",
> "rgw_default_region_info_oid": "default.region",
> "rgw_zonegroup": "",
> "rgw_zonegroup_root_pool": ".rgw.root",
> "rgw_default_zonegroup_info_oid": "default.zonegroup",
> "rgw_realm": "",
> "rgw_realm_root_pool": ".rgw.root",
> "rgw_default_realm_info_oid": "default.realm",
> "rgw_period_root_pool": ".rgw.root",
> "rgw_period_latest_epoch_info_oid": ".latest_epoch",
> "rgw_log_nonexistent_bucket": "false",
> "rgw_log_object_name": "%Y-%m-%d-%H-%i-%n",
> "rgw_log_object_name_utc": "false",
> "rgw_usage_max_shards": "32",
> "rgw_usage_max_user_shards": "1",
> "rgw_enable_ops_log": "false",
> "rgw_enable_usage_log": "false",
> "rgw_ops_log_rados": "true",
> "rgw_ops_log_socket_path": "",
> "rgw_ops_log_data_backlog": "5242880",
> "rgw_usage_log_flush_threshold": "1024",
> "rgw_usage_log_tick_interval": "30",
>

Re: [ceph-users] Can Jewel read Hammer radosgw buckets?

2016-04-23 Thread Yehuda Sadeh-Weinraub

On Sat, Apr 23, 2016 at 6:22 AM, Richard Chan
 wrote:
> Hi Cephers,
>
> I upgraded to Jewel and noted the is massive radosgw multisite rework
> in the release notes.
>
> Can Jewel radosgw be configured to present existing Hammer buckets?
> On  a test system, jewel didn't recognise my Hammer buckets;
>
> Hammer used pools .rgw.*
> Jewel created by default: .rgw.root and default.rgw*
>
>
>

Yes, jewel should be able to read hammer buckets. If it detects that
there's an old config, it should migrate existing setup into the new
config. It seemsthat something didn't work as expected here. One way
to fix it would be to create a new zone and set its pools to point at
the old config's pools. We'll need to figure out what went wrong
though.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW not start after upgrade to Jewel

2016-04-26 Thread Yehuda Sadeh-Weinraub

On Tue, Apr 26, 2016 at 6:50 AM, Abhishek Lekshmanan  wrote:
>
> Ansgar Jazdzewski writes:
>
>> Hi,
>>
>> After plaing with the setup i got some output that looks wrong
>>
>> # radosgw-admin zone get
>>
>> "placement_pools": [
>> {
>> "key": "default-placement",
>> "val": {
>> "index_pool": ".eu-qa.rgw.buckets.inde",
>> "data_pool": ".eu-qa.rgw.buckets.dat",
>> "data_extra_pool": ".eu-qa.rgw.buckets.non-e",
>> "index_type": 0
>> }
>> }
>> ],
>>
>> i think it sould be
>>
>> index_pool = .eu-qa.rgw.buckets.index.
>> data_pool = .eu-qa.rgw.buckets
>> data_extra_pool = .eu-qa.rgw.buckets.extra
>>
>> how can i fix it?
>
> Not sure how it reached this state, but given a zone get json, you can

There's an issue now when doing radosgw-admin zone set, and the pool
names start with a period (http://tracker.ceph.com/issues/15597). The
pool name is getting truncated by one character. We will have this
fixed for the next point release, but the workaround now would be to
add an extra character in each pool name before running the zone set
command.

Yehuda

> edit this and set it back using zone set for eg
> # radosgw-admin zone get > zone.json # now edit this file
> # radosgw-admin zone set --rgw-zone="eu-qa" < zone.json
>>
>> Thanks
>> Ansgar
>>
>> 2016-04-26 13:07 GMT+02:00 Ansgar Jazdzewski :
>>> Hi all,
>>>
>>> i got an answer, that pointed me to:
>>> https://github.com/ceph/ceph/blob/master/doc/radosgw/multisite.rst
>>>
>>> 2016-04-25 16:02 GMT+02:00 Karol Mroz :
 On Mon, Apr 25, 2016 at 02:23:28PM +0200, Ansgar Jazdzewski wrote:
> Hi,
>
> we test Jewel in our  QA environment (from Infernalis to Hammer) the
> upgrade went fine but the Radosgw did not start.
>
> the error appears also with radosgw-admin
>
> # radosgw-admin user info --uid="images" --rgw-region=eu --rgw-zone=eu-qa
> 2016-04-25 12:13:33.425481 7fc757fad900  0 error in read_id for id  :
> (2) No such file or directory
> 2016-04-25 12:13:33.425494 7fc757fad900  0 failed reading zonegroup
> info: ret -2 (2) No such file or directory
> couldn't init storage provider
>
> do i have to change some settings, also for upgrade of the radosgw?

 Hi,

 Testing a recent master build (with only default region and zone),
 I'm able to successfully run the command you specified:

 % ./radosgw-admin user info --uid="testid" --rgw-region=default 
 --rgw-zone=default
 ...
 {
 "user_id": "testid",
 "display_name": "M. Tester",
 ...
 }

 Are you certain the region and zone you specified exist?

 What do the following report:

 radosgw-admin zone list
 radosgw-admin region list

 --
 Regards,
 Karol
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Abhishek Lekshmanan
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-20 Thread Yehuda Sadeh-Weinraub

On Fri, May 20, 2016 at 9:03 AM, Jonathan D. Proulx  wrote:
> Hi All,
>
> I saw the previous thread on this related to
> http://tracker.ceph.com/issues/15597
>
> and Yehuda's fix script
> https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
>
> Running this seems to have landed me in a weird state.
>
> I can create and get new buckets and objects but I've "lost" all my
> old buckets.  I'm fairly confident the "lost" data is in the
> .rgw.buckets pool but my current zone is set to use .rgw.buckets_
>
>
>
> root@ceph-mon0:~# radosgw-admin zone get
> {
> "id": "default",
> "name": "default",
> "domain_root": ".rgw_",
> "control_pool": ".rgw.control_",
> "gc_pool": ".rgw.gc_",
> "log_pool": ".log_",
> "intent_log_pool": ".intent-log_",
> "usage_log_pool": ".usage_",
> "user_keys_pool": ".users_",
> "user_email_pool": ".users.email_",
> "user_swift_pool": ".users.swift_",
> "user_uid_pool": ".users.uid_",
> "system_key": {
> "access_key": "",
> "secret_key": ""
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": ".rgw.buckets.index_",
> "data_pool": ".rgw.buckets_",
> "data_extra_pool": ".rgw.buckets.extra_",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "default.rgw.meta",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
> }
>
>
> root@ceph-mon0:~# ceph osd pool ls |grep rgw|sort
> default.rgw.meta
> .rgw
> .rgw_
> .rgw.buckets
> .rgw.buckets_
> .rgw.buckets.index
> .rgw.buckets.index_
> .rgw.control
> .rgw.control_
> .rgw.gc
> .rgw.gc_
> .rgw.root
> .rgw.root.backup
>
> Should I just adjust the zone to use the pools without trailing
> slashes?  I'm a bit lost.  the last I could see from running the

Yes. The trailing slashes were needed when upgrading for 10.2.0, as
there was another bug, and I needed to add these to compensate for it.
I should update the script now to reflect that fix. You should just
update the json and set the zone appropriately.

Yehuda

> script didn't seem to indicate any errors (though I lost the to to
> scroll back buffer before i noticed the issue)
>
> Tail of output from running script:
> https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
>
> + radosgw-admin zone set --rgw-zone=default
> zone id default{
> "id": "default",
> "name": "default",
> "domain_root": ".rgw_",
> "control_pool": ".rgw.control_",
> "gc_pool": ".rgw.gc_",
> "log_pool": ".log_",
> "intent_log_pool": ".intent-log_",
> "usage_log_pool": ".usage_",
> "user_keys_pool": ".users_",
> "user_email_pool": ".users.email_",
> "user_swift_pool": ".users.swift_",
> "user_uid_pool": ".users.uid_",
> "system_key": {
> "access_key": "",
> "secret_key": ""
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": ".rgw.buckets.index_",
> "data_pool": ".rgw.buckets_",
> "data_extra_pool": ".rgw.buckets.extra_",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "default.rgw.meta",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
> }
> + radosgw-admin zonegroup default --rgw-zonegroup=default
> + radosgw-admin zone default --rgw-zone=default
> root@ceph-mon0:~# radosgw-admin region get --rgw-zonegroup=default
> {
> "id": "default",
> "name": "default",
> "api_name": "",
> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "default",
> "zones": [
> {
> "id": "default",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"}
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"}
>
> root@ceph-mon0:~# ceph -v
> ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
> Thanks,
> -Jon
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-12 Thread Yehuda Sadeh-Weinraub

On Thu, May 12, 2016 at 12:29 AM, Saverio Proto  wrote:
>> While I'm usually not fond of blaming the client application, this is
>> really the swift command line tool issue. It tries to be smart by
>> comparing the md5sum of the object's content with the object's etag,
>> and it breaks with multipart objects. Multipart objects is calculated
>> differently (md5sum of the md5sum of each part). I think the swift
>> tool has a special handling for swift large objects (which are not the
>> same as s3 multipart objects), so that's why it works in that specific
>> use case.
>
> Well but I tried also with rclone and I have the same issue.
>
> Clients I tried
> rclone (both SWIFT and S3)
> s3cmd (S3)
> python-swiftclient (SWIFT).
>
> I can reproduce the issue with different clients.
> Once a multipart object is uploaded via S3 (with rclone or s3cmd) I
> cannot read it anymore via SWIFT (either with rclone or
> pythonswift-client).
>
> Are you saying that all SWIFT clients implementations are wrong ?

Yes.

>
> Or should the radosgw be configured with only 1 API active ?
>
> Saverio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Yehuda Sadeh-Weinraub

While I'm usually not fond of blaming the client application, this is
really the swift command line tool issue. It tries to be smart by
comparing the md5sum of the object's content with the object's etag,
and it breaks with multipart objects. Multipart objects is calculated
differently (md5sum of the md5sum of each part). I think the swift
tool has a special handling for swift large objects (which are not the
same as s3 multipart objects), so that's why it works in that specific
use case.

Yehuda

On Wed, May 11, 2016 at 7:15 AM, Saverio Proto  wrote:
> It does not work also the way around:
>
> If I upload a file with the swift client with the -S options to force
> swift to make multipart:
>
> swift upload -S 100 multipart 180.mp4
>
> Then I am not able to read the file with S3
>
> s3cmd get s3://multipart/180.mp4
> download: 's3://multipart/180.mp4' -> './180.mp4'  [1 of 1]
> download: 's3://multipart/180.mp4' -> './180.mp4'  [1 of 1]
>  38818503 of 38818503   100% in1s27.32 MB/s  done
> WARNING: MD5 signatures do not match:
> computed=961f154cc78c7bf1be3b4009c29e5a68,
> received=d41d8cd98f00b204e9800998ecf8427e
>
> Saverio
>
>
> 2016-05-11 16:07 GMT+02:00 Saverio Proto :
>> Thank you.
>>
>> It is exactly a problem with multipart.
>>
>> So I tried two clients (s3cmd and rclone). When you upload a file in
>> S3 using multipart, you are not able to read anymore this object with
>> the SWIFT API because the md5 check fails.
>>
>> Saverio
>>
>>
>>
>> 2016-05-09 12:00 GMT+02:00 Xusangdi :
>>> Hi,
>>>
>>> I'm not running a cluster as yours, but I don't think the issue is caused 
>>> by you using 2 APIs at the same time.
>>> IIRC the dash thing is append by S3 multipart upload, with a following 
>>> digit indicating the number of parts.
>>> You may want to check this reported in s3cmd community:
>>> https://sourceforge.net/p/s3tools/bugs/123/
>>>
>>> and some basic info from Amazon:
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
>>>
>>> Hope this helps :D
>>>
>>> Regards,
>>> ---Sandy
>>>
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Saverio Proto
 Sent: Monday, May 09, 2016 4:42 PM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API 
 at the same time

 I try to simplify the question to get some feedback.

 Is anyone running the RadosGW in production with S3 and SWIFT API active 
 at the same time ?

 thank you !

 Saverio


 2016-05-06 11:39 GMT+02:00 Saverio Proto :
 > Hello,
 >
 > We have been running the Rados GW with the S3 API and we did not have
 > problems for more than a year.
 >
 > We recently enabled also the SWIFT API for our users.
 >
 > radosgw --version
 > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 >
 > The idea is that each user of the system is free of choosing the S3
 > client or the SWIFT client to access the same container/buckets.
 >
 > Please tell us if this is possible by design or if we are doing 
 > something wrong.
 >
 > We have now a problem that some files wrote in the past with S3,
 > cannot be read with the SWIFT API because the md5sum always fails.
 >
 > I am able to reproduce the bug in this way:
 >
 > We have this file googlebooks-fre-all-2gram-20120701-ts.gz and we know
 > the correct md5 is 1c8113d2bd21232688221ec74dccff3a You can download
 > the same file here:
 > https://www.dropbox.com/s/auq16vdv2maw4p7/googlebooks-fre-all-2gram-20
 > 120701-ts.gz?dl=0
 >
 > rclone mkdir lss3:bugreproduce
 > rclone copy googlebooks-fre-all-2gram-20120701-ts.gz lss3:bugreproduce
 >
 > The file is successfully uploaded.
 >
 > At this point I can succesfully download again the file:
 > rclone copy lss3:bugreproduce/googlebooks-fre-all-2gram-20120701-ts.gz
 > test.gz
 >
 > but not with swift:
 >
 > swift download googlebooks-ngrams-gz
 > fre/googlebooks-fre-all-2gram-20120701-ts.gz
 > Error downloading object
 > 'googlebooks-ngrams-gz/fre/googlebooks-fre-all-2gram-20120701-ts.gz':
 > u'Error downloading fre/googlebooks-fre-all-2gram-20120701-ts.gz:
 > md5sum != etag, 1c8113d2bd21232688221ec74dccff3a !=
 > 1a209a31b4ac3eb923fac5e8d194d9d3-2'
 >
 > Also I found strange the dash character '-' at the end of the md5 that
 > is trying to compare.
 >
 > Of course upload a file with the swift client and redownloading the
 > same file just works.
 >
 > Should I open a bug for the radosgw on http://tracker.ceph.com/ ?
 >
 > thank you
 >
 > Saverio
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com

1 2 >

1 - 100 of 180 matches

Mail list logo