Re: [ceph-users] Having problem to start Radosgw
- Original Message - From: B L super.itera...@gmail.com To: ceph-users@lists.ceph.com Sent: Friday, February 13, 2015 11:55:22 PM Subject: [ceph-users] Having problem to start Radosgw Hi all, I’m having a problem to start radosgw, giving me error that I can’t diagnose: $ radosgw -c ceph.conf -d 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned ret=-2 ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown since it complains that this file exists: /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this error: $ radosgw -c ceph.conf -d 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9 Error 9 is EBADF (bad file number). Looks like there's an issue with the socket created for the fastcgi communication. How did you configure it? Yehuda 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned ret=-2 Cant somebody help me where to start fixing this? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
- Original Message - From: B L super.itera...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, February 14, 2015 11:03:42 AM Subject: Re: [ceph-users] Having problem to start Radosgw Hello Yehyda, The strace command you referred to me, shows this: https://gist.github.com/anonymous/8e9f1ced485996a263bb Additionally, I traced this log file: /var/log/radosgw/ceph-client.radosgw.gateway it has the following: 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using default settings. 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider (RADOS) 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using default settings. 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using default settings. 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using default settings. 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using default settings. 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider (RADOS) Turns out to be that the last line of the logs is thrown out by this piece of code in rgw_main.cc: … … FCGX_Init(); RGWStoreManager store_manager; if (!store_manager.init(rados, g_ceph_context)) { derr Couldn't init storage provider (RADOS) dendl; return EIO; } RGWProcess process(g_ceph_context, 20); process.run(); return 0; N.B. you can find it in:( http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc ) , 10th line from below. Is that by any means related to the problem? Not related. This actually means that it couldn't connect to the rados backend, so there's a different issue now. The strace log doesn't provide much with regard to the original issue as it didn't get to that part now. You can try bumping up the debug level (debug rgw = 20, debug ms = 1). I assume that the issue that you're seeing is that the wrong rados user and/or wrong cephx keys are being used. Try to run it again as you do usually, and see what the regular params that are being passed when starting radosgw; use these when running the strace command. Yehuda On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow files
- Original Message - From: Ben b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Craig Lewis cle...@centraldesktop.com, ceph-users ceph-us...@ceph.com Sent: Tuesday, March 17, 2015 7:28:28 PM Subject: Re: [ceph-users] Shadow files None of this helps with trying to remove defunct shadow files which number in the 10s of millions. Did it at least reflect that the garbage collection system works? Is there a quick way to see which shadow files are safe to delete easily? There's no easy process. If you know that a lot of the removed data is on buckets that shouldn't exist anymore then you could start by trying to identify that. You could do that by: $ radosgw-admin metadata list bucket then, for each bucket: $ radosgw-admin metadata get bucket:bucket name This will give you the bucket markers of all existing buckets. Each data object (head and shadow objects) is prefixed by bucket markers. Objects that don't have valid bucket markers can be removed. Note that I would first list all objects, then get the list of valid bucket markers, as the operation is racy and new buckets can be created in the mean time. We did discuss a new garbage cleanup tool that will address your specific issue, and we have a design for it, but it's not there yet. Yehuda Remembering that there are MILLIONS of objects. We have a 320TB cluster which is 272TB full. Of this, we should only actually be seeing 190TB. There is 80TB of shadow files that should no longer exist. On 2015-03-18 02:00, Yehuda Sadeh-Weinraub wrote: - Original Message - From: Ben b@benjackson.email To: Craig Lewis cle...@centraldesktop.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com, ceph-users ceph-us...@ceph.com Sent: Monday, March 16, 2015 3:38:42 PM Subject: Re: [ceph-users] Shadow files Thats the thing. The peaks and troughs are in USERS BUCKETS only. The actual cluster usage does not go up and down, it just goes up up up. I would expect to see peaks and troughs much the same as the user buckets peaks and troughs on the overall cluster disk usage. But this is not the case. We upgraded the cluster and radosgws to GIANT (0.87.1) yesterday, and now we are seeing a large number of misplaced(??) objects being moved around. Does this mean it has found all the shadow files that shouldn't exist anymore, and is deleting them? If so I would expect to start seeing overall cluster usage drop, but this hasn't happened yet. No, I don't think so. Sounds like your cluster is recovering, and it happens in a completely different layer. Any ideas? try running: $ radosgw-admin gc list --include-all This should be showing all the shadow objects that are pending for delete. Note that if you have a non-default radosgw configuration, make sure you run radosgw-admin using the same user and config that radosgw is running (e.g., add -n client.user appropriately), otherwise it might not look at the correct zone data. You could create an object, identify the shadow objects for that object, remove it, check to see that the gc list command shows these shadow objects. Then, wait the configured time (2 hours?), and see if it was removed. Yehuda On 2015-03-17 06:12, Craig Lewis wrote: Out of curiousity, what's the frequency of the peaks and troughs? RadosGW has configs on how long it should wait after deleting before garbage collecting, how long between GC runs, and how many objects it can GC in per run. The defaults are 2 hours, 1 hour, and 32 respectively. Search http://docs.ceph.com/docs/master/radosgw/config-ref/ [2] for rgw gc. If your peaks and troughs have a frequency less than 1 hour, then GC is going to delay and alias the disk usage w.r.t. the object count. If you have millions of objects, you probably need to tweak those values. If RGW is only GCing 32 objects an hour, it's never going to catch up. Now that I think about it, I bet I'm having issues here too. I delete more than (32*24) objects per day... On Sun, Mar 15, 2015 at 4:41 PM, Ben b@benjackson.email wrote: It is either a problem with CEPH, Civetweb or something else in our configuration. But deletes in user buckets is still leaving a high number of old shadow files. Since we have millions and millions of objects, it is hard to reconcile what should and shouldnt exist. Looking at our cluster usage, there are no troughs, it is just a rising peak. But when looking at users data usage, we can see peaks and troughs as you would expect as data is deleted and added. Our ceph version 0.80.9 Please ideas? On 2015-03-13 02:25, Yehuda Sadeh-Weinraub wrote: - Original Message - From: Ben b@benjackson.email To: ceph-us...@ceph.com Sent: Wednesday, March 11, 2015 8:46:25 PM Subject: Re: [ceph-users] Shadow files Anyone
Re: [ceph-users] RadosGW Direct Upload Limitation
- Original Message - From: Craig Lewis cle...@centraldesktop.com To: Gregory Farnum g...@gregs42.com Cc: ceph-users@lists.ceph.com Sent: Monday, March 16, 2015 11:48:15 AM Subject: Re: [ceph-users] RadosGW Direct Upload Limitation Maybe, but I'm not sure if Yehuda would want to take it upstream or not. This limit is present because it's part of the S3 spec. For larger objects you should use multi-part upload, which can get much bigger. -Greg Note that the multi-part upload has a lower limit of 4MiB per part, and the direct upload has an upper limit of 5GiB. The limit is 10MB, but it does not apply to the last part, so basically you could upload any object size with it. I would still recommend using the plain upload for smaller object sizes, it is faster, and the resulting object might be more efficient (for really small sizes). Yehuda So you have to use both methods - direct upload for small files, and multi-part upload for big files. Your best bet is to use the Amazon S3 libraries. They have functions that take care of it for you. I'd like to see this mentioned in the Ceph documentation someplace. When I first encountered the issue, I couldn't find a limit in the RadosGW documentation anywhere. I only found the 5GiB limit in the Amazon API documentation, which lead me to test on RadosGW. Now that I know it was done to preserve Amazon compatibility, I don't want to override the value anymore. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] FastCGI and RadosGW issue?
- Original Message - From: Potato Farmer potato_far...@outlook.com To: ceph-users@lists.ceph.com Sent: Thursday, March 19, 2015 12:26:41 PM Subject: [ceph-users] FastCGI and RadosGW issue? Hi, I am running into an issue uploading to a bucket over an s3 connection to ceph. I can create buckets just fine. I just can’t create a key and copy data to it. Command that causes the error: key.set_contents_from_string(testing from string) I encounter the following error: Traceback (most recent call last): File stdin, line 1, in module File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 1424, in set_contents_from_string encrypt_key=encrypt_key) File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 1291, in set_contents_from_file chunked_transfer=chunked_transfer, size=size) File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 748, in send_file chunked_transfer=chunked_transfer, size=size) File /usr/lib/python2.7/site-packages/boto/s3/key.py, line 949, in _send_file_internal query_args=query_args File /usr/lib/python2.7/site-packages/boto/s3/connection.py, line 664, in make_request retry_handler=retry_handler File /usr/lib/python2.7/site-packages/boto/connection.py, line 1068, in make_request retry_handler=retry_handler) File /usr/lib/python2.7/site-packages/boto/connection.py, line 1025, in _mexe raise BotoServerError(response.status, response.reason, body) boto.exception.BotoServerError: BotoServerError: 500 Internal Server Error None In the Apache logs I see the following: [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: comm with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec) [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: comm with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec) [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi I do not get any data to show in the radosgw logs, it is empty. I have turned off FastCGIWrapper and set rgw print continue to false in ceph.conf. I am using the version of FastCGI provided by the ceph repo. In this case you don't need to have 'rgw print continue' set to false; either remove that line, or set it to true. Yehuda Has anyone run into this before? Any suggestions? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow files
- Original Message - From: Abhishek L abhishek.lekshma...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com Sent: Wednesday, March 18, 2015 10:54:37 AM Subject: Re: [ceph-users] Shadow files Yehuda Sadeh-Weinraub writes: Is there a quick way to see which shadow files are safe to delete easily? There's no easy process. If you know that a lot of the removed data is on buckets that shouldn't exist anymore then you could start by trying to identify that. You could do that by: $ radosgw-admin metadata list bucket then, for each bucket: $ radosgw-admin metadata get bucket:bucket name This will give you the bucket markers of all existing buckets. Each data object (head and shadow objects) is prefixed by bucket markers. Objects that don't have valid bucket markers can be removed. Note that I would first list all objects, then get the list of valid bucket markers, as the operation is racy and new buckets can be created in the mean time. We did discuss a new garbage cleanup tool that will address your specific issue, and we have a design for it, but it's not there yet. Could you share the design/ideas for making the cleanup tool. After an initial search I could only find two issues [1] http://tracker.ceph.com/issues/10342 It is sketched in here (10342), probably needs to be better formatted and documented. Yehuda [2] http://tracker.ceph.com/issues/9604 though not much details are there to get started. -- Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
- Original Message - From: Steffen Winther ceph.u...@siimnet.dk To: ceph-users@lists.ceph.com Sent: Monday, March 9, 2015 12:43:58 AM Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP Steffen W Sørensen stefws@... writes: Response: HTTP/1.1 200 OK Date: Fri, 06 Mar 2015 10:41:14 GMT Server: Apache/2.2.22 (Fedora) Connection: close Transfer-Encoding: chunked Content-Type: application/xml This response makes the App say: S3.createBucket, class S3, code UnexpectedContent, message Inconsistency in S3 response. error response is not a valid xml message Are our S3 GW not responding properly? Why doesn't the radosGW return a Content-Length: 0 header when the body is empty? If you're using apache, then it filters out zero Content-Length. Nothing much radosgw can do about it. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html Maybe this is confusing my App to expect some XML in body You can try using the radosgw civetweb frontend, see if it changes anything. Yehuda 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? # rados -p .rgw ls .bucket.meta.mssCl:default.6309817.1 .bucket.meta.mssCl:default.6187712.3 .bucket.meta.mssCl:default.6299841.7 .bucket.meta.mssCl:default.6309817.5 .bucket.meta.mssCl:default.6187712.2 .bucket.meta.mssCl:default.6187712.19 .bucket.meta.mssCl:default.6187712.12 mssCl ... # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12 ceph.objclass.version user.rgw.acl /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
- Original Message - From: Steffen Winther ceph.u...@siimnet.dk To: ceph-users@lists.ceph.com Sent: Monday, March 9, 2015 1:25:43 PM Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP Yehuda Sadeh-Weinraub yehuda@... writes: If you're using apache, then it filters out zero Content-Length. Nothing much radosgw can do about it. You can try using the radosgw civetweb frontend, see if it changes anything. Thanks, only no difference... Req: PUT /mssCl/ HTTP/1.1 Host: rgw.gsp.sprawl.dk:7480 Authorization: AWS auth id Date: Mon, 09 Mar 2015 20:18:16 GMT Content-Length: 0 Response: HTTP/1.1 200 OK Content-type: application/xml Content-Length: 0 App still says: S3.createBucket, class S3, code UnexpectedContent, message Inconsistency in S3 response. error response is not a valid xml message :/ According to the api specified here http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html, there's no response expected. I can only assume that the application tries to decode the xml if xml content type is returned. What kind of application is that? Yehuda any comments on below 2. issue? 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Yehuda # rados -p .rgw ls .bucket.meta.mssCl:default.6309817.1 .bucket.meta.mssCl:default.6187712.3 .bucket.meta.mssCl:default.6299841.7 .bucket.meta.mssCl:default.6309817.5 .bucket.meta.mssCl:default.6187712.2 .bucket.meta.mssCl:default.6187712.19 .bucket.meta.mssCl:default.6187712.12 mssCl ... # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12 ceph.objclass.version user.rgw.acl /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw admin api - users
The metadata api can do it: GET /admin/metadata/user Yehuda - Original Message - From: Joshua Weaver joshua.wea...@ctl.io To: ceph-us...@ceph.com Sent: Thursday, March 5, 2015 1:43:33 PM Subject: [ceph-users] rgw admin api - users According to the docs at http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info I should be able to invoke /admin/user without a quid specified, and get a list of users. No matter what I try, I get a 403. After looking at the source at github (ceph/ceph), it appears that there isn’t any code path that would result in a collection of users to be generated from that resource. Am I missing something? TIA, _josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] not existing key from s3 list
- Original Message - From: Dominik Mostowiec dominikmostow...@gmail.com To: ceph-users@lists.ceph.com Sent: Friday, March 13, 2015 4:50:18 PM Subject: [ceph-users] not existing key from s3 list Hi, I found a strange problem with not existing file in s3. Object exists in list # s3 -u list bucketimages | grep 'files/fotoobject_83884@2/55673' files/fotoobject_83884@2/55673.JPG 2014-03-26T22:25:59Z 349K but: # s3 -u head 'bucketimages/files/fotoobject_83884@2/55673.JPG' ERROR: HttpErrorNotFound After a little digging: # radosgw-admin --bucket=bucketimages bucket stats | grep marker marker: default.7573587.55, # rados listomapkeys .dir.default.7573587.55 -p .rgw.buckets.index | grep 'files/fotoobject' files/fotoobject_83884@2/55673.JPG # rados -p .rgw.buckets.index getomapval .dir.default.7573587.55 'files/fotoobject_83884@2/55673.JPG' No such key: .rgw.buckets.index/.dir.default.7573587.55/files/fotoobject_83884@2/55673.JPG What is wrong? It is likely that this object failed to upload, we returned an error for that, but there was a bug (fixed recently) that we didn't clear the bucket index entry correctly. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow files
- Original Message - From: Ben b@benjackson.email To: ceph-us...@ceph.com Sent: Wednesday, March 11, 2015 8:46:25 PM Subject: Re: [ceph-users] Shadow files Anyone got any info on this? Is it safe to delete shadow files? It depends. Shadow files are badly named objects that represent part of the objects data. They are only safe to remove if you know that the corresponding objects no longer exist. Yehuda On 2015-03-11 10:03, Ben wrote: We have a large number of shadow files in our cluster that aren't being deleted automatically as data is deleted. Is it safe to delete these files? Is there something we need to be aware of when deleting them? Is there a script that we can run that will delete these safely? Is there something wrong with our cluster that it isn't deleting these files when it should be? We are using civetweb with radosgw, with tengine ssl proxy infront of it Any advice please Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
- Original Message - From: Steffen Winther ceph.u...@siimnet.dk To: ceph-users@lists.ceph.com Sent: Tuesday, March 10, 2015 12:06:38 AM Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP Yehuda Sadeh-Weinraub yehuda@... writes: According to the api specified here http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html, there's no response expected. I can only assume that the application tries to decode the xml if xml content type is returned. Also what I hinted App vendor What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Ok, any resolution/work-around to this? Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Auth URL not found when using object gateway
- Original Message - From: Greg Meier greg.me...@nyriad.com To: ceph-users@lists.ceph.com Sent: Tuesday, March 24, 2015 4:24:16 PM Subject: [ceph-users] Auth URL not found when using object gateway Hi, I'm having trouble setting up an object gateway on an existing cluster. The cluster I'm trying to add the gateway to is running on a Precise 12.04 virtual machine. The cluster is up and running, with a monitor, two OSDs, and a metadata server. It returns HEALTH_OK and active+clean, so I am somewhat assured that it is running correctly. I've: - set up an apache2 webserver with the fastcgi mod installed - created an rgw.conf file - added an s3gw.fcgi script - enabled the rgw.conf site and disabled the default - created a keyring and gateway user with appropriate cap's - restarted ceph, apache2, and the radosgw daemon - created a user and subuser - tested both s3 and swift calls Unfortunately, both s3 and swift fail to authorize. An attempt to create a new bucket with s3 using a python script returns: Traceback (most recent call last): File s3test.py, line 13, in module bucket = conn.create_bucket('my-new-bucket') File /usr/lib/python2.7/dist-packages/boto/s3/connection.py, line 422, in create_bucket response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 404 Not Found None And an attempt to post a container using the python-swiftclient from the command line with command: swift --debug --info -A http://localhost/auth/1.0 -U gatewayuser:swift -K key post new_container returns: INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost DEBUG:urllib3.connectionpool:GET /auth/1.0 HTTP/1.1 404 180 INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET INFO:swiftclient:RESP STATUS: 404 Not Found INFO:swiftclient:RESP HEADERS: [('content-length', '180'), ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'), ('content-type', 'text/html; charset=iso-8859-1'), ('vary', 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')] INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+ �|-�X)��V��b�a���與'@Uo���-�n��?� ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181, in _retry self.url, self.token = self.get_auth() File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155, in get_auth insecure=self.insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318, in get_auth insecure=insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241, in get_auth_1_0 http_reason=resp.reason) ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost DEBUG:urllib3.connectionpool:GET /auth/1.0 HTTP/1.1 404 180 INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET INFO:swiftclient:RESP STATUS: 404 Not Found INFO:swiftclient:RESP HEADERS: [('content-length', '180'), ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'), ('content-type', 'text/html; charset=iso-8859-1'), ('vary', 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')] INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+ �|-�X)��V��b�a���與'@Uo���-�n��?� ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1181, in _retry self.url, self.token = self.get_auth() File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 1155, in get_auth insecure=self.insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 318, in get_auth insecure=insecure) File /usr/lib/python2.7/dist-packages/swiftclient/client.py, line 241, in get_auth_1_0 http_reason=resp.reason) ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found Auth GET failed: http://localhost/auth/1.0 404 Not Found I'm not at all sure why it doesn't work when I've followed the documentation for setting it up. Please find attached, the config files for rgw.conf, ceph.conf, and apache2.conf What does the rgw log show? (please add 'debug rgw = 20' and 'debug ms = 1') Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw authorization failed
- Original Message - From: Neville neville.tay...@hotmail.co.uk To: ceph-users@lists.ceph.com Sent: Wednesday, March 25, 2015 8:16:39 AM Subject: [ceph-users] Radosgw authorization failed Hi all, I'm testing backup product which supports Amazon S3 as target for Archive storage and I'm trying to setup a Ceph cluster configured with the S3 API to use as an internal target for backup archives instead of AWS. I've followed the online guide for setting up Radosgw and created a default region and zone based on the AWS naming convention US-East-1. I'm not sure if this is relevant but since I was having issues I thought it might need to be the same. I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create a bucket, create a folder, list buckets etc. The problem is when the backup software tries to create an object I get an authorization failure. It's using the same user/access/secret as I'm using from boto.s3 and I'm sure the creds are right as it lets me create the initial connection, it just fails when trying to create an object (backup folder). Here's the extract from the radosgw log: - 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET /:list_bucket:init op 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET /:list_bucket:verifying op mask 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET /:list_bucket:verifying op permissions 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test mask=49 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1 mask=49 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2 mask=49 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test owner=test perm=1 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1, policy perm=1, user_perm_mask=1, acl perm=1 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET /:list_bucket:verifying op params 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET /:list_bucket:executing 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) start num 1001 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET /:list_bucket:http status=200 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0 http_status=200 == 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ: 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88 2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX= 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs= 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015 15:07:26 GMT 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue 2015-03-25 15:07:26.517093 7f1058dd7700 20 HTTP_HOST=test1.devops-os-cog01.devops.local 2015-03-25 15:07:26.517094 7f1058dd7700 20 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1 Java_HotSpot(TM)_Client_VM/24.55-b03 2015-03-25 15:07:26.517096 7f1058dd7700 20 HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26 2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88 2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD 2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on 2015-03-25 15:07:26.517100 7f1058dd7700 20 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2015-03-25 15:07:26.517100 7f1058dd7700 20 QUERY_STRING= 2015-03-25 15:07:26.517101 7f1058dd7700 20 REMOTE_ADDR=10.40.41.106 2015-03-25 15:07:26.517102 7f1058dd7700 20 REMOTE_PORT=55439 2015-03-25 15:07:26.517103 7f1058dd7700 20
Re: [ceph-users] Radosgw authorization failed
- Original Message - From: Neville neville.tay...@hotmail.co.uk To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Monday, March 30, 2015 6:49:29 AM Subject: Re: [ceph-users] Radosgw authorization failed Date: Wed, 25 Mar 2015 11:43:44 -0400 From: yeh...@redhat.com To: neville.tay...@hotmail.co.uk CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Radosgw authorization failed - Original Message - From: Neville neville.tay...@hotmail.co.uk To: ceph-users@lists.ceph.com Sent: Wednesday, March 25, 2015 8:16:39 AM Subject: [ceph-users] Radosgw authorization failed Hi all, I'm testing backup product which supports Amazon S3 as target for Archive storage and I'm trying to setup a Ceph cluster configured with the S3 API to use as an internal target for backup archives instead of AWS. I've followed the online guide for setting up Radosgw and created a default region and zone based on the AWS naming convention US-East-1. I'm not sure if this is relevant but since I was having issues I thought it might need to be the same. I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create a bucket, create a folder, list buckets etc. The problem is when the backup software tries to create an object I get an authorization failure. It's using the same user/access/secret as I'm using from boto.s3 and I'm sure the creds are right as it lets me create the initial connection, it just fails when trying to create an object (backup folder). Here's the extract from the radosgw log: - 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET /:list_bucket:init op 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET /:list_bucket:verifying op mask 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET /:list_bucket:verifying op permissions 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test mask=49 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1 mask=49 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2 mask=49 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test owner=test perm=1 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1, policy perm=1, user_perm_mask=1, acl perm=1 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET /:list_bucket:verifying op params 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET /:list_bucket:executing 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) start num 1001 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET /:list_bucket:http status=200 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0 http_status=200 == 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ: 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88 2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX= 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs= 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015 15:07:26 GMT 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue 2015-03-25 15:07:26.517093 7f1058dd7700 20 HTTP_HOST=test1.devops-os-cog01.devops.local 2015-03-25 15:07:26.517094 7f1058dd7700 20 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1 Java_HotSpot(TM
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
- Original Message - From: Steffen W Sørensen ste...@me.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Friday, February 27, 2015 9:39:46 AM Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. When launching radosgw it logs this: ... 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0 read_len=524288 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put: name=.rgw.root+zone_info.default 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default to cache LRU end 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get: name=.rgw.root+region_map : miss 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put: name=.rgw.root+region_map 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to cache LRU end 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my user name} 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my user name} 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header: ret=-2 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my user name} ret=-2 Why does it seem to find my radosgw defined user name as a pool and what might bring it to fail to read user header? That's just a red herring. It tries to sync the user stats, but it can't because quota is not enabled (iirc). We should probably get rid of these messages as they're pretty confusing. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
- Original Message - From: Steffen W Sørensen ste...@me.com To: ceph-users@lists.ceph.com Sent: Friday, February 27, 2015 6:40:01 AM Subject: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed Hi, Newbie to RadosGW+Ceph, but learning... Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to verify a RadosGW S3 api, but seems to have an issue with RadosGW access. I get the error (not found anything searching so far...): S3ResponseError: 405 Method Not Allowed when trying to access the rgw. Apache vhost access log file says: 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 - Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 and Apache's general error_log file says: [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/ auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc= RadosGW seems to launch and run fine, though /var/log/messages at launches says: Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip 003fb36cb1dc sp 7fffde221410 error 4 in librados.so.2.0.0[3fb320+6d] # ps -fuapache UIDPID PPID C STIME TTY TIME CMD apache 15113 15111 0 14:07 ?00:00:00 /usr/sbin/fcgi- apache 15114 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15115 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15116 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15117 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15118 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15119 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15120 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15121 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15224 1 1 14:12 ?00:00:25 /usr/bin/radosgw -n client.radosgw.owmblob RadosGW create my FastCGI socket and a default .asok, (not sure why/what default socket are meant for) as well as the configured log file though it never logs anything... # tail -18 /etc/ceph/ceph.conf: [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file didn't exist, assuming radosgw creates this log file when opening it, only it creates it as root not $USER, thus not output, manually chowning it and restarting GW gives output ala: 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ: 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg= 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015 14:25:14 GMT 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING= 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/ 2015-02-27 15:25:14.469677 7fef431e4700 20 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/ 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/ 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/ 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email} 2015-02-27 15:25:14.469782 7fef431e4700
Re: [ceph-users] Hammer sharded radosgw bucket indexes question
- Original Message - From: Ben Hines bhi...@gmail.com To: ceph-users ceph-users@lists.ceph.com Sent: Wednesday, March 4, 2015 1:03:16 PM Subject: [ceph-users] Hammer sharded radosgw bucket indexes question Hi, These questions were asked previously but perhaps lost: We have some large buckets. - When upgrading to Hammer (0.93 or later), is it necessary to recreate the buckets to get a sharded index? - What parameters does the system use for deciding when to shard the index? The system does not re-shard the bucket index, it will only affect new buckets. There is a per-zone configurable that specifies num of shards for buckets created in that zone (by default it's disabled). There's also a ceph.conf configurable that can be set to override that value. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Understand RadosGW logs
- Original Message - From: Daniel Schneller daniel.schnel...@centerdevice.com To: ceph-users@lists.ceph.com Sent: Tuesday, March 3, 2015 2:54:13 AM Subject: [ceph-users] Understand RadosGW logs Hi! After realizing the problem with log rotation (see http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708) and fixing it, I now for the first time have some meaningful (and recent) logs to look at. While from an application perspective there seem to be no issues, I would like to understand some messages I find with relatively high frequency in the logs: Exhibit 1 - 2015-03-03 11:14:53.685361 7fcf4bfef700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:15:57.476059 7fcf39ff3700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:17:43.570986 7fcf25fcb700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:22:00.881640 7fcf39ff3700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:22:48.147011 7fcf35feb700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:27:40.572723 7fcf50ff9700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:29:40.082954 7fcf36fed700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 11:30:32.204492 7fcf4dff3700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 It means that returning data to the client got some error, usually means that the client disconnected before completion. I cannot find anything relevant by Googling for that, apart from the actual line of code that produces this line. What does that mean? Is it an indication of data corruption or are there more benign reasons for this line? Exhibit 2 -- Several of these blocks 2015-03-03 07:06:17.805772 7fcf36fed700 1 == starting new request req=0x7fcf5800f3b0 = 2015-03-03 07:06:17.836671 7fcf36fed700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-03-03 07:06:17.836758 7fcf36fed700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-03-03 07:06:17.836918 7fcf36fed700 0 RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243 part_ofs=0 rule-part_size=0 2015-03-03 07:06:18.263126 7fcf36fed700 1 == req done req=0x7fcf5800f3b0 http_status=200 == ... 2015-03-03 09:27:29.855001 7fcf28fd1700 1 == starting new request req=0x7fcf580102a0 = 2015-03-03 09:27:29.866718 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.866778 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.866852 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.866917 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.875466 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.884434 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.906155 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.914364 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0 2015-03-03 09:27:29.940653 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024 part_ofs=0 rule-part_size=0 2015-03-03 09:27:30.272816 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328 part_ofs=0 rule-part_size=0 2015-03-03 09:27:31.125773 7fcf28fd1700 0 RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632 part_ofs=0 rule-part_size=0 2015-03-03 09:27:31.192661 7fcf28fd1700 0 ERROR: flush_read_list(): d-client_c-handle_data() returned -1 2015-03-03 09:27:31.194481 7fcf28fd1700 1 == req done req=0x7fcf580102a0 http_status=200 == ... 2015-03-03 09:28:43.008517 7fcf2a7d4700 1 == starting new request req=0x7fcf580102a0 = 2015-03-03 09:28:43.016414 7fcf2a7d4700 0 RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579 part_ofs=0 rule-part_size=0 2015-03-03 09:28:43.022387 7fcf2a7d4700 1 == req done req=0x7fcf580102a0 http_status=200 == First, what is the req= line? Is that a thread-id? I am asking, because the same id is used over and over in the same file over time. It's the request id (within the current radosgw instance) More
Re: [ceph-users] RadosGW - multiple dns names
- Original Message - From: Shinji Nakamoto shinji.nakam...@mgo.com To: ceph-us...@ceph.com Sent: Friday, February 20, 2015 3:58:39 PM Subject: [ceph-users] RadosGW - multiple dns names We have multiple interfaces on our Rados gateway node, each of which is assigned to one of our many VLANs with a unique IP address. Is it possible to set multiple DNS names for a single Rados GW, so it can handle the request to each of the VLAN specific IP address DNS names? Not yet, however, the upcoming hammer release will support that (hostnames will be configured as part of the region). Yehuda eg. rgw dns name = prd-apiceph001 rgw dns name = prd-backendceph001 etc. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mixed ceph versions
- Original Message - From: Gregory Farnum g...@gregs42.com To: Tom Deneau tom.den...@amd.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, February 25, 2015 3:20:07 PM Subject: Re: [ceph-users] mixed ceph versions On Wed, Feb 25, 2015 at 3:11 PM, Deneau, Tom tom.den...@amd.com wrote: I need to set up a cluster where the rados client (for running rados bench) may be on a different architecture and hence running a different ceph version from the osd/mon nodes. Is there a list of which ceph versions work together for a situation like this? The RADOS protocol is architecture-independent, and while we don't test across a huge version divergence (mostly between LTS releases) the client should also be compatible with pretty much anything you have server-side. Client stuff like rgw usually requires that the backend runs a version at least as new (for objclass functionality). Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS Gateway quota management
Great, I opened issue # 11323. Thanks, Yehuda - Original Message - From: Sergey Arkhipov sarkhi...@asdco.ru To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Friday, April 3, 2015 1:00:02 AM Subject: Re: [ceph-users] RADOS Gateway quota management Hi, Thank you for your answer! Meanwhile I did some investigations and found the reason: quota works on PUTs perfectly, but there are no checks on POSTs. I've made a pull-request: https://github.com/ceph/ceph/pull/4240 2015-04-02 18:40 GMT+03:00 Yehuda Sadeh-Weinraub yeh...@redhat.com : From: Sergey Arkhipov sarkhi...@asdco.ru To: ceph-users@lists.ceph.com Sent: Monday, March 30, 2015 2:55:33 AM Subject: [ceph-users] RADOS Gateway quota management Hi, Currently I am trying to figure out how to work with RADOS Gateway (ceph 0.87) limits and I've managed to produce such strange behavior: { bucket: test1-8, pool: .rgw.buckets, index_pool: .rgw.buckets.index, id: default.17497.14, marker: default.17497.14, owner: cb254310-8b24-4622-93fb-640ca4a45998, ver: 21, master_ver: 0, mtime: 1427705802, max_marker: , usage: { rgw.main: { size_kb: 16000, size_kb_actual: 16020, num_objects: 9}}, bucket_quota: { enabled: true, max_size_kb: -1, max_objects: 3}} Steps to reproduce: create bucket, set quota like that (max_objects = 3 and enable) and successfully upload 9 files. User quota is also defined: bucket_quota: { enabled: true, max_size_kb: -1, max_objects: 3}, user_quota: { enabled: true, max_size_kb: 1048576, max_objects: 5}, Could someone please help me to understand how to limit users? -- The question is whether the user is able to continue writing objects at this point. The quota system is working asynchronously, so it's possible to get into edge cases where users exceeded it a bit (it looks a whole lot better with larger numbers). The question is whether it's working for you at all. Yehuda -- Sergey Arkhipov Software Engineer, ASD Technologies Phone: +7 920 018 9404 Skype: serge.arkhipov sarkhi...@asdco.ru asdtech.co ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Purpose of the s3gw.fcgi script?
- Original Message - From: Francois Lafont flafdiv...@free.fr To: ceph-users@lists.ceph.com Sent: Monday, April 13, 2015 5:17:47 PM Subject: Re: [ceph-users] Purpose of the s3gw.fcgi script? Hi, Yehuda Sadeh-Weinraub wrote: You're not missing anything. The script was only needed when we used the process manager of the fastcgi module, but it has been very long since we stopped using it. Just to be sure, so if I understand well, these parts of the documentation: 1. http://docs.ceph.com/docs/master/radosgw/config/#create-a-cgi-wrapper-script 2. http://docs.ceph.com/docs/master/radosgw/config/#adjust-cgi-wrapper-script-permission can be completely skipped. Is it correct? Yes. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Swift and Ceph
Sounds like you're hitting a known issue that was fixed a while back (although might not be fixed on the specific version you're running). Can you try creating a second subuser for the same user, see if that one works? Yehuda - Original Message - From: alistair whittle alistair.whit...@barclays.com To: ceph-users@lists.ceph.com Sent: Thursday, April 23, 2015 8:38:44 AM Subject: [ceph-users] Swift and Ceph All, I was hoping for some advice. I have recently built a Ceph cluster on RHEL 6.5 and have configured RGW. I want to test Swift API access, and as a result have created a user, swift subuser and swift keys as per the output below: 1. Create user radosgw-admin user create --uid=testuser1 --display-name=Test User1 { user_id: testuser1, display_name: Test User1, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: testuser1, access_key: MJBEZLJ7BYG8XODXT71V, secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} 2. Create subuser. radosgw-admin subuser create --uid=testuser1 --subuser=testuser1:swift --access=full { user_id: testuser1, display_name: Test User1, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser1:swift, permissions: full-control}], keys: [ { user: testuser1:swift, access_key: HX9Q30EJWCZG825AT7B0, secret_key: }, { user: testuser1, access_key: MJBEZLJ7BYG8XODXT71V, secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} 3. Create key radosgw-admin key create --subuser=testuser1:swift --key-type=swift --gen-secret { user_id: testuser1, display_name: Test User1, email: , suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: testuser1:swift, permissions: full-control}], keys: [ { user: testuser1:swift, access_key: HX9Q30EJWCZG825AT7B0, secret_key: }, { user: testuser1, access_key: MJBEZLJ7BYG8XODXT71V, secret_key: tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p}], swift_keys: [ { user: testuser1:swift, secret_key: KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf}], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} When I try and do anything using the credentials above, I get “Account not found” errors as per the example below: swift -A https://FQDN/auth/1.0 -U testuser1:swift -K KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf list That’s the first thing. Secondly, when I follow the process above to create a second user “testuser2”, the user and subuser is created, however, when I try and generate a swift key for it, I get the following error: radosgw-admin key create --subuser=testuser2:swift --key-type=swift --gen-secret could not create key: unable to add access key, unable to store user info 2015-04-23 15:42:38.897090 7f38e157d820 0 WARNING: can't store user info, swift id () already mapped to another user (testuser2) This suggests there is something wrong with the users or the configuration of the gateway somewhere. Can someone provide some advice on what might be wrong, or where I can look to find out. I have gone through whatever log files I can and don’t see anything of any use at the moment. Any help appreciated. Thanks Alistair ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer . For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com . ___ ___
Re: [ceph-users] Shadow Files
These ones: http://tracker.ceph.com/issues/10295 http://tracker.ceph.com/issues/11447 - Original Message - From: Ben Jackson b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 3:06:02 PM Subject: Re: [ceph-users] Shadow Files We were firefly, then we upgraded to giant, now we are on hammer. What issues? On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - From: Ben b@benjackson.email To: ceph-users ceph-us...@ceph.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com Sent: Thursday, April 23, 2015 7:42:06 PM Subject: [ceph-users] Shadow Files We are still experiencing a problem with out gateway not properly clearing out shadow files. I have done numerous tests where I have: -Uploaded a file of 1.5GB in size using s3browser application -Done an object stat on the file to get its prefix -Done rados ls -p .rgw.buckets | grep prefix to count the number of shadow files associated (in this case it is around 290 shadow files) -Deleted said file with s3browser -Performed a gc list, which shows the ~290 files listed -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to recount the shadow files only to be left with 290 files still there From log output /var/log/ceph/radosgw.log, I can see the following when clicking DELETE (this appears 290 times) 2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0 In this same log, I also see the gc process saying it is removing said file (these records appear 290 times too) 2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.929636 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.930448 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.931226 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.932103 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.933470 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname So even though it appears that the GC is processing its removal, the shadow files remain! Please help! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow Files
What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - From: Ben b@benjackson.email To: ceph-users ceph-us...@ceph.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com Sent: Thursday, April 23, 2015 7:42:06 PM Subject: [ceph-users] Shadow Files We are still experiencing a problem with out gateway not properly clearing out shadow files. I have done numerous tests where I have: -Uploaded a file of 1.5GB in size using s3browser application -Done an object stat on the file to get its prefix -Done rados ls -p .rgw.buckets | grep prefix to count the number of shadow files associated (in this case it is around 290 shadow files) -Deleted said file with s3browser -Performed a gc list, which shows the ~290 files listed -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to recount the shadow files only to be left with 290 files still there From log output /var/log/ceph/radosgw.log, I can see the following when clicking DELETE (this appears 290 times) 2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0 In this same log, I also see the gc process saying it is removing said file (these records appear 290 times too) 2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.929636 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.930448 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.931226 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.932103 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.933470 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname So even though it appears that the GC is processing its removal, the shadow files remain! Please help! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow Files
Yeah, that's definitely something that we'd address soon. Yehuda - Original Message - From: Ben b@benjackson.email To: Ben Hines bhi...@gmail.com, Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 5:14:11 PM Subject: Re: [ceph-users] Shadow Files Definitely need something to help clear out these old shadow files. I'm sure our cluster has around 100TB of these shadow files. I've written a script to go through known objects to get prefixes of objects that should exist to compare to ones that shouldn't, but the time it takes to do this over millions and millions of objects is just too long. On 25/04/15 09:53, Ben Hines wrote: When these are fixed it would be great to get good steps for listing / cleaning up any orphaned objects. I have suspicions this is affecting us. thanks- -Ben On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: These ones: http://tracker.ceph.com/issues/10295 http://tracker.ceph.com/issues/11447 - Original Message - From: Ben Jackson b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 3:06:02 PM Subject: Re: [ceph-users] Shadow Files We were firefly, then we upgraded to giant, now we are on hammer. What issues? On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - From: Ben b@benjackson.email To: ceph-users ceph-us...@ceph.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com Sent: Thursday, April 23, 2015 7:42:06 PM Subject: [ceph-users] Shadow Files We are still experiencing a problem with out gateway not properly clearing out shadow files. I have done numerous tests where I have: -Uploaded a file of 1.5GB in size using s3browser application -Done an object stat on the file to get its prefix -Done rados ls -p .rgw.buckets | grep prefix to count the number of shadow files associated (in this case it is around 290 shadow files) -Deleted said file with s3browser -Performed a gc list, which shows the ~290 files listed -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to recount the shadow files only to be left with 290 files still there From log output /var/log/ceph/radosgw.log, I can see the following when clicking DELETE (this appears 290 times) 2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0 In this same log, I also see the gc process saying it is removing said file (these records appear 290 times too) 2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.929636 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.930448 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.931226 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.932103 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.933470 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname So even though it appears that the GC is processing its removal, the shadow files remain! Please help! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
- Original Message - From: Sean seapasu...@uchicago.edu To: ceph-users@lists.ceph.com Sent: Tuesday, April 28, 2015 2:52:35 PM Subject: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hey yall! I have a weird issue and I am not sure where to look so any help would be appreciated. I have a large ceph giant cluster that has been stable and healthy almost entirely since its inception. We have stored over 1.5PB into the cluster currently through RGW and everything seems to be functioning great. We have downloaded smaller objects without issue but last night we did a test on our largest file (almost 1 terabyte) and it continuously times out at almost the exact same place. Investigating further it looks like Civetweb/RGW is returning that the uploads completed even though the objects are truncated. At least when we download the objects they seem to be truncated. I have tried searching through the mailing list archives to see what may be going on but it looks like the mailing list DB may be going through some mainenance: Unable to read word database file '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db' After checking through the gzipped logs I see that civetweb just stops logging after a rotation for some reason as well and my last log is from the 28th of march. I tried manually running /etc/init.d/radosgw reload but this didn't seem to work. As running the download again could take all day to error out we instead use the range request to try and pull the missing bites. https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the code we are using to download via S3 / boto as well as the returned size report and overview of our issue. http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb server they are hitting. Here is our current config :: http://pastebin.com/2SGfSDYG Current output of ceph health:: http://pastebin.com/3f6iJEbu I am thinking that this must be a civetweb/radosgw bug of somekind. My question is 1.) is there a way to try and download the object via rados directly I am guessing I will need to find the prefix and then just cat all of them together and hope I get it right? 2.) Why would ceph say the upload went fine but then return a smaller object? Note that the returned http resonse returns 206 (partial content): /var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700 2 req 0:1.067030:s3:GET /tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http status=206 It'll only return that if partial content is requested (through the http Range header). It's really hard to tell from these logs whether there's any actual problem. I suggest bumping up the log level (debug ms = 1, debug rgw = 20), and take a look at an entire request (one that include all the request http headers). Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow Files
It will get to the ceph mainline eventually. We're still reviewing and testing the fix, and there's more work to be done on the cleanup tool. Yehuda - Original Message - From: Ben b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Sunday, April 26, 2015 11:02:23 PM Subject: Re: [ceph-users] Shadow Files Are these fixes going to make it into the repository versions of ceph, or will we be required to compile and install manually? On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote: Yeah, that's definitely something that we'd address soon. Yehuda - Original Message - From: Ben b@benjackson.email To: Ben Hines bhi...@gmail.com, Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 5:14:11 PM Subject: Re: [ceph-users] Shadow Files Definitely need something to help clear out these old shadow files. I'm sure our cluster has around 100TB of these shadow files. I've written a script to go through known objects to get prefixes of objects that should exist to compare to ones that shouldn't, but the time it takes to do this over millions and millions of objects is just too long. On 25/04/15 09:53, Ben Hines wrote: When these are fixed it would be great to get good steps for listing / cleaning up any orphaned objects. I have suspicions this is affecting us. thanks- -Ben On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: These ones: http://tracker.ceph.com/issues/10295 http://tracker.ceph.com/issues/11447 - Original Message - From: Ben Jackson b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 3:06:02 PM Subject: Re: [ceph-users] Shadow Files We were firefly, then we upgraded to giant, now we are on hammer. What issues? On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - From: Ben b@benjackson.email To: ceph-users ceph-us...@ceph.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com Sent: Thursday, April 23, 2015 7:42:06 PM Subject: [ceph-users] Shadow Files We are still experiencing a problem with out gateway not properly clearing out shadow files. I have done numerous tests where I have: -Uploaded a file of 1.5GB in size using s3browser application -Done an object stat on the file to get its prefix -Done rados ls -p .rgw.buckets | grep prefix to count the number of shadow files associated (in this case it is around 290 shadow files) -Deleted said file with s3browser -Performed a gc list, which shows the ~290 files listed -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep prefix to recount the shadow files only to be left with 290 files still there From log output /var/log/ceph/radosgw.log, I can see the following when clicking DELETE (this appears 290 times) 2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0 2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0 In this same log, I also see the gc process saying it is removing said file (these records appear 290 times too) 2015-04-23 14:16:27.926952 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.928572 7f15be0ee700 0 gc::process: removing .rgw.buckets:objectname 2015-04-23 14:16:27.929636 7f15be0ee700 0 gc
Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
- Original Message - From: Sean seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Friday, May 1, 2015 6:47:09 PM Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hey there, Sorry for the delay. I have been moving apartments UGH. Our dev team found out how to quickly identify these files that are downloading a smaller size:: iterate through all of the objects in a bucket and call for a key.size in each item and compare it to conn.get_bucket().get_key().size of each key and the sizes differ. If the sizes differ these correspond exactly to any object that seems to have missing objects in ceph. The objects always seem to be intervals of 512k as well which is really odd. == http://pastebin.com/R34wF7PB == My main question is why are these sizes different at all? Shouldn't they be exactly the same? Why are they off by multiples of 512k as well? Finally I need a way to rule out that this is a ceph issue and the only way I can think of is grabbing a list of all of the data files and concatenating them together in order in hopes that the manifest is wrong and I get the whole file. For example:: implicit size 7745820218 explicit size 7744771642. Absolute 1048576; name = 86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam I explicitly called one of the gateways and then piped the output to a text file while downloading this bam: https://drive.google.com/file/d/0B16pfLB7yY6GcTZXalBQM3RHT0U/view?usp=sharing (25 Mb of text) As we can see above. Ceph is saying that the size is 7745820218 bytes somewhere but when we download it we get 7744771642 bytes. If I download There are two different things: the bucket index, and the object manifest. The bucket index has the former, and the object manifest specifies the latter. the object I get a 7744771642 byte file. Finally if I do a range request of all of the bytes from 7744771642 to the end I get a cannot compete request:: http://pastebin.com/CVvmex4m -- traceback of the python range request. http://pastebin.com/4sd1Jc0G -- the radoslog of the range request If I request the file with a shorter range (say 7744771642 -2 bytes (7744771640)) I am left with just a 2 byte file:: http://pastebin.com/Sn7Y0t9G -- range request of file - 2 bytes to end of file. lacadmin@kh10-9:~$ ls -lhab 7gtest-range.bam -rw-r--r-- 1 lacadmin lacadmin 2 Feb 24 01:00 7gtest-range.bam I think that rados-gw may not be keeping track of the multipart chunks errors possibly? How did rados get the original and correct file size and why is it short when it returns the actual chunks? Finally why are the corrupt / missing chunks always a multipe of 512K? I do not see anything obvious that is set to 512K on the configuration/user side. Sorry for the questions and babling but I am at a loss as to how to address this. So, the question is which is correct, the index, or the object itself. Do you have any way to know which one is the correct one? Also, does it only happen to you with very large objects? Does it happen with every such object (e.g., 4GBs)? Here's some extra information you could gather: - Get the object manifest: $ radosgw-admin object stat --bucket=bucket --object=object - Get status for each rados object to the corresponding logical rgw object: First, identify the object names that correspond to this specific rgw object. From the manifest you'd get a 'prefix', which is a random hash that all tail objects should contain. Then you should do something like: $ rados -p data pool, e.g., .rgw.buckets ls | grep $prefix And then, for each object: $ rados -p data pool, e.g., .rgw.buckets stat $object There's also the head object that you'd want to inspect (named after the actual rgw object name, grep it too). HTH, Yehuda On 04/28/2015 05:03 PM, Yehuda Sadeh-Weinraub wrote: - Original Message - From: Sean seapasu...@uchicago.edu To: ceph-users@lists.ceph.com Sent: Tuesday, April 28, 2015 2:52:35 PM Subject: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hey yall! I have a weird issue and I am not sure where to look so any help would be appreciated. I have a large ceph giant cluster that has been stable and healthy almost entirely since its inception. We have stored over 1.5PB into the cluster currently through RGW and everything seems to be functioning great. We have downloaded smaller objects without issue but last night we did a test on our largest file (almost 1 terabyte) and it continuously times out at almost the exact same place. Investigating further it looks like Civetweb/RGW is returning that the uploads completed even though the objects are truncated. At least when we download
Re: [ceph-users] Shadow Files
I've been working on a new tool that would detect leaked rados objects. It will take some time for it to be merged into an official release, or even into the master branch, but if anyone likes to play with it, it is in the wip-rgw-orphans branch. At the moment I recommend to not remove any object that the tool reports, but rather move it to a different pool for backup (using the rados tool cp command). The tool works in a few stages: (1) list all the rados objects in the specified pool, store in repository (2) list all bucket instances in the system, store in repository (3) iterate through bucket instances in repository, list (logical) objects, for each object store the expected rados objects that build it (4) compare data from (1) and (3), each object that is in (1), but not in (3), stat, if older than $start_time - $stale_period, report it There can be lot's of things that can go wrong with this, so we really need to be careful here. The tool can be run by the following command: $ radosgw-admin orphans find --pool=data pool --job-id=name [--num-shards=num shards] [--orphan-stale-secs=seconds] The tool can be stopped, and restarted, and it will continue from the stage where it stopped. Note that some of the stages will restart from the beginning (of the stages), due to system limitation (specifically 1, 2). In order to clean up a job's data: $ radosgw-admin orphans finish --job-id=name Note that the jobs run in the radosgw-admin process context, it does not schedule a job on the radosgw process. Please let me know of any issue you find. Thanks, Yehuda - Original Message - From: Ben Hines bhi...@gmail.com To: Ben b@benjackson.email Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com, ceph-users ceph-us...@ceph.com Sent: Thursday, April 30, 2015 3:00:16 PM Subject: Re: [ceph-users] Shadow Files Going to hold off on our 94.1 update for this issue Hopefully this can make it into a 94.2 or a v95 git release. -Ben On Mon, Apr 27, 2015 at 2:32 PM, Ben b@benjackson.email wrote: How long are you thinking here? We added more storage to our cluster to overcome these issues, and we can't keep throwing storage at it until the issues are fixed. On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote: It will get to the ceph mainline eventually. We're still reviewing and testing the fix, and there's more work to be done on the cleanup tool. Yehuda - Original Message - From: Ben b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Sunday, April 26, 2015 11:02:23 PM Subject: Re: [ceph-users] Shadow Files Are these fixes going to make it into the repository versions of ceph, or will we be required to compile and install manually? On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote: Yeah, that's definitely something that we'd address soon. Yehuda - Original Message - From: Ben b@benjackson.email To: Ben Hines bhi...@gmail.com , Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 5:14:11 PM Subject: Re: [ceph-users] Shadow Files Definitely need something to help clear out these old shadow files. I'm sure our cluster has around 100TB of these shadow files. I've written a script to go through known objects to get prefixes of objects that should exist to compare to ones that shouldn't, but the time it takes to do this over millions and millions of objects is just too long. On 25/04/15 09:53, Ben Hines wrote: When these are fixed it would be great to get good steps for listing / cleaning up any orphaned objects. I have suspicions this is affecting us. thanks- -Ben On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: These ones: http://tracker.ceph.com/issues/10295 http://tracker.ceph.com/issues/11447 - Original Message - From: Ben Jackson b@benjackson.email To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users ceph-us...@ceph.com Sent: Friday, April 24, 2015 3:06:02 PM Subject: Re: [ceph-users] Shadow Files We were firefly, then we upgraded to giant, now we are on hammer. What issues? On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - From: Ben b@benjackson.email To: ceph-users ceph-us...@ceph.com Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com Sent: Thursday, April 23, 2015 7:42:06 PM Subject: [ceph-users] Shadow Files We are still experiencing a problem with out gateway not properly clearing out shadow files. I have done numerous tests where I have: -Uploaded a file of 1.5GB in size using s3browser application -Done an object stat on the file to get its prefix -Done
Re: [ceph-users] RGW - Can't download complete object
- Original Message - From: Sean seapasu...@uchicago.edu To: ceph-users@lists.ceph.com Sent: Thursday, May 7, 2015 3:35:14 PM Subject: [ceph-users] RGW - Can't download complete object I have another thread goign on about truncation of objects and I believe this is a separate but equally bad issue in civetweb/radosgw. My cluster is completely healthy I have one (possibly more) objects stored in ceph rados gateway that will return a different size every time I Try to download it:: http://pastebin.com/hK1iqXZH --- ceph -s http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object The two interesting things that I see here is: - the multipart upload size for each part is on the big side (is it 1GB for each part?) - it seems that there are a lot of parts that suffered from retries, could be a source for the 512k issue http://pastebin.com/5TnvgMrX --- python download code The weird part is every time I download the file it is of a different size. I am grabbing the individual objects of the 14g file and will update this email once I have them all statted out. Currently I am getting, on average, 1.5G to 2Gb files when the total object should be 14G in size. lacadmin@kh10-9:~$ python corruptpull.py the download failed. The filesize = 2125988202. The actual size is 14577056082. Attempts = 1 the download failed. The filesize = 2071462250. The actual size is 14577056082. Attempts = 2 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 3 the download failed. The filesize = 1643643242. The actual size is 14577056082. Attempts = 4 the download failed. The filesize = 1597505898. The actual size is 14577056082. Attempts = 5 the download failed. The filesize = 2075656554. The actual size is 14577056082. Attempts = 6 the download failed. The filesize = 650117482. The actual size is 14577056082. Attempts = 7 the download failed. The filesize = 1987576170. The actual size is 14577056082. Attempts = 8 the download failed. The filesize = 2109210986. The actual size is 14577056082. Attempts = 9 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 10 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 11 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 12 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 13 the download failed. The filesize = 1467482474. The actual size is 14577056082. Attempts = 14 the download failed. The filesize = 2046296426. The actual size is 14577056082. Attempts = 15 the download failed. The filesize = 2021130602. The actual size is 14577056082. Attempts = 16 the download failed. The filesize = 177366. The actual size is 14577056082. Attempts = 17 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 18 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 19 the download failed. The filesize = 1983381866. The actual size is 14577056082. Attempts = 20 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 21 Notice it is always different. Once the rados -p .rgw.buckets ls | grep finishes I will return the listing of objects as well but this is quite odd and I think this is a separate issue. Has anyone seen this before? Why wouldn't radosgw return an error and why am I getting different file sizes? Usually that means that there was some error in the middle of the download, maybe client to radosgw communication issue. What does the radosgw show when this happens? I would post the log from radosgw but I don't see any err|wrn|fatal mentions in the log and the client completes without issue every time. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow Files
Yes, so it seems. The librados::nobjects_begin() call expects at least a Hammer (0.94) backend. Probably need to add a try/catch there to catch this issue, and maybe see if using a different api would be better compatible with older backends. Yehuda - Original Message - From: Anthony Alba ascanio.al...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com Sent: Tuesday, May 5, 2015 10:14:38 AM Subject: Re: [ceph-users] Shadow Files Unfortunately it immediately aborted (running against a 0.80.9 Ceph). Does Ceph also have to be a 0.94 level? last error was -3 2015-05-06 01:11:11.710947 7f311dd15880 0 run(): building index of all objects in pool -2 2015-05-06 01:11:11.710995 7f311dd15880 1 -- 10.200.3.92:0/1001510 -- 10.200.3.32:6800/1870 -- osd_op(client.4065115.0:27 ^A/ [pgnls start_epoch 0] 11.0 ack+read +known_if_redirected e952) v5 -- ?+0 0x39a4e80 con 0x39a4aa0 -1 2015-05-06 01:11:11.712125 7f31026f4700 1 -- 10.200.3.92:0/1001510 == osd.1 10.200.3.32:6800/1870 1 osd_op_reply(27 [pgnls start_epoch 0] v934'6252 uv6252 ondisk = -22 ((22) Invalid argument)) v6 167+0+0 (3260127617 0 0) 0x7f30c4000a90 con 0x39a4aa0 0 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal (Aborted) ** in thread 7f311dd15880 2015-05-06 01:11:11.710947 7f311dd15880 0 run(): building index of all objects in pool terminate called after throwing an instance of 'std::runtime_error' what(): rados returned (22) Invalid argument *** Caught signal (Aborted) ** in thread 7f311dd15880 ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6) 1: radosgw-admin() [0x61e604] 2: (()+0xf130) [0x7f311a59f130] 3: (gsignal()+0x37) [0x7f31195d85d7] 4: (abort()+0x148) [0x7f31195d9cc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3119edc9b5] 6: (()+0x5e926) [0x7f3119eda926] 7: (()+0x5e953) [0x7f3119eda953] 8: (()+0x5eb73) [0x7f3119edab73] 9: (()+0x4d116) [0x7f311b606116] 10: (librados::IoCtx::nobjects_begin()+0x2e) [0x7f311b60c60e] 11: (RGWOrphanSearch::build_all_oids_index()+0x62) [0x516a02] 12: (RGWOrphanSearch::run()+0x1e3) [0x51ad23] 13: (main()+0xa430) [0x4fbc30] 14: (__libc_start_main()+0xf5) [0x7f31195c4af5] 15: radosgw-admin() [0x5028d9] 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal (Aborted) ** in thread 7f311dd15880 ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6) 1: radosgw-admin() [0x61e604] 2: (()+0xf130) [0x7f311a59f130] On Tue, May 5, 2015 at 10:41 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: Can you try creating the .log pool? Yehda - Original Message - From: Anthony Alba ascanio.al...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com Sent: Tuesday, May 5, 2015 3:37:15 AM Subject: Re: [ceph-users] Shadow Files ...sorry clicked send to quickly /opt/ceph/bin/radosgw-admin orphans find --pool=.rgw.buckets --job-id=abcd ERROR: failed to open log pool ret=-2 job not found On Tue, May 5, 2015 at 6:36 PM, Anthony Alba ascanio.al...@gmail.com wrote: Hi Yehuda, First run: /opt/ceph/bin/radosgw-admin --pool=.rgw.buckets --job-id=testing ERROR: failed to open log pool ret=-2 job not found Do I have to precreate some pool? On Tue, May 5, 2015 at 8:17 AM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: I've been working on a new tool that would detect leaked rados objects. It will take some time for it to be merged into an official release, or even into the master branch, but if anyone likes to play with it, it is in the wip-rgw-orphans branch. At the moment I recommend to not remove any object that the tool reports, but rather move it to a different pool for backup (using the rados tool cp command). The tool works in a few stages: (1) list all the rados objects in the specified pool, store in repository (2) list all bucket instances in the system, store in repository (3) iterate through bucket instances in repository, list (logical) objects, for each object store the expected rados objects that build it (4) compare data from (1) and (3), each object that is in (1), but not in (3), stat, if older than $start_time - $stale_period, report it There can be lot's of things that can go wrong with this, so we really need to be careful here. The tool can be run by the following command: $ radosgw-admin orphans find --pool=data pool --job-id=name [--num-shards=num shards] [--orphan-stale-secs=seconds] The tool can be stopped, and restarted, and it will continue from the stage where it stopped. Note that some of the stages will restart from the beginning (of the stages), due to system limitation (specifically 1, 2
Re: [ceph-users] Shadow Files
- Original Message - From: Daniel Hoffman daniel.hoff...@13andrew.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com Sent: Sunday, May 10, 2015 5:03:22 PM Subject: Re: [ceph-users] Shadow Files Any updates on when this is going to be released? Daniel On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: Yes, so it seems. The librados::nobjects_begin() call expects at least a Hammer (0.94) backend. Probably need to add a try/catch there to catch this issue, and maybe see if using a different api would be better compatible with older backends. Yehuda I cleaned up the commits a bit, but it needs to be reviewed, and it'll be nice to get some more testing to it before it goes on an official release. There's still the issue of running it against a firefly backend. I looked at backporting it to firefly, but it's not going to be a trivial work, so I think the better time usage would be to get the hammer one to work against a firefly backend. There are some librados api quirks that we need to flush out first. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow Files
It's the wip-rgw-orphans branch. - Original Message - From: Daniel Hoffman daniel.hoff...@13andrew.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, David Zafman dzaf...@redhat.com, ceph-users ceph-us...@ceph.com Sent: Monday, May 11, 2015 4:30:11 PM Subject: Re: [ceph-users] Shadow Files Thanks. Can you please let me know the suitable/best git version/tree to be pulling to compile and use this feature/patch? Thanks On Tue, May 12, 2015 at 4:38 AM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: From: Daniel Hoffman daniel.hoff...@13andrew.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Ben b@benjackson.email, ceph-users ceph-us...@ceph.com Sent: Sunday, May 10, 2015 5:03:22 PM Subject: Re: [ceph-users] Shadow Files Any updates on when this is going to be released? Daniel On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: Yes, so it seems. The librados::nobjects_begin() call expects at least a Hammer (0.94) backend. Probably need to add a try/catch there to catch this issue, and maybe see if using a different api would be better compatible with older backends. Yehuda I cleaned up the commits a bit, but it needs to be reviewed, and it'll be nice to get some more testing to it before it goes on an official release. There's still the issue of running it against a firefly backend. I looked at backporting it to firefly, but it's not going to be a trivial work, so I think the better time usage would be to get the hammer one to work against a firefly backend. There are some librados api quirks that we need to flush out first. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
- Original Message - From: Sean seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Tuesday, May 5, 2015 12:14:19 PM Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hello Yehuda and the rest of the mailing list. My main question currently is why are the bucket index and the object manifest ever different? Based on how we are uploading data I do not think that the rados gateway should ever know the full file size without having all of the objects within ceph at one point in time. So after the multipart is marked as completed Rados gateway should cat through all of the objects and make a complete part, correct? That's what *should* happen, but obviously there's some bug there. Secondly, I think I am not understanding the process to grab all of the parts correctly. To continue to use my example file 86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam in bucket tcga_cghub_protected. I would be using the following to grab the prefix: prefix=$(radosgw-admin object stat --bucket=tcga_cghub_protected --object=86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam | grep -iE 'prefix' | awk -F\ '{print $4}') Which should take everything between quotes for the prefix key and give me the value. In this case:: prefix: 86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S, So lacadmin@kh10-9:~$ echo ${prefix} 86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S From here I list all of the objects in the .rgw.buckets pool and grep for that said prefix which yields 1335 objects. From here if I cat all of these objects together I only end up with a 5468160 byte file which is 2G short of what the object manifest says it should be. If I grab the file and tail the Rados gateway log I end up with 1849 objects and when I sum them all up I How are these objects named? end up with 7744771642 which is the same size that the manifest reports. I understand that this does nothing other than verify the manifests accuracy but I still find it interesting. The missing chunks may still exist in ceph outside of the object manifest and tagged with the same prefix, correct? Or am I misunderstanding something? Either it's missing a chunk, or one of the objects is truncated. Can you stat all the parts? I expect most of the objects to have two different sizes (e.g., 4MB, 1MB), but at it is likely that the last part is smaller, and maybe another object that is missing 512k. We have over 40384 files in the tcga_cghub_protected bucket and only 66 of these files are suffering from this truncation issue. What I need to know is: is this happening on the gateway side or on the client side? Next I need to know what possible actions can occur where the bucket index and the object manifest would be mismatched like this as 40318 out of 40384 are working without issue. The truncated files are of all different sizes (5 megabytes - 980 gigabytes) and the truncation seems to be all over. By all over I mean some files are missing the first few bytes that should read bam and some are missing parts in the middle. Can you give an example of an object manifest for a broken object, and all the rados objects that build it (e.g., the output of 'rados stat' on these objects). A smaller object might be easier. So our upload code is using mmap to stream chunks of the file to the Rados gateway via a multipart upload but no where on the client side do we have a direct reference to the files we are using nor do we specify the size in anyway. So where is the gateway getting the correct complete filesize from and how is the bucket index showing the intended file size? This implies that, at some point in time, ceph was able to see all of the parts of the file and calculate the correct total size. This to me seems like a rados gateway bug regardless of how the file is being uploaded. I think that the RGW should be able to be fuzzed and still store the data correctly. Why is the bucket list not matching the bucket index and how can I verify that the data is not being corrupted by the RGW or worse, after it is committed to ceph ? That's what we're trying to find out. Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] civetweb lockups
- Original Message - From: Daniel Hoffman daniel.hoff...@13andrew.com To: ceph-users ceph-us...@ceph.com Sent: Sunday, May 10, 2015 10:54:21 PM Subject: [ceph-users] civetweb lockups Hi All. We have a wierd issue where civetweb just locks up, it just fails to respond to HTTP and a restart resolves the problem. This happens anywhere from every 60 seconds to every 4 hours with no reason behind it. We have run the gateway in full debug mode and there is nothing there that seems to be an issue. We run 2 gateways on 6core machines, there is no load, cpu or memory wise, the machines seem fine. They are load balanced behind HA proxy. We run 12 data nodes at the moment with ~170 disks. We see around the 40-60MB/s into the array. Is this just too much for civetweb to handle? Should we look at virtual machines on the hardware/mode nodes? [client.radosgw.ceph-obj02] host = ceph-obj02 keyring = /etc/ceph/keyring.radosgw.ceph-obj02 rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log rgw data = /var/lib/ceph/radosgw/ceph-obj02 rgw thread pool size = 1024 rgw print continue = False debug rgw = 0 debug ms = 0 rgw enable ops log = False log to stderr = False rgw enable usage log = False Advice appreciated. Not sure what would be the issue. I'd look at the number of threads, maybe try reducing it, see if it makes any difference? Also, try to see how many open fds are there when it hangs. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
Hi, Thank you for a very thorough investigation. See my comments below: - Original Message - From: Mark Murphy murphyma...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Sean Sullivan seapasu...@uchicago.edu, ceph-users@lists.ceph.com Sent: Tuesday, May 12, 2015 10:50:49 AM Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hey Yehuda, I work with Sean on the dev side. We thought we should put together a short report on what we’ve been seeing in the hopes that the behavior might make some sense to you. We had originally noticed these issues a while ago with our first iteration of this particular Ceph deployment. The issues we had seen were characterized by two different behaviors: • Some objects would appear truncated, returning different sizes for each request. Repeated attempts would eventually result in a successful retrieval if the second behavior doesn’t apply. This really sound like some kind of networking issue, maybe a load balancer that is on the way that clobbers things? • Some objects would always appear truncated, missing an integer multiple of 512KB. This is where the report that we are encountering ‘truncation’ came from, which is slightly misleading. We recently verified that we are indeed encountering the first behavior, for which I believe Sean has supplied or will be supplying Ceph logs showcasing the server-side errors, and is true truncation. However, the second behavior is not really truncation, but missing 512KB chunks, as Sean has brought up. We’ve had some luck with identifying some of the patterns that are seemingly related to this issue. Without going into too great of detail, we’ve found the following appear to hold true for all objects affected by the second behavior: • The amount of data missing is always in integer multiples of 512KB. • The expected file size is always found via the bucket index. • Ceph objects do not appear to be missing chunks or have holes in them. • The missing 512KB chunks are always at the beginning of multipart segments (1GB in our case). This matches some of my original suspicions. Here's some basic background that might help clarify things: This looks like some kind of rgw bug. A radosgw object is usually composed of two different parts: the object head, and the object tail. The head is usually composed of the first 512k of data of the object (and never more than that), and the tail has the rest of the object's data. However, the head data part is optional, and it can be zero. For example, in the case of multipart upload, after combining the parts, the head will not have any data, and the tail will be compiled out of the different parts data. However, when dealing with multipart parts, the parts do not really have a head (due to their immutability), so it is expected that the part object sizes to be 4MB. So it seems that for some reason these specific parts were treated as if they had a head, although they shouldn't have. Now, that brings me to the issue, where I noticed that some of the parts were retried. When this happens, the part name is different than the default part name, so there's a note in the manifest, and a special handling that start at specific offsets. It might be that this is related, and the code that handles the retries generate bad object parts. • For large files missing multiple chunks, the segments affected appear to be clustered and contiguous. That would point at a cluster of retries, maybe due to networking issues around the time these were created. The first pattern was identified when we noticed that the bucket index and the object manifest differed in reported size. This is useful as an quick method of identifying affected objects. We’ve used this to avoid having to pull down and check each object individually. In total, we have 108 affected objects, which translates to approximately 0.25% of our S3 objects. We noticed that the bucket index always reports the object size that would be expected had the upload gone correctly. Since we only ever report the segment sizes to the gateway, this would suggest that the segment sizes were reported accurately and aggregated correctly server side. Sean identified the Ceph objects that compose one of our affected S3 objects. We thought we might see the first Ceph object missing some data, but found it to be a full 4MB. Retrieving the first Ceph object and comparing it to the bytes in the corresponding file, it appears that the Ceph object matches the 4MB of the file after the first 512KB. We took this as evidence that the data was never getting to Ceph in the first place. However, in our testing, we were unable to get the gateway to accept segments with less data than reported. Dissecting some of the affected objects, we were
Re: [ceph-users] RGW - Can't download complete object
That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:07:22 PM Subject: Re: [ceph-users] RGW - Can't download complete object Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 303+0+4194304 (1474509283 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12917407744 len=4194304 I couldn't really find any good documentation on how fragments/files are layed out on the object file system so I am not sure on where the file will be. How could the 4mb object have issues but the cluster be completely health okay? I did do the rados stat of each object inside ceph and they all appear to be there:: http://paste.ubuntu.com/8561/ The sum of all of the objects :: 14584887282 The stat of the object inside ceph:: 14577056082 So for some reason I have more data in objects than the key manifest. We easiliy identified this object via the same method as the other thread I have:: for key in keys: : if ( key.name == 'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' ): : implicit = key.size : explicit = conn.get_bucket(bucket).get_key(key.name).size : absolute = abs(implicit - explicit) : print key.name : print implicit : print explicit : b235040a-46b6-42b3-b134-962b1f8813d5
Re: [ceph-users] RGW - Can't download complete object
Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: Sean Sullivan seapasu...@uchicago.edu Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:33:07 PM Subject: Re: [ceph-users] RGW - Can't download complete object That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:07:22 PM Subject: Re: [ceph-users] RGW - Can't download complete object Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 303+0+4194304 (1474509283 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12917407744 len=4194304 I couldn't really find any good documentation on how fragments/files are layed out on the object file system so I am not sure on where the file will be. How could the 4mb object have issues but the cluster be completely health okay? I did do the rados stat of each object inside ceph and they all appear to be there:: http://paste.ubuntu.com/8561/ The sum of all of the objects
Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
I opened issue #11604, and have a fix for the issue. I updated our test suite to cover the specific issue that you were hitting. We'll backport the fix to both hammer and firefly soon. Thanks! Yehuda - Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: Mark Murphy murphyma...@uchicago.edu Cc: ceph-users@lists.ceph.com, Sean Sullivan seapasu...@uchicago.edu Sent: Tuesday, May 12, 2015 12:59:48 PM Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hi, Thank you for a very thorough investigation. See my comments below: - Original Message - From: Mark Murphy murphyma...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: Sean Sullivan seapasu...@uchicago.edu, ceph-users@lists.ceph.com Sent: Tuesday, May 12, 2015 10:50:49 AM Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation Hey Yehuda, I work with Sean on the dev side. We thought we should put together a short report on what we’ve been seeing in the hopes that the behavior might make some sense to you. We had originally noticed these issues a while ago with our first iteration of this particular Ceph deployment. The issues we had seen were characterized by two different behaviors: • Some objects would appear truncated, returning different sizes for each request. Repeated attempts would eventually result in a successful retrieval if the second behavior doesn’t apply. This really sound like some kind of networking issue, maybe a load balancer that is on the way that clobbers things? • Some objects would always appear truncated, missing an integer multiple of 512KB. This is where the report that we are encountering ‘truncation’ came from, which is slightly misleading. We recently verified that we are indeed encountering the first behavior, for which I believe Sean has supplied or will be supplying Ceph logs showcasing the server-side errors, and is true truncation. However, the second behavior is not really truncation, but missing 512KB chunks, as Sean has brought up. We’ve had some luck with identifying some of the patterns that are seemingly related to this issue. Without going into too great of detail, we’ve found the following appear to hold true for all objects affected by the second behavior: • The amount of data missing is always in integer multiples of 512KB. • The expected file size is always found via the bucket index. • Ceph objects do not appear to be missing chunks or have holes in them. • The missing 512KB chunks are always at the beginning of multipart segments (1GB in our case). This matches some of my original suspicions. Here's some basic background that might help clarify things: This looks like some kind of rgw bug. A radosgw object is usually composed of two different parts: the object head, and the object tail. The head is usually composed of the first 512k of data of the object (and never more than that), and the tail has the rest of the object's data. However, the head data part is optional, and it can be zero. For example, in the case of multipart upload, after combining the parts, the head will not have any data, and the tail will be compiled out of the different parts data. However, when dealing with multipart parts, the parts do not really have a head (due to their immutability), so it is expected that the part object sizes to be 4MB. So it seems that for some reason these specific parts were treated as if they had a head, although they shouldn't have. Now, that brings me to the issue, where I noticed that some of the parts were retried. When this happens, the part name is different than the default part name, so there's a note in the manifest, and a special handling that start at specific offsets. It might be that this is related, and the code that handles the retries generate bad object parts. • For large files missing multiple chunks, the segments affected appear to be clustered and contiguous. That would point at a cluster of retries, maybe due to networking issues around the time these were created. The first pattern was identified when we noticed that the bucket index and the object manifest differed in reported size. This is useful as an quick method of identifying affected objects. We’ve used this to avoid having to pull down and check each object individually. In total, we have 108 affected objects, which translates to approximately 0.25% of our S3 objects. We noticed that the bucket index always reports the object size that would be expected had the upload gone correctly. Since we only ever report the segment sizes to the gateway, this would suggest that the segment sizes were reported accurately and aggregated correctly server
Re: [ceph-users] RGW - Can't download complete object
The code is in wip-11620, abd it's currently on top of the next branch. We'll get it through the tests, then get it into hammer and firefly. I wouldn't recommend installing it in production without proper testing first. Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 7:22:10 PM Subject: Re: [ceph-users] RGW - Can't download complete object Thank you so much Yahuda! I look forward to testing these. Is there a way for me to pull this code in? Is it in master? On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: Sean Sullivan seapasu...@uchicago.edu Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:33:07 PM Subject: Re: [ceph-users] RGW - Can't download complete object That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:07:22 PM Subject: Re: [ceph-users] RGW - Can't download complete object Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05
Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket
- Original Message - From: Francois Lafont flafdiv...@free.fr To: ceph-users@lists.ceph.com Sent: Sunday, April 12, 2015 8:47:40 PM Subject: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket Hi, On a testing cluster, I have a radosgw on Firefly and the other nodes, OSDs and monitors, are on Hammer. The nodes are installed with puppet in personal VM, so I can reproduce the problem. Generally, I use s3cmd to check the radosgw. While radosgw is on Firefly, I can create bucket, no problem. Then, I upgrade the radosgw (it's a Ubuntu Trusty): sed -i 's/firefly/hammer/g' /etc/apt/sources.list.d/ceph.list apt-get update apt-get dist-upgrade -y service stop apache2 stop radosgw-all start radosgw-all service apache2 start After that, impossible to create a bucket with s3cmd: -- ~# s3cmd -d mb s3://bucket-2 DEBUG: ConfigParser: Reading file '/root/.s3cfg' DEBUG: ConfigParser: bucket_location-US DEBUG: ConfigParser: cloudfront_host-cloudfront.amazonaws.com DEBUG: ConfigParser: default_mime_type-binary/octet-stream DEBUG: ConfigParser: delete_removed-False DEBUG: ConfigParser: dry_run-False DEBUG: ConfigParser: enable_multipart-True DEBUG: ConfigParser: encoding-UTF-8 DEBUG: ConfigParser: encrypt-False DEBUG: ConfigParser: follow_symlinks-False DEBUG: ConfigParser: force-False DEBUG: ConfigParser: get_continue-False DEBUG: ConfigParser: gpg_command-/usr/bin/gpg DEBUG: ConfigParser: gpg_decrypt-%(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s DEBUG: ConfigParser: gpg_encrypt-%(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s DEBUG: ConfigParser: gpg_passphrase-...-3_chars... DEBUG: ConfigParser: guess_mime_type-True DEBUG: ConfigParser: host_base-ostore.athome.priv DEBUG: ConfigParser: access_key-5R...17_chars...Y DEBUG: ConfigParser: secret_key-Ij...37_chars...I DEBUG: ConfigParser: host_bucket-%(bucket)s.ostore.athome.priv DEBUG: ConfigParser: human_readable_sizes-False DEBUG: ConfigParser: invalidate_on_cf-False DEBUG: ConfigParser: list_md5-False DEBUG: ConfigParser: log_target_prefix- DEBUG: ConfigParser: mime_type- DEBUG: ConfigParser: multipart_chunk_size_mb-15 DEBUG: ConfigParser: preserve_attrs-True DEBUG: ConfigParser: progress_meter-True DEBUG: ConfigParser: proxy_host- DEBUG: ConfigParser: proxy_port-0 DEBUG: ConfigParser: recursive-False DEBUG: ConfigParser: recv_chunk-4096 DEBUG: ConfigParser: reduced_redundancy-False DEBUG: ConfigParser: send_chunk-4096 DEBUG: ConfigParser: simpledb_host-sdb.amazonaws.com DEBUG: ConfigParser: skip_existing-False DEBUG: ConfigParser: socket_timeout-300 DEBUG: ConfigParser: urlencoding_mode-normal DEBUG: ConfigParser: use_https-False DEBUG: ConfigParser: verbosity-WARNING DEBUG: ConfigParser: website_endpoint-http://%(bucket)s.s3-website-%(location)s.amazonaws.com/ DEBUG: ConfigParser: website_error- DEBUG: ConfigParser: website_index-index.html DEBUG: Updating Config.Config encoding - UTF-8 DEBUG: Updating Config.Config follow_symlinks - False DEBUG: Updating Config.Config verbosity - 10 DEBUG: Unicodising 'mb' using UTF-8 DEBUG: Unicodising 's3://bucket-2' using UTF-8 DEBUG: Command: mb DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23 +\n/bucket-2/' DEBUG: CreateRequest: resource[uri]=/ DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23 +\n/bucket-2/' DEBUG: Processing request, please wait... DEBUG: get_hostname(bucket-2): bucket-2.ostore.athome.priv DEBUG: format_uri(): / DEBUG: Sending request method_string='PUT', uri='/', headers={'content-length': '0', 'Authorization': 'AWS 5RUS0Z3SBG6IK263PLFY:3V1MdXoCGFrJKrO2LSJaBpNMcK4=', 'x-amz-date': 'Mon, 13 Apr 2015 03:32:23 +'}, body=(0 bytes) DEBUG: Response: {'status': 405, 'headers': {'date': 'Mon, 13 Apr 2015 03:32:23 GMT', 'accept-ranges': 'bytes', 'content-type': 'application/xml', 'content-length': '82', 'server': 'Apache/2.4.7 (Ubuntu)'}, 'reason': 'Method Not Allowed', 'data': '?xml version=1.0 encoding=UTF-8?ErrorCodeMethodNotAllowed/Code/Error'} DEBUG: S3Error: 405 (Method Not Allowed) DEBUG: HttpHeader: date: Mon, 13 Apr 2015 03:32:23 GMT DEBUG: HttpHeader: accept-ranges: bytes DEBUG: HttpHeader: content-type: application/xml DEBUG: HttpHeader: content-length: 82 DEBUG: HttpHeader: server: Apache/2.4.7 (Ubuntu) DEBUG: ErrorXML: Code: 'MethodNotAllowed' ERROR: S3 error: 405 (MethodNotAllowed): -- But before the upgrade, the same command worked fine. I see nothing in the log. Here is my ceph.conf: -- [global] auth client required = cephx auth cluster required = cephx auth
Re: [ceph-users] Purpose of the s3gw.fcgi script?
You're not missing anything. The script was only needed when we used the process manager of the fastcgi module, but it has been very long since we stopped using it. Yehuda - Original Message - From: Greg Meier greg.me...@nyriad.com To: ceph-users@lists.ceph.com Sent: Saturday, April 11, 2015 10:54:27 PM Subject: [ceph-users] Purpose of the s3gw.fcgi script? From my observation, the s3gw.fcgi script seems to be completely superfluous in the operation of Ceph. With or without the script, swift requests execute correctly, as long as a radosgw daemon is running. Is there something I'm missing here? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket
- Original Message - From: Francois Lafont flafdiv...@free.fr To: ceph-users@lists.ceph.com Sent: Monday, April 13, 2015 7:11:49 PM Subject: Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket Hi, Yehuda Sadeh-Weinraub wrote: The 405 in this case usually means that rgw failed to translate the http hostname header into a bucket name. Do you have 'rgw dns name' set correctly? Ah, I have found and indeed it concerned rgw dns name as also Karan thought. ;) But it's a little curious. Explanations: My s3cmd client use these hostnames (which are well resolved with the IP address of the radosgw host): bucket-name.ostore.athome.priv And in the configuration of my radosgw, I had: --- [client.radosgw.gw1] host= ceph-radosgw1 rgw dns name= ostore ... --- ie just the *short* name of the radosgw's fqdn (its fqdn is ostore.athome.priv). And with Firefly, it worked well, I never had problem with this configuration! But with Hammer, it doesn't work anymore (I don't know why). Now, with Hammer, I just notice that I have to put the fqdn in rgw dns name not the short name: --- [client.radosgw.gw1] host= ceph-radosgw1 rgw dns name= ostore.athome.priv ... --- And with this configuration, it works. Is it normal? In fact, maybe my configuration with the short name (instead of the fqdn) was not valid and I just was lucky it work well so far. Is it the good conclusion of the story? In fact, I think I never have well understood the meaning of the rgw dns name parameter. Can you confirm to me (or not) this: This parameter is *only* used when a S3 client accesses to a bucket with the method http://bucket-name.radosgw-address. If we don't set this parameter, such access will not work and a S3 client could access to a bucket only with the method http://radosgw-address/bucket-name Is it correct? Yes. Not sure why it *was* working in firefly. We did do some work around this in hammer, might have changed the behavior inadvertently. Yehuda Thx Yehuda and thx to Karan (who has pointed the real problem in fact ;)). -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS Gateway quota management
- Original Message - From: Sergey Arkhipov sarkhi...@asdco.ru To: ceph-users@lists.ceph.com Sent: Monday, March 30, 2015 2:55:33 AM Subject: [ceph-users] RADOS Gateway quota management Hi, Currently I am trying to figure out how to work with RADOS Gateway (ceph 0.87) limits and I've managed to produce such strange behavior: { bucket: test1-8, pool: .rgw.buckets, index_pool: .rgw.buckets.index, id: default.17497.14, marker: default.17497.14, owner: cb254310-8b24-4622-93fb-640ca4a45998, ver: 21, master_ver: 0, mtime: 1427705802, max_marker: , usage: { rgw.main: { size_kb: 16000, size_kb_actual: 16020, num_objects: 9}}, bucket_quota: { enabled: true, max_size_kb: -1, max_objects: 3}} Steps to reproduce: create bucket, set quota like that (max_objects = 3 and enable) and successfully upload 9 files. User quota is also defined: bucket_quota: { enabled: true, max_size_kb: -1, max_objects: 3}, user_quota: { enabled: true, max_size_kb: 1048576, max_objects: 5}, Could someone please help me to understand how to limit users? -- The question is whether the user is able to continue writing objects at this point. The quota system is working asynchronously, so it's possible to get into edge cases where users exceeded it a bit (it looks a whole lot better with larger numbers). The question is whether it's working for you at all. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Radosgw authorization failed
- Original Message - From: Neville neville.tay...@hotmail.co.uk To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, April 1, 2015 11:45:09 AM Subject: Re: [ceph-users] Radosgw authorization failed On 31 Mar 2015, at 11:38, Neville neville.tay...@hotmail.co.uk wrote: Date: Mon, 30 Mar 2015 12:17:48 -0400 From: yeh...@redhat.com To: neville.tay...@hotmail.co.uk CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Radosgw authorization failed - Original Message - From: Neville neville.tay...@hotmail.co.uk To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Monday, March 30, 2015 6:49:29 AM Subject: Re: [ceph-users] Radosgw authorization failed Date: Wed, 25 Mar 2015 11:43:44 -0400 From: yeh...@redhat.com To: neville.tay...@hotmail.co.uk CC: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Radosgw authorization failed - Original Message - From: Neville neville.tay...@hotmail.co.uk To: ceph-users@lists.ceph.com Sent: Wednesday, March 25, 2015 8:16:39 AM Subject: [ceph-users] Radosgw authorization failed Hi all, I'm testing backup product which supports Amazon S3 as target for Archive storage and I'm trying to setup a Ceph cluster configured with the S3 API to use as an internal target for backup archives instead of AWS. I've followed the online guide for setting up Radosgw and created a default region and zone based on the AWS naming convention US-East-1. I'm not sure if this is relevant but since I was having issues I thought it might need to be the same. I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create a bucket, create a folder, list buckets etc. The problem is when the backup software tries to create an object I get an authorization failure. It's using the same user/access/secret as I'm using from boto.s3 and I'm sure the creds are right as it lets me create the initial connection, it just fails when trying to create an object (backup folder). Here's the extract from the radosgw log: - 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET /:list_bucket:init op 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET /:list_bucket:verifying op mask 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET /:list_bucket:verifying op permissions 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test mask=49 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1 mask=49 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2 mask=49 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test owner=test perm=1 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1, policy perm=1, user_perm_mask=1, acl perm=1 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET /:list_bucket:verifying op params 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET /:list_bucket:executing 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) start num 1001 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET /:list_bucket:http status=200 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0 http_status=200 == 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ: 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88 2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream 2015-03-25 15:07:26.517085
Re: [ceph-users] radosgw crash within libfcgi
- Original Message - From: GuangYang yguan...@outlook.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com Sent: Wednesday, June 24, 2015 2:12:23 PM Subject: RE: radosgw crash within libfcgi Date: Wed, 24 Jun 2015 17:04:05 -0400 From: yeh...@redhat.com To: yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: radosgw crash within libfcgi - Original Message - From: GuangYang yguan...@outlook.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com Sent: Wednesday, June 24, 2015 1:53:20 PM Subject: RE: radosgw crash within libfcgi Thanks Yehuda for the response. We already patched libfcgi to use poll instead of select to overcome the limitation. Thanks, Guang Date: Wed, 24 Jun 2015 14:40:25 -0400 From: yeh...@redhat.com To: yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: radosgw crash within libfcgi - Original Message - From: GuangYang yguan...@outlook.com To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com Sent: Wednesday, June 24, 2015 10:09:58 AM Subject: radosgw crash within libfcgi Hello Cephers, Recently we have several radosgw daemon crashes with the same following kernel log: Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip 7ffa069996f2 sp 7ff55c432710 error 6 in error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault. Is the pointer actually invalid? With (ip - {address_load_the_sharded_library}) to get the instruction which caused this crash, the objdump shows the crash happened at instruction 46f2 (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I don't quite understand how/why it could crash there. 4690 FCGX_Free: 4690: 48 89 5c 24 f0 mov %rbx,-0x10(%rsp) 4695: 48 89 6c 24 f8 mov %rbp,-0x8(%rsp) 469a: 48 83 ec 18 sub $0x18,%rsp 469e: 48 85 ff test %rdi,%rdi 46a1: 48 89 fb mov %rdi,%rbx 46a4: 89 f5 mov %esi,%ebp 46a6: 74 28 je 46d0 FCGX_Free+0x40 46a8: 48 8d 7f 08 lea 0x8(%rdi),%rdi 46ac: e8 67 e3 ff ff callq 2a18 FCGX_FreeStream@plt 46b1: 48 8d 7b 10 lea 0x10(%rbx),%rdi 46b5: e8 5e e3 ff ff callq 2a18 FCGX_FreeStream@plt 46ba: 48 8d 7b 18 lea 0x18(%rbx),%rdi 46be: e8 55 e3 ff ff callq 2a18 FCGX_FreeStream@plt 46c3: 48 8d 7b 28 lea 0x28(%rbx),%rdi 46c7: e8 d4 f4 ff ff callq 3ba0 FCGX_PutS+0x40 46cc: 85 ed test %ebp,%ebp 46ce: 75 10 jne 46e0 FCGX_Free+0x50 46d0: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 46d5: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 46da: 48 83 c4 18 add $0x18,%rsp 46de: c3 retq 46df: 90 nop 46e0: 31 f6 xor %esi,%esi 46e2: 83 7b 4c 00 cmpl $0x0,0x4c(%rbx) 46e6: 8b 7b 30 mov 0x30(%rbx),%edi 46e9: 40 0f 94 c6 sete %sil 46ed: e8 86 e6 ff ff callq 2d78 OS_IpcClose@plt 46f2: c7 43 30 ff ff ff ff movl $0x,0x30(%rbx) info registers? Not too familiar with the specific message, but it could be that OS_IpcClose() aborts (not highly unlikely) and it only dumps the return address of the current function (shouldn't be referenced as ip though). What's rbx? Is the memory at %rbx + 0x30 valid? Also, did you by any chance upgrade the binaries while the code was running? is the code running over nfs? Yehuda Yehuda libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000] Looking at the assembly, it seems crashing at this point - http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which confused me. I tried to see if there is any other reference holding the FCGX_Request which release the handle without any luck. There are also other observations: 1 Several radosgw daemon across different hosts crashed around the same time. 2 Apache's error log has some fcgi error complaining ##idle timeout## during the time. Does anyone experience similar issue? In the past we've had issues with libfcgi that were related to the number of open fds on the process ( 1024). The issue was a buggy libfcgi that was using select
Re: [ceph-users] radosgw crash within libfcgi
Also, looking at the code, I see an extra call to FCGX_Finish_r(): diff --git a/src/rgw/rgw_main.cc b/src/rgw/rgw_main.cc index 9a8aa5f..0aa7ded 100644 --- a/src/rgw/rgw_main.cc +++ b/src/rgw/rgw_main.cc @@ -669,8 +669,6 @@ void RGWFCGXProcess::handle_request(RGWRequest *r) dout(20) process_request() returned ret dendl; } - FCGX_Finish_r(fcgx); - delete req; } Maybe this is a problem on the specific libfcgi version that you're using? - Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: GuangYang yguan...@outlook.com Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com Sent: Wednesday, June 24, 2015 2:21:04 PM Subject: Re: radosgw crash within libfcgi - Original Message - From: GuangYang yguan...@outlook.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com Sent: Wednesday, June 24, 2015 2:12:23 PM Subject: RE: radosgw crash within libfcgi Date: Wed, 24 Jun 2015 17:04:05 -0400 From: yeh...@redhat.com To: yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: radosgw crash within libfcgi - Original Message - From: GuangYang yguan...@outlook.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com Sent: Wednesday, June 24, 2015 1:53:20 PM Subject: RE: radosgw crash within libfcgi Thanks Yehuda for the response. We already patched libfcgi to use poll instead of select to overcome the limitation. Thanks, Guang Date: Wed, 24 Jun 2015 14:40:25 -0400 From: yeh...@redhat.com To: yguan...@outlook.com CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com Subject: Re: radosgw crash within libfcgi - Original Message - From: GuangYang yguan...@outlook.com To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com Sent: Wednesday, June 24, 2015 10:09:58 AM Subject: radosgw crash within libfcgi Hello Cephers, Recently we have several radosgw daemon crashes with the same following kernel log: Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip 7ffa069996f2 sp 7ff55c432710 error 6 in error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault. Is the pointer actually invalid? With (ip - {address_load_the_sharded_library}) to get the instruction which caused this crash, the objdump shows the crash happened at instruction 46f2 (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I don't quite understand how/why it could crash there. 4690 FCGX_Free: 4690: 48 89 5c 24 f0 mov %rbx,-0x10(%rsp) 4695: 48 89 6c 24 f8 mov %rbp,-0x8(%rsp) 469a: 48 83 ec 18 sub $0x18,%rsp 469e: 48 85 ff test %rdi,%rdi 46a1: 48 89 fb mov %rdi,%rbx 46a4: 89 f5 mov %esi,%ebp 46a6: 74 28 je 46d0 FCGX_Free+0x40 46a8: 48 8d 7f 08 lea 0x8(%rdi),%rdi 46ac: e8 67 e3 ff ff callq 2a18 FCGX_FreeStream@plt 46b1: 48 8d 7b 10 lea 0x10(%rbx),%rdi 46b5: e8 5e e3 ff ff callq 2a18 FCGX_FreeStream@plt 46ba: 48 8d 7b 18 lea 0x18(%rbx),%rdi 46be: e8 55 e3 ff ff callq 2a18 FCGX_FreeStream@plt 46c3: 48 8d 7b 28 lea 0x28(%rbx),%rdi 46c7: e8 d4 f4 ff ff callq 3ba0 FCGX_PutS+0x40 46cc: 85 ed test %ebp,%ebp 46ce: 75 10 jne 46e0 FCGX_Free+0x50 46d0: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 46d5: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 46da: 48 83 c4 18 add $0x18,%rsp 46de: c3 retq 46df: 90 nop 46e0: 31 f6 xor %esi,%esi 46e2: 83 7b 4c 00 cmpl $0x0,0x4c(%rbx) 46e6: 8b 7b 30 mov 0x30(%rbx),%edi 46e9: 40 0f 94 c6 sete %sil 46ed: e8 86 e6 ff ff callq 2d78 OS_IpcClose@plt 46f2: c7 43 30 ff ff ff ff movl $0x,0x30(%rbx) info registers? Not too familiar with the specific message, but it could be that OS_IpcClose() aborts (not highly unlikely) and it only dumps the return address of the current function (shouldn't be referenced as ip though). What's rbx? Is the memory at %rbx + 0x30 valid? Also, did you by any chance upgrade the binaries while the code was running
Re: [ceph-users] radosgw crash within libfcgi
- Original Message - From: GuangYang yguan...@outlook.com To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com Sent: Wednesday, June 24, 2015 10:09:58 AM Subject: radosgw crash within libfcgi Hello Cephers, Recently we have several radosgw daemon crashes with the same following kernel log: Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip 7ffa069996f2 sp 7ff55c432710 error 6 in libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000] Looking at the assembly, it seems crashing at this point - http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which confused me. I tried to see if there is any other reference holding the FCGX_Request which release the handle without any luck. There are also other observations: 1 Several radosgw daemon across different hosts crashed around the same time. 2 Apache's error log has some fcgi error complaining ##idle timeout## during the time. Does anyone experience similar issue? In the past we've had issues with libfcgi that were related to the number of open fds on the process ( 1024). The issue was a buggy libfcgi that was using select() instead of poll(), so this might be the issue you're noticing. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw get quota
On Thu, Oct 29, 2015 at 11:29 AM, Derek Yarnellwrote: > Sorry, the information is in the headers. So I think the valid question > to follow up is why is this information in the headers and not the body > of the request. I think this is a bug, but maybe I am not aware of a > subtly. It would seem this json comes from this line[0]. > > [0] - > https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697 > > For example the information is returned in what seems to be the > Content-type header as follows. Maybe the missing : in the json > encoding would explain something? It's definitely a bug. It looks like we fail to call end_header() before it, so everything is dumped before we close the http header. Can you open a ceph tracker issue with the info you provided here? Thanks, Yehuda > > INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS > connection (1): ceph.umiacs.umd.edu > DEBUG:requests.packages.urllib3.connectionpool:"GET > /admin/user?quota=json=foo1209=user HTTP/1.1" 200 0 > INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'), > ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type: > application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6 > (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')] > > On 10/28/15 11:15 PM, Derek Yarnell wrote: >> I have had this issue before, and I don't think I have resolved it. I >> have been using the RGW admin api to set quota based on the docs[0]. >> But I can't seem to be able to get it to cough up and show me the quota >> now. Any ideas I get a 200 back but no body, I have tested this on a >> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster. The latter is what >> the logs are for. >> >> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas >> >> DEBUG:rgwadmin.rgw:URL: >> http://ceph.umiacs.umd.edu/admin/user?quota=derek=user >> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD >> DEBUG:rgwadmin.rgw:Verify: True CA Bundle: None >> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP >> connection (1): ceph.umiacs.umd.edu >> DEBUG:requests.packages.urllib3.connectionpool:"GET >> /admin/user?quota=derek=user HTTP/1.1" 200 0 >> INFO:rgwadmin.rgw:No JSON object could be decoded >> >> >> 2015-10-28 23:02:46.445367 7f444cff1700 1 civetweb: 0x7f445c026d00: >> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1 >> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:02.063755 7f447ace2700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST: >> localhost:7480 >> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set(): >> HTTP_ACCEPT_ENCODING: gzip, deflate >> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */* >> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set(): >> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5 >> Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE: >> Thu, 29 Oct 2015 03:03:17 GMT >> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set(): >> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= >> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_FOR: 128.8.132.4 >> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu >> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu >> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set(): >> HTTP_CONNECTION: Keep-Alive >> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set(): >> REQUEST_METHOD: GET >> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI: >> /admin/user >> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING: >> quota=derek=user >> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER: >> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI: >> /admin/user >> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480 >> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/* >> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip, >> deflate >> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS >> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= >> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive >> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015 >> 03:03:17 GMT >> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480 >> 2015-10-28 23:03:17.139413 7f443cfd1700 20 >> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5 >> Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4 >> 2015-10-28 23:03:17.139415
Re: [ceph-users] Missing bucket
On Fri, Nov 13, 2015 at 12:53 PM, Łukasz Jagiełłowrote: > Hi all, > > Recently I've noticed a problem with one of our buckets: > > I cannot list or stats on a bucket: > #v+ > root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket > error getting bucket stats ret=-22 That's EINVAL, not ENOENT. It could mean lot's of things, e.g., radosgw-admin version mismatch vs. version that osds are running. Try to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit more info about the source of this error. > ➜ ~ s3cmd -c /etc/s3cmd/prod.cfg ls > s3://problematic_bucket/images/e/e0/file.png > ERROR: S3 error: None > #v- > > ,but direct request for an object is working perfectly fine: > #v+ > ➜ ~ curl -svo /dev/null > http://ceph-s1/problematic_bucket/images/e/e0/file.png > […] > < HTTP/1.1 200 OK > < Content-Type: image/png > < Content-Length: 379906 > […] > #v- > > Any solution how to fix it? We're still running ceph 0.67.11 > You're really behind. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing bucket
On Fri, Nov 13, 2015 at 1:37 PM, Łukasz Jagiełłowrote: >> >> > Recently I've noticed a problem with one of our buckets: >> >> > >> >> > I cannot list or stats on a bucket: >> >> > #v+ >> >> > root@ceph-s1:~# radosgw-admin bucket stats >> >> > --bucket=problematic_bucket >> >> > error getting bucket stats ret=-22 >> >> >> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g., >> >> radosgw-admin version mismatch vs. version that osds are running. Try >> >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit >> >> more info about the source of this error. >> > >> > >> > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77 >> > >> > Result of more verbose debug. >> > >> 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 --> >> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30 >> .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 -- >> ?+0 0x15f3740 con 0x15daa60 >> 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <== >> osd.12 10.8.42.35:6800/26514 6 osd_op_reply(30 >> .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 >> 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60 >> error getting bucket stats ret=-22 >> >> You can try taking a look at osd.12 logs. Any chance osd.12 and >> radosgw-admin aren't running the same major version? (more likely >> radosgw-admin running a newer version). > > > From last 12h it's just deep-scrub info > #v+ > 2015-11-13 08:23:00.690076 7fc4c62ee700 0 log [INF] : 15.621 deep-scrub ok > #v- This is unrelated. > > But yesterday there was a big rebalance and a host with that osd was > rebuilding from scratch. > > We're running the same version (ceph, rados) across entire cluster just > double check it. > what does 'radosgw-admin --version' return? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing bucket
On Fri, Nov 13, 2015 at 1:14 PM, Łukasz Jagiełło <jagiello.luk...@gmail.com> wrote: > On Fri, Nov 13, 2015 at 1:07 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com> > wrote: >> >> > Recently I've noticed a problem with one of our buckets: >> > >> > I cannot list or stats on a bucket: >> > #v+ >> > root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket >> > error getting bucket stats ret=-22 >> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g., >> radosgw-admin version mismatch vs. version that osds are running. Try >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit >> more info about the source of this error. > > > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77 > > Result of more verbose debug. > 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 --> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30 .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 -- ?+0 0x15f3740 con 0x15daa60 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <== osd.12 10.8.42.35:6800/26514 6 osd_op_reply(30 .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60 error getting bucket stats ret=-22 You can try taking a look at osd.12 logs. Any chance osd.12 and radosgw-admin aren't running the same major version? (more likely radosgw-admin running a newer version). >> >> You're really behind. > > > I know, we've got scheduled update for 2016 it's a big project to ensure > everything is fine. > > -- > Łukasz Jagiełło > lukaszjagielloorg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw keystone accepted roles not matching
On Thu, Oct 15, 2015 at 8:34 AM, Mike Lowewrote: > I’m having some trouble with radosgw and keystone integration, I always get > the following error: > > user does not hold a matching role; required roles: Member,user,_member_,admin > > Despite my token clearly having one of the roles: > > "user": { > "id": "401375297eb540bbb1c32432439827b0", > "name": "jomlowe", > "roles": [ > { > "id": "8adcf7413cd3469abe4ae13cf259be6e", > "name": "user" > } > ], > "roles_links": [], > "username": "jomlowe" > } > > Does anybody have any hints? Does the user has these roles assigned on keystone? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?
On Thu, Oct 8, 2015 at 1:55 PM, Christian Sarrasinwrote: > After discovering this excellent blog post [1], I thought that taking > advantage of users' "default_placement" feature would be a preferable way to > achieve my multi-tenancy requirements (see previous post). > > Alas I seem to be hitting a snag. Any attempt to create a bucket with a user > setup with a non-empty default_placement results in a 400 error thrown back > to the client and the following msg in the radosgw logs: > > "could not find placement rule placement-user2 within region" > > (The pools exist, I reloaded the radosgw service and ran 'radosgw-admin > regionmap update' as suggested in the blog post before running the client > test) > > Here's the setup. What am I doing wrong? Any insight is really > appreciated! Not sure. Did you run 'radosgw-admin regionmap update'? > > radosgw-admin region get > { "name": "default", > "api_name": "", > "is_master": "true", > "endpoints": [], > "master_zone": "", > "zones": [ > { "name": "default", > "endpoints": [], > "log_meta": "false", > "log_data": "false"}], > "placement_targets": [ > { "name": "default-placement", > "tags": []}, > { "name": "placement-user2", > "tags": []}], > "default_placement": "default-placement"} > > radosgw-admin zone get default > { "domain_root": ".rgw", > "control_pool": ".rgw.control", > "gc_pool": ".rgw.gc", > "log_pool": ".log", > "intent_log_pool": ".intent-log", > "usage_log_pool": ".usage", > "user_keys_pool": ".users", > "user_email_pool": ".users.email", > "user_swift_pool": ".users.swift", > "user_uid_pool": ".users.uid", > "system_key": { "access_key": "", > "secret_key": ""}, > "placement_pools": [ > { "key": "default-placement", > "val": { "index_pool": ".rgw.buckets.index", > "data_pool": ".rgw.buckets", > "data_extra_pool": ".rgw.buckets.extra"}}, > { "key": "placement-user2", > "val": { "index_pool": ".rgw.index.user2", > "data_pool": ".rgw.buckets.user2", > "data_extra_pool": ".rgw.buckets.extra"}}]} > > radosgw-admin user info --uid=user2 > { "user_id": "user2", > "display_name": "User2", > "email": "", > "suspended": 0, > "max_buckets": 1000, > "auid": 0, > "subusers": [], > "keys": [ > { "user": "user2", > "access_key": "VYM2EEU1X5H6Y82D0K4F", > "secret_key": "vEeJ9+yadvtqZrb2xoCAEuM2AlVyZ7UTArbfIEek"}], > "swift_keys": [], > "caps": [], > "op_mask": "read, write, delete", > "default_placement": "placement-user2", > "placement_tags": [], > "bucket_quota": { "enabled": false, > "max_size_kb": -1, > "max_objects": -1}, > "user_quota": { "enabled": false, > "max_size_kb": -1, > "max_objects": -1}, > "temp_url_keys": []} > > [1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw > > > On 03/10/15 19:48, Christian Sarrasin wrote: >> >> What are the best options to setup the Ceph radosgw so it supports >> separate/independent "tenants"? What I'm after: >> >> 1. Ensure isolation between tenants, ie: no overlap/conflict in bucket >> namespace; something separate radosgw "users" doesn't achieve >> 2. Ability to backup/restore tenants' pools individually >> >> Referring to the docs [1], it seems this could possibly be achieved with >> zones; one zone per tenant and leave out synchronization. Seems a little >> heavy handed and presumably the overhead is non-negligible. >> >> Is this "supported"? Is there a better way? >> >> I'm running Firefly. I'm also rather new to Ceph so apologies if this is >> already covered somewhere; kindly send pointers if so... >> >> Cheers, >> Christian >> >> PS: cross-posted from [2] >> >> [1] http://docs.ceph.com/docs/v0.80/radosgw/federated-config/ >> [2] >> >> http://serverfault.com/questions/726491/how-to-setup-ceph-radosgw-to-support-multi-tenancy >> > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?
When you start radosgw, do you explicitly state the name of the region that gateway belongs to? On Thu, Oct 8, 2015 at 2:19 PM, Christian Sarrasin <c.n...@cleansafecloud.com> wrote: > Hi Yehuda, > > Yes I did run "radosgw-admin regionmap update" and the regionmap appears to > know about my custom placement_target. Any other idea? > > Thanks a lot > Christian > > radosgw-admin region-map get > { "regions": [ > { "key": "default", > "val": { "name": "default", > "api_name": "", > "is_master": "true", > "endpoints": [], > "master_zone": "", > "zones": [ > { "name": "default", > "endpoints": [], > "log_meta": "false", > "log_data": "false"}], > "placement_targets": [ > { "name": "default-placement", > "tags": []}, > { "name": "placement-user2", > "tags": []}], > "default_placement": "default-placement"}}], > "master_region": "default", > "bucket_quota": { "enabled": false, > "max_size_kb": -1, > "max_objects": -1}, > "user_quota": { "enabled": false, > "max_size_kb": -1, > "max_objects": -1}} > > On 08/10/15 23:02, Yehuda Sadeh-Weinraub wrote: > >>> Here's the setup. What am I doing wrong? Any insight is really >>> appreciated! >> >> >> Not sure. Did you run 'radosgw-admin regionmap update'? > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3:Permissions of access-key
On Fri, Aug 28, 2015 at 2:17 AM, Zhengqiankun zheng.qian...@h3c.com wrote: hi,Yehuda: I have a question and hope that you can help me answer it. Different subuser of swift can set specific permissions, but why not set specific permission for access-key of s3? Probably because no one ever asked it. It shouldn't be hard to do this, sounds like an easy starter project if anyone wants to get their hands dirty in the rgw code. Note that the canonical way to do it in S3 is through user policies that we don't (yet?) support. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3
As long as you're 100% sure that the prefix is only being used for the specific bucket that was previously removed, then it is safe to remove these objects. But please do double check and make sure that there's no other bucket that matches this prefix somehow. Yehuda On Mon, Aug 31, 2015 at 2:42 PM, Ben Hineswrote: > No input, eh? (or maybe TL,DR for everyone) > > Short version: Presuming the bucket index shows blank/empty, which it > does and is fine, would me manually deleting the rados objects with > the prefix matching the former bucket's ID cause any problems? > > thanks, > > -Ben > > On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines wrote: >> Ceph 0.93->94.2->94.3 >> >> I noticed my pool used data amount is about twice the bucket used data count. >> >> This bucket was emptied long ago. It has zero objects: >> "globalcache01", >> { >> "bucket": "globalcache01", >> "pool": ".rgw.buckets", >> "index_pool": ".rgw.buckets.index", >> "id": "default.8873277.32", >> "marker": "default.8873277.32", >> "owner": "...", >> "ver": "0#12348839", >> "master_ver": "0#0", >> "mtime": "2015-03-08 11:44:11.00", >> "max_marker": "0#", >> "usage": { >> "rgw.none": { >> "size_kb": 0, >> "size_kb_actual": 0, >> "num_objects": 0 >> }, >> "rgw.main": { >> "size_kb": 0, >> "size_kb_actual": 0, >> "num_objects": 0 >> } >> }, >> "bucket_quota": { >> "enabled": false, >> "max_size_kb": -1, >> "max_objects": -1 >> } >> }, >> >> >> >> bucket check shows nothing: >> >> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check >> --bucket=globalcache01 --fix >> [] >> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check >> --check-head-obj-locator --bucket=globalcache01 --fix >> { >> "bucket": "globalcache01", >> "check_objects": [ >> ] >> } >> >> >> However, i see a lot of data for it on an OSD (all shadow files with >> escaped underscores) >> >> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print >> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c >> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c >> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c >> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c >> >> >> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+' >> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47 >> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6 >> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3 >> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1 >> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16 >> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2 >> >> >> Is there still a bug in the fix obj locator command perhaps? I suppose >> can just do something like: >> >>rados -p .rgw.buckets cleanup --prefix default.8873277.32 >> >> Since i want to destroy the bucket anyway, but if this affects other >> buckets, i may want to clean those a better way. >> >> -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3
Make sure you use the underscore also, e.g., "default.8873277.32_". Otherwise you could potentially erase objects you did't intend to, like ones who start with "default.8873277.320" and such. On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote: > Ok. I'm not too familiar with the inner workings of RGW, but i would > assume that for a bucket with these parameters: > >"id": "default.8873277.32", >"marker": "default.8873277.32", > > Tha it would be the only bucket using the files that start with > "default.8873277.32" > > default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15 > default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2 > > > > On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub > <yeh...@redhat.com> wrote: >> As long as you're 100% sure that the prefix is only being used for the >> specific bucket that was previously removed, then it is safe to remove >> these objects. But please do double check and make sure that there's >> no other bucket that matches this prefix somehow. >> >> Yehuda >> >> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote: >>> No input, eh? (or maybe TL,DR for everyone) >>> >>> Short version: Presuming the bucket index shows blank/empty, which it >>> does and is fine, would me manually deleting the rados objects with >>> the prefix matching the former bucket's ID cause any problems? >>> >>> thanks, >>> >>> -Ben >>> >>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote: >>>> Ceph 0.93->94.2->94.3 >>>> >>>> I noticed my pool used data amount is about twice the bucket used data >>>> count. >>>> >>>> This bucket was emptied long ago. It has zero objects: >>>> "globalcache01", >>>> { >>>> "bucket": "globalcache01", >>>> "pool": ".rgw.buckets", >>>> "index_pool": ".rgw.buckets.index", >>>> "id": "default.8873277.32", >>>> "marker": "default.8873277.32", >>>> "owner": "...", >>>> "ver": "0#12348839", >>>> "master_ver": "0#0", >>>> "mtime": "2015-03-08 11:44:11.00", >>>> "max_marker": "0#", >>>> "usage": { >>>> "rgw.none": { >>>> "size_kb": 0, >>>> "size_kb_actual": 0, >>>> "num_objects": 0 >>>> }, >>>> "rgw.main": { >>>> "size_kb": 0, >>>> "size_kb_actual": 0, >>>> "num_objects": 0 >>>> } >>>> }, >>>> "bucket_quota": { >>>> "enabled": false, >>>> "max_size_kb": -1, >>>> "max_objects": -1 >>>> } >>>> }, >>>> >>>> >>>> >>>> bucket check shows nothing: >>>> >>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check >>>> --bucket=globalcache01 --fix >>>> [] >>>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check >>>> --check-head-obj-locator --bucket=globalcache01 --fix >>>> { >>>> "bucket": "globalcache01", >>>> "check_objects": [ >>>> ] >>>> } >>>> >>>> >>>> However, i see a lot of data for it on an OSD (all shadow files with >>>> escaped underscores) >>>> >>>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print >>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c >>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c >>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c >>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c >>>> >>>> >>>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+' >>>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47 >>>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6 >>>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3 >>>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1 >>>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16 >>>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2 >>>> >>>> >>>> Is there still a bug in the fix obj locator command perhaps? I suppose >>>> can just do something like: >>>> >>>>rados -p .rgw.buckets cleanup --prefix default.8873277.32 >>>> >>>> Since i want to destroy the bucket anyway, but if this affects other >>>> buckets, i may want to clean those a better way. >>>> >>>> -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3
The bucket index objects are most likely in the .rgw.buckets.index pool. Yehuda On Mon, Aug 31, 2015 at 3:27 PM, Ben Hines <bhi...@gmail.com> wrote: > Good call, thanks! > > Is there any risk of also deleting parts of the bucket index? I'm not > sure what the objects for the index itself look like, or if they are > in the .rgw.buckets pool. > > > On Mon, Aug 31, 2015 at 3:23 PM, Yehuda Sadeh-Weinraub > <yeh...@redhat.com> wrote: >> Make sure you use the underscore also, e.g., "default.8873277.32_". >> Otherwise you could potentially erase objects you did't intend to, >> like ones who start with "default.8873277.320" and such. >> >> On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote: >>> Ok. I'm not too familiar with the inner workings of RGW, but i would >>> assume that for a bucket with these parameters: >>> >>>"id": "default.8873277.32", >>>"marker": "default.8873277.32", >>> >>> Tha it would be the only bucket using the files that start with >>> "default.8873277.32" >>> >>> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15 >>> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2 >>> >>> >>> >>> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub >>> <yeh...@redhat.com> wrote: >>>> As long as you're 100% sure that the prefix is only being used for the >>>> specific bucket that was previously removed, then it is safe to remove >>>> these objects. But please do double check and make sure that there's >>>> no other bucket that matches this prefix somehow. >>>> >>>> Yehuda >>>> >>>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote: >>>>> No input, eh? (or maybe TL,DR for everyone) >>>>> >>>>> Short version: Presuming the bucket index shows blank/empty, which it >>>>> does and is fine, would me manually deleting the rados objects with >>>>> the prefix matching the former bucket's ID cause any problems? >>>>> >>>>> thanks, >>>>> >>>>> -Ben >>>>> >>>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote: >>>>>> Ceph 0.93->94.2->94.3 >>>>>> >>>>>> I noticed my pool used data amount is about twice the bucket used data >>>>>> count. >>>>>> >>>>>> This bucket was emptied long ago. It has zero objects: >>>>>> "globalcache01", >>>>>> { >>>>>> "bucket": "globalcache01", >>>>>> "pool": ".rgw.buckets", >>>>>> "index_pool": ".rgw.buckets.index", >>>>>> "id": "default.8873277.32", >>>>>> "marker": "default.8873277.32", >>>>>> "owner": "...", >>>>>> "ver": "0#12348839", >>>>>> "master_ver": "0#0", >>>>>> "mtime": "2015-03-08 11:44:11.00", >>>>>> "max_marker": "0#", >>>>>> "usage": { >>>>>> "rgw.none": { >>>>>> "size_kb": 0, >>>>>> "size_kb_actual": 0, >>>>>> "num_objects": 0 >>>>>> }, >>>>>> "rgw.main": { >>>>>> "size_kb": 0, >>>>>> "size_kb_actual": 0, >>>>>> "num_objects": 0 >>>>>> } >>>>>> }, >>>>>> "bucket_quota": { >>>>>> "enabled": false, >>>>>> "max_size_kb": -1, >>>>>> "max_objects": -1 >>>>>> } >>>>>> }, >>>>>> >>>>>> >>>>>> >>>>>> bucket check shows nothing: >>>>>> >>>>>> 16:07:09 root
Re: [ceph-users] Troubleshooting rgw bucket list
I assume you filtered the log by thread? I don't see the response messages. For the bucket check you can run radosgw-admin with --log-to-stderr. Can you also set 'debug objclass = 20' on the osds? You can do it by: $ ceph tell osd.\* injectargs --debug-objclass 20 Also, it'd be interesting to get the following: $ radosgw-admin bi list --bucket= --object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5 Thanks, Yehuda On Tue, Sep 1, 2015 at 10:44 AM, Sam Wouters <s...@ericom.be> wrote: > not sure where I can find the logs for the bucket check, I can't really > filter them out in the radosgw log. > > -Sam > > On 01-09-15 19:25, Sam Wouters wrote: >> It looks like it, this is what shows in the logs after bumping the debug >> and requesting a bucket list. >> >> 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list >> aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1]) >> start >> abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[] >> num_entries 1 >> 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from >> .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 >> 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state: >> rctx=0x7fccb17c84d0 >> obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1 >> state=0x7fcde01a4060 s->prefetch_data=0 >> 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get: >> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit >> 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was >> set empty >> 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get: >> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit >> 2015-09-01 17:14:53.008675 7fccb17ca700 1 -- 10.11.4.105:0/1109243 --> >> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874 >> ... >> .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84 >> ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870 >> >> On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote: >>> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the >>> operations (bucket listing and bucket check) go into some kind of >>> infinite loop? >>> >>> Yehuda >>> >>> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters <s...@ericom.be> wrote: >>>> Hi, I've started the bucket --check --fix on friday evening and it's >>>> still running. 'ceph -s' shows the cluster health as OK, I don't know if >>>> there is anything else I could check? Is there a way of finding out if >>>> its actually doing something? >>>> >>>> We only have this issue on the one bucket with versioning enabled, I >>>> can't get rid of the feeling it has something todo with that. The >>>> "underscore bug" is also still present on that bucket >>>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any >>>> way. >>>> Are there any alternatives, as for example copy all the objects into a >>>> new bucket without versioning? Simple way would be to list the objects >>>> and copy them to a new bucket, but bucket listing is not working so... >>>> >>>> -Sam >>>> >>>> >>>> On 31-08-15 10:47, Gregory Farnum wrote: >>>>> This generally shouldn't be a problem at your bucket sizes. Have you >>>>> checked that the cluster is actually in a healthy state? The sleeping >>>>> locks are normal but should be getting woken up; if they aren't it >>>>> means the object access isn't working for some reason. A down PG or >>>>> something would be the simplest explanation. >>>>> -Greg >>>>> >>>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters <s...@ericom.be> wrote: >>>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose >>>>>> or progress logging of the radosgw-admin tool. >>>>>> I will start a check and let it run over the weekend. >>>>>> >>>>>> tnx, >>>>>> Sam >>>>>> >>>>>> On 28-08-15 18:16, Sam Wouters wrote: >>>>>>> Hi, >>>>>>> >>>>>>> this bucket only has 13389 objects, so the index size shouldn't be a >>>>>>> problem. Also, on the same cluster we have an other bucket with 1200543 >>>>>>> objects (but no versioning configured), which has no issues. >>
Re: [ceph-users] Troubleshooting rgw bucket list
Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the operations (bucket listing and bucket check) go into some kind of infinite loop? Yehuda On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouterswrote: > Hi, I've started the bucket --check --fix on friday evening and it's > still running. 'ceph -s' shows the cluster health as OK, I don't know if > there is anything else I could check? Is there a way of finding out if > its actually doing something? > > We only have this issue on the one bucket with versioning enabled, I > can't get rid of the feeling it has something todo with that. The > "underscore bug" is also still present on that bucket > (http://tracker.ceph.com/issues/12819). Not sure if thats related in any > way. > Are there any alternatives, as for example copy all the objects into a > new bucket without versioning? Simple way would be to list the objects > and copy them to a new bucket, but bucket listing is not working so... > > -Sam > > > On 31-08-15 10:47, Gregory Farnum wrote: >> This generally shouldn't be a problem at your bucket sizes. Have you >> checked that the cluster is actually in a healthy state? The sleeping >> locks are normal but should be getting woken up; if they aren't it >> means the object access isn't working for some reason. A down PG or >> something would be the simplest explanation. >> -Greg >> >> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters wrote: >>> Ok, maybe I'm to impatient. It would be great if there were some verbose >>> or progress logging of the radosgw-admin tool. >>> I will start a check and let it run over the weekend. >>> >>> tnx, >>> Sam >>> >>> On 28-08-15 18:16, Sam Wouters wrote: Hi, this bucket only has 13389 objects, so the index size shouldn't be a problem. Also, on the same cluster we have an other bucket with 1200543 objects (but no versioning configured), which has no issues. when we run a radosgw-admin bucket --check (--fix), nothing seems to be happening. Putting an strace on the process shows a lot of lines like these: [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 [pid 99385] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 99371] <... futex resumed> ) = 0 but no errors in the ceph logs or health warnings. r, Sam On 28-08-15 17:49, Ben Hines wrote: > How many objects in the bucket? > > RGW has problems with index size once number of objects gets into the > 90+ level. The buckets need to be recreated with 'sharded bucket > indexes' on: > > rgw override bucket index max shards = 23 > > You could also try repairing the index with: > > radosgw-admin bucket check --fix --bucket= > > -Ben > > On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters wrote: >> Hi, >> >> we have a rgw bucket (with versioning) where PUT and GET operations for >> specific objects succeed, but retrieving an object list fails. >> Using python-boto, after a timeout just gives us an 500 internal error; >> radosgw-admin just hangs. >> Also a radosgw-admin bucket check just seems to hang... >> >> ceph version is 0.94.3 but this also was happening with 0.94.2, we >> quietly hoped upgrading would fix but it didn't... >> >> r, >> Sam >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to observed civetweb.
You can increase the civetweb logs by adding 'debug civetweb = 10' in your ceph.conf. The output will go into the rgw logs. Yehuda On Tue, Sep 8, 2015 at 2:24 AM, Vickie chwrote: > Dear cephers, >Just upgrade radosgw from apache to civetweb. > It's really simple to installed and used. But I can't find any parameters or > logs to adjust(or observe) civetweb. (Like apache log). I'm really confuse. > Any ideas? > > > Best wishes, > Mika > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw and keystone version 3 domains
At the moment radosgw just doesn't support v3 (so it seems). I created issue #13303. If anyone wants to pick this up (or provide some information as to what it would require to support that) it would be great. Thanks, Yehuda On Wed, Sep 30, 2015 at 3:32 AM, Robert Duncanwrote: > Yes, but it always results in 401 from horizon and cli > > swift --debug --os-auth-url http://172.25.60.2:5000/v3 --os-username ldapuser > --os-user-domain-name ldapdomain --os-project-name someproject > --os-project-domain-name ldapdomain --os-password password123 -V 3 post > containerV3 > DEBUG:keystoneclient.auth.identity.v3:Making authentication request to > http://172.25.60.2:5000/v3/auth/tokens > INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2 > DEBUG:urllib3.connectionpool:Setting read timeout to None > DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366 > DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:46.053177Z into {'tz_sign': > None, 'second_fraction': u'053177', 'hour': u'11', 'daydash': u'30', > 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'46', > 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', > 'day': None, 'minute': u'20'} with default timezone object at 0x1736f50> > DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None > DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None > DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9 > DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None > DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30 > DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None > DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None > DEBUG:iso8601.iso8601:Got u'46' for 'second' with default None > INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2 > DEBUG:urllib3.connectionpool:Setting read timeout to 0x7f193dc590b0> > DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None > INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X > POST -H "Content-Length: 0" -H "X-Auth-Token: > 30fd924774bf480d8814c61c7fdf128e" > INFO:swiftclient:RESP STATUS: 401 Unauthorized > INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), > ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', > 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 > Sep 2015 10:20:46 GMT'), ('content-type', 'text/plain; charset=utf-8')] > INFO:swiftclient:RESP BODY: AccessDenied > > DEBUG:keystoneclient.auth.identity.v3:Making authentication request to > http://172.25.60.2:5000/v3/auth/tokens > INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2 > DEBUG:urllib3.connectionpool:Setting read timeout to None > DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366 > DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:47.839422Z into {'tz_sign': > None, 'second_fraction': u'839422', 'hour': u'11', 'daydash': u'30', > 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'47', > 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', > 'day': None, 'minute': u'20'} with default timezone object at 0x1736f50> > DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None > DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None > DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9 > DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None > DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30 > DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None > DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None > DEBUG:iso8601.iso8601:Got u'47' for 'second' with default None > INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2 > DEBUG:urllib3.connectionpool:Setting read timeout to 0x7f193dc590b0> > DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None > INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X > POST -H "Content-Length: 0" -H "X-Auth-Token: > fc7bb4a07baf41058546d8a85b2cd2b8" > INFO:swiftclient:RESP STATUS: 401 Unauthorized > INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), > ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', > 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 > Sep 2015 10:20:47 GMT'), ('content-type', 'text/plain; charset=utf-8')] > INFO:swiftclient:RESP BODY: AccessDenied > > ERROR:swiftclient:Container POST failed: > http://172.25.60.2:8080/swift/v1/containerV3 401 Unauthorized AccessDenied > Traceback (most recent call last): > File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 1243, > in _retry > rv = func(self.url, self.token, *args, **kwargs) > File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 771, in > post_container > http_response_content=body)
Re: [ceph-users] radosgw Storage policies
On Mon, Sep 28, 2015 at 4:00 AM, Luis Periquitowrote: > Hi All, > > I was hearing the ceph talk about radosgw and Yehuda talks about storage > policies. I started looking for it in the documentation, on how to > implement/use and couldn't much information: > http://docs.ceph.com/docs/master/radosgw/s3/ says it doesn't currently > support it, and http://docs.ceph.com/docs/master/radosgw/swift/ doesn't > mention it. > > From the release notes it seems to be for the swift interface, not S3. Is > this correct? Can we create them for S3 interface, or only Swift? > > You can create buckets in both swift and s3 that utilize this feature. You need to define different placement targets in the zone configuration. In S3 when you create a bucket, you need to specify a location constrain that specifies this policy. The location constraint should be specified as follows: [region][:policy]. So if you're creating a bucket in the current region using your 'gold' policy that you defined, you'll need to set it to ':gold'. In swift, the api requires sending it through a special http header (X-Storage-Policy). Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados gateway / no socket server point defined
On Thu, Sep 24, 2015 at 8:59 AM, Mikaël Guichardwrote: > Hi, > > I encounter this error : > >> /usr/bin/radosgw -d --keyring /etc/ceph/ceph.client.radosgw.keyring -n >> client.radosgw.myhost > 2015-09-24 17:41:18.223206 7f427f074880 0 ceph version 0.94.3 > (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process radosgw, pid 4570 > 2015-09-24 17:41:18.349037 7f427f074880 0 framework: fastcgi > 2015-09-24 17:41:18.349044 7f427f074880 0 framework: civetweb > 2015-09-24 17:41:18.349048 7f427f074880 0 framework conf key: port, val: > 7480 > 2015-09-24 17:41:18.349056 7f427f074880 0 starting handler: civetweb > 2015-09-24 17:41:18.351852 7f427f074880 0 starting handler: fastcgi > 2015-09-24 17:41:18.351921 7f41fc7a0700 0 ERROR: no socket server point > defined, cannot start fcgi frontend > > I can force the socket file with the followed option and it works : > --rgw-socket-path=/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock > but why the ceph.conf parameter is ignored ? > > I look in the radosgw code, it should work : > > conf->get_val("socket_path", "", _path); > conf->get_val("socket_port", g_conf->rgw_port, _port); > conf->get_val("socket_host", g_conf->rgw_host, _host); > > if (socket_path.empty() && socket_port.empty() && socket_host.empty()) { > socket_path = g_conf->rgw_socket_path; > if (socket_path.empty()) { > dout(0) << "ERROR: no socket server point defined, cannot start fcgi > frontend" << dendl; > return; > } > } > > > > My ceph.conf content : > > [client.radosgw.gateway] You're using a different user for starting rgw (client.radosgw.myhost), so this config section doesn't get used. Either rename this section, or use the client.radosgw.gateway user. Yehuda > host = myhost > keyring = /etc/ceph/ceph.client.radosgw.keyring > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock > rgw print continue = false > rgw enable usage log = true > rgw enable ops log = true > log file = /var/log/radosgw/client.radosgw.gateway.log > rgw usage log tick interval = 30 > rgw usage log flush threshold = 1024 > rgw usage max shards = 32 > rgw usage max user shards = 1 > > thanks for your response. > > regards > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] s3cmd --disable-multipart
On Thu, Dec 10, 2015 at 11:10 AM, Deneau, Tomwrote: > If using s3cmd to radosgw and using s3cmd's --disable-multipart option, is > there any limit to the size of the object that can be stored thru radosgw? > rgw limits plain uploads to 5GB > Also, is there a recommendation for multipart chunk size for radosgw? > Having it as a multiply of the underlying rgw stripe size (default is 4MB) might be a good idea. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] about federated gateway
On Sun, Dec 13, 2015 at 7:27 AM, 孙方臣wrote: > Hi, All, > > I'm setting up federated gateway. One is master zone, the other is slave > zone. Radosgw-agent is running in slave zone. I have encountered some > problems, can anybody help answering this: > > 1. When put a object to radosgw, there are two bilogs to generate. One is > "pending" state, the other is "complete" state.This should be ignored when > the entry is "pending" state, otherwise the same object will be copied > twice. I have a pull request that is at > https://github.com/ceph/radosgw-agent/pull/39, please give some suggestions > about it. > > 2. When the "rgw_num_rados_handles" is set as 16, the radosgw-agent caannot > unlock, the error code is 404. the log is following: > .. > 2015-12-13 21:52:33,373 26594 [radosgw_agent.lock][WARNING] failed to unlock > shard 115 in zone zone-a: Http error code 404 content Not Found > .. > 2015-12-13 21:53:00,732 26594 [radosgw_agent.lock][ERROR ] locking shard 116 > in zone zone-a failed: Http error code 423 content > .. > > I can find the locker with the "rados lock info" command, and can break the > lock with "rados lock break" command. > I find the reason finally, the reason is that the lock request from > radosgw-agent is processed by rados client and the unlock request from > radosgw-agent is processed by anther rados client. When the > "rgw_num_rados_handles" is set as 1, the warning message did not appeared. > Can anybody help giving some suggestions about this, and can the warning > message be ignored? Hi, it certainly seems like a bug. Can you open an issue at tracker.ceph.com? Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw pool names
On Fri, Jun 10, 2016 at 11:44 AM, Deneau, Tomwrote: > When I start radosgw, I create the pool .rgw.buckets manually to control > whether it is replicated or erasure coded and I let the other pools be > created automatically. > > However, I have noticed that sometimes the pools get created with the > "default" > prefix, thus > rados lspools > .rgw.root > default.rgw.control > default.rgw.data.root > default.rgw.gc > default.rgw.log > .rgw.buckets # the one I created > default.rgw.users.uid > default.rgw.users.keys > default.rgw.meta > default.rgw.buckets.index > default.rgw.buckets.data # the one actually being used > > What controls whether these pools have the "default" prefix or not? > The prefix is the name of the zone ('default' by default). This was added for the jewel release, as well as dropping the requirement of having the pool names starts with a dot. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw s3website issue
On Sun, May 29, 2016 at 4:47 AM, Gaurav Bafnawrote: > Hi Cephers, > > I am unable to create bucket hosting a webstite in my vstart cluster. > > When I do this in boto : > > website_bucket.configure_website('index.html','error.html') > > I get : > > boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed > > > Here is my ceph.conf for radosgw: > > rgw frontends = fastcgi, civetweb port=8010 > > rgw enable static website = true > > rgw dns name = 10.140.13.22 > > rgw dns s3website name = 10.140.13.22 > > > Here are the logs in rgw : > > 2016-05-29 00:00:47.191297 7ff404ff9700 1 == starting new request > req=0x7ff404ff37d0 = > > 2016-05-29 00:00:47.191325 7ff404ff9700 2 req 1:0.28::PUT > /s3website/::initializing for trans_id = > tx1-005749967f-101f-default > > 2016-05-29 00:00:47.191330 7ff404ff9700 10 host=10.140.13.22 > > 2016-05-29 00:00:47.191338 7ff404ff9700 20 subdomain= > domain=10.140.13.22 in_hosted_domain=1 in_hosted_domain_s3website=1 > Could it be that the endpoint is configured to serve both S3 and static websites? Yehuda > 2016-05-29 00:00:47.191350 7ff404ff9700 5 the op is PUT > > 2016-05-29 00:00:47.191395 7ff404ff9700 20 get_handler > handler=32RGWHandler_REST_Bucket_S3Website > > 2016-05-29 00:00:47.191399 7ff404ff9700 10 > handler=32RGWHandler_REST_Bucket_S3Website > > 2016-05-29 00:00:47.191401 7ff404ff9700 2 req 1:0.000104:s3:PUT > /s3website/::getting op 1 > > 2016-05-29 00:00:47.191410 7ff404ff9700 10 > RGWHandler_REST_S3Website::error_handler err_no=-2003 http_ret=405 > > 2016-05-29 00:00:47.191412 7ff404ff9700 20 No special error handling today! > > 2016-05-29 00:00:47.191415 7ff404ff9700 20 handler->ERRORHANDLER: > err_no=-2003 new_err_no=-2003 > > 2016-05-29 00:00:47.191504 7ff404ff9700 2 req 1:0.000207:s3:PUT > /s3website/::op status=0 > > 2016-05-29 00:00:47.191510 7ff404ff9700 2 req 1:0.000213:s3:PUT > /s3website/::http status=405 > > 2016-05-29 00:00:47.191511 7ff404ff9700 1 == req done > req=0x7ff404ff37d0 op status=0 http_status=405 == > > > Code wise I see that put_op is not defined for > RGWHandler_REST_S3Website class but is defined for > RGWHandler_REST_Bucket_S3 class . > > Can somebody please help me out ? > > > > > -- > Gaurav Bafna > 9540631400 > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
On Thu, Jan 14, 2016 at 10:51 PM, seapasu...@uchicago.eduwrote: > It looks like the gateway is experiencing a similar race condition to what > we reported before. > > The rados object has a size of 0 bytes but the bucket index shows the object > listed and the object metadata shows a size of > 7147520 bytes. > > I have a lot of logs but I don't think any of them have the full data from > the upload of this object. > > I thought this bug was fixed back in firefly/giant > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19971.html > > -- > > root@kg34-33:/srv/nfs/griffin_temp# rados -p .rgw.buckets stat > default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar > ..rgw.buckets/default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar > mtime 1446672570, size 0 > > -- > > SError: [Errno 2] No such file or directory: > '/srv/nfs/griffin_tempnoaa-nexrad-l2/2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar' > > In [13]: print(key.size) > 7147520 > > We are currently using 94.5 and the file were uploaded to hammer as well > > lacadmin@kh28-10:~$ ceph --version > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) > lacadmin@kh28-10:~$ radosgw --version > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) > > > The cluster is health_ok and was ok during the upload. I need to confirm > with the person who uploaded the data but I think they did it with s3cmd. > Has anyone seen this before? I think I need to file a bug :-( > What does 'radosgw-admin object stat --bucket= --object=' show? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > Hello Yehuda, > > Here it is:: > > radosgw-admin object stat --bucket="noaa-nexrad-l2" > --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar" > { > "name": > "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", > "size": 7147520, > "policy": { > "acl": { > "acl_user_map": [ > { > "user": "b05f707271774dbd89674a0736c9406e", > "acl": 15 > } > ], > "acl_group_map": [ > { > "group": 1, > "acl": 1 > } > ], > "grant_map": [ > { > "id": "", > "grant": { > "type": { > "type": 2 > }, > "id": "", > "email": "", > "permission": { > "flags": 1 > }, > "name": "", > "group": 1 > } > }, > { > "id": "b05f707271774dbd89674a0736c9406e", > "grant": { > "type": { > "type": 0 > }, > "id": "b05f707271774dbd89674a0736c9406e", > "email": "", > "permission": { > "flags": 15 > }, > "name": "noaa-commons", > "group": 0 > } > } > ] > }, > "owner": { > "id": "b05f707271774dbd89674a0736c9406e", > "display_name": "noaa-commons" > } > }, > "etag": "b91b6f1650350965c5434c547b3c38ff-1\u", > "tag": "_cWrvEa914Gy1AeyzIhRlUdp1wJnek3E\u", > "manifest": { > "objs": [], > "obj_size": 7147520, > "explicit_objs": "false", > "head_obj": { > "bucket": { > "name": "noaa-nexrad-l2", > "pool": ".rgw.buckets", > "data_extra_pool": ".rgw.buckets.extra", > "index_pool": ".rgw.buckets.index", > "marker": "default.384153.1", > "bucket_id": "default.384153.1" > }, > "key": "", > "ns": "", > "object": > "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", > "instance": "" > }, > "head_size": 0, > "max_head_size": 0, > "prefix": > "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar.2~pcu5Hz6foFXjlSxBat22D8YMcHlQOBD", Try running: $ rados -p .rgw.buckets ls | grep pcu5Hz6 Yehuda > "tail_bucket": { > "name": "noaa-nexrad-l2", > "pool": ".rgw.buckets", > "data_extra_pool": ".rgw.buckets.extra", > "index_pool": ".rgw.buckets.index", > "marker": "default.384153.1", > "bucket_id": "default.384153.1" > }, > "rules": [ > { > "key": 0, > "val": { > "start_part_num": 1, > "start_ofs": 0, > "part_size": 0, > "stripe_max_size": 4194304, > "override_prefix": "" > } > } > ] > }, > "attrs": {} > > } > > On 1/15/16 11:17 AM, Yehuda Sadeh-Weinraub wrote: >> >> radosgw-admin object stat --bucket= --object=' > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
That's interesting, and might point at the underlying issue that caused it. Could be a racing upload that somehow ended up with the wrong object head. The 'multipart' object should be 4M in size, and the 'shadow' one should have the remainder of the data. You can run 'rados stat -p .rgw.buckets ' to validate that. If that's the case, you can copy these to the expected object names: $ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2 $ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD $ rados -p .rgw.buckets cp default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1 default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1 $ rados -p .rgw.buckets cp default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1 default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1 Yehuda On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6' > lacadmin@kh28-10:~$ > > Nothing was found. That said when I run the command with another prefix > snippet:: > lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto' > default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1 > default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1 > > > > > On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote: >> >> On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> Hello Yehuda, >>> >>> Here it is:: >>> >>> radosgw-admin object stat --bucket="noaa-nexrad-l2" >>> >>> --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar" >>> { >>> "name": >>> >>> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", >>> "size": 7147520, >>> "policy": { >>> "acl": { >>> "acl_user_map": [ >>> { >>> "user": "b05f707271774dbd89674a0736c9406e", >>> "acl": 15 >>> } >>> ], >>> "acl_group_map": [ >>> { >>> "group": 1, >>> "acl": 1 >>> } >>> ], >>> "grant_map": [ >>> { >>> "id": "", >>> "grant": { >>> "type": { >>> "type": 2 >>> }, >>> "id": "", >>> "email": "", >>> "permission": { >>> "flags": 1 >>> }, >>> "name": "", >>> "group": 1 >>> } >>> }, >>> { >>> "id": "b05f707271774dbd89674a0736c9406e", >>> "grant": { >>> "type": { >>> "type": 0 >>> }, >>> "id": "b05f707271774dbd89674a0736c9406e", >>> "email": "", >>> "permission": { >>> "flags": 15 >>> }, >>> "name": "noaa-commons", >>> "group": 0 >>> } >>> } >>> ] >>> }, >>> "owner": { >>> "id": "b05f707271774dbd89674a0736c9406e", >>> "display_n
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
Ah, I see. Misread that and the object names were very similar. No, don't copy it. You can try to grep for the specific object name and see if there are pieces of it lying around under a different upload id. Yehuda On Fri, Jan 15, 2016 at 1:44 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > Sorry I am a bit confused. The successful list that I provided is from a > different object of the same size to show that I could indeed get a list. > Are you saying to copy the working object to the missing object? Sorry for > the confusion. > > > On 1/15/16 3:20 PM, Yehuda Sadeh-Weinraub wrote: >> >> That's interesting, and might point at the underlying issue that >> caused it. Could be a racing upload that somehow ended up with the >> wrong object head. The 'multipart' object should be 4M in size, and >> the 'shadow' one should have the remainder of the data. You can run >> 'rados stat -p .rgw.buckets ' to validate that. If that's the >> case, you can copy these to the expected object names: >> >> $ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2 >> $ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD >> >> $ rados -p .rgw.buckets cp >> >> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1 >> >> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1 >> >> $ rados -p .rgw.buckets cp >> >> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1 >> >> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1 >> >> Yehuda >> >> >> On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6' >>> lacadmin@kh28-10:~$ >>> >>> Nothing was found. That said when I run the command with another prefix >>> snippet:: >>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto' >>> >>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1 >>> >>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1 >>> >>> >>> >>> >>> On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> On Fri, Jan 15, 2016 at 9:36 AM, seapasu...@uchicago.edu >>>> <seapasu...@uchicago.edu> wrote: >>>>> >>>>> Hello Yehuda, >>>>> >>>>> Here it is:: >>>>> >>>>> radosgw-admin object stat --bucket="noaa-nexrad-l2" >>>>> >>>>> >>>>> --object="2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar" >>>>> { >>>>> "name": >>>>> >>>>> >>>>> "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", >>>>> "size": 7147520, >>>>> "policy": { >>>>> "acl": { >>>>> "acl_user_map": [ >>>>> { >>>>> "user": "b05f707271774dbd89674a0736c9406e", >>>>> "acl": 15 >>>>> } >>>>> ], >>>>> "acl_group_map": [ >>>>> { >>>>> "group": 1, >>>>> "acl": 1 >>>>> } >>>>> ], >>>>> "grant_map": [ >>>>> { >>>>> "id": "", >>>>> "grant": { >>>>> "type": { >>>>> "type": 2 >>>>> }, >>>>> "id": "", >>>>> "email": "", >>>>> "permission": { >
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
The head object of a multipart object has 0 size, so it's expected. What's missing is the tail of the object. I don't assume you have any logs from when the object was uploaded? Yehuda On Fri, Jan 15, 2016 at 2:12 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > Sorry for the confusion:: > > When I grepped for the prefix of the missing object:: > "2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar.2~pcu5Hz6foFXjlSxBat22D8YMcHlQOBD" > > I am not able to find any chunks of the object:: > > lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6' > lacadmin@kh28-10:~$ > > The only piece of the object that I can seem to find is the original one I > posted:: > lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep > 'NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959' > default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar > > And when we stat this object is is 0 bytes as shown earlier:: > lacadmin@kh28-10:~$ rados -p .rgw.buckets stat > 'default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar' > .rgw.buckets/default.384153.1_2015/01/01/PAKC/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar > mtime 2015-11-04 15:29:30.00, size 0 > > Sorry again for the confusion. > > > > On 1/15/16 3:58 PM, Yehuda Sadeh-Weinraub wrote: >> >> Ah, I see. Misread that and the object names were very similar. No, >> don't copy it. You can try to grep for the specific object name and >> see if there are pieces of it lying around under a different upload >> id. >> >> Yehuda >> >> On Fri, Jan 15, 2016 at 1:44 PM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> Sorry I am a bit confused. The successful list that I provided is from a >>> different object of the same size to show that I could indeed get a list. >>> Are you saying to copy the working object to the missing object? Sorry >>> for >>> the confusion. >>> >>> >>> On 1/15/16 3:20 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> That's interesting, and might point at the underlying issue that >>>> caused it. Could be a racing upload that somehow ended up with the >>>> wrong object head. The 'multipart' object should be 4M in size, and >>>> the 'shadow' one should have the remainder of the data. You can run >>>> 'rados stat -p .rgw.buckets ' to validate that. If that's the >>>> case, you can copy these to the expected object names: >>>> >>>> $ src_uploadid=wksHvto9gRgHUJbhm_TZPXJTZUPXLT2 >>>> $ dest_uploadid=pcu5Hz6foFXjlSxBat22D8YMcHlQOBD >>>> >>>> $ rados -p .rgw.buckets cp >>>> >>>> >>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_uploadid}.1 >>>> >>>> >>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_uploadid}.1 >>>> >>>> $ rados -p .rgw.buckets cp >>>> >>>> >>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${src_upload_id}.1_1 >>>> >>>> >>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~${dest_upload_id}.1_1 >>>> >>>> Yehuda >>>> >>>> >>>> On Fri, Jan 15, 2016 at 1:02 PM, seapasu...@uchicago.edu >>>> <seapasu...@uchicago.edu> wrote: >>>>> >>>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'pcu5Hz6' >>>>> lacadmin@kh28-10:~$ >>>>> >>>>> Nothing was found. That said when I run the command with another prefix >>>>> snippet:: >>>>> lacadmin@kh28-10:~$ rados -p .rgw.buckets ls | grep 'wksHvto' >>>>> >>>>> >>>>> default.384153.1__shadow_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1_1 >>>>> >>>>> >>>>> default.384153.1__multipart_2015/01/01/KABR/NWS_NEXRAD_NXL2DP_KABR_2015010113_20150101135959.tar.2~wksHvto9gRgHUJbhm_TZPXJTZUPXLT2.1 >>>>> >>>>> >>>>> >>>>> >>>>> On 1/15/16 12:05 PM, Yehuda Sadeh-Weinraub wrote: >>>>>> >>>>>> On Fri, Jan 15, 2016 at 9:
Re: [ceph-users] v10.0.2 released
On Thu, Jan 14, 2016 at 7:37 AM, Sage Weilwrote: > This development release includes a raft of changes and improvements for > Jewel. Key additions include CephFS scrub/repair improvements, an AIX and > Solaris port of librados, many librbd journaling additions and fixes, > extended per-pool options, and NBD driver for RBD (rbd-nbd) that allows > librbd to present a kernel-level block device on Linux, multitenancy > support for RGW, RGW bucket lifecycle support, RGW support for Swift rgw bucket lifecycle isn't there, it still has some more way to go before we merge it in. Yehuda > static large objects (SLO), and RGW support for Swift bulk delete. > > There are also lots of smaller optimizations and performance fixes going > in all over the tree, particular in the OSD and common code. > > Notable Changes > --- > > See > > http://ceph.com/releases/v10-0-2-released/ > > [I'd include the changelog here but I'm missing a oneliner that renders > the rst in email-suitable form...] > > Getting Ceph > > > * Git at git://github.com/ceph/ceph.git > * Tarball at http://download.ceph.com/tarballs/ceph-10.0.2.tar.gz > * For packages, see http://ceph.com/docs/master/install/get-packages > * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw anonymous write
On Tue, Feb 9, 2016 at 5:15 AM, Jacek Jarosiewiczwrote: > Hi list, > > My setup is: ceph 0.94.5, ubuntu 14.04, tengine (patched nginx). > > I'm trying to migrate from our old file storage (MogileFS) to the new ceph > radosgw. The problem is that the old storage had no access control - no > authorization, so the access to read and/or write was controlled by the web > server (ie per IP/network). > > I want to keep the clients using old storage, but get rid of the MogileFS so > I don't have to maintain two different storage solutions. > > Basically MogileFS http API is similar to S3, except for the authorization > part - so the methods are the same (PUT, GET, DELETE..). > > I've created a bucket with public-read-write access and tried to connect > MogileFS client to it - the uploads work fine, and the files get acl > public-read so are readable, but they don't have an owner. > > So after upload I can't manage them (ie modify acl) - I can only remove > objects. > > Is there a way to force files that are uploaded anonymously to have an > owner? Is there a way maybe to have them inherit owner from the bucket? > Currently there's no way to change it. I'm not sure though that we're doing the correct thing. Did you try it with Amazon S3 by any chance? > Cheers, > J > > -- > Jacek Jarosiewicz > Administrator Systemów Informatycznych > > > SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie > ul. Senatorska 13/15, 00-075 Warszawa > Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru > Sądowego, > nr KRS 029537; kapitał zakładowy 42.756.000 zł > NIP: 957-05-49-503 > Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa > > > SUPERMEDIA -> http://www.supermedia.pl > dostep do internetu - hosting - kolokacja - lacza - telefonia > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: oddity when creating users via admin api
On Wed, Jan 27, 2016 at 4:20 PM, seapasu...@uchicago.eduwrote: > So when I create a new user with the admin api. If the user already exists > it just generates a new keypair for that user. Shouldn't the admin api > report that the user already exists? I ask because I can end up with > multiple keypairs for the same user unintentionally which could be an issue. > I was not sure if this was a feature or a bug so I thought I would ask here > prior to filing a bug. It's definitely a bug. But note that it sounds familiar, and we might have already fixed it for the next major version. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW :: bucket quota not enforced below 1
On Wed, Jan 27, 2016 at 4:18 PM, seapasu...@uchicago.eduwrote: > if you set a RGW user to have abucket quota of 0 buckets you can still > create buckets. The only way I have found to prevent a user from being able > to create buckets is to set the op_mask to read. 1.) it looks like > bucket_policy is not enforced when you have it set to anything below 1. It > looks like the only way to prevent a user from creating buckets is to set > the op_mask but this is not documented. How would I set the op_mask via the > radosgw admin api? I keep getting a 200 success code but the op_mask of the > user stays the same. > > relavent pastebins: > http://pastebin.com/Rbzdy52c -- shows user info with bucket quota set but > shows ability to create buckets. > http://pastebin.com/J9K3dgdF -- shows inability to set op_mask from admin > api (that or I don't know how) > > > 1.) does anyone know how to set the op_mask via the admin api? 2.) why can I > create what seems like an infinite amount of buckets when my bucket quota is > set to 0 objects and 0 size? Shouldn't it be enforced for anything above -1? That's not bucket quota, that's the user's max_buckets param. When this value is set to '0' it means the user has no limit on the number of the buckets. Sadly due to backward compatibility issue, having 0 mean something else is a bit problematic. We can probably add a new bool param that will specify whether bucket creation is allowed at all. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 411 Content-Length required error
On Wed, Jan 27, 2016 at 3:31 PM, John Hogenmillerwrote: > I did end up switching to civetweb and I also found that rgw content length > compat, which I set to true. I am still getting the 411 Length required > issue. > > I have had more discussions with our testing team, and I am still trying to > ascertain how valid this issue is. > > With AWS Sig v4, you use a different method to do chunked transfers. With > the sigv2, you do it as a "Transfer-Encoding: chunked" (as detailed in my > s3curl example). However, that v2 method may only apply to the > implementation we have have (we have a proprietary implementation of s3 that > I am hoping to replace with Ceph, if I can match our acceptance testing). > > The reason I think that this is a valid issue is because of this commit > > http://tracker.ceph.com/projects/ceph/repository/revisions/14fa77d9277b5ef5d0c6683504b368773b39ccc4 > >> Fixes: #2878 >> We now allow complete multipart upload to use chunked encoding >> when sending request data. With chunked encoding the HTTP_LENGTH >> header is not required. > > > What I would like to see is the test code for this (ideally in a curl or > s3curl format) so that I can compare locally to see if we're saying the same > thing, or if that commit from 3 years ago is still valid. > I don't think it's related. Try bumping up the rgw debug log, (debug rgw = 20), and see what are the http header fields that are being sent for the specific request. It could be that apache is not passing on the Transfer-Encoding header, or does something else to it. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem create user RGW
try running: $ radosgw-admin --name client.rgw.servergw001 metadata list user Yehuda On Wed, Feb 24, 2016 at 8:41 AM, Andrea Annoèwrote: > I don’t see any user create in RGW > > > > sudo radosgw-admin metadata list user > > [ > > ] > > > > > > sudo radosgw-admin user create --uid="user1site1" --display-name="User test > replica site1" --name client.rgw.servergw001 --access-key=user1site1 > --secret=pwd1 > > { > > "user_id": "user1site1", > > "display_name": "User test replica site1", > > "email": "", > > "suspended": 0, > > "max_buckets": 1000, > > "auid": 0, > > "subusers": [], > > "keys": [ > > { > > "user": "user1site1", > > "access_key": "user1site1", > > "secret_key": "pwd1" > > } > > ], > > "swift_keys": [], > > "caps": [], > > "op_mask": "read, write, delete", > > "default_placement": "", > > "placement_tags": [], > > "bucket_quota": { > > "enabled": false, > > "max_size_kb": -1, > > "max_objects": -1 > > }, > > "user_quota": { > > "enabled": false, > > "max_size_kb": -1, > > "max_objects": -1 > > }, > > "temp_url_keys": [] > > } > > > > sudo radosgw-admin metadata list user > > [ > > ] > > > > > > The list of user don’t change… what’s the problem? Command, keyring… ?? > > The command for create user don’t report error if I try to retry more time. > > > > Please help me. > > > > Best regards. > > Andrea > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5
On Wed, Feb 24, 2016 at 5:48 PM, Ben Hineswrote: > Any idea what is going on here? I get these intermittently, especially with > very large file. > > The client is doing RANGE requests on this >51 GB file, incrementally > fetching later chunks. > > 2016-02-24 16:30:59.669561 7fd33b7fe700 1 == starting new request > req=0x7fd32c0879c0 = > 2016-02-24 16:30:59.669675 7fd33b7fe700 2 req 3648804:0.000114::GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for > trans_id = tx00037ad24-0056ce4b43-259914b-default > 2016-02-24 16:30:59.669687 7fd33b7fe700 10 host= > 2016-02-24 16:30:59.669757 7fd33b7fe700 10 > s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg > s->bucket= > 2016-02-24 16:30:59.669767 7fd33b7fe700 2 req 3648804:0.000206:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op > 2016-02-24 16:30:59.669776 7fd33b7fe700 2 req 3648804:0.000215:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing > 2016-02-24 16:30:59.669785 7fd33b7fe700 2 req 3648804:0.000224:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading > permissions > 2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size = > 50346000384 > 2016-02-24 16:30:59.673841 7fd33b7fe700 2 req 3648804:0.004280:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op > 2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get: > name=.users.uid+ : hit > 2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get: > name=.users.uid+ : hit > 2016-02-24 16:30:59.673921 7fd33b7fe700 2 req 3648804:0.004360:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying > op mask > 2016-02-24 16:30:59.673929 7fd33b7fe700 2 req 3648804:0.004369:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying > op permissions > 2016-02-24 16:30:59.673941 7fd33b7fe700 5 Searching permissions for > uid=anonymous mask=49 > 2016-02-24 16:30:59.673944 7fd33b7fe700 5 Permissions for user not found > 2016-02-24 16:30:59.673946 7fd33b7fe700 5 Searching permissions for group=1 > mask=49 > 2016-02-24 16:30:59.673949 7fd33b7fe700 5 Found permission: 1 > 2016-02-24 16:30:59.673951 7fd33b7fe700 5 Searching permissions for group=2 > mask=49 > 2016-02-24 16:30:59.673953 7fd33b7fe700 5 Permissions for group not found > 2016-02-24 16:30:59.673955 7fd33b7fe700 5 Getting permissions id=anonymous > owner= perm=1 > 2016-02-24 16:30:59.673957 7fd33b7fe700 10 uid=anonymous requested perm > (type)=1, policy perm=1, user_perm_mask=15, acl perm=1 > 2016-02-24 16:30:59.673961 7fd33b7fe700 2 req 3648804:0.004400:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying > op params > 2016-02-24 16:30:59.673965 7fd33b7fe700 2 req 3648804:0.004404:s3:GET > //int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing > 2016-02-24 16:30:59.674107 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:30:59.674193 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:30:59.674317 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:30:59.674433 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:31:00.046110 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:31:00.150966 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:31:00.151118 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600 > rule->part_size=52428800 > 2016-02-24 16:31:00.161000 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400 > rule->part_size=52428800 > 2016-02-24 16:31:00.199553 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400 > rule->part_size=52428800 > 2016-02-24 16:31:00.278308 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400 > rule->part_size=52428800 > 2016-02-24 16:31:00.312306 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400 > rule->part_size=52428800 > 2016-02-24 16:31:00.751626 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400 > rule->part_size=52428800 > 2016-02-24 16:31:00.833570 7fd33b7fe700 0 RGWObjManifest::operator++(): > result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400 >
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > I have looked all over and I do not see any explicit mention of > "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor do I > see a timestamp from November 4th although I do see log rotations dating > back to october 15th. I don't think it's possible it wasn't logged so I am > going through the bucket logs from the 'radosgw-admin log show --object' > side and I found the following:: > > 4604932 { > 4604933 "bucket": "noaa-nexrad-l2", > 4604934 "time": "2015-11-04 21:29:27.346509Z", > 4604935 "time_local": "2015-11-04 15:29:27.346509", > 4604936 "remote_addr": "", > 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e", > 4604938 "user": "b05f707271774dbd89674a0736c9406e", > 4604939 "operation": "PUT", I'd expect a multipart upload completion to be done with a POST, not a PUT. > 4604940 "uri": > "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", > 4604941 "http_status": "200", > 4604942 "error_code": "", > 4604943 "bytes_sent": 19, > 4604944 "bytes_received": 0, > 4604945 "object_size": 0, Do you see a zero object_size for other multipart uploads? Yehuda > 4604946 "total_time": 142640400, > 4604947 "user_agent": "Boto\/2.38.0 Python\/2.7.7 > Linux\/2.6.32-573.7.1.el6.x86_64", > 4604948 "referrer": "" > 4604949 } > > Does this help at all. The total time seems exceptionally high. Would it be > possible that there is a timeout issue where the put request started a > multipart upload with the correct header and then timed out but the radosgw > took the data anyway? > > I am surprised the radosgw returned a 200 let alone placed the key in the > bucket listing. > > > That said here is another object (different object) that 404s: > 1650873 { > 1650874 "bucket": "noaa-nexrad-l2", > 1650875 "time": "2015-11-05 04:50:42.606838Z", > 1650876 "time_local": "2015-11-04 22:50:42.606838", > 1650877 "remote_addr": "", > 1650878 "object_owner": "b05f707271774dbd89674a0736c9406e", > 1650879 "user": "b05f707271774dbd89674a0736c9406e", > 1650880 "operation": "PUT", > 1650881 "uri": > "\/noaa-nexrad-l2\/2015\/02\/25\/KVBX\/NWS_NEXRAD_NXL2DP_KVBX_2015022516_20150225165959.tar", > 1650882 "http_status": "200", > 1650883 "error_code": "", > 1650884 "bytes_sent": 19, > 1650885 "bytes_received": 0, > 1650886 "object_size": 0, > 1650887 "total_time": 0, > 1650888 "user_agent": "Boto\/2.38.0 Python\/2.7.7 > Linux\/2.6.32-573.7.1.el6.x86_64", > 1650889 "referrer": "" > 1650890 } > > And this one fails with a 404 as well. Does this help at all? Here is a > successful object (different object) log entry as well just in case:: > > 17462367 { > 17462368 "bucket": "noaa-nexrad-l2", > 17462369 "time": "2015-11-04 21:16:44.148603Z", > 17462370 "time_local": "2015-11-04 15:16:44.148603", > 17462371 "remote_addr": "", > 17462372 "object_owner": "b05f707271774dbd89674a0736c9406e", > 17462373 "user": "b05f707271774dbd89674a0736c9406e", > 17462374 "operation": "PUT", > 17462375 "uri": > "\/noaa-nexrad-l2\/2015\/01\/01\/KAKQ\/NWS_NEXRAD_NXL2DP_KAKQ_2015010108_20150101085959.tar", > 17462376 "http_status": "200", > 17462377 "error_code": "", > 17462378 "bytes_sent": 19, > 17462379 "bytes_received": 0, > 17462380 "object_size": 0, > 17462381 "total_time": 0, >
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
On Thu, Jan 21, 2016 at 4:02 PM, seapasu...@uchicago.eduwrote: > I haven't been able to reproduce the issue on my end but I do not fully > understand how the bug exists or why it is happening. I was finally given > the code they are using to upload the files:: > > http://pastebin.com/N0j86NQJ > > I don't know if this helps at all :-(. the other thing is that I have only > experienced this bug on this 'noaa-nexrad-l2' bucket. The other buckets have > substantially less data and objects though. > > Right now I am trying to trigger this bug using python requests-aws and I > keep getting a 403 while trying to authenticate. I am not a developer by any > means and a piss poor sysadmin haha. My plan is to start a multipart upload > and initiate a put for the first part but hang when placing the data inside. > Then try to complete the multipart upload in another session. The reproduction I had in mind would be something like: init multipart upload upload part then run multiple operations *concurrently* that complete the upload also try to complete and abort concurrently Yehuda > > I guess please stand by while I figure this out :/ Thanks for all of your > help! > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > > > On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote: >> >> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> I have looked all over and I do not see any explicit mention of >>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor do >>> I >>> see a timestamp from November 4th although I do see log rotations dating >>> back to october 15th. I don't think it's possible it wasn't logged so I >>> am >>> going through the bucket logs from the 'radosgw-admin log show --object' >>> side and I found the following:: >>> >>> 4604932 { >>> 4604933 "bucket": "noaa-nexrad-l2", >>> 4604934 "time": "2015-11-04 21:29:27.346509Z", >>> 4604935 "time_local": "2015-11-04 15:29:27.346509", >>> 4604936 "remote_addr": "", >>> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e", >>> 4604938 "user": "b05f707271774dbd89674a0736c9406e", >>> 4604939 "operation": "PUT", >> >> I'd expect a multipart upload completion to be done with a POST, not a >> PUT. > > Indeed it seems really weird. >> >> >>> 4604940 "uri": >>> >>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", >>> 4604941 "http_status": "200", >>> 4604942 "error_code": "", >>> 4604943 "bytes_sent": 19, >>> 4604944 "bytes_received": 0, >>> 4604945 "object_size": 0, >> >> Do you see a zero object_size for other multipart uploads? > > I think so. I still don't know how to tell for certain if a radosgw object > is a multipart object or not. I think all of the objects in noaa-nexrad-l2 > bucket are multipart:: > > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{ > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket": > "noaa-nexrad-l2", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16 > 19:49:30.579738Z", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local": > "2015-10-16 14:49:30.579738", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user": > "b05f707271774dbd89674a0736c9406e", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri": > "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status": "200", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331, > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received": 152, > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "object_size": 0, > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "total_time": 0, > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user_agent": > "Boto\/2.38.0 Python\/2.7.7 Linux\/2.6.32-573.7.1.el6.x86_64", > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "referrer": "" > ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-} > > The objects above (NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar) > pulls down without an issue though. Below is a paste for object > "NWS_NEXRAD_NXL2DP_KVBX_2015022516_20150225165959.tar" which 404's:: > http://pastebin.com/Jtw8z7G4 Sadly the log doesn't provide all the input, but I can guess what the operations were: - POST (init multipart upload) - PUT (upload part) - GET (list parts) - POST (complete multipart) <-- took > 57 seconds to process - POST (complete multipart) - HEAD (stat object) For some reason the complete multipart operation took too long, which I think triggered a client retry (either that, or an abort). Then there were two completions racing (or a complete and abort), which might have caused the issue we're seeing for some reason. E.g., two completions might have ended up with the second completion noticing that it's overwriting an existing object (that we just created), sending the 'old' object to be garbage collected, when that object's tail is actually its own tail. > > I see two posts one recorded a minute before for this object both with 0 > size though. Does this help at all? Yes, very much Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
Keep in mind that if the problem is that the tail is being sent to garbage collection, you'll only see the 404 after a few hours. A shorter way to check it would be by listing the gc entries (with --include-all). Yehuda On Wed, Jan 20, 2016 at 1:52 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > I'm working on getting the code they used and trying different timeouts in > my multipart upload code. Right now I have not created any new 404 keys > though :-( > > > On 1/20/16 3:44 PM, Yehuda Sadeh-Weinraub wrote: >> >> We'll need to confirm that this is the actual issue, and then have it >> fixed. It would be nice to have some kind of a unitest that reproduces >> it. >> >> Yehuda >> >> On Wed, Jan 20, 2016 at 1:34 PM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> So is there any way to prevent this from happening going forward? I mean >>> ideally this should never be possible, right? Even with a complete object >>> that is 0 bytes it should be downloaded as 0 bytes and have a different >>> md5sum and not report as 7mb? >>> >>> >>> >>> On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu >>>> <seapasu...@uchicago.edu> wrote: >>>>> >>>>> >>>>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote: >>>>>> >>>>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu >>>>>> <seapasu...@uchicago.edu> wrote: >>>>>>> >>>>>>> I have looked all over and I do not see any explicit mention of >>>>>>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs >>>>>>> nor >>>>>>> do >>>>>>> I >>>>>>> see a timestamp from November 4th although I do see log rotations >>>>>>> dating >>>>>>> back to october 15th. I don't think it's possible it wasn't logged so >>>>>>> I >>>>>>> am >>>>>>> going through the bucket logs from the 'radosgw-admin log show >>>>>>> --object' >>>>>>> side and I found the following:: >>>>>>> >>>>>>> 4604932 { >>>>>>> 4604933 "bucket": "noaa-nexrad-l2", >>>>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z", >>>>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509", >>>>>>> 4604936 "remote_addr": "", >>>>>>> 4604937 "object_owner": >>>>>>> "b05f707271774dbd89674a0736c9406e", >>>>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e", >>>>>>> 4604939 "operation": "PUT", >>>>>> >>>>>> I'd expect a multipart upload completion to be done with a POST, not a >>>>>> PUT. >>>>> >>>>> Indeed it seems really weird. >>>>>> >>>>>> >>>>>>> 4604940 "uri": >>>>>>> >>>>>>> >>>>>>> >>>>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", >>>>>>> 4604941 "http_status": "200", >>>>>>> 4604942 "error_code": "", >>>>>>> 4604943 "bytes_sent": 19, >>>>>>> 4604944 "bytes_received": 0, >>>>>>> 4604945 "object_size": 0, >>>>>> >>>>>> Do you see a zero object_size for other multipart uploads? >>>>> >>>>> I think so. I still don't know how to tell for certain if a radosgw >>>>> object >>>>> is a multipart object or not. I think all of the objects in >>>>> noaa-nexrad-l2 >>>>> bucket are multipart:: >>>>> >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{ >>>>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket": >>>>> "noaa-nexrad-l
Re: [ceph-users] RGW -- 404 on keys in bucket.list() thousands of multipart ids listed as well.
We'll need to confirm that this is the actual issue, and then have it fixed. It would be nice to have some kind of a unitest that reproduces it. Yehuda On Wed, Jan 20, 2016 at 1:34 PM, seapasu...@uchicago.edu <seapasu...@uchicago.edu> wrote: > So is there any way to prevent this from happening going forward? I mean > ideally this should never be possible, right? Even with a complete object > that is 0 bytes it should be downloaded as 0 bytes and have a different > md5sum and not report as 7mb? > > > > On 1/20/16 1:30 PM, Yehuda Sadeh-Weinraub wrote: >> >> On Wed, Jan 20, 2016 at 10:43 AM, seapasu...@uchicago.edu >> <seapasu...@uchicago.edu> wrote: >>> >>> >>> On 1/19/16 4:00 PM, Yehuda Sadeh-Weinraub wrote: >>>> >>>> On Fri, Jan 15, 2016 at 5:04 PM, seapasu...@uchicago.edu >>>> <seapasu...@uchicago.edu> wrote: >>>>> >>>>> I have looked all over and I do not see any explicit mention of >>>>> "NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959" in the logs nor >>>>> do >>>>> I >>>>> see a timestamp from November 4th although I do see log rotations >>>>> dating >>>>> back to october 15th. I don't think it's possible it wasn't logged so I >>>>> am >>>>> going through the bucket logs from the 'radosgw-admin log show >>>>> --object' >>>>> side and I found the following:: >>>>> >>>>> 4604932 { >>>>> 4604933 "bucket": "noaa-nexrad-l2", >>>>> 4604934 "time": "2015-11-04 21:29:27.346509Z", >>>>> 4604935 "time_local": "2015-11-04 15:29:27.346509", >>>>> 4604936 "remote_addr": "", >>>>> 4604937 "object_owner": "b05f707271774dbd89674a0736c9406e", >>>>> 4604938 "user": "b05f707271774dbd89674a0736c9406e", >>>>> 4604939 "operation": "PUT", >>>> >>>> I'd expect a multipart upload completion to be done with a POST, not a >>>> PUT. >>> >>> Indeed it seems really weird. >>>> >>>> >>>>> 4604940 "uri": >>>>> >>>>> >>>>> "\/noaa-nexrad-l2\/2015\/01\/01\/PAKC\/NWS_NEXRAD_NXL2DP_PAKC_2015010111_20150101115959.tar", >>>>> 4604941 "http_status": "200", >>>>> 4604942 "error_code": "", >>>>> 4604943 "bytes_sent": 19, >>>>> 4604944 "bytes_received": 0, >>>>> 4604945 "object_size": 0, >>>> >>>> Do you see a zero object_size for other multipart uploads? >>> >>> I think so. I still don't know how to tell for certain if a radosgw >>> object >>> is a multipart object or not. I think all of the objects in >>> noaa-nexrad-l2 >>> bucket are multipart:: >>> >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out-{ >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bucket": >>> "noaa-nexrad-l2", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time": "2015-10-16 >>> 19:49:30.579738Z", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "time_local": >>> "2015-10-16 14:49:30.579738", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "remote_addr": "", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "user": >>> "b05f707271774dbd89674a0736c9406e", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out: "operation": "POST", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "uri": >>> >>> "\/noaa-nexrad-l2\/2015\/01\/13\/KGRK\/NWS_NEXRAD_NXL2DP_KGRK_2015011304_20150113045959.tar", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "http_status": >>> "200", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "error_code": "", >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_sent": 331, >>> ./2015-10-16-14-default.384153.1-noaa-nexrad-l2.out- "bytes_received": >>> 152,
Re: [ceph-users] How-to doc: hosting a static website on radosgw
On Tue, Jan 26, 2016 at 2:37 PM, Florian Haaswrote: > On Tue, Jan 26, 2016 at 8:56 PM, Wido den Hollander wrote: >> On 01/26/2016 08:29 PM, Florian Haas wrote: >>> Hi everyone, >>> >>> we recently worked a bit on running a full static website just on >>> radosgw (akin to >>> http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html), >>> and didn't find a good how-to writeup out there. So we did a bit of >>> fiddling with radosgw and HAproxy, and wrote one: >>> https://www.hastexo.com/resources/hints-and-kinks/hosting-website-radosgw/#.VqfGx99vFhG >>> >>> Hopefully some of you find this useful. If you spot errors or >>> omissions, just let us know in the comments at the bottom of the page. >>> Thanks! >>> >> >> Thanks! >> >> Were you aware of this work going on: >> https://github.com/ceph/ceph/tree/wip-static-website >> >> This might be in the RADOS Gateway soon and then you don't need HAProxy >> anymore. > > The moment this lands in a release, we'll be more than happy to ditch > the HAProxy request/response mangling bits. But that WIP branch hasn't > seen commits in 4 months, so we took it as an exercise in coming up Here's a more up-to-date branch: https://github.com/ceph/ceph/tree/wip-rgw-static-website-yehuda We're currently testing it, and the plan is to get it in before jewel. One caveat though, the error page handling still has some issues so at the moment so the feature will be disabled by default for now. Yehuda > with something workable as an interim solution. :) > > Cheers, > Florian > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Idea for speedup RadosGW for buckets with many objects.
On Wed, Feb 17, 2016 at 12:51 PM, Krzysztof Księżykwrote: > Hi, > > I'm experiencing problem with poor performance of RadosGW while operating on > bucket with many object. That's known issue with LevelDB and can be > partially resolved using shrading but I have one more idea. As I see in ceph > osd logs all slow requests are while making call to rgw.bucket_list: > > 2016-02-17 03:17:56.846694 7f5396f63700 0 log_channel(cluster) log [WRN] : > slow request 30.272904 seconds old, received at 2016-02-17 03:17:26.573742: > osd_op(client.12611484.0:15137332 .dir.default.4162.3 [call rgw.bucket_list] > 9.2955279 ack+read+known_if_redirected e3252) currently started > > I don't know exactly how Ceph internally works but maybe data required to > return results for rgw.bucket_list could be cached for some time. Cache TTL > would be parametrized and could be disabled to keep the same behaviour as > current one. There can be 3 cases when there's a call to rgw.bucket_list: > 1. no cached data > 2. up-to-date cache > 3. outdated cache > > Ad 1. First call starts generating full list. All new requests are put on > hold. When list is ready it's saved to cache > Ad 2. All calls are served from cache > Ad 3. First request starts generating full list. All new requests are served > from outdated cache until new cached data is ready > > This can be even optimized by periodically generating fresh cache, even if > it's not expired yet to reduce cases when cache is outdated. Where is the cache going to live in? Note that for it to be on rgw, it will need to be shared among all rgw instances (serving the same zone). On the other hand, I'm not exactly sure how the osd could cache it (there's not mechanism at the moment that would allow that). And the cache itself will need to be part of the osd that serves the specific bucket index, otherwise you'd need to go to multiple osds for that operation, which will slow down things for the general case. Note that we need for things to be durable, otherwise we might end up with inconsistencies when things don't go as expected (e.g., when rgw / osd went down). We did some thinking recently around the bucket index area, see how things can be improved. One way would be (for some use cases) to drop it altogether. This could work in environment where 1. you don't need to list objects in the bucket, and 2. no multi-zone sync. Another possible mechanisms would be to relax the bucket index update, and replace it with some kind of a lazy update (maybe similar to what you suggested), and some way to rebuild the index out of the raw pool data (maybe combining it with rados namespace). > > Maybe this idea is stupid, maybe not, but if it's doable it would be nice to > have choice. Thanks for the suggestions! Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw bucket deletion woes
On Tue, Mar 15, 2016 at 11:36 PM, Pavan Rallabhandiwrote: > Hi, > > I find this to be discussed here before, but couldn¹t find any solution > hence the mail. In RGW, for a bucket holding objects in the range of ~ > millions, one can find it to take for ever to delete the bucket(via > radosgw-admin). I understand the gc(and its parameters) that would reclaim > the space eventually, but am looking more at the bucket deletion options > that can possibly speed up the operation. > > I realize, currently rgw_remove_bucket(), does it 1000 objects at a time, > serially. Wanted to know if there is a reason(that am possibly missing and > discussed) for this to be left that way, otherwise I was considering a > patch to make it happen better. > There is no real reason. You might want to have a version of that command that doesn't schedule the removal to gc, but rather removes all the object parts by itself. Otherwise, you're just going to flood the gc. You'll need to iterate through all the objects, and for each object you'll need to remove all of it's rados objects (starting with the tail, then the head). Removal of each rados object can be done asynchronously, but you'll need to throttle the operations, not send everything to the osds at once (which will be impossible, as the objecter will throttle the requests anyway, which will lead to a high memory consumption). Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
On Thu, Feb 25, 2016 at 7:17 AM, Ritter Sławomirwrote: > Hi, > > > > We have two CEPH clusters running on Dumpling 0.67.11 and some of our > "multipart objects" are incompleted. It seems that some slow requests could > cause corruption of related S3 objects. Moveover GETs for that objects are > working without any error messages. There are only HTTP 200 in logs as well > as no information about problems from popular client tools/libs. > > > > The situation looks very similiar to described in bug #8269, but we are > using fixed 0.67.11 version: http://tracker.ceph.com/issues/8269 > > > > Regards, > > > > Sławomir Ritter > > > > > > > > EXAMPLE#1 > > > > slow_request > > > > 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4 > slow requests, 4 included below; oldest blocked for > 30.727096 secs > > 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN] > slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460: > osd_op(c > > lient.47792965.0:185007087 > default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 > [writef > > ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from > [469,9] > Did these requests ever finish? > > > > > HTTP_500 in apache.log > > == > > 127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT > /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=56 > HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 > Linux/3.13.0-39-generic(syncworker)" > > 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT > /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=57 > HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3 > Linux/3.13.0-39-generic(syncworker)" > > 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT > /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=57 > HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 > Linux/3.13.0-39-generic(syncworker)" > > 127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT > /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z=58 > HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 > Linux/3.13.0-39-generic(syncworker)" > > > > > > Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST > > == > > found > default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2 > 2097152 ok 2097152 10.7acc9476 (10.1476) [278,142,436] > [278,142,436] > > found > default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57 > 0 diff4194304 10.4f5be025 (10.25) [57,310,428] > [57,310,428] > > found > default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1 > 4194304 ok 4194304 10.81191602 (10.1602) [441,109,420] > [441,109,420] > > found > default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 > 2097152 ok 2097152 10.ce729ebe (10.1ebe) [260,469,9] > [260,469,9] > > > > > > "Silent" GETs > > = > > # object size from headers > > $ s3 -u head > video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > Content-Type: binary/octet-stream > > Content-Length: 641775701 > > Server: nginx > > > > # but GETs only 637581397 (641775701 - missing 4194304 = 637581397) > > $ s3 -u get > video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > > /tmp/test > > $ ls -al /tmp/test > > -rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test > > > > # no error in logs > > 127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET > /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux > x86_64)" > > > > # wget - retry for missing part, but there is no missing part, so it GETs > head/tail of the file again > > $ wget > http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > > --2016-02-23 17:10:11-- > http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > > Connecting to 127.0.0.1:88... connected. > > HTTP request sent, awaiting response... 200 OK > > Length: 641775701 (612M) [binary/octet-stream] > > Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' > > > > 99% > [==> > ] 637,581,397 63.9M/s in 9.5s > > > > 2016-02-23 17:10:20
Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests
On Fri, Mar 4, 2016 at 7:26 AM, Ritter Sławomirwrote: >> From: Robin H. Johnson [mailto:robb...@gentoo.org] >> Sent: Friday, March 04, 2016 12:40 AM >> To: Ritter Sławomir >> Cc: ceph-us...@ceph.com; ceph-devel >> Subject: Re: [ceph-users] Problem: silently corrupted RadosGW objects caused >> by slow requests >> >> On Thu, Mar 03, 2016 at 01:55:13PM +0100, Ritter Sławomir wrote: >> > Hi, >> > >> > I think this is really serious problem - again: >> > >> > - we silently lost S3/RGW objects in clusters >> > >> > Moreover, it our situation looks very similiar to described in >> > uncorrected bug #13764 (Hammer) and in corrected #8269 (Dumpling). >> FYI fix in #8269 _is_ present in Hammer: >> commit bd8e026f88b rgw: don't allow multiple writers to same multiobject part >> >> -- >> Robin Hugh Johnson >> Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee >> E-Mail : robb...@gentoo.org >> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > Yes, > > fix for #8269 also has been included in our version: Dumpling 0.67.11. > Guys from #13764 are using patched Hammer version I didn't notice that you were actually running Dumpling (which we haven't supported and backported fixes for a while). Here's one issue that you might have hit: http://tracker.ceph.com/issues/11604 Yehuda > > Both situations with corrupted files are very similiar to that described in > #8269. > There was a problem with 2 threads writing to the same RADOS objects. > > Maybe there is another one uknown and specific exception to fix? > > Cheers, > SR > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can Jewel read Hammer radosgw buckets?
(sorry for resubmission, adding ceph-users) On Mon, Apr 25, 2016 at 9:47 AM, Richard Chanwrote: > Hi Yehuda > > I created a test 3xVM setup with Hammer and one radosgw on the (separate) > admin node; creating one user and buckets. > > I upgraded the VMs to jewel and created a new radosgw on one of the nodes. > > The object store didn't seem to survive the upgrade > > # radosgw-admin user info --uid=testuser > 2016-04-26 00:41:50.713069 7fcdcc6fca40 0 RGWZoneParams::create(): error > creating default zone params: (17) File exists > could not fetch user info: no user info saved > > rados lspools > rbd > .rgw.root > .rgw.control > .rgw > .rgw.gc > .users.uid > .users > .rgw.buckets.index > .rgw.buckets > default.rgw.control > default.rgw.data.root > default.rgw.gc > default.rgw.log > default.rgw.users.uid > default.rgw.users.keys > > Do I have to configure radosgw to use the pools with default.*? No. Need to get it to play along nicely with the old pools. > How do you actually do that? What does 'radosgw-admin zone get' return? Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can Jewel read Hammer radosgw buckets?
I managed to reproduce the issue, and there seem to be multiple problems. Specifically we have an issue when upgrading a default cluster that hasn't had a zone (and region) explicitly configured before. There is another bug that I found (http://tracker.ceph.com/issues/15597) that makes things even a bit more complicated. I created the following script that might be able to fix things for you: https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone For future reference, this script shouldn't be used if there are any zones configured other than the default one. It also makes some ninja patching to the zone config because of a bug that exists currently, but will probably not apply to any next versions. Please let me know if you have any issues, or if this actually does its magic. Thanks, Yehuda On Mon, Apr 25, 2016 at 4:10 PM, Richard Chanwrote: > >> > How do you actually do that? >> >> What does 'radosgw-admin zone get' return? >> >> Yehuda > > > > [root@node1 ceph]# radosgw-admin zone get > unable to initialize zone: (2) No such file or directory > > (I don't have any rgw configuration in /etc/ceph/ceph.conf; this is from a > clean > > ceph-deploy rgw create node1 > > ## user created under Hammer > [root@node1 ceph]# radosgw-admin user info --uid=testuser > 2016-04-26 07:07:06.159497 7f410c33ca40 0 RGWZoneParams::create(): error > creating default zone params: (17) File exists > could not fetch user info: no user info saved > > "rgw_max_chunk_size": "524288", > "rgw_max_put_size": "5368709120", > "rgw_override_bucket_index_max_shards": "0", > "rgw_bucket_index_max_aio": "8", > "rgw_enable_quota_threads": "true", > "rgw_enable_gc_threads": "true", > "rgw_data": "\/var\/lib\/ceph\/radosgw\/ceph-rgw.node1", > "rgw_enable_apis": "s3, s3website, swift, swift_auth, admin", > "rgw_cache_enabled": "true", > "rgw_cache_lru_size": "1", > "rgw_socket_path": "", > "rgw_host": "", > "rgw_port": "", > "rgw_dns_name": "", > "rgw_dns_s3website_name": "", > "rgw_content_length_compat": "false", > "rgw_script_uri": "", > "rgw_request_uri": "", > "rgw_swift_url": "", > "rgw_swift_url_prefix": "swift", > "rgw_swift_auth_url": "", > "rgw_swift_auth_entry": "auth", > "rgw_swift_tenant_name": "", > "rgw_swift_account_in_url": "false", > "rgw_swift_enforce_content_length": "false", > "rgw_keystone_url": "", > "rgw_keystone_admin_token": "", > "rgw_keystone_admin_user": "", > "rgw_keystone_admin_password": "", > "rgw_keystone_admin_tenant": "", > "rgw_keystone_admin_project": "", > "rgw_keystone_admin_domain": "", > "rgw_keystone_api_version": "2", > "rgw_keystone_accepted_roles": "Member, admin", > "rgw_keystone_token_cache_size": "1", > "rgw_keystone_revocation_interval": "900", > "rgw_keystone_verify_ssl": "true", > "rgw_keystone_implicit_tenants": "false", > "rgw_s3_auth_use_rados": "true", > "rgw_s3_auth_use_keystone": "false", > "rgw_ldap_uri": "ldaps:\/\/", > "rgw_ldap_binddn": "uid=admin,cn=users,dc=example,dc=com", > "rgw_ldap_searchdn": "cn=users,cn=accounts,dc=example,dc=com", > "rgw_ldap_dnattr": "uid", > "rgw_ldap_secret": "\/etc\/openldap\/secret", > "rgw_s3_auth_use_ldap": "false", > "rgw_admin_entry": "admin", > "rgw_enforce_swift_acls": "true", > "rgw_swift_token_expiration": "86400", > "rgw_print_continue": "true", > "rgw_remote_addr_param": "REMOTE_ADDR", > "rgw_op_thread_timeout": "600", > "rgw_op_thread_suicide_timeout": "0", > "rgw_thread_pool_size": "100", > "rgw_num_control_oids": "8", > "rgw_num_rados_handles": "1", > "rgw_nfs_lru_lanes": "5", > "rgw_nfs_lru_lane_hiwat": "911", > "rgw_nfs_fhcache_partitions": "3", > "rgw_nfs_fhcache_size": "2017", > "rgw_zone": "", > "rgw_zone_root_pool": ".rgw.root", > "rgw_default_zone_info_oid": "default.zone", > "rgw_region": "", > "rgw_default_region_info_oid": "default.region", > "rgw_zonegroup": "", > "rgw_zonegroup_root_pool": ".rgw.root", > "rgw_default_zonegroup_info_oid": "default.zonegroup", > "rgw_realm": "", > "rgw_realm_root_pool": ".rgw.root", > "rgw_default_realm_info_oid": "default.realm", > "rgw_period_root_pool": ".rgw.root", > "rgw_period_latest_epoch_info_oid": ".latest_epoch", > "rgw_log_nonexistent_bucket": "false", > "rgw_log_object_name": "%Y-%m-%d-%H-%i-%n", > "rgw_log_object_name_utc": "false", > "rgw_usage_max_shards": "32", > "rgw_usage_max_user_shards": "1", > "rgw_enable_ops_log": "false", > "rgw_enable_usage_log": "false", > "rgw_ops_log_rados": "true", > "rgw_ops_log_socket_path": "", > "rgw_ops_log_data_backlog": "5242880", > "rgw_usage_log_flush_threshold": "1024", > "rgw_usage_log_tick_interval": "30", >
Re: [ceph-users] Can Jewel read Hammer radosgw buckets?
On Sat, Apr 23, 2016 at 6:22 AM, Richard Chanwrote: > Hi Cephers, > > I upgraded to Jewel and noted the is massive radosgw multisite rework > in the release notes. > > Can Jewel radosgw be configured to present existing Hammer buckets? > On a test system, jewel didn't recognise my Hammer buckets; > > Hammer used pools .rgw.* > Jewel created by default: .rgw.root and default.rgw* > > > Yes, jewel should be able to read hammer buckets. If it detects that there's an old config, it should migrate existing setup into the new config. It seemsthat something didn't work as expected here. One way to fix it would be to create a new zone and set its pools to point at the old config's pools. We'll need to figure out what went wrong though. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW not start after upgrade to Jewel
On Tue, Apr 26, 2016 at 6:50 AM, Abhishek Lekshmananwrote: > > Ansgar Jazdzewski writes: > >> Hi, >> >> After plaing with the setup i got some output that looks wrong >> >> # radosgw-admin zone get >> >> "placement_pools": [ >> { >> "key": "default-placement", >> "val": { >> "index_pool": ".eu-qa.rgw.buckets.inde", >> "data_pool": ".eu-qa.rgw.buckets.dat", >> "data_extra_pool": ".eu-qa.rgw.buckets.non-e", >> "index_type": 0 >> } >> } >> ], >> >> i think it sould be >> >> index_pool = .eu-qa.rgw.buckets.index. >> data_pool = .eu-qa.rgw.buckets >> data_extra_pool = .eu-qa.rgw.buckets.extra >> >> how can i fix it? > > Not sure how it reached this state, but given a zone get json, you can There's an issue now when doing radosgw-admin zone set, and the pool names start with a period (http://tracker.ceph.com/issues/15597). The pool name is getting truncated by one character. We will have this fixed for the next point release, but the workaround now would be to add an extra character in each pool name before running the zone set command. Yehuda > edit this and set it back using zone set for eg > # radosgw-admin zone get > zone.json # now edit this file > # radosgw-admin zone set --rgw-zone="eu-qa" < zone.json >> >> Thanks >> Ansgar >> >> 2016-04-26 13:07 GMT+02:00 Ansgar Jazdzewski : >>> Hi all, >>> >>> i got an answer, that pointed me to: >>> https://github.com/ceph/ceph/blob/master/doc/radosgw/multisite.rst >>> >>> 2016-04-25 16:02 GMT+02:00 Karol Mroz : On Mon, Apr 25, 2016 at 02:23:28PM +0200, Ansgar Jazdzewski wrote: > Hi, > > we test Jewel in our QA environment (from Infernalis to Hammer) the > upgrade went fine but the Radosgw did not start. > > the error appears also with radosgw-admin > > # radosgw-admin user info --uid="images" --rgw-region=eu --rgw-zone=eu-qa > 2016-04-25 12:13:33.425481 7fc757fad900 0 error in read_id for id : > (2) No such file or directory > 2016-04-25 12:13:33.425494 7fc757fad900 0 failed reading zonegroup > info: ret -2 (2) No such file or directory > couldn't init storage provider > > do i have to change some settings, also for upgrade of the radosgw? Hi, Testing a recent master build (with only default region and zone), I'm able to successfully run the command you specified: % ./radosgw-admin user info --uid="testid" --rgw-region=default --rgw-zone=default ... { "user_id": "testid", "display_name": "M. Tester", ... } Are you certain the region and zone you specified exist? What do the following report: radosgw-admin zone list radosgw-admin region list -- Regards, Karol >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Abhishek Lekshmanan > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nürnberg) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)
On Fri, May 20, 2016 at 9:03 AM, Jonathan D. Proulxwrote: > Hi All, > > I saw the previous thread on this related to > http://tracker.ceph.com/issues/15597 > > and Yehuda's fix script > https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone > > Running this seems to have landed me in a weird state. > > I can create and get new buckets and objects but I've "lost" all my > old buckets. I'm fairly confident the "lost" data is in the > .rgw.buckets pool but my current zone is set to use .rgw.buckets_ > > > > root@ceph-mon0:~# radosgw-admin zone get > { > "id": "default", > "name": "default", > "domain_root": ".rgw_", > "control_pool": ".rgw.control_", > "gc_pool": ".rgw.gc_", > "log_pool": ".log_", > "intent_log_pool": ".intent-log_", > "usage_log_pool": ".usage_", > "user_keys_pool": ".users_", > "user_email_pool": ".users.email_", > "user_swift_pool": ".users.swift_", > "user_uid_pool": ".users.uid_", > "system_key": { > "access_key": "", > "secret_key": "" > }, > "placement_pools": [ > { > "key": "default-placement", > "val": { > "index_pool": ".rgw.buckets.index_", > "data_pool": ".rgw.buckets_", > "data_extra_pool": ".rgw.buckets.extra_", > "index_type": 0 > } > } > ], > "metadata_heap": "default.rgw.meta", > "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be" > } > > > root@ceph-mon0:~# ceph osd pool ls |grep rgw|sort > default.rgw.meta > .rgw > .rgw_ > .rgw.buckets > .rgw.buckets_ > .rgw.buckets.index > .rgw.buckets.index_ > .rgw.control > .rgw.control_ > .rgw.gc > .rgw.gc_ > .rgw.root > .rgw.root.backup > > Should I just adjust the zone to use the pools without trailing > slashes? I'm a bit lost. the last I could see from running the Yes. The trailing slashes were needed when upgrading for 10.2.0, as there was another bug, and I needed to add these to compensate for it. I should update the script now to reflect that fix. You should just update the json and set the zone appropriately. Yehuda > script didn't seem to indicate any errors (though I lost the to to > scroll back buffer before i noticed the issue) > > Tail of output from running script: > https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone > > + radosgw-admin zone set --rgw-zone=default > zone id default{ > "id": "default", > "name": "default", > "domain_root": ".rgw_", > "control_pool": ".rgw.control_", > "gc_pool": ".rgw.gc_", > "log_pool": ".log_", > "intent_log_pool": ".intent-log_", > "usage_log_pool": ".usage_", > "user_keys_pool": ".users_", > "user_email_pool": ".users.email_", > "user_swift_pool": ".users.swift_", > "user_uid_pool": ".users.uid_", > "system_key": { > "access_key": "", > "secret_key": "" > }, > "placement_pools": [ > { > "key": "default-placement", > "val": { > "index_pool": ".rgw.buckets.index_", > "data_pool": ".rgw.buckets_", > "data_extra_pool": ".rgw.buckets.extra_", > "index_type": 0 > } > } > ], > "metadata_heap": "default.rgw.meta", > "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be" > } > + radosgw-admin zonegroup default --rgw-zonegroup=default > + radosgw-admin zone default --rgw-zone=default > root@ceph-mon0:~# radosgw-admin region get --rgw-zonegroup=default > { > "id": "default", > "name": "default", > "api_name": "", > "is_master": "true", > "endpoints": [], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "default", > "zones": [ > { > "id": "default", > "name": "default", > "endpoints": [], > "log_meta": "false", > "log_data": "false", > "bucket_index_max_shards": 0, > "read_only": "false"} > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [] > } > ], > "default_placement": "default-placement", > "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"} > > root@ceph-mon0:~# ceph -v > ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269) > > Thanks, > -Jon > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time
On Thu, May 12, 2016 at 12:29 AM, Saverio Protowrote: >> While I'm usually not fond of blaming the client application, this is >> really the swift command line tool issue. It tries to be smart by >> comparing the md5sum of the object's content with the object's etag, >> and it breaks with multipart objects. Multipart objects is calculated >> differently (md5sum of the md5sum of each part). I think the swift >> tool has a special handling for swift large objects (which are not the >> same as s3 multipart objects), so that's why it works in that specific >> use case. > > Well but I tried also with rclone and I have the same issue. > > Clients I tried > rclone (both SWIFT and S3) > s3cmd (S3) > python-swiftclient (SWIFT). > > I can reproduce the issue with different clients. > Once a multipart object is uploaded via S3 (with rclone or s3cmd) I > cannot read it anymore via SWIFT (either with rclone or > pythonswift-client). > > Are you saying that all SWIFT clients implementations are wrong ? Yes. > > Or should the radosgw be configured with only 1 API active ? > > Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time
While I'm usually not fond of blaming the client application, this is really the swift command line tool issue. It tries to be smart by comparing the md5sum of the object's content with the object's etag, and it breaks with multipart objects. Multipart objects is calculated differently (md5sum of the md5sum of each part). I think the swift tool has a special handling for swift large objects (which are not the same as s3 multipart objects), so that's why it works in that specific use case. Yehuda On Wed, May 11, 2016 at 7:15 AM, Saverio Protowrote: > It does not work also the way around: > > If I upload a file with the swift client with the -S options to force > swift to make multipart: > > swift upload -S 100 multipart 180.mp4 > > Then I am not able to read the file with S3 > > s3cmd get s3://multipart/180.mp4 > download: 's3://multipart/180.mp4' -> './180.mp4' [1 of 1] > download: 's3://multipart/180.mp4' -> './180.mp4' [1 of 1] > 38818503 of 38818503 100% in1s27.32 MB/s done > WARNING: MD5 signatures do not match: > computed=961f154cc78c7bf1be3b4009c29e5a68, > received=d41d8cd98f00b204e9800998ecf8427e > > Saverio > > > 2016-05-11 16:07 GMT+02:00 Saverio Proto : >> Thank you. >> >> It is exactly a problem with multipart. >> >> So I tried two clients (s3cmd and rclone). When you upload a file in >> S3 using multipart, you are not able to read anymore this object with >> the SWIFT API because the md5 check fails. >> >> Saverio >> >> >> >> 2016-05-09 12:00 GMT+02:00 Xusangdi : >>> Hi, >>> >>> I'm not running a cluster as yours, but I don't think the issue is caused >>> by you using 2 APIs at the same time. >>> IIRC the dash thing is append by S3 multipart upload, with a following >>> digit indicating the number of parts. >>> You may want to check this reported in s3cmd community: >>> https://sourceforge.net/p/s3tools/bugs/123/ >>> >>> and some basic info from Amazon: >>> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html >>> >>> Hope this helps :D >>> >>> Regards, >>> ---Sandy >>> -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Saverio Proto Sent: Monday, May 09, 2016 4:42 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time I try to simplify the question to get some feedback. Is anyone running the RadosGW in production with S3 and SWIFT API active at the same time ? thank you ! Saverio 2016-05-06 11:39 GMT+02:00 Saverio Proto : > Hello, > > We have been running the Rados GW with the S3 API and we did not have > problems for more than a year. > > We recently enabled also the SWIFT API for our users. > > radosgw --version > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) > > The idea is that each user of the system is free of choosing the S3 > client or the SWIFT client to access the same container/buckets. > > Please tell us if this is possible by design or if we are doing > something wrong. > > We have now a problem that some files wrote in the past with S3, > cannot be read with the SWIFT API because the md5sum always fails. > > I am able to reproduce the bug in this way: > > We have this file googlebooks-fre-all-2gram-20120701-ts.gz and we know > the correct md5 is 1c8113d2bd21232688221ec74dccff3a You can download > the same file here: > https://www.dropbox.com/s/auq16vdv2maw4p7/googlebooks-fre-all-2gram-20 > 120701-ts.gz?dl=0 > > rclone mkdir lss3:bugreproduce > rclone copy googlebooks-fre-all-2gram-20120701-ts.gz lss3:bugreproduce > > The file is successfully uploaded. > > At this point I can succesfully download again the file: > rclone copy lss3:bugreproduce/googlebooks-fre-all-2gram-20120701-ts.gz > test.gz > > but not with swift: > > swift download googlebooks-ngrams-gz > fre/googlebooks-fre-all-2gram-20120701-ts.gz > Error downloading object > 'googlebooks-ngrams-gz/fre/googlebooks-fre-all-2gram-20120701-ts.gz': > u'Error downloading fre/googlebooks-fre-all-2gram-20120701-ts.gz: > md5sum != etag, 1c8113d2bd21232688221ec74dccff3a != > 1a209a31b4ac3eb923fac5e8d194d9d3-2' > > Also I found strange the dash character '-' at the end of the md5 that > is trying to compare. > > Of course upload a file with the swift client and redownloading the > same file just works. > > Should I open a bug for the radosgw on http://tracker.ceph.com/ ? > > thank you > > Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com