[ceph-users] osd down after server failure
Hi, I had server failure that starts from one disk failure: Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0: [sdaa] Unhandled error code Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0: [sdaa] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023995] sd 4:2:26:0: [sdaa] CDB: Read(10): 28 00 00 00 00 d0 00 00 10 00 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024005] end_request: I/O error, dev sdaa, sector 208 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024744] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.025879] XFS (sdaa): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.820288] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.821194] XFS (sdaa): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Oct 14 03:25:32 s3-10-177-64-6 kernel: [1027264.667851] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 this caused that the server has been unresponsive. After server restart 3 of 26 osd on it are down. In ceph-osd log after debug osd = 10 and restart is: 2013-10-14 06:21:23.141936 7fdeb4872700 -1 osd.47 43203 *** Got signal Terminated *** 2013-10-14 06:21:23.142141 7fdeb4872700 -1 osd.47 43203 pausing thread pools 2013-10-14 06:21:23.142146 7fdeb4872700 -1 osd.47 43203 flushing io 2013-10-14 06:21:25.406187 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and appears to work 2013-10-14 06:21:25.406204 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-10-14 06:21:25.406557 7f02690f9780 0 filestore(/vol0/data/osd.47) mount did NOT detect btrfs 2013-10-14 06:21:25.412617 7f02690f9780 0 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-10-14 06:21:25.412831 7f02690f9780 0 filestore(/vol0/data/osd.47) mount found snaps 2013-10-14 06:21:25.415798 7f02690f9780 0 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-10-14 06:21:26.078377 7f02690f9780 2 osd.47 0 mounting /vol0/data/osd.47 /vol0/data/osd.47/journal 2013-10-14 06:21:26.080872 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and appears to work 2013-10-14 06:21:26.080885 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-10-14 06:21:26.081289 7f02690f9780 0 filestore(/vol0/data/osd.47) mount did NOT detect btrfs 2013-10-14 06:21:26.087524 7f02690f9780 0 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-10-14 06:21:26.087582 7f02690f9780 0 filestore(/vol0/data/osd.47) mount found snaps 2013-10-14 06:21:26.089614 7f02690f9780 0 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-10-14 06:21:26.726676 7f02690f9780 2 osd.47 0 boot 2013-10-14 06:21:26.726773 7f02690f9780 10 osd.47 0 read_superblock sb(16773c25-5054-4451-bf9f-efc1f7f21b89 osd.47 63cf7d70-99cb-0ab1-4006-002f e43203 [41261,43203] lci=[43194,43203]) 2013-10-14 06:21:26.726862 7f02690f9780 10 osd.47 0 add_map_bl 43203 82622 bytes 2013-10-14 06:21:26.727184 7f02690f9780 10 osd.47 43203 load_pgs 2013-10-14 06:21:26.727643 7f02690f9780 10 osd.47 43203 load_pgs ignoring unrecognized meta 2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs 3.df1_TEMP clearing temp osd.47 is still down, I put it out from cluster. 47 1 osd.47 down0 How can I check what is wrong? ceph -v ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c) -- Pozdrawiam Dominik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 2013年10月14日 14:42:23 自动保存草稿
hi all I follow the mail configure the ceph with hadoop (http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/1809). 1. Install additional packages: libcephfs-java libcephfs-jni using the commonds: ./configure --enable-cephfs-java make make install cp /src/java/libcephfs.jar /usr/hadoop/lib/ 2. Download http://ceph.com/download/hadoop-cephfs.jar cp hadoop-cephfs.jar /usr/hadoop/lib 3. Symink JNI library cd /usr/hadoop/lib/native/Linux-amd64-64 ln -s /usr/local/lib/libcephfs_jni.so . 4 vim core-site.xml fs.default.name=ceph://192.168.22.158:6789/ fs.ceph.impl=org.apache.hadoop.fs.ceph.CephFileSystem ceph.conf.file=/etc/ceph/ceph.conf and then # hadoop fs -ls ls: cannot access . :no such file or directory #hadoop dfsadmin -report report:FileSystem ceph://192.168.22.158:6789 is not a distributed file System Usage: java DFSAdmin[-report] thanks pengft ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd down after server failure
Hi I have found somthing. After restart time was wrong on server (+2hours) before ntp has fixed it. I restarted this 3 osd - it not helps. It is possible that ceph banned this osd? Or after start with wrong time osd has broken hi's filestore? -- Regards Dominik 2013/10/14 Dominik Mostowiec dominikmostow...@gmail.com: Hi, I had server failure that starts from one disk failure: Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0: [sdaa] Unhandled error code Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0: [sdaa] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023995] sd 4:2:26:0: [sdaa] CDB: Read(10): 28 00 00 00 00 d0 00 00 10 00 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024005] end_request: I/O error, dev sdaa, sector 208 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024744] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.025879] XFS (sdaa): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.820288] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.821194] XFS (sdaa): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Oct 14 03:25:32 s3-10-177-64-6 kernel: [1027264.667851] XFS (sdaa): metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf count 8192 this caused that the server has been unresponsive. After server restart 3 of 26 osd on it are down. In ceph-osd log after debug osd = 10 and restart is: 2013-10-14 06:21:23.141936 7fdeb4872700 -1 osd.47 43203 *** Got signal Terminated *** 2013-10-14 06:21:23.142141 7fdeb4872700 -1 osd.47 43203 pausing thread pools 2013-10-14 06:21:23.142146 7fdeb4872700 -1 osd.47 43203 flushing io 2013-10-14 06:21:25.406187 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and appears to work 2013-10-14 06:21:25.406204 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-10-14 06:21:25.406557 7f02690f9780 0 filestore(/vol0/data/osd.47) mount did NOT detect btrfs 2013-10-14 06:21:25.412617 7f02690f9780 0 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-10-14 06:21:25.412831 7f02690f9780 0 filestore(/vol0/data/osd.47) mount found snaps 2013-10-14 06:21:25.415798 7f02690f9780 0 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-10-14 06:21:26.078377 7f02690f9780 2 osd.47 0 mounting /vol0/data/osd.47 /vol0/data/osd.47/journal 2013-10-14 06:21:26.080872 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and appears to work 2013-10-14 06:21:26.080885 7f02690f9780 0 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-10-14 06:21:26.081289 7f02690f9780 0 filestore(/vol0/data/osd.47) mount did NOT detect btrfs 2013-10-14 06:21:26.087524 7f02690f9780 0 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-10-14 06:21:26.087582 7f02690f9780 0 filestore(/vol0/data/osd.47) mount found snaps 2013-10-14 06:21:26.089614 7f02690f9780 0 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-10-14 06:21:26.726676 7f02690f9780 2 osd.47 0 boot 2013-10-14 06:21:26.726773 7f02690f9780 10 osd.47 0 read_superblock sb(16773c25-5054-4451-bf9f-efc1f7f21b89 osd.47 63cf7d70-99cb-0ab1-4006-002f e43203 [41261,43203] lci=[43194,43203]) 2013-10-14 06:21:26.726862 7f02690f9780 10 osd.47 0 add_map_bl 43203 82622 bytes 2013-10-14 06:21:26.727184 7f02690f9780 10 osd.47 43203 load_pgs 2013-10-14 06:21:26.727643 7f02690f9780 10 osd.47 43203 load_pgs ignoring unrecognized meta 2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs 3.df1_TEMP clearing temp osd.47 is still down, I put it out from cluster. 47 1 osd.47 down0 How can I check what is wrong? ceph -v ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c) -- Pozdrawiam Dominik -- Pozdrawiam Dominik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-admin doesn't list user anymore
We upgraded from 0.61.8 to 0.67.4. The metadata commands works for the users and the buckets: root@ineri ~$ radosgw-admin metadata list bucket [ a4mesh, 61a75c04-34a5-11e3-9bea-8f8d15b5cf20, 6e22de72-34a5-11e3-afc4-d3f70b676c52, ... root@ineri ~$ radosgw-admin metadata list user [ cloudbroker, a4mesh, valery, ... Cheers, Valery On 11/10/13 18:27 , Yehuda Sadeh wrote: On Fri, Oct 11, 2013 at 7:46 AM, Valery Tschopp valery.tsch...@switch.ch wrote: Hi, Since we upgraded ceph to 0.67.4, the radosgw-admin doesn't list all the users anymore: root@ineri:~# radosgw-admin user info could not fetch user info: no user info saved But it still work for single user: root@ineri:~# radosgw-admin user info --uid=valery { user_id: valery, display_name: Valery Tschopp, email: valery.tsch...@switch.ch, ... The debug log file is too big for the mailing-list, but here it is on pastebin: http://pastebin.com/cFypJ2Qd What version did you upgrade from? You can try using the following: $ radosgw-admin metadata list bucket Thanks, Yehuda -- SWITCH -- Valery Tschopp, Software Engineer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland email: valery.tsch...@switch.ch phone: +41 44 268 1544 smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] using ceph with hadoop
| hi all I follow the mail configure the ceph with hadoop (http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/1809). 1. Install additional packages: libcephfs-java libcephfs-jni using the commonds: ./configure --enable-cephfs-java make make install cp /src/java/libcephfs.jar /usr/hadoop/lib/ 2. Download http://ceph.com/download/hadoop-cephfs.jar cp hadoop-cephfs.jar /usr/hadoop/lib 3. Symink JNI library cd /usr/hadoop/lib/native/Linux-amd64-64 ln -s /usr/local/lib/libcephfs_jni.so . 4 vim core-site.xml fs.default.name=ceph://192.168.22.158:6789/ fs.ceph.impl=org.apache.hadoop.fs.ceph.CephFileSystem ceph.conf.file=/etc/ceph/ceph.conf and then # hadoop fs -ls ls: cannot access . :no such file or directory #hadoop dfsadmin -report report:FileSystem ceph://192.168.22.158:6789 is not a distributed file System Usage: java DFSAdmin[-report] # /usr/hadoop/bin/stop-all.sh # /usr/hadoop/bin/start-all.sh hadoop:Exception in thread IPC Client(47) Connection to 192.168.58.129:6789 from rwt java.lang.RuntimeException:readObject cant find class thanks pengft | | | | | |___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Speed limit on RadosGW?
Hi sorry, i missed this mail. During writes, does the CPU usage on your RadosGW node go way up? No, CPU stay the same very low ( 10%) When upload small files(300KB/file) over RadosGW: - using 1 process: upload bandwidth ~ 3MB/s - using 100 processes: upload bandwidth ~ 15MB/s When upload big files(3GB/file) over RadosGW: - using 1 process: upload bandwidth ~ 70MB/s (Therefore i don't upload big files using multi-processes any more :D) Maybe, RadosGW have a problem when write many smail files. Or it's a problem of CEPH when simultaneously write many smail files into a bucket, that already have millions files? On Wed, Sep 25, 2013 at 7:24 PM, Mark Nelson mark.nel...@inktank.comwrote: On 09/25/2013 02:49 AM, Chu Duc Minh wrote: I have a CEPH cluster with 9 nodes (6 data nodes 3 mon/mds nodes) And i setup 4 separate nodes to test performance of Rados-GW: - 2 node run Rados-GW - 2 node run multi-process put file to [multi] Rados-GW Result: a) When i use 1 RadosGW node 1 upload-node, speed upload = 50MB/s /upload-node, Rados-GW input/output speed = 50MB/s b) When i use 2 RadosGW node 1 upload-node, speed upload = 50MB/s /upload-node; each RadosGW have input/output = 25MB/s == sum input/ouput of 2 Rados-GW = 50MB/s c) When i use 1 RadosGW node 2 upload-node, speed upload = 25MB/s /upload-node == sum output of 2 upload-node = 50MB/s, RadosGW have input/output = 50MB/s d) When i use 2 RadosGW node 2 upload-node, speed upload = 25MB/s /upload-node == sum output of 2 upload-node = 50MB/s; each RadosGW have input/output = 25MB/s == sum input/ouput of 2 Rados-GW = 50MB/s _*Problem*_: i can pass limit 50MB/s when put file over Rados-GW, regardless of the number Rados-GW nodes and upload-nodes. When i use this CEPH cluster over librados (openstack/kvm), i can easily achieve 300MB/s I don't know why performance of RadosGW is so low. What's bottleneck? During writes, does the CPU usage on your RadosGW node go way up? If this is a test cluster, you might want to try the wip-6286 build from our gitbuilder site. There is a fix that depending on the size of your objects, could have a big impact on performance. We're currently investigating some other radosgw performance issues as well, so stay tuned. :) Mark Thank you very much! __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw can still get the object even if this object's physical file is removed on OSDs
Hi ceph-users, I uploaded an object successfully to radosgw with 3 replicas. And I located all the physical paths of 3 replicas on different OSDs. i.e, one of the 3 physical paths is /var/lib/ceph/osd/ceph-2/current/3.5_head/DIR_D/default.4896.65\\u20131014\\u1__head_0646563D__3 Then I manually deleted all the 3 replica files on OSDs, but this object can still get from radosgw with http code 200 even I cleaned all the caches on both radosgw and OSDs by 'echo 3 /proc/sys/vm/drop_caches'. Only after I restarted the 3 OSDs, get request will return 404. What did I miss? Is it not right to clean cache in that way? Thanks. -- Regards, Zhi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using ceph with hadoop error
On Sun, Oct 13, 2013 at 8:28 PM, 鹏 wkp4...@126.com wrote: hi all: Exception in thread main java.lang.NoClassDefFoundError: com/ceph/fs/cephFileAlreadyExisteException at java.lang.class.forName0(Native Method) This looks like a bug, which I'll fixup today. But it shouldn't be related to the problems you are seeing. Caused by : java.lang.classNotFoundException:com.ceph.fs.CephFileAlreadyExistsException at java.net.URLClassLoader$1.run(URLClassLoader.jar:202) at This looks like you don't have the CephFS Java bindings in a place that Hadoop can locate. Typically you can stick the libcephfs-jar file into the lib directory of Hadoop, or add it to your classpath. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] using ceph with hadoop
The error below seems to indicate that Hadoop isn't aware of the `ceph://` file system. You'll need to manually add this to your core-site.xml: * property** namefs.ceph.impl/name** valueorg.apache.hadoop.fs.ceph.CephFileSystem/value** /property* report:FileSystem ceph://192.168.22.158:6789 is not a distributed file System Usage: java DFSAdmin[-report] # /usr/hadoop/bin/stop-all.sh # /usr/hadoop/bin/start-all.sh hadoop:Exception in thread IPC Client(47) Connection to 192.168.58.129:6789 from rwt java.lang.RuntimeException:readObject cant find class thanks pengft ** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 2013年10月14日 14:42:23 自动保存草稿
Do you have the following in your core-site.xml? property namefs.ceph.impl/name valueorg.apache.hadoop.fs.ceph.CephFileSystem/value /property On Sun, Oct 13, 2013 at 11:55 PM, 鹏 wkp4...@126.com wrote: hi all I follow the mail configure the ceph with hadoop (http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/1809). 1. Install additional packages: libcephfs-java libcephfs-jni using the commonds: ./configure --enable-cephfs-java make make install cp /src/java/libcephfs.jar /usr/hadoop/lib/ 2. Download http://ceph.com/download/hadoop-cephfs.jar cp hadoop-cephfs.jar /usr/hadoop/lib 3. Symink JNI library cd /usr/hadoop/lib/native/Linux-amd64-64 ln -s /usr/local/lib/libcephfs_jni.so . 4 vim core-site.xml fs.default.name=ceph://192.168.22.158:6789/ fs.ceph.impl=org.apache.hadoop.fs.ceph.CephFileSystem ceph.conf.file=/etc/ceph/ceph.conf and then # hadoop fs -ls ls: cannot access . :no such file or directory #hadoop dfsadmin -report report:FileSystem ceph://192.168.22.158:6789 is not a distributed file System Usage: java DFSAdmin[-report] thanks pengft ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw can still get the object even if this object's physical file is removed on OSDs
On Mon, Oct 14, 2013 at 4:04 AM, david zhang zhang.david2...@gmail.com wrote: Hi ceph-users, I uploaded an object successfully to radosgw with 3 replicas. And I located all the physical paths of 3 replicas on different OSDs. i.e, one of the 3 physical paths is /var/lib/ceph/osd/ceph-2/current/3.5_head/DIR_D/default.4896.65\\u20131014\\u1__head_0646563D__3 Then I manually deleted all the 3 replica files on OSDs, but this object can still get from radosgw with http code 200 even I cleaned all the caches on both radosgw and OSDs by 'echo 3 /proc/sys/vm/drop_caches'. Only after I restarted the 3 OSDs, get request will return 404. What did I miss? Is it not right to clean cache in that way? I'm not too sure what you're trying to achieve. You should never ever access the osd objects directly like that. The reason you're still able to read the objects is probably because the osd keeps open fds for recently opened files and it still holds a reference to them. If you need to remove objects off the rados backend you should use the rados tool to do that. However, since you created the objects via radosgw, you're going to have some radosgw consistency issues, so in that case the way to go would be by going through radosgw-admin (or through the radosgw RESTful api). Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Speed limit on RadosGW?
I've personally saturated 1Gbps links on multiple radosgw nodes on a large cluster, if I remember correctly, Yehuda has tested it up into the 7Gbps range with 10Gbps gear. Could you describe your clusters hardware and connectivity? On Mon, Oct 14, 2013 at 3:34 AM, Chu Duc Minh chu.ducm...@gmail.com wrote: Hi sorry, i missed this mail. During writes, does the CPU usage on your RadosGW node go way up? No, CPU stay the same very low ( 10%) When upload small files(300KB/file) over RadosGW: - using 1 process: upload bandwidth ~ 3MB/s - using 100 processes: upload bandwidth ~ 15MB/s When upload big files(3GB/file) over RadosGW: - using 1 process: upload bandwidth ~ 70MB/s (Therefore i don't upload big files using multi-processes any more :D) Maybe, RadosGW have a problem when write many smail files. Or it's a problem of CEPH when simultaneously write many smail files into a bucket, that already have millions files? On Wed, Sep 25, 2013 at 7:24 PM, Mark Nelson mark.nel...@inktank.comwrote: On 09/25/2013 02:49 AM, Chu Duc Minh wrote: I have a CEPH cluster with 9 nodes (6 data nodes 3 mon/mds nodes) And i setup 4 separate nodes to test performance of Rados-GW: - 2 node run Rados-GW - 2 node run multi-process put file to [multi] Rados-GW Result: a) When i use 1 RadosGW node 1 upload-node, speed upload = 50MB/s /upload-node, Rados-GW input/output speed = 50MB/s b) When i use 2 RadosGW node 1 upload-node, speed upload = 50MB/s /upload-node; each RadosGW have input/output = 25MB/s == sum input/ouput of 2 Rados-GW = 50MB/s c) When i use 1 RadosGW node 2 upload-node, speed upload = 25MB/s /upload-node == sum output of 2 upload-node = 50MB/s, RadosGW have input/output = 50MB/s d) When i use 2 RadosGW node 2 upload-node, speed upload = 25MB/s /upload-node == sum output of 2 upload-node = 50MB/s; each RadosGW have input/output = 25MB/s == sum input/ouput of 2 Rados-GW = 50MB/s _*Problem*_: i can pass limit 50MB/s when put file over Rados-GW, regardless of the number Rados-GW nodes and upload-nodes. When i use this CEPH cluster over librados (openstack/kvm), i can easily achieve 300MB/s I don't know why performance of RadosGW is so low. What's bottleneck? During writes, does the CPU usage on your RadosGW node go way up? If this is a test cluster, you might want to try the wip-6286 build from our gitbuilder site. There is a fix that depending on the size of your objects, could have a big impact on performance. We're currently investigating some other radosgw performance issues as well, so stay tuned. :) Mark Thank you very much! __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] xfs log device and osd journal specifications in ceph.conf
3 questions: 1. I'd like to use xfs devices with a separate log device in a ceph cluster. What's the best way to do this? Is it possible to specify xfs log devices in the [osd.x] sections of ceph.conf? E.G.: [osd.0] host = delta devs = /dev/sdx osd mkfs options xfs = -d su=131072,sw=8 -i size=1024 -l logdev=/dev/sdq1,su=131072 [osd.1] host = epsilon devs = /dev/sdy osd mkfs options xfs = -d su=131072,sw=8 -i size=1024 -l logdev=/dev/sdq2,su=131072 2.Is this the correct syntax for the line without the log device options? osd mkfs options xfs = -d su=131072,sw=8 -i size=1024 3. For osd journal devices. I assume there's a 1:1 relationship between osds and journal devices. The section in sample.ceph.conf seems to imply a single entry. Should there be an osd journal entry in each [osd.x] section of ceph.conf? [osd] ; This is where the osd expects its data osd data = /data/$name ; Ideally, make the journal a separate disk or partition. ; 1-10GB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/$name/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. osd journal = /data/$name/journal osd journal size = 1000 ; journal size, in megabytes On my cluster (deployed with ceph-deploy) the data is in /var/lib/ceph/osd. Not /data/$name as in the sample file. Directory organization on my cluster: /var/lib/ceph/osd/: ceph-0 ceph-10 ceph-12 ceph-14 ceph-16 ceph-18 ceph-2 ceph-21 ceph-3 ceph-5 ceph-7 ceph-9 ceph-1 ceph-11 ceph-13 ceph-15 ceph-17 ceph-19 ceph-20 ceph-22 ceph-4 ceph-6 ceph-8 /var/lib/ceph/osd/ceph-0: /var/lib/ceph/osd/ceph-1: ls /data ls: cannot access /data: No such file or directory Thanks, Tim ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Production locked: OSDs down
Hi, I have a pretty big problem here... my OSDs are marked down (except one?!) I have ceph ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b). I recently had a full monitors so I had to remove them but it seemed to work. # idweighttype nameup/downreweight -115root default -36datacenter xxx -20host cloud-1 -40host cloud-2 -73host xxx-1 71osd.7down1 81osd.8down1 91osd.9down1 -83host xxx-2 31osd.3down1 41osd.4down1 51osd.5up1 I see this in the logs when I try to restart them : 2013-10-15 06:54:32.651951 7fa5db16b780 1 journal _open /dev/ssd/osd_3_jrn fd 26: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-10-15 06:54:36.321235 7fa5ac741700 0 -- 192.168.242.2:6801/29193 192.168.242.1:6811/12764 pipe(0x7fa588002490 sd=28 :0 s=1 pgs=0 cs=0 l=0).fault with nothing to send, going to standby 2013-10-15 06:54:36.321256 7fa59c2f3700 0 -- 192.168.242.2:6801/29193 192.168.242.1:6801/12362 pipe(0x7fa588001490 sd=27 :0 s=1 pgs=0 cs=0 l=0).fault with nothing to send, going to standby 2013-10-15 06:54:36.321267 7fa5ac13b700 0 -- 192.168.242.2:6801/29193 192.168.242.1:6814/13354 pipe(0x7fa588001970 sd=30 :0 s=1 pgs=0 cs=0 l=0).fault with nothing to send, going to standby Any idea? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] kvm live migrate wil ceph
Hello, I would like to live migrate a VM between two hypervisors. Is it possible to do this with a rbd disk or should the vm disks be created as qcow images on a CephFS/NFS share (is it possible to do clvm over rbds? OR GlusterFS over rbds?)and point kvm at the network directory. As I understand it, rbds aren't cluster aware so you can't mount an rbd on multiple hosts at once, but maybe libvirt has a way to handle the transfer...? I like the idea of master or golden images where guests write any changes to a new image, I don't think rbds are able to handle copy-on-write in the same way kvm does so maybe a clustered filesystem approach is the ideal way to go. Thanks for your input. I think I'm just missing some piece. .. I just don't grok... Bestv Regards, Jon A ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Full OSD with 29% free
How fragmented is that file system? Sent from my iPad On Oct 14, 2013, at 5:44 PM, Bryan Stillwell bstillw...@photobucket.com wrote: This appears to be more of an XFS issue than a ceph issue, but I've run into a problem where some of my OSDs failed because the filesystem was reported as full even though there was 29% free: [root@den2ceph001 ceph-1]# touch blah touch: cannot touch `blah': No space left on device [root@den2ceph001 ceph-1]# df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc1486562672 342139340 144423332 71% /var/lib/ceph/osd/ceph-1 [root@den2ceph001 ceph-1]# df -i . FilesystemInodes IUsed IFree IUse% Mounted on /dev/sdc160849984 4097408 567525767% /var/lib/ceph/osd/ceph-1 [root@den2ceph001 ceph-1]# I've tried remounting the filesystem with the inode64 option like a few people recommended, but that didn't help (probably because it doesn't appear to be running out of inodes). This happened while I was on vacation and I'm pretty sure it was caused by another OSD failing on the same node. I've been able to recover from the situation by bringing the failed OSD back online, but it's only a matter of time until I'll be running into this issue again since my cluster is still being populated. Any ideas on things I can try the next time this happens? Thanks, Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kvm live migrate wil ceph
I live migrate all the time using the rbd driver in qemu, no problems. Qemu will issue a flush as part of the migration so everything is consistent. It's the right way to use ceph to back vm's. I would strongly recommend against a network file system approach. You may want to look into format 2 rbd images, the cloning and writable snapshots may be what you are looking for. Sent from my iPad On Oct 14, 2013, at 5:37 AM, Jon three1...@gmail.com wrote: Hello, I would like to live migrate a VM between two hypervisors. Is it possible to do this with a rbd disk or should the vm disks be created as qcow images on a CephFS/NFS share (is it possible to do clvm over rbds? OR GlusterFS over rbds?)and point kvm at the network directory. As I understand it, rbds aren't cluster aware so you can't mount an rbd on multiple hosts at once, but maybe libvirt has a way to handle the transfer...? I like the idea of master or golden images where guests write any changes to a new image, I don't think rbds are able to handle copy-on-write in the same way kvm does so maybe a clustered filesystem approach is the ideal way to go. Thanks for your input. I think I'm just missing some piece. .. I just don't grok... Bestv Regards, Jon A ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Full OSD with 29% free
The filesystem isn't as full now, but the fragmentation is pretty low: [root@den2ceph001 ~]# df /dev/sdc1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc1486562672 270845628 215717044 56% /var/lib/ceph/osd/ceph-1 [root@den2ceph001 ~]# xfs_db -c frag -r /dev/sdc1 actual 3481543, ideal 3447443, fragmentation factor 0.98% Bryan On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe j.michael.l...@gmail.com wrote: How fragmented is that file system? Sent from my iPad On Oct 14, 2013, at 5:44 PM, Bryan Stillwell bstillw...@photobucket.com wrote: This appears to be more of an XFS issue than a ceph issue, but I've run into a problem where some of my OSDs failed because the filesystem was reported as full even though there was 29% free: [root@den2ceph001 ceph-1]# touch blah touch: cannot touch `blah': No space left on device [root@den2ceph001 ceph-1]# df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc1486562672 342139340 144423332 71% /var/lib/ceph/osd/ceph-1 [root@den2ceph001 ceph-1]# df -i . FilesystemInodes IUsed IFree IUse% Mounted on /dev/sdc160849984 4097408 567525767% /var/lib/ceph/osd/ceph-1 [root@den2ceph001 ceph-1]# I've tried remounting the filesystem with the inode64 option like a few people recommended, but that didn't help (probably because it doesn't appear to be running out of inodes). This happened while I was on vacation and I'm pretty sure it was caused by another OSD failing on the same node. I've been able to recover from the situation by bringing the failed OSD back online, but it's only a matter of time until I'll be running into this issue again since my cluster is still being populated. Any ideas on things I can try the next time this happens? Thanks, Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Using Hadoop With Cephfs
Hi, I have a 4-node Ceph cluster(2 mon, 1 mds, 2 osd) and a Hadoop node. Currently, I'm trying to replace HDFS with CephFS. I followed the instructions in USING HADOOP WITH CEPHFS. But every time I run bin/start-all.sh to run Hadoop, it failed with: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-namenode-ceph-srv1.out localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-datanode-ceph-srv1.out localhost: Exception in thread IPC Client (47) connection to /172.29.84.56:6789 from hduser java.lang.RuntimeException: readObject can't find class localhost: at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:185) localhost: at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) localhost: at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:851) localhost: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) localhost: Caused by: java.lang.ClassNotFoundException: localhost: at java.lang.Class.forName0(Native Method) localhost: at java.lang.Class.forName(Class.java:249) localhost: at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:802) localhost: at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:183) My core-site.xml: configuration property nameceph.conf.file/name value/etc/ceph/ceph.conf/value /property property namefs.default.name/name valueceph://172.29.84.56:6789//value /property property nameceph.mon.address/name value172.29.84.56:6789/value /property property nameceph.auth.keyring/name value/etc/ceph/ceph.client.admin.keyring/value /property property nameceph.data.pools/name valuehadoop1/value /property /configuration Here is my ceph -s output: ceph -s cluster 942afa43-9a92-434b-9dfa-e893d4e5d565 health HEALTH_WARN 16 pgs degraded; 16 pgs stuck unclean; recovery 505/1719 degraded (29.378%); clock skew detected on mon.ceph-srv3 monmap e1: 2 mons at {ceph-srv2=172.29.84.56:6789/0,ceph-srv3=172.29.84.57:6789/0}, election epoch 12, quorum 0,1 ceph-srv2,ceph-srv3 osdmap e52: 2 osds: 2 up, 2 in pgmap v33521: 372 pgs: 356 active+clean, 16 active+degraded; 4097 MB data, 8277 MB used, 1384 GB / 1392 GB avail; 505/1719 degraded (29.378%) mdsmap e17: 1/1/1 up {0=ceph-srv2=up:active} Can anyone show me how to use hadoop with cephfs correctly? Thanks, Kai___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Hadoop With Cephfs
Hi Kai, It doesn't look like there is anything Ceph specific in the Java backtrace you posted. Does you installation work with HDFS? Are there any logs where an error is occurring with the Ceph plugin? Thanks, Noah On Mon, Oct 14, 2013 at 4:34 PM, log1024 log1...@yeah.net wrote: Hi, I have a 4-node Ceph cluster(2 mon, 1 mds, 2 osd) and a Hadoop node. Currently, I'm trying to replace HDFS with CephFS. I followed the instructions in USING HADOOP WITH CEPHFS. But every time I run bin/start-all.sh to run Hadoop, it failed with: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-namenode-ceph-srv1.out localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hduser-datanode-ceph-srv1.out localhost: Exception in thread IPC Client (47) connection to /172.29.84.56:6789 from hduser java.lang.RuntimeException: readObject can't find class localhost: at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:185) localhost: at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) localhost: at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:851) localhost: at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) localhost: Caused by: java.lang.ClassNotFoundException: localhost: at java.lang.Class.forName0(Native Method) localhost: at java.lang.Class.forName(Class.java:249) localhost: at org.apache.hadoop.conf.Configu ration.getClassByName(Configuration.java:802) localhost: at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:183) My core-site.xml: configuration property nameceph.conf.file/name value/etc/ceph/ceph.conf/value /property property namefs.default.name/name nbsp;valueceph://172.29.84.56:6789//value /property property nameceph.mon.address/name value172.29.84.56:6789/value /property property nameceph.auth.keyring/name value/etc/ceph/ceph.client.admin.keyring/value /property property nameceph.data.pools/name valuehadoop1/value /property /configuration Here is my ceph -s output: ceph -s cluster 942afa43-9a92-434b-9dfa-e893d4e5d565 health HEALTH_WARN 16 pgs degraded; 16 pgs stuck unclean; recovery 505/1719 degraded (29.378%); clock skew detected on mon.ceph-srv3 monmap e1: 2 mons at {ceph-srv2=172.29.84.56:6789/0,ceph-srv3=172.29.84.57:6789/0}, election epoch 12, quorum 0,1 ceph-srv2,ceph-srv3 osdmap e52: 2 osds: 2 up, 2 in pgmap v33521: 372 pgs: 356 active+clean, 16 active+degraded; 4097 MB data, 8277 MB used, 1384 GB / 1392 GB avail; 505/1719 degraded (29.378%) mdsmap e17: 1/1/1 up {0=ceph-srv2=up:active} Can anyone show me how to use hadoop with cephfs correctly? Thanks, Kai ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu-kvm with rbd mem slow leak
On 10/13/2013 07:43 PM, alan.zhang wrote: CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz *2 MEM: 32GB KVM: qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64 Host: CentOS 6.4, kernel 2.6.32-358.14.1.el6.x86_64 Guest: CentOS 6.4, kernel 2.6.32-279.14.1.el6.x86_64 Ceph: ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) Opennebula: 4.2 top -M info: top - 10:35:31 up 7 days, 9:19, 1 user, load average: 0.85, 1.63, 1.40 Tasks: 454 total, 2 running, 452 sleeping, 0 stopped, 0 zombie Cpu(s): 8.5%us, 6.6%sy, 0.0%ni, 84.2%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 32865800k total, 32191072k used, 674728k free,59984k buffers Swap: 10485752k total, 10134076k used, 351676k free, 3474176k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 20135 oneadmin 20 0 6381m 3.4g 9120 S 2.3 10.8 104:00.48 qemu-kvm 29171 oneadmin 20 0 6452m 3.2g 9072 S 2.0 10.2 168:02.06 qemu-kvm 8857 oneadmin 20 0 6338m 2.9g 4504 S 2.3 9.3 289:14.48 qemu-kvm 12283 oneadmin 20 0 6591m 2.9g 4464 S 1.3 9.2 268:57.30 qemu-kvm 6612 oneadmin 20 0 5050m 2.0g 4472 S 12.9 6.3 191:23.51 qemu-kvm 12006 oneadmin 20 0 5532m 1.9g 4468 S 4.3 6.1 236:43.50 qemu-kvm 7216 oneadmin 20 0 3600m 1.9g 4680 S 1.3 6.1 159:40.53 qemu-kvm 10602 oneadmin 20 0 5333m 1.6g 4636 S 1.3 5.1 208:54.52 qemu-kvm 13162 oneadmin 20 0 3400m 989m 4528 S 50.3 3.1 4151:19 qemu-kvm 5273 oneadmin 20 0 5168m 842m 4464 S 5.3 2.6 468:20.65 qemu-kvm 6287 oneadmin 20 0 3150m 761m 4472 S 37.4 2.4 150:32.89 qemu-kvm 6081 root 20 0 1732m 504m 5744 S 6.3 1.6 243:17.00 ceph-osd 11729 oneadmin 20 0 3541m 498m 4468 S 0.7 1.6 66:48.52 qemu-kvm 12503 oneadmin 20 0 3832m 428m 9336 S 0.3 1.3 19:58.78 qemu-kvm such as 20135 process command line: ps -ef | grep 20135 oneadmin 20135 1 2 Oct11 ?01:44:01 /usr/libexec/qemu-kvm -name one-18 -S -M rhel6.4.0 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid c40fe8a4-f4fa-9e02-cf2d-6eaaf5062440 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-18.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:one/one-0-18-0:auth_supported=none,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=rbd:one/one-2:auth_supported=none,if=none,id=drive-virtio-disk1,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/var/lib/one/datastores/0/18/disk.1,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:c0:a8:0a:3b,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:18 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 I have only give it 2GB,but as you see, VIRT/RES (6381m/3.4g). Does the resident memory continue increasing, or does it stay constant? How does this compare with using only local files instead of rbd with that qemu package? I think it must be mem leak. could any one give me a hand? If you do observe continued increasing memory usage with rbd, but not with local files, gathering some heap snaphshots via massif would help figure out what's leaking. (http://tracker.ceph.com/issues/6494 is a good example of getting massif output). Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-admin doesn't list user anymore
root@ineri:~# radosgw-admin user info could not fetch user info: no user info saved Hi Valery, You need to use radosgw-admin metadata list user Thanks, derek -- --- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com