Call for jenkins slaves to improve multi operating system support

2015-04-08 Thread Loic Dachary
Hi Ceph,

When a contribution is proposed to Ceph [1], a bot compiles and run tests with 
it to provide feedback to the developer [2]. When something goes wrong the 
failure can be repeated on the developer machine [3] for debug. This also helps 
the reviewer who knows the code compiles and does not break anything that would 
be detected by make check.

The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older 
operating systems (headers, compiler version, etc.) may be detected later, when 
building packages [4] and after the pull request has been merged in master. 
This is rare but requires extra attention from the reviewer and needs to be 
dealt with urgently when it happens.

If you can spare a machine to help expand the operating systems on which tests 
can run, it would be a great help. The minimum hardware configuration to run a 
slave is:

*  x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian GNU/Linux 
Jessie, Ubuntu 14.02

  32 GB RAM
  200 GB SSD
  8 core  2.5Ghz

*  i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux 
Jessie, Ubuntu 14.04, Ubuntu 14.02

  4 GB RAM
  200 GB disk
  2 core

*  armv7, armv8 architecture for Ubuntu 14.04

  4 GB RAM
  200 GB disk
  2 core 

Note that since the make check bot can run in a docker container, x86_64 
machines can be used to run any of the operating systems for which a docker 
file has been prepared [5].

Cheers

[1] pull requests https://github.com/ceph/ceph/pulls
[2] make check bot feedback 
https://github.com/ceph/ceph/pull/4296#issuecomment-90812064
[3] run-make-check.sh 
https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44
[4] gitbuilder http://ceph.com/gitbuilder.cgi
[5] https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Initial newstore vs filestore results

2015-04-08 Thread Haomai Wang
On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote:
 On Tue, 7 Apr 2015, Mark Nelson wrote:
 On 04/07/2015 02:16 PM, Mark Nelson wrote:
  On 04/07/2015 09:57 AM, Mark Nelson wrote:
   Hi Guys,
  
   I ran some quick tests on Sage's newstore branch.  So far given that
   this is a prototype, things are looking pretty good imho.  The 4MB
   object rados bench read/write and small read performance looks
   especially good.  Keep in mind that this is not using the SSD journals
   in any way, so 640MB/s sequential writes is actually really good
   compared to filestore without SSD journals.
  
   small write performance appears to be fairly bad, especially in the RBD
   case where it's small writes to larger objects.  I'm going to sit down
   and see if I can figure out what's going on.  It's bad enough that I
   suspect there's just something odd going on.
  
   Mark
 
  Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
  interested:
 
  http://nhm.ceph.com/newstore/
 
  Interestingly small object write/read performance with 4 OSDs was about
  1/3-1/4 the speed of the same cluster with 36 OSDs.
 
  Note: Thanks Dan for fixing the directory column width!
 
  Mark

 New fio/librbd results using Sage's latest code that attempts to keep small
 overwrite extents in the db.  This is 4 OSD so not directly comparable to the
 36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:

   write   readrandw   randr
 4MB   57.9319.6   55.2285.9
 128KB 2.5 230.6   2.4 125.4
 4KB   0.4655.65   1.113.56

 What would be very interesting would be to see the 4KB performance
 with the defaults (newstore overlay max = 32) vs overlays disabled
 (newstore overlay max = 0) and see if/how much it is helping.

 The latest branch also has open-by-handle.  It's on by default (newstore
 open by handle = true).  I think for most workloads it won't be very
 noticeable... I think there are two questions we need to answer though:

 1) Does it have any impact on a creation workload (say, 4kb objects).  It
 shouldn't, but we should confirm.

 2) Does it impact small object random reads with a cold cache.  I think to
 see the effect we'll probably need to pile a ton of objects into the
 store, drop caches, and then do random reads.  In the best case the
 effect will be small, but hopefully noticeable: we should go from
 a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
 read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
 I'm not really sure what XFS is doing under the covers here..

WOW, it's really a cool implementation beyond my original mind
according to blueprint. Handler, overlay_map and data_map looks so
flexible and make small io cheaper in theory. Now we only have 1
element in data_map and I'm not sure your goal about the future's
usage. Although I have a unclearly idea that it could enhance the role
of NewStore and make local filesystem just as a block space allocator.
Let NewStore own a variable of FTL(File Translation Layer), so many
cool features could be added. What's your idea about data_map?

My concern currently still is WAL after fsync and kv commiting, maybe
fsync process is just fine because mostly we won't meet this case in
rbd. But submit sync kv transaction isn't a low latency job I think,
maybe we could let WAL parallel with kv commiting?(yes, I really
concern the latency of one op :-) )

Then from the actual rados write op, it will add setattr and
omap_setkeys ops. Current NewStore looks plays badly for setattr. It
always encode all xattrs(and other not so tiny fields) and write again
(Is this true?) though it could batch multi transaction's onode write
in short time.

NewStore also employ much more workload to KeyValueDB compared to
FileStore, so maybe we need to consider the rich workload again
compared before. FileStore only use leveldb just for write workload
mainly so leveldb could fit into greatly, but currently overlay
keys(read) and onode(read) will occur a main latency source in normal
IO I think. The default kvdb like leveldb and rocksdb both plays not
well for random read workload, it maybe will be problem. Looking for
another kv db maybe a choice.

And it still doesn't add journal codes for wal?

Anyway, NewStore should cover more workloads compared to FileStore. Good job!


 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph-deploy with monitor port number?

2015-04-08 Thread Amon Ott
Hello all,

AFAICS, ceph-deploy does not yet support specifying the monitor port
number for the new command. It would be great if new could take
another parameter like

ceph-deploy --cluster mycluster --monport 6790 node1 node2 node3 ...

Our current workaround is to add the port number in the .conf file
mon_host line by hand after new and before mon create-initial.
With that change, all commands work as expected.

Thanks,

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH   Tel: +49 30 24342334
Werner-Voß-Damm 62   Fax: +49 30 99296856
12101 Berlin http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


init script bug with multiple clusters

2015-04-08 Thread Amon Ott
Hello Ceph!

The Ceph init script (src/init-ceph.in) creates pid files without
cluster names. This means that only one cluster can run at a time. The
solution is simple and works fine here, patch against 0.94 is attached.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH   Tel: +49 30 24342334
Werner-Voß-Damm 62   Fax: +49 30 99296856
12101 Berlin http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649

--- ceph-0.93/src/init-ceph.in	2015-02-27 19:47:15.0 +0100
+++ ceph-0.93/src/init-ceph.in.mp	2015-04-07 13:29:47.127067864 +0200
@@ -227,7 +237,7 @@
 
 get_conf run_dir /var/run/ceph run dir
 
-get_conf pid_file $run_dir/$type.$id.pid pid file
+get_conf pid_file $run_dir/$cluster-$type.$id.pid pid file
 
 if [ $command = start ]; then
 	if [ -n $pid_file ]; then


Re: [ceph-users] OSD auto-mount after server reboot

2015-04-08 Thread Loic Dachary


On 08/04/2015 02:16, shiva rkreddy wrote:
 We didn't do a upgrade from 0.80.7 to 0.80.9. It was a fresh install with 
 ceph 0.80.9 with ceph-deploy version 1.15.12.
 
 May be  should have used ceph-deploy version 1.15.16  
 https://github.com/ceph/ceph-deploy/pull/240/files

Ok, mystery solved then, right ?
 
 
 On Tue, Apr 7, 2015 at 4:31 PM, Loic Dachary l...@dachary.org 
 mailto:l...@dachary.org wrote:
 
 
 
 On 07/04/2015 23:03, shiva rkreddy wrote:
  It turns out that ceph service is set ceph to off. Following defect 
 talks about it..Once its set to on, everything worked fine after host reboot.
 
  http://tracker.ceph.com/issues/9090
 
 Interesting :-) Does that mean that the ceph service was somehow turned 
 off when upgrading from v0.80.7 to v0.80.9 ?
 
 
  On Tue, Apr 7, 2015 at 2:20 PM, Loic Dachary l...@dachary.org 
 mailto:l...@dachary.org mailto:l...@dachary.org 
 mailto:l...@dachary.org wrote:
 
  [cc'ing ceph-devel for archival purposes]
 
  Hi,
 
  On 07/04/2015 19:55, shiva rkreddy wrote:
   Hi Loic,
   I've looked at the way our cluster provisioning partition and 
 label the journal device. Its done using /*parted*/ command with gpt label. 
  Following is the parted output for /dev/sdr that we were looking yesterday.
  
   # parted /dev/sdr
   GNU Parted 2.1
   Using /dev/sdr
   Welcome to GNU Parted! Type 'help' to view a list of commands.
   (parted) p
   Model: ASR7160 JBOD-R (scsi)
   Disk /dev/sdr: 960GB
   Sector size (logical/physical): 512B/512B
   Partition Table: *gpt*
  
   Number  Start   EndSize   File system  Name Flags
1  9599MB  240GB  230GB   primary
2  250GB   480GB  230GB   primary
3  490GB   720GB  230GB   primary
4  730GB   960GB  230GB   primary
  
   (parted)
  
   *# sgdisk --info 1 /dev/sdr*
   Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 
 (Microsoft basic data)
   Partition unique GUID: 0A45987B-DC77-4816-B506-DEE0DC60AE9E
   First sector: 18747392 (at 8.9 GiB)
   Last sector: 468707327 (at 223.5 GiB)
   Partition size: 449959936 sectors (214.6 GiB)
   Attribute flags: 
   Partition name: 'primary'
  
  
   We stared to put label on the disk based on a blog at 
 http://blog.zhaw.ch/icclab/deploy-ceph-and-start-using-it-end-to-end-tutorial-installation-part-13/
  
   I'm not sure why sgdisk is getting  Microsoft basics data.
 
  It's not a big deal as long as it's what ceph-disk expects:
 
  https://github.com/ceph/ceph/blob/firefly/src/ceph-disk#L76
 
  The more interesting question would be to figure out if 
 ceph-disk-udev is called from
 
  https://github.com/ceph/ceph/blob/firefly/udev/95-ceph-osd-alt.rules
 
  We have tried udevadm trigger --sysname-match=sd* yesterday and 
 verified it is called when a udev event is sent by the kernel. And you have 
 added that to /etc/rc.local to make sure it is called at least once at boot 
 time. Would you have time to experiment as described at 
 https://wiki.ubuntu.com/DebuggingUdev to get more information about what 
 happens during the boot phase of your machine ?
 
  Cheers
 
  
   Thanks,
   Shiva
  
   On Mon, Apr 6, 2015 at 2:29 PM, shiva rkreddy 
 shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com 
 mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com 
 mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com 
 mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com wrote:
  
   Hi,
  
   I'm on IRC as *shivark*
  
   # sgdisk --info 1 /dev/sdb
   Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D 
 (Unknown)
   Partition unique GUID: 1618223E-B8C9-4C4A-B5D2-EBFF6D64CB12
   First sector: 2048 (at 1024.0 KiB)
   Last sector: 7811870686 (at 3.6 TiB)
   Partition size: 7811868639 sectors (3.6 TiB)
   Attribute flags: 
   Partition name: 'ceph data'
   #
  
  
   On Mon, Apr 6, 2015 at 2:09 PM, Loic Dachary 
 l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org 
 mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org 
 mailto:l...@dachary.org mailto:l...@dachary.org wrote:
  
   Hi,
  
   lrwxrwxrwx 1 root root 10 Apr  4 16:27 
 1618223e-b8c9-4c4a-b5d2-ebff6d64cb12 - ../../sdb1
  
   looks encouraging. Could you
  
  

Re: Preliminary RDMA vs TCP numbers

2015-04-08 Thread Andrey Korolyov
On Wed, Apr 8, 2015 at 11:17 AM, Somnath Roy somnath@sandisk.com wrote:

 Hi,
 Please find the preliminary performance numbers of TCP Vs RDMA (XIO) 
 implementation (on top of SSDs) in the following link.

 http://www.slideshare.net/somnathroy7568/ceph-on-rdma

 The attachment didn't go through it seems, so, I had to use slideshare.

 Mark,
 If we have time, I can present it in tomorrow's performance meeting.

 Thanks  Regards
 Somnath


Those numbers are really impressive (for small numbers at least)! What
are TCP settings you using?For example, difference can be lowered on
scale due to less intensive per-connection acceleration on CUBIC on a
larger number of nodes, though I do not believe that it was a main
reason for an observed TCP catchup on a relatively flat workload such
as fio generates.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy

Hi,
Please find the preliminary performance numbers of TCP Vs RDMA (XIO) 
implementation (on top of SSDs) in the following link.

http://www.slideshare.net/somnathroy7568/ceph-on-rdma

The attachment didn't go through it seems, so, I had to use slideshare.

Mark,
If we have time, I can present it in tomorrow's performance meeting.

Thanks  Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Backporting to Firefly

2015-04-08 Thread Loic Dachary
Hi,

I see you have been busy backporting issues to Firefly today, this is great :-) 

https://github.com/ceph/ceph/pulls/xinxinsh

It would be helpful if you could update the pull requests (and the 
corresponding issues) as explained at 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_backport_commits. 

Once it's done I propose we move to the next step, as explained at 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO: merging your pull 
requests in the integration branch ( step 5 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_populate_the_integration_branch
 ) and running tests on them ( step 6 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests).

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


RE: Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
I used the default TCP setting in Ubuntu 14.04.

-Original Message-
From: Andrey Korolyov [mailto:and...@xdel.ru]
Sent: Wednesday, April 08, 2015 1:28 AM
To: Somnath Roy
Cc: ceph-us...@lists.ceph.com; ceph-devel
Subject: Re: Preliminary RDMA vs TCP numbers

On Wed, Apr 8, 2015 at 11:17 AM, Somnath Roy somnath@sandisk.com wrote:

 Hi,
 Please find the preliminary performance numbers of TCP Vs RDMA (XIO) 
 implementation (on top of SSDs) in the following link.

 http://www.slideshare.net/somnathroy7568/ceph-on-rdma

 The attachment didn't go through it seems, so, I had to use slideshare.

 Mark,
 If we have time, I can present it in tomorrow's performance meeting.

 Thanks  Regards
 Somnath


Those numbers are really impressive (for small numbers at least)! What are TCP 
settings you using?For example, difference can be lowered on scale due to less 
intensive per-connection acceleration on CUBIC on a larger number of nodes, 
though I do not believe that it was a main reason for an observed TCP catchup 
on a relatively flat workload such as fio generates.



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

[ANN] ceph-deploy 1.5.23 released

2015-04-08 Thread Travis Rhoden
Hi All,

This is a new release of ceph-deploy that includes a new feature for
Hammer and bugfixes.  ceph-deploy can be installed from the ceph.com
hosted repos for Firefly, Giant, Hammer, or testing, and is also
available on PyPI.

ceph-deploy now defaults to installing the Hammer release. If you need
to install a different release, use the --release flag.

To go along with the Hammer release, ceph-deploy now includes support
for a drastically simplified deployment for RGW.  See further details
at [1] and [2].

This release also fixes an issue where keyrings pushed to remote nodes
ended up with world-readable permissions.

The full changelog can be seen at [3].

Please update!

Cheers,

 - Travis

[1] http://ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance
[2] http://ceph.com/ceph-deploy/docs/rgw.html
[3] http://ceph.com/ceph-deploy/docs/changelog.html#id2
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Initial newstore vs filestore results

2015-04-08 Thread Gregory Farnum
On Wed, Apr 8, 2015 at 9:49 AM, Sage Weil s...@newdream.net wrote:
 On Wed, 8 Apr 2015, Haomai Wang wrote:
 On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote:
  On Tue, 7 Apr 2015, Mark Nelson wrote:
  On 04/07/2015 02:16 PM, Mark Nelson wrote:
   On 04/07/2015 09:57 AM, Mark Nelson wrote:
Hi Guys,
   
I ran some quick tests on Sage's newstore branch.  So far given that
this is a prototype, things are looking pretty good imho.  The 4MB
object rados bench read/write and small read performance looks
especially good.  Keep in mind that this is not using the SSD journals
in any way, so 640MB/s sequential writes is actually really good
compared to filestore without SSD journals.
   
small write performance appears to be fairly bad, especially in the 
RBD
case where it's small writes to larger objects.  I'm going to sit down
and see if I can figure out what's going on.  It's bad enough that I
suspect there's just something odd going on.
   
Mark
  
   Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
   interested:
  
   http://nhm.ceph.com/newstore/
  
   Interestingly small object write/read performance with 4 OSDs was about
   1/3-1/4 the speed of the same cluster with 36 OSDs.
  
   Note: Thanks Dan for fixing the directory column width!
  
   Mark
 
  New fio/librbd results using Sage's latest code that attempts to keep 
  small
  overwrite extents in the db.  This is 4 OSD so not directly comparable to 
  the
  36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:
 
write   readrandw   randr
  4MB   57.9319.6   55.2285.9
  128KB 2.5 230.6   2.4 125.4
  4KB   0.4655.65   1.113.56
 
  What would be very interesting would be to see the 4KB performance
  with the defaults (newstore overlay max = 32) vs overlays disabled
  (newstore overlay max = 0) and see if/how much it is helping.
 
  The latest branch also has open-by-handle.  It's on by default (newstore
  open by handle = true).  I think for most workloads it won't be very
  noticeable... I think there are two questions we need to answer though:
 
  1) Does it have any impact on a creation workload (say, 4kb objects).  It
  shouldn't, but we should confirm.
 
  2) Does it impact small object random reads with a cold cache.  I think to
  see the effect we'll probably need to pile a ton of objects into the
  store, drop caches, and then do random reads.  In the best case the
  effect will be small, but hopefully noticeable: we should go from
  a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
  read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
  I'm not really sure what XFS is doing under the covers here..

 WOW, it's really a cool implementation beyond my original mind
 according to blueprint. Handler, overlay_map and data_map looks so
 flexible and make small io cheaper in theory. Now we only have 1
 element in data_map and I'm not sure your goal about the future's
 usage. Although I have a unclearly idea that it could enhance the role
 of NewStore and make local filesystem just as a block space allocator.
 Let NewStore own a variable of FTL(File Translation Layer), so many
 cool features could be added. What's your idea about data_map?

 Exactly, that is one option.  The other is that we'd treat the data_map
 similar to overlay_map with a fixed or max extent size so that a large
 partial overwrite will mostly go to a new file instead of doing the
 slow WAL.

 My concern currently still is WAL after fsync and kv commiting, maybe
 fsync process is just fine because mostly we won't meet this case in
 rbd. But submit sync kv transaction isn't a low latency job I think,
 maybe we could let WAL parallel with kv commiting?(yes, I really
 concern the latency of one op :-) )

 The WAL has to come after kv commit.  But the fsync after the wal
 completion sucks, especially since we are always dispatching a single
 fsync at a time so it's kind of worst-case seek behavior.  We could throw
 these into another parallel fsync queue so that the fs can batch them up,
 but I'm not sure we will enough parallelism.  What would really be nice is
 a batch fsync syscall, but in leiu of that maybe we wait until we have a
 bunch of fsyncs pending and then throw them at the kernel together in a
 bunch of threads?  Not sure.  These aren't normally time sensitive
 unless a read comes along (which is pretty rare), but they have to be done
 for correctness.

Couldn't we write both the log entry and the data in parallel and only
acknowledge to the client once both commit?
If we replay the log without the data we'll know it didn't get
committed, and we can collect the data after replay if it's not
referenced by the log (I'm speculating, as I haven't looked at the
code or how it's actually choosing names).
-Greg


 Then from the actual rados write op, it will add setattr and
 omap_setkeys ops. Current 

Re: Call for jenkins slaves to improve multi operating system support

2015-04-08 Thread Loic Dachary
Hi Sage,

On 08/04/2015 18:59, Sage Weil wrote:
 On Wed, 8 Apr 2015, Loic Dachary wrote:
 Hi Ceph,

 When a contribution is proposed to Ceph [1], a bot compiles and run 
 tests with it to provide feedback to the developer [2]. When something 
 goes wrong the failure can be repeated on the developer machine [3] for 
 debug. This also helps the reviewer who knows the code compiles and does 
 not break anything that would be detected by make check.

 The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to 
 older operating systems (headers, compiler version, etc.) may be 
 detected later, when building packages [4] and after the pull request 
 has been merged in master. This is rare but requires extra attention 
 from the reviewer and needs to be dealt with urgently when it happens.
 
 Do additional slaves block the message from appearing on the pull 
 request?  I.e., what happens if a slave is very slow (e.g., armv7) or 
 broken (network issue)?

I will make it so a comment is posted as soon as the first slave succeeds / 
fails (it currently waits for all to finish which is inconvenient). The first 
slave will always be a CentOS 7 running on a fast machine so that the worst 
that can happen is that it's the only one to run. If slow slaves lag behind too 
much it would be nice to have a jenkins plugin that discards jobs randomly to 
prevent the queue from growing out of proportion on that specific slave.

 What are the connectivity requirements?  

Nothing more than the ability to git pull from a ceph repository. 

 Can slaves exist on other 
 (private) networks?

Yes. In that case a ssh -f -n -L tunnel to the jenkins master will be 
established to allow it to probe the slave when necessary.

 
 Thanks!
 sage
 

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Initial newstore vs filestore results

2015-04-08 Thread Sage Weil
On Wed, 8 Apr 2015, Gregory Farnum wrote:
 On Wed, Apr 8, 2015 at 9:49 AM, Sage Weil s...@newdream.net wrote:
  On Wed, 8 Apr 2015, Haomai Wang wrote:
  On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote:
   On Tue, 7 Apr 2015, Mark Nelson wrote:
   On 04/07/2015 02:16 PM, Mark Nelson wrote:
On 04/07/2015 09:57 AM, Mark Nelson wrote:
 Hi Guys,

 I ran some quick tests on Sage's newstore branch.  So far given that
 this is a prototype, things are looking pretty good imho.  The 4MB
 object rados bench read/write and small read performance looks
 especially good.  Keep in mind that this is not using the SSD 
 journals
 in any way, so 640MB/s sequential writes is actually really good
 compared to filestore without SSD journals.

 small write performance appears to be fairly bad, especially in the 
 RBD
 case where it's small writes to larger objects.  I'm going to sit 
 down
 and see if I can figure out what's going on.  It's bad enough that I
 suspect there's just something odd going on.

 Mark
   
Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for 
those
interested:
   
http://nhm.ceph.com/newstore/
   
Interestingly small object write/read performance with 4 OSDs was 
about
1/3-1/4 the speed of the same cluster with 36 OSDs.
   
Note: Thanks Dan for fixing the directory column width!
   
Mark
  
   New fio/librbd results using Sage's latest code that attempts to keep 
   small
   overwrite extents in the db.  This is 4 OSD so not directly comparable 
   to the
   36 OSD tests above, but does include seekwatcher graphs.  Results in 
   MB/s:
  
 write   readrandw   randr
   4MB   57.9319.6   55.2285.9
   128KB 2.5 230.6   2.4 125.4
   4KB   0.4655.65   1.113.56
  
   What would be very interesting would be to see the 4KB performance
   with the defaults (newstore overlay max = 32) vs overlays disabled
   (newstore overlay max = 0) and see if/how much it is helping.
  
   The latest branch also has open-by-handle.  It's on by default (newstore
   open by handle = true).  I think for most workloads it won't be very
   noticeable... I think there are two questions we need to answer though:
  
   1) Does it have any impact on a creation workload (say, 4kb objects).  It
   shouldn't, but we should confirm.
  
   2) Does it impact small object random reads with a cold cache.  I think 
   to
   see the effect we'll probably need to pile a ton of objects into the
   store, drop caches, and then do random reads.  In the best case the
   effect will be small, but hopefully noticeable: we should go from
   a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
   read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
   I'm not really sure what XFS is doing under the covers here..
 
  WOW, it's really a cool implementation beyond my original mind
  according to blueprint. Handler, overlay_map and data_map looks so
  flexible and make small io cheaper in theory. Now we only have 1
  element in data_map and I'm not sure your goal about the future's
  usage. Although I have a unclearly idea that it could enhance the role
  of NewStore and make local filesystem just as a block space allocator.
  Let NewStore own a variable of FTL(File Translation Layer), so many
  cool features could be added. What's your idea about data_map?
 
  Exactly, that is one option.  The other is that we'd treat the data_map
  similar to overlay_map with a fixed or max extent size so that a large
  partial overwrite will mostly go to a new file instead of doing the
  slow WAL.
 
  My concern currently still is WAL after fsync and kv commiting, maybe
  fsync process is just fine because mostly we won't meet this case in
  rbd. But submit sync kv transaction isn't a low latency job I think,
  maybe we could let WAL parallel with kv commiting?(yes, I really
  concern the latency of one op :-) )
 
  The WAL has to come after kv commit.  But the fsync after the wal
  completion sucks, especially since we are always dispatching a single
  fsync at a time so it's kind of worst-case seek behavior.  We could throw
  these into another parallel fsync queue so that the fs can batch them up,
  but I'm not sure we will enough parallelism.  What would really be nice is
  a batch fsync syscall, but in leiu of that maybe we wait until we have a
  bunch of fsyncs pending and then throw them at the kernel together in a
  bunch of threads?  Not sure.  These aren't normally time sensitive
  unless a read comes along (which is pretty rare), but they have to be done
  for correctness.
 
 Couldn't we write both the log entry and the data in parallel and only
 acknowledge to the client once both commit?
 If we replay the log without the data we'll know it didn't get
 committed, and we can collect the data after replay if it's not
 referenced by the log (I'm speculating, 

04/08/2015 Weekly Ceph Performance Meeting IS ON!

2015-04-08 Thread Mark Nelson
8AM PST as usual!  Please add an agenda item if there is something you 
want to talk about.  So far we'll be discussing Sandisk's xio messenger 
performance results, newstore performance, and civetweb vs 
mod-proxy-fastcgi.  Be there or be square!


Here's the links:

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
  +1 408 740 7256
  +1 888 240 2560(US Toll Free)
  +1 408 317 9253(Alternate Number)
  (see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Call for jenkins slaves to improve multi operating system support

2015-04-08 Thread Duan, Jiangang
Loric,

do you mean we need give the servers to you or we just build the testing inside 
our own server room to do all the testing?

-jiangang

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Loic Dachary
Sent: Wednesday, April 08, 2015 6:56 PM
To: Ceph Development
Subject: Call for jenkins slaves to improve multi operating system support

Hi Ceph,

When a contribution is proposed to Ceph [1], a bot compiles and run tests with 
it to provide feedback to the developer [2]. When something goes wrong the 
failure can be repeated on the developer machine [3] for debug. This also helps 
the reviewer who knows the code compiles and does not break anything that would 
be detected by make check.

The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older 
operating systems (headers, compiler version, etc.) may be detected later, when 
building packages [4] and after the pull request has been merged in master. 
This is rare but requires extra attention from the reviewer and needs to be 
dealt with urgently when it happens.

If you can spare a machine to help expand the operating systems on which tests 
can run, it would be a great help. The minimum hardware configuration to run a 
slave is:

*  x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian GNU/Linux 
Jessie, Ubuntu 14.02

  32 GB RAM
  200 GB SSD
  8 core  2.5Ghz

*  i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux 
Jessie, Ubuntu 14.04, Ubuntu 14.02

  4 GB RAM
  200 GB disk
  2 core

*  armv7, armv8 architecture for Ubuntu 14.04

  4 GB RAM
  200 GB disk
  2 core 

Note that since the make check bot can run in a docker container, x86_64 
machines can be used to run any of the operating systems for which a docker 
file has been prepared [5].

Cheers

[1] pull requests https://github.com/ceph/ceph/pulls
[2] make check bot feedback 
https://github.com/ceph/ceph/pull/4296#issuecomment-90812064
[3] run-make-check.sh 
https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44
[4] gitbuilder http://ceph.com/gitbuilder.cgi [5] 
https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91

--
Loïc Dachary, Artisan Logiciel Libre



Re: Call for jenkins slaves to improve multi operating system support

2015-04-08 Thread Loic Dachary


On 08/04/2015 15:21, Duan, Jiangang wrote:
 Loric,
 
 do you mean we need give the servers to you or we just build the testing 
 inside our own server room to do all the testing?

Thanks for asking, I realize that was not clear. 

The idea is not to donate hardware, because that would require manpower and 
extra costs to connect to the net. 

What would be useful is a machine connected to the net and dedicated to running 
a jenkins slave. It receives a build from the jenkins master 
(http://jenkins.ceph.dachary.org/) via ssh (possibly with a tunnel if behind a 
NAT), clone http://github.com/ceph/ceph, execute the run-make-check.sh script 
that is found at the root of the repository and reports failure / success back 
to the jenkins master.

Does that make sense ?

 
 -jiangang
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Loic Dachary
 Sent: Wednesday, April 08, 2015 6:56 PM
 To: Ceph Development
 Subject: Call for jenkins slaves to improve multi operating system support
 
 Hi Ceph,
 
 When a contribution is proposed to Ceph [1], a bot compiles and run tests 
 with it to provide feedback to the developer [2]. When something goes wrong 
 the failure can be repeated on the developer machine [3] for debug. This also 
 helps the reviewer who knows the code compiles and does not break anything 
 that would be detected by make check.
 
 The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older 
 operating systems (headers, compiler version, etc.) may be detected later, 
 when building packages [4] and after the pull request has been merged in 
 master. This is rare but requires extra attention from the reviewer and needs 
 to be dealt with urgently when it happens.
 
 If you can spare a machine to help expand the operating systems on which 
 tests can run, it would be a great help. The minimum hardware configuration 
 to run a slave is:
 
 *  x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian 
 GNU/Linux Jessie, Ubuntu 14.02
 
   32 GB RAM
   200 GB SSD
   8 core  2.5Ghz
 
 *  i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux 
 Jessie, Ubuntu 14.04, Ubuntu 14.02
 
   4 GB RAM
   200 GB disk
   2 core
 
 *  armv7, armv8 architecture for Ubuntu 14.04
 
   4 GB RAM
   200 GB disk
   2 core 
 
 Note that since the make check bot can run in a docker container, x86_64 
 machines can be used to run any of the operating systems for which a docker 
 file has been prepared [5].
 
 Cheers
 
 [1] pull requests https://github.com/ceph/ceph/pulls
 [2] make check bot feedback 
 https://github.com/ceph/ceph/pull/4296#issuecomment-90812064
 [3] run-make-check.sh 
 https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44
 [4] gitbuilder http://ceph.com/gitbuilder.cgi [5] 
 https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91
 
 --
 Loïc Dachary, Artisan Logiciel Libre
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Initial newstore vs filestore results

2015-04-08 Thread Milosz Tanski
On Wed, Apr 8, 2015 at 12:49 PM, Sage Weil s...@newdream.net wrote:
 On Wed, 8 Apr 2015, Haomai Wang wrote:
 On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote:
  On Tue, 7 Apr 2015, Mark Nelson wrote:
  On 04/07/2015 02:16 PM, Mark Nelson wrote:
   On 04/07/2015 09:57 AM, Mark Nelson wrote:
Hi Guys,
   
I ran some quick tests on Sage's newstore branch.  So far given that
this is a prototype, things are looking pretty good imho.  The 4MB
object rados bench read/write and small read performance looks
especially good.  Keep in mind that this is not using the SSD journals
in any way, so 640MB/s sequential writes is actually really good
compared to filestore without SSD journals.
   
small write performance appears to be fairly bad, especially in the 
RBD
case where it's small writes to larger objects.  I'm going to sit down
and see if I can figure out what's going on.  It's bad enough that I
suspect there's just something odd going on.
   
Mark
  
   Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
   interested:
  
   http://nhm.ceph.com/newstore/
  
   Interestingly small object write/read performance with 4 OSDs was about
   1/3-1/4 the speed of the same cluster with 36 OSDs.
  
   Note: Thanks Dan for fixing the directory column width!
  
   Mark
 
  New fio/librbd results using Sage's latest code that attempts to keep 
  small
  overwrite extents in the db.  This is 4 OSD so not directly comparable to 
  the
  36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:
 
write   readrandw   randr
  4MB   57.9319.6   55.2285.9
  128KB 2.5 230.6   2.4 125.4
  4KB   0.4655.65   1.113.56
 
  What would be very interesting would be to see the 4KB performance
  with the defaults (newstore overlay max = 32) vs overlays disabled
  (newstore overlay max = 0) and see if/how much it is helping.
 
  The latest branch also has open-by-handle.  It's on by default (newstore
  open by handle = true).  I think for most workloads it won't be very
  noticeable... I think there are two questions we need to answer though:
 
  1) Does it have any impact on a creation workload (say, 4kb objects).  It
  shouldn't, but we should confirm.
 
  2) Does it impact small object random reads with a cold cache.  I think to
  see the effect we'll probably need to pile a ton of objects into the
  store, drop caches, and then do random reads.  In the best case the
  effect will be small, but hopefully noticeable: we should go from
  a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
  read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
  I'm not really sure what XFS is doing under the covers here..

 WOW, it's really a cool implementation beyond my original mind
 according to blueprint. Handler, overlay_map and data_map looks so
 flexible and make small io cheaper in theory. Now we only have 1
 element in data_map and I'm not sure your goal about the future's
 usage. Although I have a unclearly idea that it could enhance the role
 of NewStore and make local filesystem just as a block space allocator.
 Let NewStore own a variable of FTL(File Translation Layer), so many
 cool features could be added. What's your idea about data_map?

 Exactly, that is one option.  The other is that we'd treat the data_map
 similar to overlay_map with a fixed or max extent size so that a large
 partial overwrite will mostly go to a new file instead of doing the
 slow WAL.

 My concern currently still is WAL after fsync and kv commiting, maybe
 fsync process is just fine because mostly we won't meet this case in
 rbd. But submit sync kv transaction isn't a low latency job I think,
 maybe we could let WAL parallel with kv commiting?(yes, I really
 concern the latency of one op :-) )

 The WAL has to come after kv commit.  But the fsync after the wal
 completion sucks, especially since we are always dispatching a single
 fsync at a time so it's kind of worst-case seek behavior.  We could throw
 these into another parallel fsync queue so that the fs can batch them up,
 but I'm not sure we will enough parallelism.  What would really be nice is
 a batch fsync syscall, but in leiu of that maybe we wait until we have a
 bunch of fsyncs pending and then throw them at the kernel together in a
 bunch of threads?  Not sure.  These aren't normally time sensitive
 unless a read comes along (which is pretty rare), but they have to be done
 for correctness.

 Then from the actual rados write op, it will add setattr and
 omap_setkeys ops. Current NewStore looks plays badly for setattr. It
 always encode all xattrs(and other not so tiny fields) and write again
 (Is this true?) though it could batch multi transaction's onode write
 in short time.

 Yeah, this could be optimized so that we only unpack and repack the
 bufferlist, or do a single walk through the buffer to do the updates
 (similar to what TMAP 

Re: Initial newstore vs filestore results

2015-04-08 Thread Sage Weil
On Wed, 8 Apr 2015, Haomai Wang wrote:
 On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote:
  On Tue, 7 Apr 2015, Mark Nelson wrote:
  On 04/07/2015 02:16 PM, Mark Nelson wrote:
   On 04/07/2015 09:57 AM, Mark Nelson wrote:
Hi Guys,
   
I ran some quick tests on Sage's newstore branch.  So far given that
this is a prototype, things are looking pretty good imho.  The 4MB
object rados bench read/write and small read performance looks
especially good.  Keep in mind that this is not using the SSD journals
in any way, so 640MB/s sequential writes is actually really good
compared to filestore without SSD journals.
   
small write performance appears to be fairly bad, especially in the RBD
case where it's small writes to larger objects.  I'm going to sit down
and see if I can figure out what's going on.  It's bad enough that I
suspect there's just something odd going on.
   
Mark
  
   Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
   interested:
  
   http://nhm.ceph.com/newstore/
  
   Interestingly small object write/read performance with 4 OSDs was about
   1/3-1/4 the speed of the same cluster with 36 OSDs.
  
   Note: Thanks Dan for fixing the directory column width!
  
   Mark
 
  New fio/librbd results using Sage's latest code that attempts to keep small
  overwrite extents in the db.  This is 4 OSD so not directly comparable to 
  the
  36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:
 
write   readrandw   randr
  4MB   57.9319.6   55.2285.9
  128KB 2.5 230.6   2.4 125.4
  4KB   0.4655.65   1.113.56
 
  What would be very interesting would be to see the 4KB performance
  with the defaults (newstore overlay max = 32) vs overlays disabled
  (newstore overlay max = 0) and see if/how much it is helping.
 
  The latest branch also has open-by-handle.  It's on by default (newstore
  open by handle = true).  I think for most workloads it won't be very
  noticeable... I think there are two questions we need to answer though:
 
  1) Does it have any impact on a creation workload (say, 4kb objects).  It
  shouldn't, but we should confirm.
 
  2) Does it impact small object random reads with a cold cache.  I think to
  see the effect we'll probably need to pile a ton of objects into the
  store, drop caches, and then do random reads.  In the best case the
  effect will be small, but hopefully noticeable: we should go from
  a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
  read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
  I'm not really sure what XFS is doing under the covers here..
 
 WOW, it's really a cool implementation beyond my original mind
 according to blueprint. Handler, overlay_map and data_map looks so
 flexible and make small io cheaper in theory. Now we only have 1
 element in data_map and I'm not sure your goal about the future's
 usage. Although I have a unclearly idea that it could enhance the role
 of NewStore and make local filesystem just as a block space allocator.
 Let NewStore own a variable of FTL(File Translation Layer), so many
 cool features could be added. What's your idea about data_map?

Exactly, that is one option.  The other is that we'd treat the data_map 
similar to overlay_map with a fixed or max extent size so that a large 
partial overwrite will mostly go to a new file instead of doing the 
slow WAL.

 My concern currently still is WAL after fsync and kv commiting, maybe
 fsync process is just fine because mostly we won't meet this case in
 rbd. But submit sync kv transaction isn't a low latency job I think,
 maybe we could let WAL parallel with kv commiting?(yes, I really
 concern the latency of one op :-) )

The WAL has to come after kv commit.  But the fsync after the wal 
completion sucks, especially since we are always dispatching a single 
fsync at a time so it's kind of worst-case seek behavior.  We could throw 
these into another parallel fsync queue so that the fs can batch them up, 
but I'm not sure we will enough parallelism.  What would really be nice is 
a batch fsync syscall, but in leiu of that maybe we wait until we have a 
bunch of fsyncs pending and then throw them at the kernel together in a 
bunch of threads?  Not sure.  These aren't normally time sensitive 
unless a read comes along (which is pretty rare), but they have to be done 
for correctness.

 Then from the actual rados write op, it will add setattr and
 omap_setkeys ops. Current NewStore looks plays badly for setattr. It
 always encode all xattrs(and other not so tiny fields) and write again
 (Is this true?) though it could batch multi transaction's onode write
 in short time.

Yeah, this could be optimized so that we only unpack and repack the 
bufferlist, or do a single walk through the buffer to do the updates 
(similar to what TMAP used to do).

 NewStore also employ much more workload to KeyValueDB compared 

Re: Call for jenkins slaves to improve multi operating system support

2015-04-08 Thread Sage Weil
On Wed, 8 Apr 2015, Loic Dachary wrote:
 Hi Ceph,
 
 When a contribution is proposed to Ceph [1], a bot compiles and run 
 tests with it to provide feedback to the developer [2]. When something 
 goes wrong the failure can be repeated on the developer machine [3] for 
 debug. This also helps the reviewer who knows the code compiles and does 
 not break anything that would be detected by make check.
 
 The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to 
 older operating systems (headers, compiler version, etc.) may be 
 detected later, when building packages [4] and after the pull request 
 has been merged in master. This is rare but requires extra attention 
 from the reviewer and needs to be dealt with urgently when it happens.

Do additional slaves block the message from appearing on the pull 
request?  I.e., what happens if a slave is very slow (e.g., armv7) or 
broken (network issue)?

What are the connectivity requirements?  Can slaves exist on other 
(private) networks?

Thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lttng broken on i386 trusty

2015-04-08 Thread Josh Durgin

Ah, I didn't notice Loic disabled it for the tarball gitbuilder (it
makes sense to disable there since it's not needed, and speeds up build
+ tests that that gitbuilder does).

I'm not sure why it's failing --with-lttng yet. Odd that it doesn't
fail in the deb gitbuilder, which still has lttng autodetected.

Josh

On 04/08/2015 04:25 PM, kernel neophyte wrote:

Thats because its being run with --without-lttng

as per the log in
http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582:

./configure --with-debug --with-radosgw --with-libatomic-ops
--without-lttng --disable-static --without-cryptopp --with-tcmalloc

if you change it to --with-lttng it will fail

-Neo

On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote:

Are you still seeing the same error on the latest master?
The gitbuilder is building successfully now:

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582


On 04/08/2015 04:07 PM, kernel neophyte wrote:


Need help! This is still broken!

-Neo

On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote:


Hi,

In case you did not notice:


http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa

CXXLD radosgw-admin
CXXLD rados
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
undefined reference to `dlclose'
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
undefined reference to `dlsym'
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
undefined reference to `dlopen'

  error: collect2: ld returned 1 exit status

make[3]: *** [ceph_tpbench] Error 1
make[3]: *** Waiting for unfinished jobs
CXXLD ceph-objectstore-tool
make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'

  make: *** [all-recursive] Error 1

Cheers

--
Loïc Dachary, Artisan Logiciel Libre


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ms_crc_data false

2015-04-08 Thread Deneau, Tom
With 0.93, I tried 
ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--ms_crc_header=false'

and saw the changes reflected in ceph admin-daemon

But having done that, perf top still shows time being spent in crc32 routines.
Is there some other parameter that needs changing?

-- Tom Deneau

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lttng broken on i386 trusty

2015-04-08 Thread Josh Durgin

Are you still seeing the same error on the latest master?
The gitbuilder is building successfully now:

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582

On 04/08/2015 04:07 PM, kernel neophyte wrote:

Need help! This is still broken!

-Neo

On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote:

Hi,

In case you did not notice:

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa

CXXLD radosgw-admin
CXXLD rados
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
undefined reference to `dlclose'
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
undefined reference to `dlsym'
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
undefined reference to `dlopen'

 error: collect2: ld returned 1 exit status

make[3]: *** [ceph_tpbench] Error 1
make[3]: *** Waiting for unfinished jobs
CXXLD ceph-objectstore-tool
make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'

 make: *** [all-recursive] Error 1

Cheers

--
Loïc Dachary, Artisan Logiciel Libre


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lttng broken on i386 trusty

2015-04-08 Thread kernel neophyte
Thats because its being run with --without-lttng

as per the log in
http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582:

./configure --with-debug --with-radosgw --with-libatomic-ops
--without-lttng --disable-static --without-cryptopp --with-tcmalloc

if you change it to --with-lttng it will fail

-Neo

On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote:
 Are you still seeing the same error on the latest master?
 The gitbuilder is building successfully now:

 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582


 On 04/08/2015 04:07 PM, kernel neophyte wrote:

 Need help! This is still broken!

 -Neo

 On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote:

 Hi,

 In case you did not notice:


 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa

 CXXLD radosgw-admin
 CXXLD rados
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlclose'
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlsym'
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlopen'

  error: collect2: ld returned 1 exit status

 make[3]: *** [ceph_tpbench] Error 1
 make[3]: *** Waiting for unfinished jobs
 CXXLD ceph-objectstore-tool
 make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[2]: *** [all-recursive] Error 1
 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[1]: *** [all] Error 2
 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'

  make: *** [all-recursive] Error 1

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ms_crc_data false

2015-04-08 Thread Gregory Farnum
On Wed, Apr 8, 2015 at 3:38 PM, Deneau, Tom tom.den...@amd.com wrote:
 With 0.93, I tried
 ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--ms_crc_header=false'

 and saw the changes reflected in ceph admin-daemon

 But having done that, perf top still shows time being spent in crc32 routines.
 Is there some other parameter that needs changing?

You can change this config value, but unfortunately it won't have any
effect on a running daemon. You'll need to specify it in a config and
restart.
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Initial newstore vs filestore results

2015-04-08 Thread Mark Nelson

On 04/07/2015 09:58 PM, Sage Weil wrote:

On Tue, 7 Apr 2015, Mark Nelson wrote:

On 04/07/2015 02:16 PM, Mark Nelson wrote:

On 04/07/2015 09:57 AM, Mark Nelson wrote:

Hi Guys,

I ran some quick tests on Sage's newstore branch.  So far given that
this is a prototype, things are looking pretty good imho.  The 4MB
object rados bench read/write and small read performance looks
especially good.  Keep in mind that this is not using the SSD journals
in any way, so 640MB/s sequential writes is actually really good
compared to filestore without SSD journals.

small write performance appears to be fairly bad, especially in the RBD
case where it's small writes to larger objects.  I'm going to sit down
and see if I can figure out what's going on.  It's bad enough that I
suspect there's just something odd going on.

Mark


Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those
interested:

http://nhm.ceph.com/newstore/

Interestingly small object write/read performance with 4 OSDs was about
1/3-1/4 the speed of the same cluster with 36 OSDs.

Note: Thanks Dan for fixing the directory column width!

Mark


New fio/librbd results using Sage's latest code that attempts to keep small
overwrite extents in the db.  This is 4 OSD so not directly comparable to the
36 OSD tests above, but does include seekwatcher graphs.  Results in MB/s:

write   readrandw   randr
4MB 57.9319.6   55.2285.9
128KB   2.5 230.6   2.4 125.4
4KB 0.4655.65   1.113.56


What would be very interesting would be to see the 4KB performance
with the defaults (newstore overlay max = 32) vs overlays disabled
(newstore overlay max = 0) and see if/how much it is helping.


And here we go.  1 OSD, 1X replication.  16GB RBD volume.

4MB write   readrandw   randr
default overlay 36.13   106.61  34.49   92.69
no overlay  36.29   105.61  34.49   93.55

128KB   write   readrandw   randr
default overlay 1.7197.90   1.6525.79
no overlay  1.7297.80   1.6625.78

4KB write   readrandw   randr
default overlay 0.4061.88   1.291.11
no overlay  0.0561.26   0.051.10

seekwatcher movies generating now, but I'm going to bed soon so I'll 
have to wait until tomorrow morning to post them. :)




The latest branch also has open-by-handle.  It's on by default (newstore
open by handle = true).  I think for most workloads it won't be very
noticeable... I think there are two questions we need to answer though:

1) Does it have any impact on a creation workload (say, 4kb objects).  It
shouldn't, but we should confirm.

2) Does it impact small object random reads with a cold cache.  I think to
see the effect we'll probably need to pile a ton of objects into the
store, drop caches, and then do random reads.  In the best case the
effect will be small, but hopefully noticeable: we should go from
a directory lookup (1+ seeks) + inode lookup (1+ seek) + data
read, to inode lookup (1+ seek) + data read.  So, 3 - 2 seeks best case?
I'm not really sure what XFS is doing under the covers here...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lttng broken on i386 trusty

2015-04-08 Thread kernel neophyte
Strange! I see the build succeeds for deb

-Neo

On Wed, Apr 8, 2015 at 5:00 PM, Josh Durgin jdur...@redhat.com wrote:
 Ah, I didn't notice Loic disabled it for the tarball gitbuilder (it
 makes sense to disable there since it's not needed, and speeds up build
 + tests that that gitbuilder does).

 I'm not sure why it's failing --with-lttng yet. Odd that it doesn't
 fail in the deb gitbuilder, which still has lttng autodetected.

 Josh


 On 04/08/2015 04:25 PM, kernel neophyte wrote:

 Thats because its being run with --without-lttng

 as per the log in

 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582:

 ./configure --with-debug --with-radosgw --with-libatomic-ops
 --without-lttng --disable-static --without-cryptopp --with-tcmalloc

 if you change it to --with-lttng it will fail

 -Neo

 On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote:

 Are you still seeing the same error on the latest master?
 The gitbuilder is building successfully now:


 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582


 On 04/08/2015 04:07 PM, kernel neophyte wrote:


 Need help! This is still broken!

 -Neo

 On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote:


 Hi,

 In case you did not notice:



 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa

 CXXLD radosgw-admin
 CXXLD rados

 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlclose'

 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlsym'

 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so:
 undefined reference to `dlopen'

   error: collect2: ld returned 1 exit status

 make[3]: *** [ceph_tpbench] Error 1
 make[3]: *** Waiting for unfinished jobs
 CXXLD ceph-objectstore-tool
 make[3]: Leaving directory
 `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[2]: *** [all-recursive] Error 1
 make[2]: Leaving directory
 `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[1]: *** [all] Error 2
 make[1]: Leaving directory
 `/srv/autobuild-ceph/gitbuilder.git/build/src'

   make: *** [all-recursive] Error 1

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lttng broken on i386 trusty

2015-04-08 Thread kernel neophyte
Need help! This is still broken!

-Neo

On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote:
 Hi,

 In case you did not notice:

 http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa

 CXXLD radosgw-admin
 CXXLD rados
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
 undefined reference to `dlclose'
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
 undefined reference to `dlsym'
 /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: 
 undefined reference to `dlopen'

 error: collect2: ld returned 1 exit status

 make[3]: *** [ceph_tpbench] Error 1
 make[3]: *** Waiting for unfinished jobs
 CXXLD ceph-objectstore-tool
 make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[2]: *** [all-recursive] Error 1
 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'
 make[1]: *** [all] Error 2
 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src'

 make: *** [all-recursive] Error 1

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ms_crc_data false

2015-04-08 Thread Deneau, Tom
 -Original Message-
 From: Sage Weil [mailto:s...@newdream.net]
 Sent: Wednesday, April 08, 2015 5:40 PM
 To: Deneau, Tom
 Cc: ceph-devel
 Subject: Re: ms_crc_data false
 
 On Wed, 8 Apr 2015, Deneau, Tom wrote:
  With 0.93, I tried
  ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--
 ms_crc_header=false'
 
  and saw the changes reflected in ceph admin-daemon
 
  But having done that, perf top still shows time being spent in crc32
 routines.
  Is there some other parameter that needs changing?
 
 The osd still does a CRC for the purposes of write journaling.  This can't be
 disabled currently.  You shouldn't see this come up on reads...
 
 sage

Hmm, I am doing rados bench seq so only reads.

Is this in 0.93 or do I need 0.94?

-- Tom
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcmalloc issue

2015-04-08 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 01/04/15 18:54, Somnath Roy wrote:
 the tcmalloc bug is reported in the following link.
 
 http://code.google.com/p/gperftools/issues/detail?id=585
 
 So, do you want us to give the steps on how we are hitting the
 following trace or to reproduce the bug reported in the above link
 ?
 
 If you are asking about hitting the stack trace I mentioned below,
 it is just running ceph with lot of IOs for longer period of time.

A minimal test case on how to reproduce the bug is sufficient; we can
of course complete the second activity as well as soon as we have the
update available.

Cheers

James


- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVJhUoAAoJEL/srsug59jD/WoQAJqZYJ82xcDDxuD5C9RnUXha
2UtYNXZTK9O1NdiLqsNw14SPKtOQ6IDyfJjvB2v+ddfvRxhlfSM6LmC4ERa0t0SW
a3tfwx9fQ0WEz2r35nkupHwo+Z4yfTAuHgGom3rH6tq4gbAInn7J+4unsxghV1kf
op/c9qNJZV9jqfqMZFFgwwptK7rJ4mKJDvGIRU4ddc9HEKnqukJHQAHh3zUlNXC9
6y11RAHFh0FcgEEHys3o/ui0JVzNjmP8HJXISzyWSqO6J2NxqRNYoKGTk81zBvrf
IkWDDujCcTQoAU2aW6ERRvG1Dkzc9oTLdcmb0gGCKKpqa1m1GjnqS1B39rjheXio
HK5L9HO11Fs+9uONBXWvI6P0GXi3UQoe1NaVQB85EL+NPFbTdKr9zw8nOyyqZ9Y5
X55uVRAscLV/jb1vYIU5JxSNWof4BtPSjQzq29cithPqfWyF2N+eyQbCrAaIu1oP
QGSnryCsoWDARyvjMIFxcsd2X6nFnOd9DMgumi6O1ksKAe1P38k1iUWFwXXutby6
6eTdZtOj9Wc4VNRPb8SYBkBSlU+dCgRVMlNUImcCKWR6w8+Y9BbQU7OwQGEzj9Xw
f9sdi+4JA04l/BXdAtlVwSj4uSm2Q6R/YK4szpRGs24EOV6dX+aCMK5ezBIliPo0
D/7ZD03rXkiBzHuBqiay
=066U
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html