RE: Very slow recovery/peering with latest master

2015-09-24 Thread Podoski, Igor
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Sage Weil
> Sent: Thursday, September 24, 2015 3:32 AM
> To: Handzik, Joe
> Cc: Somnath Roy; Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-
> devel
> Subject: Re: Very slow recovery/peering with latest master
> 
> On Wed, 23 Sep 2015, Handzik, Joe wrote:
> > Ok. When configuring with ceph-disk, it does something nifty and
> > actually gives the OSD the uuid of the disk's partition as its fsid. I
> > bootstrap off that to get an argument to pass into the function you
> > have identified as the bottleneck. I ran it by sage and we both
> > realized there would be cases where it wouldn't work...I'm sure
> > neither of us realized the failure would take three minutes though.
> >
> > In the short term, it makes sense to create an option to disable or
> > short-circuit the blkid code. I would prefer that the default be left
> > with the code enabled, but I'm open to default disabled if others
> > think this will be a widespread problem. You could also make sure your
> > OSD fsids are set to match your disk partition uuids for now too, if
> > that's a faster workaround for you (it'll get rid of the failure).
> 
> I think we should try to figure out where it is hanging.  Can you strace the
> blkid process to see what it is up to?
> 
> I opened http://tracker.ceph.com/issues/13219
> 
> I think as long as it behaves reliably with ceph-disk OSDs then we can have it
> on by default.
> 
> sage
> 
> 
> >
> > Joe
> >
> > > On Sep 23, 2015, at 6:26 PM, Somnath Roy 
> wrote:
> > >
> > > < > >
> > > -Original Message-
> > > From: Handzik, Joe [mailto:joseph.t.hand...@hpe.com]
> > > Sent: Wednesday, September 23, 2015 4:20 PM
> > > To: Samuel Just
> > > Cc: Somnath Roy; Samuel Just (sam.j...@inktank.com); Sage Weil
> > > (s...@newdream.net); ceph-devel
> > > Subject: Re: Very slow recovery/peering with latest master
> > >
> > > I added that, there is code up the stack in calamari that consumes the
> path provided, which is intended in the future to facilitate disk monitoring
> and management.
> > >
> > > [Somnath] Ok
> > >
> > > Somnath, what does your disk configuration look like (filesystem,
> SSD/HDD, anything else you think could be relevant)? Did you configure your
> disks with ceph-disk, or by hand? I never saw this while testing my code, has
> anyone else heard of this behavior on master? The code has been in master
> for 2-3 months now I believe.
> > > [Somnath] All SSD , I use mkcephfs to create cluster , I partitioned the
> disk with fdisk beforehand. I am using XFS. Are you trying with Ubuntu 3.16.*
> kernel ? It could be Linux distribution/kernel specific.

Somnath, maybe it is GPT related, what partition table do you have? I think 
parted and gdisk can create GPT partitions, but not fdisk (definitely not in 
version that I use).

You could backup and clear blkid cache /etc/blkid/blkid.tab, maybe there is a 
mess.

Regards,
Igor.


> > >
> > > It would be nice to not need to disable this, but if this behavior exists 
> > > and
> can't be explained by a misconfiguration or something else I'll need to figure
> out a different implementation.
> > >
> > > Joe
> > >
> > >> On Sep 23, 2015, at 6:07 PM, Samuel Just  wrote:
> > >>
> > >> Wow.  Why would that take so long?  I think you are correct that
> > >> it's only used for metadata, we could just add a config value to
> > >> disable it.
> > >> -Sam
> > >>
> > >>> On Wed, Sep 23, 2015 at 3:48 PM, Somnath Roy
>  wrote:
> > >>> Sam/Sage,
> > >>> I debugged it down and found out that the get_device_by_uuid-
> >blkid_find_dev_with_tag() call within FileStore::collect_metadata() is
> hanging for ~3 mins before returning a EINVAL. I saw this portion is newly
> added after hammer.
> > >>> Commenting it out resolves the issue. BTW, I saw this value is stored as
> metadata but not used anywhere , am I missing anything ?
> > >>> Here is my Linux details..
> > >>>
> > >>> root@emsnode5:~/wip-write-path-optimization/src# uname -a Linux
> > >>> emsnode5 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8
> > >>> 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> > >>>
> > >>>
> > >>> root@emsnode5:~/wip-write-path-optimization/src# lsb_release -a
> No
> > >>> LSB modules are available.
> > >>> Distributor ID: Ubuntu
> > >>> Description:Ubuntu 14.04.2 LTS
> > >>> Release:14.04
> > >>> Codename:   trusty
> > >>>
> > >>> Thanks & Regards
> > >>> Somnath
> > >>>
> > >>> -Original Message-
> > >>> From: Somnath Roy
> > >>> Sent: Wednesday, September 16, 2015 2:20 PM
> > >>> To: 'Gregory Farnum'
> > >>> Cc: 'ceph-devel'
> > >>> Subject: RE: Very slow recovery/peering with latest master
> > >>>
> > >>>
> > >>> Sage/Greg,
> > >>>
> > >>> Yeah, as we expected, it is not happening probably because of
> recovery settings. I reverted it back in my ceph.conf , 

RE: vstart runner for cephfs tests

2015-07-23 Thread Podoski, Igor
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Mark Nelson
 Sent: Thursday, July 23, 2015 2:51 PM
 To: John Spray; ceph-devel@vger.kernel.org
 Subject: Re: vstart runner for cephfs tests
 
 
 
 On 07/23/2015 07:37 AM, John Spray wrote:
 
 
  On 23/07/15 12:56, Mark Nelson wrote:
  I had similar thoughts on the benchmarking side, which is why I
  started writing cbt a couple years ago.  I needed the ability to
  quickly spin up clusters and run benchmarks on arbitrary sets of
  hardware.  The outcome isn't perfect, but it's been extremely useful
  for running benchmarks and sort of exists as a half-way point between
  vstart and teuthology.
 
  The basic idea is that you give it a yaml file that looks a little
  bit like a teuthology yaml file and cbt will (optionally) build a
  cluster across a number of user defined nodes with pdsh, start
  various monitoring tools (this is ugly right now, I'm working on
  making it modular), and then sweep through user defined benchmarks
  and sets of parameter spaces.  I have a separate tool that will sweep
  through ceph parameters, create ceph.conf files for each space, and
  run cbt with each one, but the eventual goal is to integrate that into cbt
 itself.
 
  Though I never really intended it to run functional tests, I just
  added something like looks very similar to the rados suite so I can
  benchmark ceph_test_rados for the new community lab hardware. I
  already had a mechanism to inject OSD down/out up/in events, so with
  a bit of squinting it can give you a very rough approximation of a
  workload using the osd thrasher.  If you are interested, I'd be game
  to see if we could integrate your cephfs tests as well (I eventually
  wanted to add cephfs benchmark capabilities anyway).
 
  Cool - my focus is very much on tightening the code-build-test loop
  for developers, but I can see us needing to extend that into a
  code-build-test-bench loop as we do performance work on cephfs in the
  future.  Does cbt rely on having ceph packages built, or does it blast
  the binaries directly from src/ onto the test nodes?
 
 cbt doesn't handle builds/installs at all, so it's probably not particularly 
 helpful
 in this regard.  By default it assumes binaries are in /usr/bin, but you can
 optionally override that in the yaml.  My workflow is usually to:
 
 1a) build ceph from src and distribute to other nodes (manually)
 1b) run a shell script that installs a given release from gitbuilder on all 
 nodes
 2) run a cbt yaml file that targets /usr/local, the build dir, /usr/bin, etc.
 
 Definitely would be useful to have something that makes 1a) better.
 Probably not cbt's job though.

About 1a)

In my test cluster I have NFS server (on one node) sharing /home/ceph with 
others, with many versions in it. In every subdirectory I run make install with 
DESTDIR pointing to another newly created BIN subdir.

So it looks like this:
/home/ceph/ceph-0.94.1/BIN 
ls BIN
etc
sbin
usr
var

Then I remove var, and run stow on every node to link binaries and libs from 
shared /home/ceph/ceph-version/BIN to '/' and 'ldconfig' at the end. Basically 
I can do changes only on one node, and very quickly switch between ceph 
versions. So there is no ceph installed at any node, ceph stuff is only at /var 
directory.

Of course when NFS node fails, fails everything ... but I'm aware of that.

Check out stow.

 
  John
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html


Regards,
Igor.



RE: building just src/tools/rados

2015-07-22 Thread Podoski, Igor
Hi Tom,

Have you tried cd src; make rados?

Regards,
Igor.


-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Deneau, Tom
Sent: Wednesday, July 22, 2015 10:13 PM
To: ceph-devel
Subject: building just src/tools/rados

Is there a make command that would build just the src/tools or even just 
src/tools/rados ?

-- Tom Deneau

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: deleting objects from a pool

2015-06-25 Thread Podoski, Igor
Hi David,

You're right, now I see adding --run-name  will clean all benchmark data from 
specified namespace, so you can run command only once.

rados -p poolname -N namespace cleanup --prefix  --run-name 

Regards,
Igor.


-Original Message-
From: David Zafman [mailto:dzaf...@redhat.com] 
Sent: Friday, June 26, 2015 3:46 AM
To: Podoski, Igor; Deneau, Tom; Dałek, Piotr; ceph-devel
Subject: Re: deleting objects from a pool


If you have rados bench data around, you'll need to run cleanup a second time 
because the first time the benchmark_last_metadata object will be consulted 
to find what objects to remove.

Also, using cleanup this way will only remove objects from the default 
namespace unless a namespace is specified with the -N option.

rados -p poolname -N namespace cleanup --prefix 

David

On 6/24/15 11:06 PM, Podoski, Igor wrote:
 Hi,

 It appears, that cleanup can be used as a purge:

 rados -p poolname cleanup  --prefix 

 Regards,
 Igor.


 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Deneau, Tom
 Sent: Wednesday, June 24, 2015 10:22 PM
 To: Dałek, Piotr; ceph-devel
 Subject: RE: deleting objects from a pool

 I've noticed that deleting objects from a basic k=2 m=1 erasure pool is much 
 much slower than deleting a similar number of objects from a replicated size 
 3 pool (so the same number of files to be deleted).   It looked like the ec 
 pool object deletion was almost 20x slower.  Is there a lot more work to be 
 done to delete an ec pool object?

 -- Tom



 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- 
 ow...@vger.kernel.org] On Behalf Of Dalek, Piotr
 Sent: Wednesday, June 24, 2015 11:56 AM
 To: ceph-devel
 Subject: Re: deleting objects from a pool

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- 
 ow...@vger.kernel.org] On Behalf Of Deneau, Tom
 Sent: Wednesday, June 24, 2015 6:44 PM

 I have benchmarking situations where I want to leave a pool around 
 but delete a lot of objects from the pool.  Is there any really fast 
 way to do
 that?
 I noticed rados rmpool is fast but I don't want to remove the pool.

 I have been spawning multiple threads, each deleting a subset of the
 objects
 (which I believe is what rados bench write does) but even that can 
 be very slow.
 For now, apart from rados -p poolname cleanup (which doesn't 
 purge the pool, but merely removes objects written during last 
 benchmark run), the only option is by brute force:

 for i in $(rados -p poolname ls); do (rados -p poolname rm $i 
 /dev/null ); done;

 There's no purge pool command in rados -- not yet, at least. I was 
 thinking about one, but never really had time to implement one.

 With best regards / Pozdrawiam
 Piotr Dałek
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: deleting objects from a pool

2015-06-25 Thread Podoski, Igor
Hi,

It appears, that cleanup can be used as a purge:

rados -p poolname cleanup  --prefix 

Regards,
Igor.


-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Deneau, Tom
Sent: Wednesday, June 24, 2015 10:22 PM
To: Dałek, Piotr; ceph-devel
Subject: RE: deleting objects from a pool

I've noticed that deleting objects from a basic k=2 m=1 erasure pool is much 
much slower than deleting a similar number of objects from a replicated size 3 
pool (so the same number of files to be deleted).   It looked like the ec pool 
object deletion was almost 20x slower.  Is there a lot more work to be done to 
delete an ec pool object?

-- Tom



 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- 
 ow...@vger.kernel.org] On Behalf Of Dalek, Piotr
 Sent: Wednesday, June 24, 2015 11:56 AM
 To: ceph-devel
 Subject: Re: deleting objects from a pool
 
  -Original Message-
  From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- 
  ow...@vger.kernel.org] On Behalf Of Deneau, Tom
  Sent: Wednesday, June 24, 2015 6:44 PM
 
  I have benchmarking situations where I want to leave a pool around 
  but delete a lot of objects from the pool.  Is there any really fast 
  way to do
 that?
  I noticed rados rmpool is fast but I don't want to remove the pool.
 
  I have been spawning multiple threads, each deleting a subset of the
 objects
  (which I believe is what rados bench write does) but even that can 
  be very slow.
 
 For now, apart from rados -p poolname cleanup (which doesn't purge 
 the pool, but merely removes objects written during last benchmark 
 run), the only option is by brute force:
 
 for i in $(rados -p poolname ls); do (rados -p poolname rm $i 
 /dev/null ); done;
 
 There's no purge pool command in rados -- not yet, at least. I was 
 thinking about one, but never really had time to implement one.
 
 With best regards / Pozdrawiam
 Piotr Dałek
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html