[ceph-users] Ceph and hadoop

2014-10-24 Thread Matan Safriel
Hi,

Given HDFS is far from ideal for small files, I am examining the
possibility of using Hadoop on top Ceph. I found mainly one online resource
about it https://ceph.com/docs/v0.79/cephfs/hadoop/. I am wondering whether
there is any reference implementation or blog post you are aware of, about
hadoop on top Ceph. Likewise happy to have any pointers about why _not_ to
attempt just that

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and my use case - is it a fit?

2014-08-02 Thread Matan Safriel
Thanks John,

I really mean my files are too small for HDFS, as the majority of them will
be under 64M, which I think is (still?) the default HDFS block size, *and
also,* they will be very numerous.

As such, they would quickly consume a huge aggregate amount of RAM on the
HDFS name node, which is designed to store a certain amount of bytes per
file.
The name node in that sense it may seem, had been initially designed to
"manage" for a collection of huge files, not a huge collection of small
files. Or at least it may seem from documentation it's not optimized for
that.

A constructive approach may suggest I'd just have to allocate a large
server instance for the HDFS name node, which may a first step on a path
towards learning the next bottleneck using HDFS for such files, the hard /
long way.

Yes, I am aware HDFS has some special dedicated API for handling small
files, and some community wrappers for managing with small files, but they
seem a bit hackish, or feel like "too many moving parts" for a simple
scenario.

What do you think, and what do you think about Ceph for this scenario?

Thanks in advance!
Matan






On Thu, Jul 31, 2014 at 7:20 PM, John Spray  wrote:

> On Wed, Jul 30, 2014 at 5:08 PM, Matan Safriel 
> wrote:
> > I'm looking for a distributed file system, for large JSON documents. My
> file
> > sizes are roughly between 20M and 100M, so they are too small for
> couchbase,
> > mongodb, even possibly Riak, but too small (by an order of magnitude) for
> > HDFS. Would you recommend Ceph for this kind of scenario?
>
> When you say they're too small for HDFS, do you really mean they're
> too numerous?  How many are we talking about?
>
> If your use case calls for just puts and gets of named serialized
> blobs, you may be best off with the RGW or librados object store
> interfaces to Ceph, rather than the file system per se.
>
> > Additional question - will it also install and behave gracefully as a
> > single-node cluster running on a single linux machine, in a dev scenario
> > and/or a unit test machine scenario?
>
> Yes, that's how some of the ceph tests themselves operate.
>
> Cheers,
> John
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and my use case - is it a fit?

2014-07-31 Thread Matan Safriel
Hi,

I'm looking for a distributed file system, for large JSON documents. My
file sizes are roughly between 20M and 100M, so they are too small for
couchbase, mongodb, even possibly Riak, but too small (by an order of
magnitude) for HDFS. Would you recommend Ceph for this kind of scenario?

Additional question - will it also install and behave gracefully as a
single-node cluster running on a single linux machine, in a dev scenario
and/or a unit test machine scenario?

Thanks,
Matan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com