Re: Kudu on top of Alluxio

2017-03-27 Thread Mike Percy
+1 thanks for adding that Todd.

Mike


On Mon, Mar 27, 2017 at 9:55 AM, Todd Lipcon  wrote:

> On Sat, Mar 25, 2017 at 2:54 PM, Mike Percy  wrote:
>
>> Kudu currently relies on local storage on a POSIX file system. Right now
>> there is no support for S3, which would be interesting but is non-trivial
>> in certain ways (particularly if we wanted to rely on S3's replication and
>> disable Kudu's app-level replication).
>>
>> I would suggest using only either EXT4 or XFS file systems for production
>> deployments as of Kudu 1.3, in a JBOD configuration, with one SSD per
>> machine for the WAL and with the data disks on either SATA or SSD drives
>> depending on the workload. Anything else is untested AFAIK.
>>
>
> I would amend this and say that SSD for the WAL is nice to have, but not a
> requirement. We do lots of testing on non-SSD test clusters and I'm aware
> of many production clusters which also do not have SSD.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Kudu on top of Alluxio

2017-03-27 Thread Todd Lipcon
On Sat, Mar 25, 2017 at 2:54 PM, Mike Percy  wrote:

> Kudu currently relies on local storage on a POSIX file system. Right now
> there is no support for S3, which would be interesting but is non-trivial
> in certain ways (particularly if we wanted to rely on S3's replication and
> disable Kudu's app-level replication).
>
> I would suggest using only either EXT4 or XFS file systems for production
> deployments as of Kudu 1.3, in a JBOD configuration, with one SSD per
> machine for the WAL and with the data disks on either SATA or SSD drives
> depending on the workload. Anything else is untested AFAIK.
>

I would amend this and say that SSD for the WAL is nice to have, but not a
requirement. We do lots of testing on non-SSD test clusters and I'm aware
of many production clusters which also do not have SSD.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Kudu on top of Alluxio

2017-03-25 Thread Mike Percy
Yeah. I think the reason HBase can pretty easily use something like Alluxio or 
S3 and Kudu can't as easily do it is because HBase already relied on external 
storage (HDFS) for replication so substituting another storage system with 
similar properties doesn't really amount to an architectural change for them.

Mike

Sent from my iPhone

> On Mar 25, 2017, at 3:43 PM, Benjamin Kim  wrote:
> 
> Mike,
> 
> Thanks for the informative answer. I asked this question because I saw that 
> Alluxio can be used to handle storage for HBase. Plus, we could keep our 
> cluster size to a minimum and not need to add more nodes based on storage 
> capacity. We would only need to size our clusters based on load (cores, 
> memory, bandwidth) instead.
> 
> Cheers,
> Ben
> 
> 
>> On Mar 25, 2017, at 2:54 PM, Mike Percy  wrote:
>> 
>> Kudu currently relies on local storage on a POSIX file system. Right now 
>> there is no support for S3, which would be interesting but is non-trivial in 
>> certain ways (particularly if we wanted to rely on S3's replication and 
>> disable Kudu's app-level replication).
>> 
>> I would suggest using only either EXT4 or XFS file systems for production 
>> deployments as of Kudu 1.3, in a JBOD configuration, with one SSD per 
>> machine for the WAL and with the data disks on either SATA or SSD drives 
>> depending on the workload. Anything else is untested AFAIK.
>> 
>> As for Alluxio, I haven't heard of people using it for permanent storage and 
>> since Kudu has its own block cache I don't think it would really help with 
>> caching. Also I don't recall Tachyon providing POSIX semantics.
>> 
>> Mike
>> 
>> Sent from my iPhone
>> 
>>> On Mar 25, 2017, at 9:50 AM, Benjamin Kim  wrote:
>>> 
>>> Hi,
>>> 
>>> Does anyone know of a way to use AWS S3 or 
>> 
> 



Re: Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
Mike,

Thanks for the informative answer. I asked this question because I saw that 
Alluxio can be used to handle storage for HBase. Plus, we could keep our 
cluster size to a minimum and not need to add more nodes based on storage 
capacity. We would only need to size our clusters based on load (cores, memory, 
bandwidth) instead.

Cheers,
Ben


> On Mar 25, 2017, at 2:54 PM, Mike Percy  wrote:
> 
> Kudu currently relies on local storage on a POSIX file system. Right now 
> there is no support for S3, which would be interesting but is non-trivial in 
> certain ways (particularly if we wanted to rely on S3's replication and 
> disable Kudu's app-level replication).
> 
> I would suggest using only either EXT4 or XFS file systems for production 
> deployments as of Kudu 1.3, in a JBOD configuration, with one SSD per machine 
> for the WAL and with the data disks on either SATA or SSD drives depending on 
> the workload. Anything else is untested AFAIK.
> 
> As for Alluxio, I haven't heard of people using it for permanent storage and 
> since Kudu has its own block cache I don't think it would really help with 
> caching. Also I don't recall Tachyon providing POSIX semantics.
> 
> Mike
> 
> Sent from my iPhone
> 
>> On Mar 25, 2017, at 9:50 AM, Benjamin Kim  wrote:
>> 
>> Hi,
>> 
>> Does anyone know of a way to use AWS S3 or 
> 



Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
Hi,

Does anyone know of a way to use AWS S3 or