Bulk Loading Recommendations: Files over 25GBs

2011-10-18 Thread Mike Rapuano
We are not currently live but testing with Cassandra. I'm looking for
recommendations on the most efficient way to load text files over 25GBs in
size to Cassandra (version 0.8.6).  Our application may require us to load
2-3 text files between 25-40GBs each a few times a week to our 3 node
cluster.  I was reading this article on DataStax:
http://www.datastax.com/dev/blog/bulk-loading

Is it most efficient to create the sstables and then use sstableloader or
does anyone have other recommendations to "bulk load data"?  We are new to
Cassandra and trying to work within what is generally acceptable
practices.

Thanks
Mike


Re: Bulk Loading Recommendations: Files over 25GBs

2011-10-18 Thread aaron morton
At that scale of data, and the fact that it's a batch job, I would go with the 
bulk loading tool. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/10/2011, at 3:32 AM, Mike Rapuano wrote:

> We are not currently live but testing with Cassandra. I'm looking for 
> recommendations on the most efficient way to load text files over 25GBs in 
> size to Cassandra (version 0.8.6).  Our application may require us to load 
> 2-3 text files between 25-40GBs each a few times a week to our 3 node 
> cluster.  I was reading this article on DataStax:  
> http://www.datastax.com/dev/blog/bulk-loading
> 
> Is it most efficient to create the sstables and then use sstableloader or 
> does anyone have other recommendations to "bulk load data"?  We are new to 
> Cassandra and trying to work within what is generally acceptable practices.   
> 
> Thanks
> Mike
> 
> 
>