Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-07-11 Thread Rohit Rai
Hi Gerard, This was on my todos since long... i just published a Calliope snapshot built against Hadoop 2.2.x, Take it for a spin if you get a chance - You can get the jars from here - -

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-27 Thread Gerard Maas
I got an answer on SO on this question, basically confirming that the CQLSSTableWrite cannot be used in Spark (at least in the form shown in the code snippet). DataStax filed a bug on that and might get solved on a future version. As you have observed, a single writer can only be used in serial

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-27 Thread Gerard Maas
Hi Rohit, Thanks for your message. We are currently on Spark 0.9.1, Cassandra 2.0.6 and Calliope GA (Would love to try the pre-release version if you want beta testers :-) Our hadoop version is CDH4.4 and of course our spark assembly is compiled against it. We have got really interesting

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-26 Thread Rohit Rai
Hi Gerard, What is the version of Spark, Hadoop, Cassandra and Calliope are you using. We never built Calliope to Hadoop2 as we/or our clients don't use Hadoop in their deployments or use it only as the Infra component for Spark in which case H1/H2 doesn't make a difference for them. I know

Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-25 Thread Gerard Maas
Hi, (My excuses for the cross-post from SO) I'm trying to create Cassandra SSTables from the results of a batch computation in Spark. Ideally, each partition should create the SSTable for the data it holds in order to parallelize the process as much as possible (and probably even stream it to

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-25 Thread Nick Pentreath
can you not use a Cassandra OutputFormat? Seems they have BulkOutputFormat. An example of using it with Hadoop is here: http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hadoop.html Using it with Spark will be similar to the examples:

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-25 Thread Gerard Maas
Thanks Nick. We used the CassandraOutputFormat through Calliope. The Calliope API makes the CassandraOutputFormat quite accessible and is cool to work with. It worked fine at prototype level, but we had Hadoop version conflicts when we put it in our Spark environment (Using our Spark assembly

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-25 Thread Nick Pentreath
Right, ok. I can't say I've used the Cassandra OutputFormats before. But perhaps if you use it directly (instead of via Calliope) you may be able to get it to work, albeit with less concise code? Or perhaps you may be able to build Cassandra from source with Hadoop 2 / CDH4 support: