Hi Gerard,
This was on my todos since long... i just published a Calliope snapshot
built against Hadoop 2.2.x, Take it for a spin if you get a chance -
You can get the jars from here -
-
I got an answer on SO on this question, basically confirming that the
CQLSSTableWrite cannot be used in Spark (at least in the form shown in the
code snippet). DataStax filed a bug on that and might get solved on a
future version.
As you have observed, a single writer can only be used in serial
Hi Rohit,
Thanks for your message. We are currently on Spark 0.9.1, Cassandra 2.0.6
and Calliope GA (Would love to try the pre-release version if you want
beta testers :-) Our hadoop version is CDH4.4 and of course our spark
assembly is compiled against it.
We have got really interesting
Hi Gerard,
What is the version of Spark, Hadoop, Cassandra and Calliope are you using.
We never built Calliope to Hadoop2 as we/or our clients don't use Hadoop in
their deployments or use it only as the Infra component for Spark in which
case H1/H2 doesn't make a difference for them.
I know
Hi,
(My excuses for the cross-post from SO)
I'm trying to create Cassandra SSTables from the results of a batch
computation in Spark. Ideally, each partition should create the SSTable for
the data it holds in order to parallelize the process as much as possible
(and probably even stream it to
can you not use a Cassandra OutputFormat? Seems they have BulkOutputFormat.
An example of using it with Hadoop is here:
http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hadoop.html
Using it with Spark will be similar to the examples:
Thanks Nick.
We used the CassandraOutputFormat through Calliope. The Calliope API makes
the CassandraOutputFormat quite accessible and is cool to work with. It
worked fine at prototype level, but we had Hadoop version conflicts when we
put it in our Spark environment (Using our Spark assembly
Right, ok.
I can't say I've used the Cassandra OutputFormats before. But perhaps if
you use it directly (instead of via Calliope) you may be able to get it to
work, albeit with less concise code?
Or perhaps you may be able to build Cassandra from source with Hadoop 2 /
CDH4 support: