Hi together,

we Cassandra to log event data and process it every 15 minutes with Spark. We are using the Cassandra Java Connector for Spark.

Randomly our Spark runs produce too few output records because no data is returned from Cassandra for a several minutes window of input data. When querying the data (with cqlsh), after multiple tries, the data eventually becomes available.

To solve the problem, we tried to use consistency=ALL when reading the data in Spark. We use the CassandraJavaUtil.javafunctions().cassandraTable() method and have set "spark.cassandra.input.consistency.level"="ALL" on the config when creating the Spark context. The problem persists but according to http://stackoverflow.com/a/25043599 using a consistency level of ONE on the write side (which we use) and ALL on the READ side should be sufficient for data consistency.

I would really appreciate if someone could give me a hint how to fix this problem, thanks!

Greets,
Dennis

P.s.:
some information about our setup:
Cassandra 2.1.12 in a two Node configuration with replication factor=2
Spark 1.5.1
Cassandra Java Driver 2.2.0-rc3
Spark Cassandra Java Connector 2.10-1.5.0-M2

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to