We have a 8 node Cassandra Cluster. Replication Strategy: 3 Consistency
Level Quorum. Data Spread: I can let you know once I get access to our
production cluster.
The use case for simple count is more for internal use than say end
clients/customers however there are many uses cases from customers
I am not sure what use case you want to demonstrate with select count in
general. Maybe you can elaborate more what your use case is.
Aside from this: this is a Cassandra issue. What is the setup of Cassandra?
Dedicated nodes? How many? Replication strategy? Consistency configuration? How
is
some accurate numbers here. so it took me 1hr:30 mins to count 698705723
rows (~700 Million)
and my code is just this
sc.cassandraTable("cuneiform", "blocks").cassandraCount
On Thu, Nov 24, 2016 at 10:48 AM, kant kodali wrote:
> Take a look at this
Take a look at this https://github.com/brianmhess/cassandra-count
Now It is just matter of incorporating it into spark-cassandra-connector I
guess.
On Thu, Nov 24, 2016 at 1:01 AM, kant kodali wrote:
> According to this link https://github.com/datastax/
>
According to this link
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md
I tried the following but it still looks like it is taking forever
sc.cassandraTable(keyspace, table).cassandraCount
On Thu, Nov 24, 2016 at 12:56 AM, kant kodali
I would be glad if SELECT COUNT(*) FROM hello can return any value for that
size :) I can say for sure it didn't return anything for 30 mins and I
probably need to build more patience to sit for few more hours after that!
Cassandra recommends to use ColumnFamilyStats using nodetool cfstats which
How fast is Cassandra without Spark on the count operation?
cqsh> SELECT COUNT(*) FROM hello
(this is not equivalent with what you are doing but might help you find the
root of the cause)
On Thu, Nov 24, 2016 at 9:03 AM, kant kodali wrote:
> I have the following code
>
> I
I have the following code
I invoke spark-shell as follows
./spark-shell --conf spark.cassandra.connection.host=170.99.99.134
--executor-memory 15G --executor-cores 12 --conf
spark.cassandra.input.split.size_in_mb=67108864
code
scala> val df = spark.sql("SELECT test from hello") //