Re: Five Questions for Cassandra Users

2019-03-28 Thread Abhishek Singh
1.   Do the same people where you work operate the cluster and write
the code to develop the application?

   Different teams. Infra separate , Dev separate.

2.   Do you have a metrics stack that allows you to see graphs of
various metrics with all the nodes displayed together?

   Use third party APM tool to monitor cluster

3.   Do you have a log stack that allows you to see the logs for all
the nodes together?

   No.Would like to.

4.   Do you regularly repair your clusters - such as by using Reaper?

Yes

5.   Do you use artificial intelligence to help manage your clusters?

   No

On Thu, 28 Mar, 2019, 2:33 PM Kenneth Brotman, 
wrote:

> I’m looking to get a better feel for how people use Cassandra in
> practice.  I thought others would benefit as well so may I ask you the
> following five questions:
>
>
>
> 1.   Do the same people where you work operate the cluster and write
> the code to develop the application?
>
>
>
> 2.   Do you have a metrics stack that allows you to see graphs of
> various metrics with all the nodes displayed together?
>
>
>
> 3.   Do you have a log stack that allows you to see the logs for all
> the nodes together?
>
>
>
> 4.   Do you regularly repair your clusters - such as by using Reaper?
>
>
>
> 5.   Do you use artificial intelligence to help manage your clusters?
>
>
>
>
>
> Thank you for taking your time to share this information!
>
>
>
> Kenneth Brotman
>


Re: Tombstone

2018-06-19 Thread Abhishek Singh
   The Partition key is made of datetime(basically date
truncated to hour) and bucket.I think your RCA may be correct since we are
deleting the partition rows one by one not in a batch files maybe
overlapping for the particular partition.A scheduled thread picks the rows
for a partition based on current datetime and bucket number and checks
whether for each row the entiry is past due or not, if yes we trigger a
event and remove the entry.



On Tue 19 Jun, 2018, 7:58 PM Jeff Jirsa,  wrote:

> The most likely explanation is tombstones in files that won’t be collected
> as they potentially overlap data in other files with a lower timestamp
> (especially true if your partition key doesn’t change and you’re writing
> and deleting data within a partition)
>
> --
> Jeff Jirsa
>
>
> > On Jun 19, 2018, at 3:28 AM, Abhishek Singh  wrote:
> >
> > Hi all,
> >We using Cassandra for storing events which are time series
> based for batch processing once a particular batch based on hour is
> processed we delete the entries but we were left with almost 18% deletes
> marked as Tombstones.
> >  I ran compaction on the particular CF tombstone didn't
> come down.
> > Can anyone suggest what is the optimal tunning/recommended
> practice used for compaction strategy and GC_grace period with 100k entries
> and deletes every hour.
> >
> > Warm Regards
> > Abhishek Singh
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Tombstone

2018-06-19 Thread Abhishek Singh
Hi all,
   We using Cassandra for storing events which are time series
based for batch processing once a particular batch based on hour is
processed we delete the entries but we were left with almost 18% deletes
marked as Tombstones.
 I ran compaction on the particular CF tombstone didn't
come down.
Can anyone suggest what is the optimal tunning/recommended
practice used for compaction strategy and GC_grace period with 100k entries
and deletes every hour.

Warm Regards
Abhishek Singh


Avoiding Data Duplication

2015-06-05 Thread Abhishek Singh Bailoo
Hello!

I have a column family to log in data coming from my GPS devices.

CREATE TABLE log(
  imei ascii,
  date ascii,
  dtime timestamp,
  data ascii,
  stime timestamp,
  PRIMARY KEY ((imei, date), dtime))
  WITH CLUSTERING ORDER BY (dtime DESC)
;

It is the standard schema for modeling time series data where
imei is the unique ID associated with each GPS device
date is the date taken from dtime
dtime is the date-time coming from the device
data is all the latitude, longitude etc that the device is sending us
stime is the date-time stamp of the server

The reason why I put dtime in the primary key as the clustering column is
because most of our queries are done on device time. There can be a delay
of a few minutes to a few hours (or a few days! in rare cases) between
dtime and stime if the network is not available.

However, now we want to query on server time as well for the purpose of
debugging. These queries will be not as common as queries on  device time.
Say for every 100 queries on dtime there will be just 1 query on stime.

What options do I have?

1. Seconday Index - not possible because stime is a timestamp and CQL does
not allow me to put  or  in the query for secondary index

2. Data duplication - I can build another column family where I will index
by stime but that means I am storing twice as much data. I know everyone
says that write operations are cheap and storage is cheap but how? If I
have to buy twice as many machines on AWS EC2 each with their own ephemeral
storage, then my bill doubles up!

Any other ideas I can try?

Many Thanks,
Abhishek


query contains IN on the partition key and an ORDER BY

2015-05-02 Thread Abhishek Singh Bailoo
Hi

I have run into the following issue
https://issues.apache.org/jira/browse/CASSANDRA-6722 when running a query
(contains IN on the partition key and an ORDER BY ) using datastax driver
for Java.

However, I am able to run this query alright in cqlsh.

cqlsh: show version;
[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]

cqlsh:gps select * from log where imeih in ('862170011627815@2015-01-29
@03','862170011627815@2015-01-30@21','862170011627815@2015-01-30@04') and
dtime  '2015-01-30 23:59:59' order by dtime desc limit 1;

The same query when run via datastax Java driver gives the following error:

Exception in thread main
com.datastax.driver.core.exceptions.InvalidQueryException: Cannot page
queries with both ORDER BY and a IN restriction on the partition key; you
must either remove the ORDER BY or the IN and sort client side, or disable
paging for this query
at
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)

Any ideas?

Thanks,
Abhishek.