Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread Sachin Nikam
I currently have a keyspace with table definition that looks like this.


CREATE TABLE *orders*(
  order-id long PRIMARY KEY,
  order-blob text
);

This table will have a write load of ~40-100 tps and a read load of
~200-400 tps.

We are now considering adding another table definition which closely
resembles a timeseries table.

CREATE TABLE order_sequence(
//shard-id will be generated by order-id%Number of Nodes in
//Cassandra Ring. It will be then suffixed with Current //Date. An
Example would be 2-Nov-11-2015

  shard-and-date text,

//This will be a simple flake generated long
  sequence-id long
  PRIMARY KEY (shard-and-date, sequence-id)
)WITH CLUSTERING ORDER BY (sequence-id DESC);


The goal of this table is to answer queries like "Get me the count of
orders changed in a given sequence-id range". This query will be
called once every 5 sec.

The plan is to write both these tables in a single BATCH statement.

1. Will this impact the WRite latency?

2. Also will it impact Read latency of "orders" table?

3. Will it impact the overall stability of the cluster?


Re: ScyllaDB, a new open source, Cassandra-compatible NoSQL

2015-09-22 Thread Sachin Nikam
Tzach,
Can you point to any documentation on scylladb site which talks about
how/why scylla db performs better than Cassandra while using the same
architecture?
Regards
Sachin

On Tue, Sep 22, 2015 at 9:18 AM, Tzach Livyatan 
wrote:

> Hello Cassandra users,
>
> We are pleased to announce a new member of the Cassandra Ecosystem -
> ScyllaDB
> ScyllaDB is a new, open source, Cassandra-compatible NoSQL data store,
> written with the goal of delivering superior performance and consistent low
> latency.  Today, ScyllaDB runs 1M tps per server with sub 1ms latency.
>
> ScyllaDB  supports CQL, is compatible with Cassandra drivers, and works
> out of the box with Cassandra tools like cqlsh, Spark connector, nodetool
> and cassandra-stress. ScyllaDB is a drop-in replacement solution for the
> Cassandra server side packages.
>
> Scylla is implemented using the new shared-nothing Seastar
>  framework for extreme performance on
> modern multicore hardware, and the Data Plane Development Kit (DPDK) for
> high-speed low-latency networking.
>
> Try Scylla Now - http://www.scylladb.com
>
> We will be at Cassandra summit 2015, you are welcome to visit our booth to
> hear more and see a demo.
> Avi Kivity, our CTO, will host a session on Scylla on Thursday, 1:50 PM -
> 2:30 PM in rooms M1 - M3.
>
> Regards
> Tzach
> scylladb
>
>


Re: CQL 3.x Update ...USING TIMESTAMP...

2015-09-12 Thread Sachin Nikam
@Tyler,
Going back to your earlier proposal i.e.
--
Instead, make the version part of the primary key:

CREATE TABLE document_store (document_id bigint, version int, document
text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version
desc)
---
My concern with this approach was having to save multiple versions of the
huge documents. You suggested I could delete the older versions.

So can I use BATCH statements to make sure that when I write version 2, I
also delete the previous version 1 as well. Is this a legitimate use of
BATCH statements.
Does using BATCH impact read latency?
Regards
Sachin


On Tue, Apr 21, 2015 at 9:57 AM, Tyler Hobbs  wrote:

>
> On Mon, Apr 20, 2015 at 4:02 PM, Sachin Nikam  wrote:
>
>> #1. We have 2 data centers located close by with plans to expand to more
>> data centers which are even further away geographically.
>> #2. How will this impact light weight transactions when there is high
>> level of network contention for cross data center traffic.
>>
>
> If you are only expecting updates to a given document from one DC, then
> you could use LOCAL_SERIAL for the LWT operations.  If you can't do that,
> then LWT are probably not a great option for you.
>
>
>> #3. Do you know of any real examples where companies have used light
>> weight transactions in a multi-data center traffic.
>>
>
> I don't know who's doing that off the top of my head, but I imagine
> they're using LOCAL_SERIAL.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Adding New Nodes/Data Center to an existing Cluster.

2015-09-04 Thread Sachin Nikam
Neha/Sebastian,
Sorry for the typo. We use DSE 4.7 which ships with Cassandra 2.1
Regards
Sachin

On Tue, Sep 1, 2015 at 10:04 PM, Neha Trivedi 
wrote:

> Sachin,
> Hope you are not using Cassandra 2.2 in production?
> regards
> Neha
>
> On Tue, Sep 1, 2015 at 11:20 PM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> DSE 4.7 ships with Cassandra 2.1 for stability.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>>
>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Sep 1, 2015 at 12:53 PM, Sachin Nikam  wrote:
>>
>>> @Neha,
>>> We are using DSE 4.7 & Cassandra 2.2
>>>
>>> @Alain,
>>> I will check with out OPS team about repair vs rebuild and get back to
>>> you.
>>> Regards
>>> Sachin
>>>
>>> On Tue, Sep 1, 2015 at 5:59 AM, Alain RODRIGUEZ 
>>> wrote:
>>>
>>>> Hi Sachin,
>>>>
>>>> You are speaking about a repair, when the proper command to do this is
>>>> "rebuild" ?
>>>>
>>>> Did you tried adding your DC this way:
>>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>>  ?
>>>>
>>>>
>>>> 2015-09-01 5:32 GMT+02:00 Neha Trivedi :
>>>>
>>>>> Hi,
>>>>> Can you specify which version of Cassandra you are using?
>>>>> Can you provide the Error Stack ?
>>>>>
>>>>> regards
>>>>> Neha
>>>>>
>>>>> On Tue, Sep 1, 2015 at 2:56 AM, Sebastian Estevez <
>>>>> sebastian.este...@datastax.com> wrote:
>>>>>
>>>>>> or https://issues.apache.org/jira/browse/CASSANDRA-8611 perhaps
>>>>>>
>>>>>> All the best,
>>>>>>
>>>>>>
>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>
>>>>>> Sebastián Estévez
>>>>>>
>>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>>>
>>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png]
>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>
>>>>>>
>>>>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>>>>
>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>> DataStax
>>>>>> is the database technology and transactional backbone of choice for the
>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>> eBay.
>>>>>>
>>>>>> On Mon, Aug 31, 2015 at 5:24 PM, Eric Evans 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 31, 2015 at 1:32 PM, Sachin Nikam 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> When we add 3 more nodes in Data Center B, the repair tool starts
>>>>>>>> syncing the data between 2 data centers and then gives up after ~2 
>>>>>>>> days.
>>>>>>>>
>>>>>>>> Has anybody run in to similar issue before? If so what is the
>>>>>>>> solution?
>>>>>>>>
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-9624, maybe?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Eric Evans
>>>>>>> eev...@wikimedia.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Adding New Nodes/Data Center to an existing Cluster.

2015-09-01 Thread Sachin Nikam
@Neha,
We are using DSE 4.7 & Cassandra 2.2

@Alain,
I will check with out OPS team about repair vs rebuild and get back to you.
Regards
Sachin

On Tue, Sep 1, 2015 at 5:59 AM, Alain RODRIGUEZ  wrote:

> Hi Sachin,
>
> You are speaking about a repair, when the proper command to do this is
> "rebuild" ?
>
> Did you tried adding your DC this way:
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
>  ?
>
>
> 2015-09-01 5:32 GMT+02:00 Neha Trivedi :
>
>> Hi,
>> Can you specify which version of Cassandra you are using?
>> Can you provide the Error Stack ?
>>
>> regards
>> Neha
>>
>> On Tue, Sep 1, 2015 at 2:56 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> or https://issues.apache.org/jira/browse/CASSANDRA-8611 perhaps
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>>
>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Mon, Aug 31, 2015 at 5:24 PM, Eric Evans 
>>> wrote:
>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 1:32 PM, Sachin Nikam 
>>>> wrote:
>>>>
>>>>> When we add 3 more nodes in Data Center B, the repair tool starts
>>>>> syncing the data between 2 data centers and then gives up after ~2 days.
>>>>>
>>>>> Has anybody run in to similar issue before? If so what is the solution?
>>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-9624, maybe?
>>>>
>>>>
>>>> --
>>>> Eric Evans
>>>> eev...@wikimedia.org
>>>>
>>>
>>>
>>
>


Data Size on each node

2015-09-01 Thread Sachin Nikam
We currently have a Cassandra Cluster spread over 2 DC. The data size on
each node of the cluster is 1.2TB with spinning disk. Minor and Major
compactions are slowing down our Read queries. It has been suggested that
replacing Spinning disks with SSD might help. Has anybody done something
similar? If so what has been the results?
Also if we go with SSD, how big can each node get for commercially
available SSDs?
Regards
Sachin


Adding New Nodes/Data Center to an existing Cluster.

2015-08-31 Thread Sachin Nikam
Here is the situation.
We have 3 nodes in Data Center A with Replication Factor of 2.
We want to add 3 more nodes in Data Center B with Replication Factor of 2.
Each node in Data Center A has about 150GB of data.

When we add 3 more nodes in Data Center B, the repair tool starts syncing
the data between 2 data centers and then gives up after ~2 days.

Has anybody run in to similar issue before? If so what is the solution?
Regards
Sachin


Re: Cassandra Data Stax java driver & Snappy Compression library

2015-08-04 Thread Sachin Nikam
Janne,
A little clarification i found snappy-java-1.0.4.1.jar on class path. But
other questions still remain.

On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam  wrote:

> Janne,
> Thanks for continuing to take the time to answer my queries. We noticed
> that write latency (tp99) from Services S1 and S2 is 50% of the write
> latency (tp99) for Service S3. I also noticed that S1 and S2, which also
> use astyanax client library also have compress-lzf.jar on their class path.
> Although the table is defined to use Snappy Compression. Is this
> compression library or some other transitive dependency pulled in by
> Astyanax enabling compression of the payload i.e. sent over the wire and
> account for the difference in tp99?
> Regards
> Sachin
>
> On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen 
> wrote:
>
>>
>> Correct. Note that you may lose some performance this way though; in a
>> typical case saving bandwidth by increasing CPU usage is good. However, it
>> always depends on your usecase and whether you’re running your cluster to
>> the max. It’s a good, low-hanging optimization to keep in mind though for
>> production environments, if you choose not to enable compression now.
>>
>> /Janne
>>
>> On 3 Aug 2015, at 08:40, Sachin Nikam  wrote:
>>
>> Thanks Janne...
>> To clarify, Service S3 should not run in to any issues and I may choose
>> to not fix the issue?
>> Regards
>> Sachin
>>
>> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen > > wrote:
>>
>>> No, this just tells that your client (S3 using Datastax driver) cannot
>>> communicate to the Cassandra cluster using a compressed protocol, since the
>>> necessary libraries are missing on the client side.  Servers will still
>>> compress the data they receive when they write it to disk.
>>>
>>> In other words
>>>
>>> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>>>
>>> To fix, make sure that the Snappy libraries are in the classpath of your
>>> S3 service application.  As always, there’s no guarantee that this improves
>>> your performance, since if your app is already CPU-heavy, the extra CPU
>>> overhead of compression *may* be a problem.  So measure :-)
>>>
>>> /Janne
>>>
>>> On 02 Aug 2015, at 02:17 , Sachin Nikam  wrote:
>>>
>>> I am currently running a Cassandra 1.2 cluster. This cluster has 2
>>> tables i.e.
>>> TableA and TableB.
>>>
>>> TableA is read and written to by Services S1 and S2 which use Astyanax
>>> client library.
>>>
>>> TableB is read and written by Service S3 which uses the datastax java
>>> driver 2.1. S3 also reads data from TableA.
>>>
>>> Both TableA and TableB are defined on the Cassandra nodes to use
>>> SnappyCompressor.
>>>
>>> On start-up service, Service S3 throws the following WARNing messages.
>>> The service is able to continue doing its normal operation thereafter
>>>
>>> **
>>> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
>>> find Snappy class, you should make sure the Snappy library is in the
>>> classpath if you intend to use it. Snappy compression will not be
>>> available for the protocol.
>>> ***
>>>
>>>
>>> My questions are as follows--
>>> #1. Does the compression happen on the cassandra client side or within
>>> cassandra server side itself?
>>> #2. Does Service S3 need to pull in additional dependencies for Snappy
>>> Compressions as mentioned here --
>>>
>>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
>>> #3. What happens without this additional library not being present on
>>> class path of Service S3. Any data that S3 writes to TableB will not be
>>> compressed?
>>> Regards
>>> Sachin
>>>
>>>
>>>
>>
>>
>


Re: Cassandra Data Stax java driver & Snappy Compression library

2015-08-04 Thread Sachin Nikam
Janne,
Thanks for continuing to take the time to answer my queries. We noticed
that write latency (tp99) from Services S1 and S2 is 50% of the write
latency (tp99) for Service S3. I also noticed that S1 and S2, which also
use astyanax client library also have compress-lzf.jar on their class path.
Although the table is defined to use Snappy Compression. Is this
compression library or some other transitive dependency pulled in by
Astyanax enabling compression of the payload i.e. sent over the wire and
account for the difference in tp99?
Regards
Sachin

On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen 
wrote:

>
> Correct. Note that you may lose some performance this way though; in a
> typical case saving bandwidth by increasing CPU usage is good. However, it
> always depends on your usecase and whether you’re running your cluster to
> the max. It’s a good, low-hanging optimization to keep in mind though for
> production environments, if you choose not to enable compression now.
>
> /Janne
>
> On 3 Aug 2015, at 08:40, Sachin Nikam  wrote:
>
> Thanks Janne...
> To clarify, Service S3 should not run in to any issues and I may choose to
> not fix the issue?
> Regards
> Sachin
>
> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen 
> wrote:
>
>> No, this just tells that your client (S3 using Datastax driver) cannot
>> communicate to the Cassandra cluster using a compressed protocol, since the
>> necessary libraries are missing on the client side.  Servers will still
>> compress the data they receive when they write it to disk.
>>
>> In other words
>>
>> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>>
>> To fix, make sure that the Snappy libraries are in the classpath of your
>> S3 service application.  As always, there’s no guarantee that this improves
>> your performance, since if your app is already CPU-heavy, the extra CPU
>> overhead of compression *may* be a problem.  So measure :-)
>>
>> /Janne
>>
>> On 02 Aug 2015, at 02:17 , Sachin Nikam  wrote:
>>
>> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
>> i.e.
>> TableA and TableB.
>>
>> TableA is read and written to by Services S1 and S2 which use Astyanax
>> client library.
>>
>> TableB is read and written by Service S3 which uses the datastax java
>> driver 2.1. S3 also reads data from TableA.
>>
>> Both TableA and TableB are defined on the Cassandra nodes to use
>> SnappyCompressor.
>>
>> On start-up service, Service S3 throws the following WARNing messages.
>> The service is able to continue doing its normal operation thereafter
>>
>> **
>> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
>> find Snappy class, you should make sure the Snappy library is in the
>> classpath if you intend to use it. Snappy compression will not be
>> available for the protocol.
>> ***
>>
>>
>> My questions are as follows--
>> #1. Does the compression happen on the cassandra client side or within
>> cassandra server side itself?
>> #2. Does Service S3 need to pull in additional dependencies for Snappy
>> Compressions as mentioned here --
>>
>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
>> #3. What happens without this additional library not being present on
>> class path of Service S3. Any data that S3 writes to TableB will not be
>> compressed?
>> Regards
>> Sachin
>>
>>
>>
>
>


Re: Cassandra Data Stax java driver & Snappy Compression library

2015-08-02 Thread Sachin Nikam
Thanks Janne...
To clarify, Service S3 should not run in to any issues and I may choose to
not fix the issue?
Regards
Sachin

On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen 
wrote:

> No, this just tells that your client (S3 using Datastax driver) cannot
> communicate to the Cassandra cluster using a compressed protocol, since the
> necessary libraries are missing on the client side.  Servers will still
> compress the data they receive when they write it to disk.
>
> In other words
>
> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>
> To fix, make sure that the Snappy libraries are in the classpath of your
> S3 service application.  As always, there’s no guarantee that this improves
> your performance, since if your app is already CPU-heavy, the extra CPU
> overhead of compression *may* be a problem.  So measure :-)
>
> /Janne
>
> On 02 Aug 2015, at 02:17 , Sachin Nikam  wrote:
>
> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
> i.e.
> TableA and TableB.
>
> TableA is read and written to by Services S1 and S2 which use Astyanax
> client library.
>
> TableB is read and written by Service S3 which uses the datastax java
> driver 2.1. S3 also reads data from TableA.
>
> Both TableA and TableB are defined on the Cassandra nodes to use
> SnappyCompressor.
>
> On start-up service, Service S3 throws the following WARNing messages. The
> service is able to continue doing its normal operation thereafter
>
> **
> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
> find Snappy class, you should make sure the Snappy library is in the
> classpath if you intend to use it. Snappy compression will not be
> available for the protocol.
> ***
>
>
> My questions are as follows--
> #1. Does the compression happen on the cassandra client side or within
> cassandra server side itself?
> #2. Does Service S3 need to pull in additional dependencies for Snappy
> Compressions as mentioned here --
>
> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
> #3. What happens without this additional library not being present on
> class path of Service S3. Any data that S3 writes to TableB will not be
> compressed?
> Regards
> Sachin
>
>
>


Cassandra Data Stax java driver & Snappy Compression library

2015-08-01 Thread Sachin Nikam
I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
i.e.
TableA and TableB.

TableA is read and written to by Services S1 and S2 which use Astyanax
client library.

TableB is read and written by Service S3 which uses the datastax java
driver 2.1. S3 also reads data from TableA.

Both TableA and TableB are defined on the Cassandra nodes to use
SnappyCompressor.

On start-up service, Service S3 throws the following WARNing messages. The
service is able to continue doing its normal operation thereafter

**
[main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find
Snappy class, you should make sure the Snappy library is in the classpath if
you intend to use it. Snappy compression will not be available for the
protocol.
***


My questions are as follows--
#1. Does the compression happen on the cassandra client side or within
cassandra server side itself?
#2. Does Service S3 need to pull in additional dependencies for Snappy
Compressions as mentioned here --
http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
#3. What happens without this additional library not being present on class
path of Service S3. Any data that S3 writes to TableB will not be
compressed?
Regards
Sachin


Re: CQL 3.x Update ...USING TIMESTAMP...

2015-04-20 Thread Sachin Nikam
Tyler,
I can consider trying out light weight transactions, but here are my
concerns
#1. We have 2 data centers located close by with plans to expand to more
data centers which are even further away geographically.
#2. How will this impact light weight transactions when there is high level
of network contention for cross data center traffic.
#3. Do you know of any real examples where companies have used light weight
transactions in a multi-data center traffic.
Regards
Sachin

On Tue, Mar 24, 2015 at 10:56 AM, Tyler Hobbs  wrote:

> do you just mean that it's easy to forget to always set your timestamp
>> correctly, and if you goof it up, it makes it difficult to recover from
>> (i.e. you issue a delete with system timestamp instead of document version,
>> and that's way larger than your document version would ever be, so you can
>> never write that document again)?
>
>
> Yes, that's basically what I meant.  Plus, if you need to make a manual
> correction to a document, you'll need to increment the version, which would
> presumably cause problems for your application.  It's possible to handle
> all of this correctly if you take care, but I wouldn't trust myself to
> always get this right.
>
>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>
> You can always delete older versions to free up space.
>
> Using lightweight transactions may be a decent option if you don't have
> really high write throughput and aren't expecting high contention (which I
> don't think you are).  I recommend testing this out with your application
> to see how it performs for you.
>
>
> On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam  wrote:
>
>> @Eric Stevens
>> Thanks for representing my position while I came back to this thread.
>>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>> Also as Eric mentions, can you elaborate on what kind of problems could
>> happen when we try to overwrite or delete data?
>> Regards
>> Sachin
>>
>> On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil 
>> wrote:
>>
>>> I agree with Tyler, in the normal run of a live application I would not
>>> recommend the use of the timestamp, and use other ways to *version*
>>> *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler
>>> mentions.
>>>
>>> However I find there’s a legitimate use the USING TIMESTAMP trick, when
>>> migrating data form another datastore.
>>>
>>> The trick is at some point to enable the application to start writing
>>> cassandra *without* any timestamp setting on the statements. ⇐ for
>>> fresh data
>>> Then start a migration batch that will use a write time with an older
>>> date (i.e. when there’s *no* possible *collision* with other data). ⇐
>>> for older data
>>>
>>> *This tricks has been used in prod with billions of records.*
>>> ​
>>>
>>> -- Brice
>>>
>>> On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens 
>>> wrote:
>>>
>>>> Ok, but if you're using a system of time that isn't server clock
>>>> oriented (Sachin's document revision ID, and my fixed and necessarily
>>>> consistent base timestamp [B's always know their parent A's exact recorded
>>>> timestamp]), isn't the principle of using timestamps to force a particular
>>>> update out of several to win still sound?
>>>>
>>>> > as using the clocks is only valid if clocks are perfectly sync'ed,
>>>> which they are not
>>>>
>>>> Clock skew is a problem which doesn't seem to be a factor in either use
>>>> case given that both have a consistent external source of truth for
>>>> timestamp.
>>>>
>>>> On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad 
>>>> wrote:
>>>>
>>>>> In most datacenters you're going to see significant variance in your
>>>>> server times.  Likely > 20ms between servers in the same rack.  Even
>>>>> google, using atomic clocks, has 1-7ms variance. 

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-03-22 Thread Sachin Nikam
>>>>
>>>> Assuming that we don't run afoul of related antipatterns such as
>>>> repeatedly overwriting the same value indefinitely, this strikes me as
>>>> sound if unorthodox practice, as long as conflict resolution in Cassandra
>>>> isn't broken in some subtle way.  We also designed this to be safe from
>>>> getting write timestamps greatly out of sync with clock time so that
>>>> non-timestamped operations (especially delete) if done accidentally will
>>>> still have a reasonable chance of having the expected results.
>>>>
>>>> So while it may not be the intended use case for write timestamps, and
>>>> there are definitely gotchas if you are not careful or misunderstand the
>>>> consequences, as far as I can see the logic behind it is sound but does
>>>> rely on correct conflict resolution in Cassandra.  I'm curious if I'm
>>>> missing or misunderstanding something important.
>>>>
>>>> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs 
>>>> wrote:
>>>>
>>>>> Don't use the version as your timestamp.  It's possible, but you'll
>>>>> end up with problems when attempting to overwrite or delete entries.
>>>>>
>>>>> Instead, make the version part of the primary key:
>>>>>
>>>>> CREATE TABLE document_store (document_id bigint, version int, document
>>>>> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY 
>>>>> (version
>>>>> desc)
>>>>>
>>>>> That way you don't have to worry about overwriting higher versions
>>>>> with a lower one, and to read the latest version, you only have to do:
>>>>>
>>>>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1;
>>>>>
>>>>> Another option is to use lightweight transactions (i.e. UPDATE ... SET
>>>>> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but
>>>>> that's going to make writes much more expensive.
>>>>>
>>>>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam 
>>>>> wrote:
>>>>>
>>>>>> I am planning to use the Update...USING TIMESTAMP... statement to
>>>>>> make sure that I do not overwrite fresh data with stale data while having
>>>>>> to avoid doing at least LOCAL_QUORUM writes.
>>>>>>
>>>>>> Here is my table structure.
>>>>>>
>>>>>> Table=DocumentStore
>>>>>> DocumentID (primaryKey, bigint)
>>>>>> Document(text)
>>>>>> Version(int)
>>>>>>
>>>>>> If the service receives 2 write requests with Version=1 and
>>>>>> Version=2, regardless of the order of arrival, the business requirement 
>>>>>> is
>>>>>> that we end up with Version=2 in the database.
>>>>>>
>>>>>> Can I use the following CQL Statement?
>>>>>>
>>>>>> Update DocumentStore using 
>>>>>> SET  Document=,
>>>>>> Version=
>>>>>> where DocumentID=;
>>>>>>
>>>>>> Has anybody used something like this? If so was the behavior as
>>>>>> expected?
>>>>>>
>>>>>> Regards
>>>>>> Sachin
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tyler Hobbs
>>>>> DataStax <http://datastax.com/>
>>>>>
>>>>
>>>>
>>
>


CQL 3.x Update ...USING TIMESTAMP...

2015-03-10 Thread Sachin Nikam
I am planning to use the Update...USING TIMESTAMP... statement to make sure
that I do not overwrite fresh data with stale data while having to avoid
doing at least LOCAL_QUORUM writes.

Here is my table structure.

Table=DocumentStore
DocumentID (primaryKey, bigint)
Document(text)
Version(int)

If the service receives 2 write requests with Version=1 and Version=2,
regardless of the order of arrival, the business requirement is that we end
up with Version=2 in the database.

Can I use the following CQL Statement?

Update DocumentStore using 
SET  Document=,
Version=
where DocumentID=;

Has anybody used something like this? If so was the behavior as expected?

Regards
Sachin


error building cassandra trunk

2011-06-01 Thread sachin nikam
I synced up cassandra-trunk and trying ant build. getting the
following error. Any ideas?

 [java] error(208):
/home/sknikam/cassandra/dev/cassandra-trunk/src/java/org/apache/cassandra/cql/Cql.g:568:1:
The following token definitions can never be matched because prior
tokens match the same input:
T__88,T__89,T__92,T__94,K_WITH,K_USING,K_USE,K_FIRST,K_COUNT,K_SET,K_APPLY,K_BATCH,K_IN,K_CREATE,K_KEYSPACE,K_COLUMNFAMILY,K_INDEX,K_ON,K_DROP,K_INTO,K_TIMESTAMP,K_TTL,FLOAT,COMPIDENT,UUID,MULTILINE_COMMENT