COPY command with where condition

2020-01-16 Thread adrien ruffie
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition 
can be use ?

Because we need to export only several data which must be return by a WHERE 
closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were 
poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Vova Shelgunov
Loader*

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

On Fri, Jan 17, 2020, 09:09 Vova Shelgunov  wrote:

> DataStax bulk loaded can be an option if data is large.
>
> On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:
>
>> If the keyspace already exist, use copy command or sstableloader to merge
>> data. If data volume it too big, consider spark or a custom java program
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
>> wrote:
>>
>> 
>> Any leads on this ?
>>
>> — Ankit
>>
>> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
>> wrote:
>>
>>> Hi Arvinder,
>>>
>>> Thanks for your response.
>>>
>>> Yes - Cluster B already has some data. Tables/KS names are identical ;
>>> for data - I still haven't got the clarity if it has identical data or no -
>>> I am assuming no since it's for different customers but need the
>>> confirmation.
>>>
>>> *Thanks & Regards,*
>>> *Ankit Gadhiya*
>>>
>>>
>>>
>>> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
>>> wrote:
>>>
 So as I understand, Cluster B already has some data and not an empty
 cluster.

 When you say, clusters share same keyspace and table names, do you mean
 both clusters have identical data on those ks/tables?


 -Arvi

 On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
 wrote:

> Hello Group,
>
> I have a requirement in one of the production systems where I need to
> be able to migrate entire dataset from Cluster A (Azure Region A) to
> Cluster B (Azure Region B).
>
> Each cluster have 3 Cassandra nodes (RF=3) running used by different
> applications. Few of the applications are common is Cluster A and Cluster 
> B
> thereby sharing same keyspace/table names.
> Need suggestion for the best possible migration strategy here
> considering - 1. No Application code changes possible - Minor config/infra
> changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.
>
> It'd be great to hear ideas from all of you based on your experiences.
>
> Cassandra Version - Cassandra 3.0.13 on both sides.
> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
> --
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>>


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Vova Shelgunov
DataStax bulk loaded can be an option if data is large.

On Fri, Jan 17, 2020, 07:33 Nitan Kainth  wrote:

> If the keyspace already exist, use copy command or sstableloader to merge
> data. If data volume it too big, consider spark or a custom java program
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya 
> wrote:
>
> 
> Any leads on this ?
>
> — Ankit
>
> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
> wrote:
>
>> Hi Arvinder,
>>
>> Thanks for your response.
>>
>> Yes - Cluster B already has some data. Tables/KS names are identical ;
>> for data - I still haven't got the clarity if it has identical data or no -
>> I am assuming no since it's for different customers but need the
>> confirmation.
>>
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>>
>>
>> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
>> wrote:
>>
>>> So as I understand, Cluster B already has some data and not an empty
>>> cluster.
>>>
>>> When you say, clusters share same keyspace and table names, do you mean
>>> both clusters have identical data on those ks/tables?
>>>
>>>
>>> -Arvi
>>>
>>> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
>>> wrote:
>>>
 Hello Group,

 I have a requirement in one of the production systems where I need to
 be able to migrate entire dataset from Cluster A (Azure Region A) to
 Cluster B (Azure Region B).

 Each cluster have 3 Cassandra nodes (RF=3) running used by different
 applications. Few of the applications are common is Cluster A and Cluster B
 thereby sharing same keyspace/table names.
 Need suggestion for the best possible migration strategy here
 considering - 1. No Application code changes possible - Minor config/infra
 changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.

 It'd be great to hear ideas from all of you based on your experiences.

 Cassandra Version - Cassandra 3.0.13 on both sides.
 Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

 *Thanks & Regards,*
 *Ankit Gadhiya*

 --
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Nitan Kainth
If the keyspace already exist, use copy command or sstableloader to merge data. 
If data volume it too big, consider spark or a custom java program 


Regards,
Nitan
Cell: 510 449 9629

> On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya  wrote:
> 
> 
> Any leads on this ?
> 
> — Ankit
> 
>> On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya  wrote:
>> Hi Arvinder,
>> 
>> Thanks for your response.
>> 
>> Yes - Cluster B already has some data. Tables/KS names are identical ; for 
>> data - I still haven't got the clarity if it has identical data or no - I am 
>> assuming no since it's for different customers but need the confirmation.
>> 
>> Thanks & Regards,
>> Ankit Gadhiya
>> 
>> 
>> 
>>> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon  
>>> wrote:
>>> So as I understand, Cluster B already has some data and not an empty 
>>> cluster.
>>> 
>>> When you say, clusters share same keyspace and table names, do you mean 
>>> both clusters have identical data on those ks/tables?
>>> 
>>> 
>>> -Arvi
>>> 
 On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya  wrote:
 Hello Group,
 
 I have a requirement in one of the production systems where I need to be 
 able to migrate entire dataset from Cluster A (Azure Region A) to Cluster 
 B (Azure Region B). 
 
 Each cluster have 3 Cassandra nodes (RF=3) running used by different 
 applications. Few of the applications are common is Cluster A and Cluster 
 B thereby sharing same keyspace/table names. 
 Need suggestion for the best possible migration strategy here considering 
 - 1. No Application code changes possible - Minor config/infra changes can 
 be considered. 2. Zero data loss. 3. No/Minimal downtime.
 
 It'd be great to hear ideas from all of you based on your experiences.
 
 Cassandra Version - Cassandra 3.0.13 on both sides.
 Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
 
 Thanks & Regards,
 Ankit Gadhiya
 
> -- 
> Thanks & Regards,
> Ankit Gadhiya
> 


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Ankit Gadhiya
Any leads on this ?

— Ankit

On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya 
wrote:

> Hi Arvinder,
>
> Thanks for your response.
>
> Yes - Cluster B already has some data. Tables/KS names are identical ; for
> data - I still haven't got the clarity if it has identical data or no - I
> am assuming no since it's for different customers but need the confirmation.
>
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>
>
> On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
> wrote:
>
>> So as I understand, Cluster B already has some data and not an empty
>> cluster.
>>
>> When you say, clusters share same keyspace and table names, do you mean
>> both clusters have identical data on those ks/tables?
>>
>>
>> -Arvi
>>
>> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
>> wrote:
>>
>>> Hello Group,
>>>
>>> I have a requirement in one of the production systems where I need to be
>>> able to migrate entire dataset from Cluster A (Azure Region A) to Cluster B
>>> (Azure Region B).
>>>
>>> Each cluster have 3 Cassandra nodes (RF=3) running used by different
>>> applications. Few of the applications are common is Cluster A and Cluster B
>>> thereby sharing same keyspace/table names.
>>> Need suggestion for the best possible migration strategy here
>>> considering - 1. No Application code changes possible - Minor config/infra
>>> changes can be considered. 2. Zero data loss. 3. No/Minimal downtime.
>>>
>>> It'd be great to hear ideas from all of you based on your experiences.
>>>
>>> Cassandra Version - Cassandra 3.0.13 on both sides.
>>> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>>>
>>> *Thanks & Regards,*
>>> *Ankit Gadhiya*
>>>
>>> --
*Thanks & Regards,*
*Ankit Gadhiya*


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Ankit Gadhiya
Hi Arvinder,

Thanks for your response.

Yes - Cluster B already has some data. Tables/KS names are identical ; for
data - I still haven't got the clarity if it has identical data or no - I
am assuming no since it's for different customers but need the confirmation.

*Thanks & Regards,*
*Ankit Gadhiya*



On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon 
wrote:

> So as I understand, Cluster B already has some data and not an empty
> cluster.
>
> When you say, clusters share same keyspace and table names, do you mean
> both clusters have identical data on those ks/tables?
>
>
> -Arvi
>
> On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya 
> wrote:
>
>> Hello Group,
>>
>> I have a requirement in one of the production systems where I need to be
>> able to migrate entire dataset from Cluster A (Azure Region A) to Cluster B
>> (Azure Region B).
>>
>> Each cluster have 3 Cassandra nodes (RF=3) running used by different
>> applications. Few of the applications are common is Cluster A and Cluster B
>> thereby sharing same keyspace/table names.
>> Need suggestion for the best possible migration strategy here considering
>> - 1. No Application code changes possible - Minor config/infra changes can
>> be considered. 2. Zero data loss. 3. No/Minimal downtime.
>>
>> It'd be great to hear ideas from all of you based on your experiences.
>>
>> Cassandra Version - Cassandra 3.0.13 on both sides.
>> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>>
>> *Thanks & Regards,*
>> *Ankit Gadhiya*
>>
>>


Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Arvinder Dhillon
So as I understand, Cluster B already has some data and not an empty
cluster.

When you say, clusters share same keyspace and table names, do you mean
both clusters have identical data on those ks/tables?


-Arvi

On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya  wrote:

> Hello Group,
>
> I have a requirement in one of the production systems where I need to be
> able to migrate entire dataset from Cluster A (Azure Region A) to Cluster B
> (Azure Region B).
>
> Each cluster have 3 Cassandra nodes (RF=3) running used by different
> applications. Few of the applications are common is Cluster A and Cluster B
> thereby sharing same keyspace/table names.
> Need suggestion for the best possible migration strategy here considering
> - 1. No Application code changes possible - Minor config/infra changes can
> be considered. 2. Zero data loss. 3. No/Minimal downtime.
>
> It'd be great to hear ideas from all of you based on your experiences.
>
> Cassandra Version - Cassandra 3.0.13 on both sides.
> Total Data size - Cluster A: 70 GB, Cluster B: 15 GB
>
> *Thanks & Regards,*
> *Ankit Gadhiya*
>
>


*URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-16 Thread Ankit Gadhiya
Hello Group,

I have a requirement in one of the production systems where I need to be
able to migrate entire dataset from Cluster A (Azure Region A) to Cluster B
(Azure Region B).

Each cluster have 3 Cassandra nodes (RF=3) running used by different
applications. Few of the applications are common is Cluster A and Cluster B
thereby sharing same keyspace/table names.
Need suggestion for the best possible migration strategy here considering -
1. No Application code changes possible - Minor config/infra changes can be
considered. 2. Zero data loss. 3. No/Minimal downtime.

It'd be great to hear ideas from all of you based on your experiences.

Cassandra Version - Cassandra 3.0.13 on both sides.
Total Data size - Cluster A: 70 GB, Cluster B: 15 GB

*Thanks & Regards,*
*Ankit Gadhiya*


Unified DataStax drivers

2020-01-16 Thread Chris Splinter
Hi all,

Last September, Jonathan Ellis announced at ApacheCon NA
 that DataStax was going to unify the
drivers that we develop for Apache Cassandra and DataStax Enterprise into a
single open-source, Apache v2.0 Licensed driver. Yesterday, we released
this new version of the drivers across our C++, C#, Java, Node.js and
Python drivers. See the blog post
 for
links to the source code and documentation.

With this unified driver, we are committing to developing all of our new
functionality in this single driver going forward, available for all
Cassandra users and not just DataStax customers. This means that the
following are now available for all users:


Java: Spring Boot Starter

This starter is currently available in DataStax Labs
,
our goal is to get it into the Spring Boot project. Also of note, Mark
Paluch  and the team that works on Spring
Data Cassandra recently completed their upgrade to the 4.x line of the Java
Driver ( DATACASS-656  ).

Java: Built-in support for Reactive programming

This new version of the Java Driver ( v4.4.0 ) now has an executeReactive
method on CqlSession for those working with Reactive Streams. See the
documentation

for details.

Java, Node.js: New Load Balancing Policy

The Java and Node.js drivers now have a new load balancing policy that uses
the in-flight request count for each node to drive the Power of 2 Choices
 decision
and takes into account the dequeuing rate of the in-flight requests to
avoid slow nodes. In addition, the amount of time that a node has been UP
is also considered when creating the query plan to only send requests to
nodes when they are ready. We are also working to get this into the C++, C#
and Python drivers soon.

Python: Pre-Built Wheels

Previously we only had pre-built wheels for the DSE driver but now they are
available for everyone to use in this new version of the driver ( v3.21.0
).


Along with the bulk loader and Kafka connector
 that we
made available for use with Apache Cassandra in December last year, we hope
that this helps simplify the picture for those that use our drivers.

Best,

Chris


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Oleksandr Shulgin
On Thu, Jan 16, 2020 at 3:18 PM Laxmikant Upadhyay 
wrote:

>
> You are right, that will solve the problem. but unfortunately i won't be
> able to meet my sla with write each quorum . I am using local quorum for
> both read and write.
> Any other way ?
>

Is you read SLO more sensitive than write SLO?  Maybe you can spend a bit
more time on read path by using non-local read CL, while keeping the write
CL local?

--
Alex


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Jean Carlo
Hello Laxmiant,

your application has to deal with eventually consistency if you are using
cassandra. Ensure to have

R + W > RF

And have the repairs runing periodically. This is the best way to be the
most cosistent and coherent

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Thu, Jan 16, 2020 at 3:18 PM Laxmikant Upadhyay 
wrote:

> Hi Alex,
>
> You are right, that will solve the problem. but unfortunately i won't be
> able to meet my sla with write each quorum . I am using local quorum for
> both read and write.
> Any other way ?
>
>
> On Thu, Jan 16, 2020, 5:45 PM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Thu, Jan 16, 2020 at 1:04 PM Laxmikant Upadhyay <
>> laxmikant@gmail.com> wrote:
>>
>>> Hi,
>>> What I meant fromActive/standby model is that even though data is being
>>> replicated (asynchronously) to standby DC ,  client will only access the
>>> data from active DC (let's say using local_quorum).
>>>
>>> you have "to switch" your clients without any issues since your writes
>>> are replicated on all DC.
>>> --> that is not true because there is a chance of mutation drop. (Hints,
>>> read repair may help to some extent but data consistency is not guaranteed
>>> unless you run anti- entropy repair )
>>>
>>
>> What are the consistency levels used by your application(s)?
>>
>> E.g. for strong consistency across multiple DCs you could use EACH_QUORUM
>> for the write requests and LOCAL_QUORUM for reads, with a replication
>> factor >= 3 per DC.
>>
>> --
>> Alex
>>
>>


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Laxmikant Upadhyay
Hi Alex,

You are right, that will solve the problem. but unfortunately i won't be
able to meet my sla with write each quorum . I am using local quorum for
both read and write.
Any other way ?


On Thu, Jan 16, 2020, 5:45 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Thu, Jan 16, 2020 at 1:04 PM Laxmikant Upadhyay <
> laxmikant@gmail.com> wrote:
>
>> Hi,
>> What I meant fromActive/standby model is that even though data is being
>> replicated (asynchronously) to standby DC ,  client will only access the
>> data from active DC (let's say using local_quorum).
>>
>> you have "to switch" your clients without any issues since your writes
>> are replicated on all DC.
>> --> that is not true because there is a chance of mutation drop. (Hints,
>> read repair may help to some extent but data consistency is not guaranteed
>> unless you run anti- entropy repair )
>>
>
> What are the consistency levels used by your application(s)?
>
> E.g. for strong consistency across multiple DCs you could use EACH_QUORUM
> for the write requests and LOCAL_QUORUM for reads, with a replication
> factor >= 3 per DC.
>
> --
> Alex
>
>


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Oleksandr Shulgin
On Thu, Jan 16, 2020 at 1:04 PM Laxmikant Upadhyay 
wrote:

> Hi,
> What I meant fromActive/standby model is that even though data is being
> replicated (asynchronously) to standby DC ,  client will only access the
> data from active DC (let's say using local_quorum).
>
> you have "to switch" your clients without any issues since your writes are
> replicated on all DC.
> --> that is not true because there is a chance of mutation drop. (Hints,
> read repair may help to some extent but data consistency is not guaranteed
> unless you run anti- entropy repair )
>

What are the consistency levels used by your application(s)?

E.g. for strong consistency across multiple DCs you could use EACH_QUORUM
for the write requests and LOCAL_QUORUM for reads, with a replication
factor >= 3 per DC.

--
Alex


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Laxmikant Upadhyay
Hi,
What I meant fromActive/standby model is that even though data is being
replicated (asynchronously) to standby DC ,  client will only access the
data from active DC (let's say using local_quorum).

you have "to switch" your clients without any issues since your writes are
replicated on all DC.
--> that is not true because there is a chance of mutation drop. (Hints,
read repair may help to some extent but data consistency is not guaranteed
unless you run anti- entropy repair )



On Thu, Jan 16, 2020, 3:45 PM Ahmed Eljami  wrote:

> Hello,
>
> What do you mean by active/standby model ? Cassandra is designed to be
> active/active inter DC.
> So you have "to switch" your clients without any issues since your writes
> are replicated on all DC.
>
> Unless you would mean by active/standby that the keyspace is not
> replicated on the second DC ?
>
> Le jeu. 16 janv. 2020 à 09:35, Laxmikant Upadhyay 
> a écrit :
>
>> We have 2 dc in active/standby model. At any given point if we want to
>> switch to standby dc, how will we make sure that data is consistent with
>> active site? Note that repair runs at its scheduled time.
>>
>>
>> I am thinking of below approaches :
>>
>> 1.Before switching run the repair (although it assure consistency mostly
>> but repair itself may take long time to complete)
>>
>> 2. Monitor the dropped message bean : If no message dropped since last
>> successful repair then it is good to switch without running repair.
>>
>> 3. Monitor the hints backlog (files in hint directory), if no backlog
>> then it is good to  switch without running repair.
>>
>>
>>
>> I am interested to know how other people are solving this issue and make
>> fast switch-over assuring consistency.
>>
>> --
>>
>> regards,
>> Laxmikant Upadhyay
>>
>>
>
> --
> Cordialement;
>
> Ahmed ELJAMI
>


Re: How to assure data consistency in switch over to standby dc

2020-01-16 Thread Ahmed Eljami
Hello,

What do you mean by active/standby model ? Cassandra is designed to be
active/active inter DC.
So you have "to switch" your clients without any issues since your writes
are replicated on all DC.

Unless you would mean by active/standby that the keyspace is not replicated
on the second DC ?

Le jeu. 16 janv. 2020 à 09:35, Laxmikant Upadhyay 
a écrit :

> We have 2 dc in active/standby model. At any given point if we want to
> switch to standby dc, how will we make sure that data is consistent with
> active site? Note that repair runs at its scheduled time.
>
>
> I am thinking of below approaches :
>
> 1.Before switching run the repair (although it assure consistency mostly
> but repair itself may take long time to complete)
>
> 2. Monitor the dropped message bean : If no message dropped since last
> successful repair then it is good to switch without running repair.
>
> 3. Monitor the hints backlog (files in hint directory), if no backlog then
> it is good to  switch without running repair.
>
>
>
> I am interested to know how other people are solving this issue and make
> fast switch-over assuring consistency.
>
> --
>
> regards,
> Laxmikant Upadhyay
>
>

-- 
Cordialement;

Ahmed ELJAMI


How to assure data consistency in switch over to standby dc

2020-01-16 Thread Laxmikant Upadhyay
We have 2 dc in active/standby model. At any given point if we want to
switch to standby dc, how will we make sure that data is consistent with
active site? Note that repair runs at its scheduled time.


I am thinking of below approaches :

1.Before switching run the repair (although it assure consistency mostly
but repair itself may take long time to complete)

2. Monitor the dropped message bean : If no message dropped since last
successful repair then it is good to switch without running repair.

3. Monitor the hints backlog (files in hint directory), if no backlog then
it is good to  switch without running repair.



I am interested to know how other people are solving this issue and make
fast switch-over assuring consistency.

-- 

regards,
Laxmikant Upadhyay