Re: Cassandra 3.1 - Aggregation query failure

2015-12-21 Thread Jonathan Haddad
Even if you get this to work for now, I really recommend using a different
tool, like Spark.  Personally I wouldn't use UDAs outside of a single
partition.

On Mon, Dec 21, 2015 at 1:50 AM Dinesh Shanbhag <
dinesh.shanb...@isanasystems.com> wrote:

>
> Thanks for the pointers!  I edited jvm.options in
> $CASSANDRA_HOME/conf/jvm.options to increase -Xms and -Xmx to 1536M.
> The result is the same.
>
> And in $CASSANDRA_HOME/logs/system.log, grep GC system.log produces this
> (when jvm.options had not been changed):
>
> INFO  [Service Thread] 2015-12-18 15:26:31,668 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 296ms.  CMS Old Gen: 18133664 -> 15589256;
> Code Cache: 5650880 -> 8122304; Compressed Class Space: 2530064 ->
> 3345624; Metaspace: 21314000 -> 28040984; Par Eden Space: 7019256 ->
> 164070848;
> INFO  [Service Thread] 2015-12-18 15:48:39,736 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 379ms.  CMS Old Gen: 649257416 -> 84190176;
> Code Cache: 20772224 -> 20726848; Par Eden Space: 2191408 -> 52356736;
> Par Survivor Space: 2378448 -> 2346840
> INFO  [Service Thread] 2015-12-18 15:58:35,118 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 406ms.  CMS Old Gen: 648847808 -> 86954856;
> Code Cache: 21182080 -> 21188032; Par Eden Space: 1815696 -> 71525744;
> Par Survivor Space: 2388648 -> 2364696
> INFO  [Service Thread] 2015-12-18 16:13:45,821 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 211ms.  CMS Old Gen: 648343768 -> 73135720;
> Par Eden Space: 3224880 -> 7957464; Par Survivor Space: 2379912 -> 2414520
> INFO  [Service Thread] 2015-12-18 16:32:46,419 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 387ms.  CMS Old Gen: 648476072 -> 6832;
> Par Eden Space: 2006624 -> 64263360; Par Survivor Space: 2403792 -> 2387664
> INFO  [Service Thread] 2015-12-18 16:42:38,648 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 365ms.  CMS Old Gen: 649126336 -> 137359384;
> Code Cache: 22972224 -> 22979840; Metaspace: 41374464 -> 41375104; Par
> Eden Space: 4286080 -> 154449480; Par Survivor Space: 1575440 -> 2310768
> INFO  [Service Thread] 2015-12-18 16:51:57,538 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 322ms.  CMS Old Gen: 648338928 -> 79783856;
> Par Eden Space: 2058968 -> 56931312; Par Survivor Space: 2342760 -> 2400336
> INFO  [Service Thread] 2015-12-18 17:02:49,543 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 212ms.  CMS Old Gen: 648702008 -> 122954344;
> Par Eden Space: 3269032 -> 61433328; Par Survivor Space: 2395824 -> 3448760
> INFO  [Service Thread] 2015-12-18 17:11:54,090 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 306ms.  CMS Old Gen: 648748576 -> 70965096;
> Par Eden Space: 2174840 -> 27074432; Par Survivor Space: 2365992 -> 2373984
> INFO  [Service Thread] 2015-12-18 17:22:28,949 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 350ms.  CMS Old Gen: 648243024 -> 90897272;
> Par Eden Space: 2150168 -> 43487192; Par Survivor Space: 2401872 -> 2410728
>
>
> After modifying jvm.options to increase -Xms & -Xmx (to 1536M):
>
> INFO  [Service Thread] 2015-12-21 11:39:24,918 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 342ms.  CMS Old Gen: 18579136 -> 16305144;
> Code Cache: 8600128 -> 10898752; Compressed Class Space: 3431288 ->
> 3761496; Metaspace: 29551832 -> 33307352; Par Eden Space: 4822000 ->
> 94853272;
> INFO  [Service Thread] 2015-12-21 11:39:30,710 GCInspector.java:284 -
> ParNew GC in 206ms.  CMS Old Gen: 22932208 -> 41454520; Par Eden Space:
> 167772160 -> 0; Par Survivor Space: 13144872 -> 20971520
> INFO  [Service Thread] 2015-12-21 13:08:14,922 GCInspector.java:284 -
> ConcurrentMarkSweep GC in 468ms.  CMS Old Gen: 21418016 -> 16146528;
> Code Cache: 11693888 -> 11744704; Compressed Class Space: 4331224 ->
> 4344192; Metaspace: 37191144 -> 37249960; Par Eden Space: 146089224 ->
> 148476848;
> INFO  [Service Thread] 2015-12-21 13:08:53,068 GCInspector.java:284 -
> ParNew GC in 216ms.  CMS Old Gen: 16146528 -> 26858568; Par Eden Space:
> 167772160 -> 0;
>
>
> Earlier the node had OpenJDK 8.  For today's tests I installed and used
> Oracle Java 8.
>
> Do the above messages provide any clue? Or any debug logging I can
> enable to progress further?
> Thanks,
> Dinesh.
>
> On 12/18/2015 9:56 PM, Tyler Hobbs wrote:
> >
> > On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan  > > wrote:
> >
> > Cassandra will perform a full table scan and fetch all the data in
> > memory to apply the aggregate function.
> >
> >
> > Just to clarify for others on the list: when executing aggregation
> > functions, Cassandra /will/ use paging internally, so at most one page
> > worth of data will be held in memory at a time.  However, if your
> > aggregation function retains a large amount of data, this may
> > contribute to heap pressure.
> >
> >
> > --
> > Tyler Hobbs
> > DataStax 
>
>


Re: Timestamp Query

2015-12-21 Thread Eric Stevens
Generally speaking (both for Cassandra as well as for many other projects),
timestamps don't carry a timezone directly.  A single point in time has a
consistent value for timestamp regardless of the timezone, and when you
convert a timestamp to a human-friendly value, you can attach a timezone to
see what the local time in that timezone was as of that timestamp.

Cassandra, like many projects, uses timestamps counted as number of
predefined intervals since 1970-01-01T00:00:00Z (the Unix Epoch).  This is
timestamp 0.

In Cassandra, timestamp fields use millisecond precision (milliseconds
since timestamp 0), while writetime() (or USING TIMESTAMP) uses
microseconds since timestamp 0.

cqlsh prefers the system timezone when displaying timestamps in friendly
format. If you want to display timestamps in a different timezone, you can
set the TZ environment variable to accomplish this (see
http://stackoverflow.com/questions/26595649/specify-cqlsh-output-timezone).


If you need to track a timezone with your timestamp (i.e. it's not good
enough to know the moment in time, but you also need to know the local
offset that the timestamp was written under), you'll need to track that in
a separate field, since Cassandra does not provide a TimestampWithTimezone
data type.

On Sun, Dec 20, 2015 at 11:09 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> https://datastax.github.io/java-driver/features/query_timestamps/
>
> On Sun, Dec 20, 2015 at 9:48 PM, Harikrishnan A  wrote:
>
>> Hello,
>>
>> How do I set a timestamp value with specific timezone in cassandra. I
>> understand that it captures the timezone of the co ordinator node while
>> inserting.
>> What about if I want to insert and display the timezone that I preferred
>> nstead of the default co ordinator timezone.
>>
>> Thanks & Regards,
>> Hari
>>
>
>


[RELEASE] Apache Cassandra 3.1.1 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.1.1.

There has been some understandable confusion about our new Tick-Tock
release style.  This thread should help explain it [4]. Since a critical
bug was discovered just after 3.1 we are releasing 3.1.1 to address it
before 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/etxSuG (CHANGES.txt)
[2]: https://goo.gl/gP7B3J (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://www.mail-archive.com/user@cassandra.apache.org/msg45119.html


Re: OpsCenter metrics growth can relates to compactions?

2015-12-21 Thread Sebastian Estevez
>
> We do have a lot of keyspaces and column families.


Be careful as c* (not just opscenter) will not run well with too many
tables. Usually 2 or 3 hundred is a good upper bound though I've seen folks
throw money at the problem and run more with special hardware (lots of RAM).

Most importantly, I truncated all rollups early this morning and during a
> big compaction (with hundreds of pending tasks at one point), the metrics
> grew to ~13G. Can I say compaction activities can increase the metric disk
> usage growth significantly? I have seen this behavior quite often with
> compaction.


This is normal especially with size tiered compaction.


Since it's metrics data, you can always decrease your ttl on the OpsCenter
tables, blacklist some keyspaces or tables, or keep truncating.

http://docs.datastax.com/en/opscenter/5.2/opsc/configure/opscMetricsConfig_r.html






All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Sun, Dec 20, 2015 at 3:45 PM, John Wong  wrote:

> Hi.
>
> We are using the open source version of OpsCenter. We find it useful, but
> the disk space for OpsCenter metrics has been increasing and can sometime
> outgrow to 30-50G in a matter of a day or two. We do have a lot of
> keyspaces and column families.
>
> Usually this dev cluster is quiet on the weekend except for some QA jobs
> or our weekend primary read-repair. Most importantly, I truncated all
> rollups early this morning and during a big compaction (with hundreds of
> pending tasks at one point), the metrics grew to ~13G. Can I say compaction
> activities can increase the metric disk usage growth significantly? I have
> seen this behavior quite often with compaction.
>
> Thanks.
>
> John
>


What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread Noorul Islam K M

Hello all,

We have two clusters X and Y with same keyspaces but distinct data sets.
We are planning to merge these into single cluster. What would be ideal
steps to achieve this without downtime for applications? We have time
series data stream continuously writing to Cassandra.

We have ruled out export/import as that will make us loose data during
the time of copy.

We also ruled out sstableloader as that is not reliable. It fails often
and there is not way to start from where it failed.

Any suggestions will help.

Thanks and Regards
Noorul


[RELEASE] Apache Cassandra 3.0.2 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/swRjp9 (CHANGES.txt)
[2]: https://goo.gl/ipA763 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Would data be lost by nodetool removenode force

2015-12-21 Thread Carlos Alonso
Why is the old node not able to restart?

If you're about to bring a new one to replace the old dead one, it may be
simpler to just replace it
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

Hope it helps.

Carlos Alonso | Software Engineer | @calonso 

On 18 December 2015 at 02:00, Shuo Chen  wrote:

> The version is 2.0.7
>
> Why we want to add a new node is that somebody creates an index on certain
> table and the cluster becomes slow, so that guy tries to restart the node.
> However  only 3 out of 4 nodes can restart and 1 node cannot restart . So
> we intend to add a new node however it shows error that cannot find certain
> cf-id and cannot be started.
>
> So we decided to remove this newly added node by nodetool removenode. Here
> is the thing... We have already started remove this newly added node. Only
> a few data (80kb) has been migrated to this newly added node. The process
> of removenode seems hanging
>
> So how to resolve this situation, can we add another node? or using remove
> force to this newly added node?
>
> On Fri, Dec 18, 2015 at 9:09 AM, Robert Coli  wrote:
>
>> On Thu, Dec 17, 2015 at 4:44 PM, Shuo Chen  wrote:
>>
>>>I have a 4 node cluster with status 3 UN and 1 DN. I am trying to add
>>> a new node into the cluster but it is also dead. So the cluster is now 3 UN
>>> and 2 DN. However I didnot run nodetool cleanup on any nodes. And just
>>> several KBs of data is migrated to this newly added node.
>>>
>>> All the keyspaces are replicated for 3. I am trying to remove this newly
>>> add node using nodetool removenode. But it hangs for 24 hours with remove
>>> node status.
>>>
>>> RemovalStatus: Removing token (-9104065154580789913). Waiting for
>>> replication confirmation from [/192.168.148.29,/192.168.148.24,/
>>> 192.168.148.23].
>>>
>>> Is it safe to use removenode force to remove this newly added node? Will
>>> any data be lost? Thanks!
>>>
>>
>> In general it is not "safe" to use "removenode" to remove a node from a
>> cluster, because removenode by definition reduces distinct replica count.
>>
>> That's why a departing node should always use "decommission," so it
>> streams the data it has, including any data it might be the only node it
>> has, to the new nodes responsible for the range.
>>
>> Your email seems to prompt questions It doesn't seem like your "real"
>> question is about removenode at all?
>>
>> Why, for example, is your new node you are adding into the cluster "dead"?
>>
>> Are you trying to replace the original DN with your new node? If so, use
>> "replace_address"?
>>
>> =Rob
>> PS - also, what version of cassandra?
>>
>
>
>
> --
> *陈硕* *Shuo Chen*
> chenatu2...@gmail.com
> chens...@whaty.com
>


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread George Sigletos
Hello,

We had a similar problem where we needed to migrate data from one cluster
to another.

We ended up using Spark to accomplish this. It is fast and reliable but
some downtime was required after all.

We minimized the downtime by doing a first run, and then run incremental
updates.

Kind regards,
George



On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
wrote:

>
> Hello all,
>
> We have two clusters X and Y with same keyspaces but distinct data sets.
> We are planning to merge these into single cluster. What would be ideal
> steps to achieve this without downtime for applications? We have time
> series data stream continuously writing to Cassandra.
>
> We have ruled out export/import as that will make us loose data during
> the time of copy.
>
> We also ruled out sstableloader as that is not reliable. It fails often
> and there is not way to start from where it failed.
>
> Any suggestions will help.
>
> Thanks and Regards
> Noorul
>


Re: Cassandra 3.1 - Aggregation query failure

2015-12-21 Thread Dinesh Shanbhag


Thanks for the pointers!  I edited jvm.options in 
$CASSANDRA_HOME/conf/jvm.options to increase -Xms and -Xmx to 1536M.  
The result is the same.


And in $CASSANDRA_HOME/logs/system.log, grep GC system.log produces this 
(when jvm.options had not been changed):


INFO  [Service Thread] 2015-12-18 15:26:31,668 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 296ms.  CMS Old Gen: 18133664 -> 15589256; 
Code Cache: 5650880 -> 8122304; Compressed Class Space: 2530064 -> 
3345624; Metaspace: 21314000 -> 28040984; Par Eden Space: 7019256 -> 
164070848;
INFO  [Service Thread] 2015-12-18 15:48:39,736 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 379ms.  CMS Old Gen: 649257416 -> 84190176; 
Code Cache: 20772224 -> 20726848; Par Eden Space: 2191408 -> 52356736; 
Par Survivor Space: 2378448 -> 2346840
INFO  [Service Thread] 2015-12-18 15:58:35,118 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 406ms.  CMS Old Gen: 648847808 -> 86954856; 
Code Cache: 21182080 -> 21188032; Par Eden Space: 1815696 -> 71525744; 
Par Survivor Space: 2388648 -> 2364696
INFO  [Service Thread] 2015-12-18 16:13:45,821 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 211ms.  CMS Old Gen: 648343768 -> 73135720; 
Par Eden Space: 3224880 -> 7957464; Par Survivor Space: 2379912 -> 2414520
INFO  [Service Thread] 2015-12-18 16:32:46,419 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 387ms.  CMS Old Gen: 648476072 -> 6832; 
Par Eden Space: 2006624 -> 64263360; Par Survivor Space: 2403792 -> 2387664
INFO  [Service Thread] 2015-12-18 16:42:38,648 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 365ms.  CMS Old Gen: 649126336 -> 137359384; 
Code Cache: 22972224 -> 22979840; Metaspace: 41374464 -> 41375104; Par 
Eden Space: 4286080 -> 154449480; Par Survivor Space: 1575440 -> 2310768
INFO  [Service Thread] 2015-12-18 16:51:57,538 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 322ms.  CMS Old Gen: 648338928 -> 79783856; 
Par Eden Space: 2058968 -> 56931312; Par Survivor Space: 2342760 -> 2400336
INFO  [Service Thread] 2015-12-18 17:02:49,543 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 212ms.  CMS Old Gen: 648702008 -> 122954344; 
Par Eden Space: 3269032 -> 61433328; Par Survivor Space: 2395824 -> 3448760
INFO  [Service Thread] 2015-12-18 17:11:54,090 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 306ms.  CMS Old Gen: 648748576 -> 70965096; 
Par Eden Space: 2174840 -> 27074432; Par Survivor Space: 2365992 -> 2373984
INFO  [Service Thread] 2015-12-18 17:22:28,949 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 350ms.  CMS Old Gen: 648243024 -> 90897272; 
Par Eden Space: 2150168 -> 43487192; Par Survivor Space: 2401872 -> 2410728



After modifying jvm.options to increase -Xms & -Xmx (to 1536M):

INFO  [Service Thread] 2015-12-21 11:39:24,918 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 342ms.  CMS Old Gen: 18579136 -> 16305144; 
Code Cache: 8600128 -> 10898752; Compressed Class Space: 3431288 -> 
3761496; Metaspace: 29551832 -> 33307352; Par Eden Space: 4822000 -> 
94853272;
INFO  [Service Thread] 2015-12-21 11:39:30,710 GCInspector.java:284 - 
ParNew GC in 206ms.  CMS Old Gen: 22932208 -> 41454520; Par Eden Space: 
167772160 -> 0; Par Survivor Space: 13144872 -> 20971520
INFO  [Service Thread] 2015-12-21 13:08:14,922 GCInspector.java:284 - 
ConcurrentMarkSweep GC in 468ms.  CMS Old Gen: 21418016 -> 16146528; 
Code Cache: 11693888 -> 11744704; Compressed Class Space: 4331224 -> 
4344192; Metaspace: 37191144 -> 37249960; Par Eden Space: 146089224 -> 
148476848;
INFO  [Service Thread] 2015-12-21 13:08:53,068 GCInspector.java:284 - 
ParNew GC in 216ms.  CMS Old Gen: 16146528 -> 26858568; Par Eden Space: 
167772160 -> 0;



Earlier the node had OpenJDK 8.  For today's tests I installed and used 
Oracle Java 8.


Do the above messages provide any clue? Or any debug logging I can 
enable to progress further?

Thanks,
Dinesh.

On 12/18/2015 9:56 PM, Tyler Hobbs wrote:


On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan > wrote:


Cassandra will perform a full table scan and fetch all the data in
memory to apply the aggregate function.


Just to clarify for others on the list: when executing aggregation 
functions, Cassandra /will/ use paging internally, so at most one page 
worth of data will be held in memory at a time.  However, if your 
aggregation function retains a large amount of data, this may 
contribute to heap pressure.



--
Tyler Hobbs
DataStax 




Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread Noorul Islam K M
George Sigletos  writes:

> Hello,
>
> We had a similar problem where we needed to migrate data from one cluster
> to another.
>
> We ended up using Spark to accomplish this. It is fast and reliable but
> some downtime was required after all.
>
> We minimized the downtime by doing a first run, and then run incremental
> updates.
>

How much data are you talking about?

How did you achieve incremental run? We are using kairosdb and some of
the other schemas does not have a way to filter based on date.

Thanks and Regards
Noorul

> Kind regards,
> George
>
>
>
> On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
> wrote:
>
>>
>> Hello all,
>>
>> We have two clusters X and Y with same keyspaces but distinct data sets.
>> We are planning to merge these into single cluster. What would be ideal
>> steps to achieve this without downtime for applications? We have time
>> series data stream continuously writing to Cassandra.
>>
>> We have ruled out export/import as that will make us loose data during
>> the time of copy.
>>
>> We also ruled out sstableloader as that is not reliable. It fails often
>> and there is not way to start from where it failed.
>>
>> Any suggestions will help.
>>
>> Thanks and Regards
>> Noorul
>>


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread George Sigletos
Roughly half TB of data.

There is a timestamp column in the tables we migrated and we did use that
to achieve incremental updates.

I don't know anything about kairosdb, but I can see from the docs that
there exists a row timestamp column. Could you maybe use that one?

Kind regards,
George

On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M 
wrote:

> George Sigletos  writes:
>
> > Hello,
> >
> > We had a similar problem where we needed to migrate data from one cluster
> > to another.
> >
> > We ended up using Spark to accomplish this. It is fast and reliable but
> > some downtime was required after all.
> >
> > We minimized the downtime by doing a first run, and then run incremental
> > updates.
> >
>
> How much data are you talking about?
>
> How did you achieve incremental run? We are using kairosdb and some of
> the other schemas does not have a way to filter based on date.
>
> Thanks and Regards
> Noorul
>
> > Kind regards,
> > George
> >
> >
> >
> > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
> > wrote:
> >
> >>
> >> Hello all,
> >>
> >> We have two clusters X and Y with same keyspaces but distinct data sets.
> >> We are planning to merge these into single cluster. What would be ideal
> >> steps to achieve this without downtime for applications? We have time
> >> series data stream continuously writing to Cassandra.
> >>
> >> We have ruled out export/import as that will make us loose data during
> >> the time of copy.
> >>
> >> We also ruled out sstableloader as that is not reliable. It fails often
> >> and there is not way to start from where it failed.
> >>
> >> Any suggestions will help.
> >>
> >> Thanks and Regards
> >> Noorul
> >>
>


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread DuyHai Doan
For cross-cluster operation with the Spark/Cassandra connector, you can
look at this trick:
http://www.slideshare.net/doanduyhai/fast-track-to-getting-started-with-dse-max-ing/64

On Mon, Dec 21, 2015 at 1:14 PM, George Sigletos 
wrote:

> Roughly half TB of data.
>
> There is a timestamp column in the tables we migrated and we did use that
> to achieve incremental updates.
>
> I don't know anything about kairosdb, but I can see from the docs that
> there exists a row timestamp column. Could you maybe use that one?
>
> Kind regards,
> George
>
> On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M 
> wrote:
>
>> George Sigletos  writes:
>>
>> > Hello,
>> >
>> > We had a similar problem where we needed to migrate data from one
>> cluster
>> > to another.
>> >
>> > We ended up using Spark to accomplish this. It is fast and reliable but
>> > some downtime was required after all.
>> >
>> > We minimized the downtime by doing a first run, and then run incremental
>> > updates.
>> >
>>
>> How much data are you talking about?
>>
>> How did you achieve incremental run? We are using kairosdb and some of
>> the other schemas does not have a way to filter based on date.
>>
>> Thanks and Regards
>> Noorul
>>
>> > Kind regards,
>> > George
>> >
>> >
>> >
>> > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
>> > wrote:
>> >
>> >>
>> >> Hello all,
>> >>
>> >> We have two clusters X and Y with same keyspaces but distinct data
>> sets.
>> >> We are planning to merge these into single cluster. What would be ideal
>> >> steps to achieve this without downtime for applications? We have time
>> >> series data stream continuously writing to Cassandra.
>> >>
>> >> We have ruled out export/import as that will make us loose data during
>> >> the time of copy.
>> >>
>> >> We also ruled out sstableloader as that is not reliable. It fails often
>> >> and there is not way to start from where it failed.
>> >>
>> >> Any suggestions will help.
>> >>
>> >> Thanks and Regards
>> >> Noorul
>> >>
>>
>
>