??????RE: Row Cache hit issue

2017-09-19 Thread Peng Xiao
Thanks All.




--  --
??: "Steinmaurer, Thomas";;
: 2017??9??20??(??) 1:38
??: "user@cassandra.apache.org";

: RE: Row Cache hit issue



  
Hi,
 
 
 
additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you  see something like that upon Cassandra 
startup:
 
 
 
INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
 
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
 
java.lang.OutOfMemoryError: Java heap space
 
at java.util.ArrayList.(ArrayList.java:152)
 
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
 
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
 
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
 
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
 
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
 
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
 
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
 
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
 
 
 
resulting in Cassandra going OOM, with a ??reading saved cache?? log entry 
close before the OOM, you may have hit some sort  of corruption. Workaround is 
to physically delete the saved cache file and Cassandra will start up just fine.
 
 
 
Regards,
 
Thomas
 
 
 
 
 
From: Dikang Gu [mailto:dikan...@gmail.com] 
 Sent: Mittwoch, 20. September 2017 06:06
 To: cassandra 
 Subject: Re: Row Cache hit issue
 
 
  
Hi Peng,
  
 
 
  
C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.
 
  
 
 
  
--Dikang.
 
 
  
 
  
On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote:
   
And we are using C* 2.1.18.
 
   
 
 
  
 
 
  
-- Original --
 
   
From:  "";<2535...@qq.com>;
 
  
Date:  Wed, Sep 20, 2017 11:27 AM
 
  
To:  "user";
 
  
Subject:  Row Cache hit issue
 
 

 
 
  
Dear All,
 
  
 
 
  
The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
 
  
but we can still see the row cache hit.
 
  
 
 
  
Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds
 
  
 
 
  
Could anyone please explain this?
 
  
 
 
  
Thanks,
 
  
Peng Xiao
 
 
 
 
  
 

 
 
  
 
 
 
-- 
   
Dikang
  
 
 
 
 
 
 
 The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received  it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?0?1dterstra?0?8e 313

RE: Row Cache hit issue

2017-09-19 Thread Steinmaurer, Thomas
Hi,

additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you see something like that upon Cassandra 
startup:

INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.(ArrayList.java:152)
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)

resulting in Cassandra going OOM, with a “reading saved cache” log entry close 
before the OOM, you may have hit some sort of corruption. Workaround is to 
physically delete the saved cache file and Cassandra will start up just fine.

Regards,
Thomas


From: Dikang Gu [mailto:dikan...@gmail.com]
Sent: Mittwoch, 20. September 2017 06:06
To: cassandra 
Subject: Re: Row Cache hit issue

Hi Peng,

C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.

--Dikang.

On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao 
<2535...@qq.com> wrote:
And we are using C* 2.1.18.


-- Original --
From:  "我自己的邮箱";<2535...@qq.com>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user">;
Subject:  Row Cache hit issue

Dear All,

The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.

Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds

Could anyone please explain this?

Thanks,
Peng Xiao



--
Dikang

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


unsubscribe

2017-09-19 Thread marlon hendred
unsubscribe


Re: Row Cache hit issue

2017-09-19 Thread Dikang Gu
Hi Peng,

C* periodically saves cache to disk, to solve cold start problem. If
row_cache_save_period=0, it means C* does not save cache to disk. But the
cache is still working, if it's enabled in table schema, just the cache
will be empty after restart.

--Dikang.

On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote:

> And we are using C* 2.1.18.
>
>
> -- Original --
> *From: * "我自己的邮箱";<2535...@qq.com>;
> *Date: * Wed, Sep 20, 2017 11:27 AM
> *To: * "user";
> *Subject: * Row Cache hit issue
>
> Dear All,
>
> The default row_cache_save_period=0,looks Row Cache does not work in this
> situation?
> but we can still see the row cache hit.
>
> Row Cache  : entries 202787, size 100 MB, capacity 100 MB,
> 3095293 hits, 6796801 requests, 0.455 recent hit rate, 0 save period in
> seconds
>
> Could anyone please explain this?
>
> Thanks,
> Peng Xiao
>



-- 
Dikang


Re: Row Cache hit issue

2017-09-19 Thread Peng Xiao
And we are using C* 2.1.18.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user";

Subject:  Row Cache hit issue



Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

Row Cache hit issue

2017-09-19 Thread Peng Xiao
Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

Commitlog without header

2017-09-19 Thread Dikang Gu
Hello,

In our production cluster, we had multiple times that after a *unclean*
shutdown, cassandra sever can not start due to commit log exceptions:

2017-09-17_06:06:32.49830 ERROR 06:06:32 [main]: Exiting due to error while
processing commit log during initialization.
2017-09-17_06:06:32.49831
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Could not read commit log descriptor in file
/data/cassandra/commitlog/CommitLog-5-1503088780367.log
2017-09-17_06:06:32.49831 at
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:634)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49831 at
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:303)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49831 at
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49832 at
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49832 at
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49832 at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49832 at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:544)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]
2017-09-17_06:06:32.49832 at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:607)
[apache-cassandra-2.2.5+git20170612.e1857fa.jar:2.2.5+git20170612.e1857fa]

I add some logging to the CommitLogDescriptor.readHeader(), and find the
header is empty in the failure case. By empty, I mean all the fields in the
header are 0:

2017-09-19_22:43:02.22112 INFO  22:43:02 [main]: Dikang: crc: 0, checkcrc:
2077607535
2017-09-19_22:43:02.22130 INFO  22:43:02 [main]: Dikang: version: 0, id: 0,
parametersLength: 0

As a result, it did not pass the crc check, and failed the commit log
replay.

My question is: is it a known issue that some race condition can cause
empty header in commit log? If so, it should be safe just skip last commit
log with empty header, right?

As you can see, we are using Cassandra 2.2.5.

Thanks
Dikang.


unsubscribe

2017-09-19 Thread Anthony P. Scism
unsubscribe

Anthony P. Scism
Info Tech-Risk Mgmt/Client Sys - Capacity Planning
Work: 402-544-0361 Mobile: 402-707-4446



From:   kurt greaves 
To: User 
Date:   09/19/2017 04:53 PM
Subject:RE: Multi-node repair fails after upgrading to 3.0.14



This email originated from outside of the company. Please use discretion 
if opening attachments or clicking on links. 
You're right of course. Part of the reason it's changing so frequently is 
to try and improve repairs so that they at least actually work reliably. 
C* 3 hasn't been the smoothest ride for repairs. Incremental repairs 
wasn't really ready for 3.0 so it was a mistake to make it a default. 
Unfortunately it's hard to change that back now as it will just lead to 
more confusion and problems for users unaware of the change.

On 20 Sep. 2017 00:25, "Durity, Sean R"  
wrote:
Required maintenance for a cluster should not be this complicated and 
should not be changing so often. To me, this is a major flaw in Cassandra.
 
 
Sean Durity
 
From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] 
Sent: Tuesday, September 19, 2017 2:33 AM
To: user@cassandra.apache.org
Subject: RE: Multi-node repair fails after upgrading to 3.0.14
 
Hi Kurt,
 
thanks for the link!
 
Honestly, a pity, that in 3.0, we can’t get the simple, reliable and 
predictable way back to run a full repair for very low data volume CFs 
being kicked off on all nodes in parallel, without all the magic behind 
the scene introduced by incremental repairs, even if not used, as 
anticompaction even with –full has been introduced with 2.2+ J
 
 
Regards,
Thomas
 
From: kurt greaves [mailto:k...@instaclustr.com] 
Sent: Dienstag, 19. September 2017 06:24
To: User 
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs 
still triggers anti-compaction on non-repaired SSTables (if I'm reading 
that right), so might need to make sure you don't run multiple repairs at 
the same time across your nodes (if your using vnodes), otherwise could 
still end up trying to run anti-compaction on the same SSTable from 2 
repairs.
 
Anyone else feel free to jump in and correct me if my interpretation is 
wrong.
 
On 18 September 2017 at 17:11, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Jeff,
 
what should be the expected outcome when running with 3.0.14:
 
nodetool repair –full –pr keyspace cfs
 
· Should –full trigger anti-compaction?
· Should this be the same operation as nodetool repair –pr 
keyspace cfs in 2.1?
· Should I be able to  run this on several nodes in parallel as in 
2.1 without troubles, where incremental repair was not the default?
 
Still confused if I’m missing something obvious. Sorry about that. J
 
Thanks,
Thomas
 
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Montag, 18. September 2017 16:10

To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
Sorry I may be wrong about the cause - didn't see -full
 
Mea culpa, its early here and I'm not awake


-- 
Jeff Jirsa
 

On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Hi Jeff,
 
understood. That’s quite a change then coming from 2.1 from an operational 
POV.
 
Thanks again.
 
Thomas
 
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Montag, 18. September 2017 15:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
The command you're running will cause anticompaction and the range borders 
for all instances at the same time
 
Since only one repair session can anticompact any given sstable, it's 
almost guaranteed to fail
 
Run it on one instance at a time


-- 
Jeff Jirsa
 

On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Hi Alex,
 
I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel 
and this may pop up now:
 
0.176.38.128 (progress: 1%)
[2017-09-18 07:59:17,145] Some repair failed
[2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
error: Repair job has failed with the error message: [2017-09-18 
07:59:17,145] Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-09-18 07:59:17,145] Some repair failed
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread kurt greaves
You're right of course. Part of the reason it's changing so frequently is
to try and improve repairs so that they at least actually work reliably. C*
3 hasn't been the smoothest ride for repairs. Incremental repairs wasn't
really ready for 3.0 so it was a mistake to make it a default.
Unfortunately it's hard to change that back now as it will just lead to
more confusion and problems for users unaware of the change.

On 20 Sep. 2017 00:25, "Durity, Sean R"  wrote:

> Required maintenance for a cluster should not be this complicated and
> should not be changing so often. To me, this is a major flaw in Cassandra.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Tuesday, September 19, 2017 2:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Hi Kurt,
>
>
>
> thanks for the link!
>
>
>
> Honestly, a pity, that in 3.0, we can’t get the simple, reliable and
> predictable way back to run a full repair for very low data volume CFs
> being kicked off on all nodes in parallel, without all the magic behind the
> scene introduced by incremental repairs, even if not used, as
> anticompaction even with –full has been introduced with 2.2+ J
>
>
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com ]
> *Sent:* Dienstag, 19. September 2017 06:24
> *To:* User 
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13153
> 
> implies full repairs still triggers anti-compaction on non-repaired
> SSTables (if I'm reading that right), so might need to make sure you don't
> run multiple repairs at the same time across your nodes (if your using
> vnodes), otherwise could still end up trying to run anti-compaction on the
> same SSTable from 2 repairs.
>
>
>
> Anyone else feel free to jump in and correct me if my interpretation is
> wrong.
>
>
>
> On 18 September 2017 at 17:11, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Jeff,
>
>
>
> what should be the expected outcome when running with 3.0.14:
>
>
>
> nodetool repair –full –pr keyspace cfs
>
>
>
> · Should –full trigger anti-compaction?
>
> · Should this be the same operation as nodetool repair –pr
> keyspace cfs in 2.1?
>
> · Should I be able to  run this on several nodes in parallel as
> in 2.1 without troubles, where incremental repair was not the default?
>
>
>
> Still confused if I’m missing something obvious. Sorry about that. J
>
>
>
> Thanks,
>
> Thomas
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Montag, 18. September 2017 16:10
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Sorry I may be wrong about the cause - didn't see -full
>
>
>
> Mea culpa, its early here and I'm not awake
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Hi Jeff,
>
>
>
> understood. That’s quite a change then coming from 2.1 from an operational
> POV.
>
>
>
> Thanks again.
>
>
>
> Thomas
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com ]
> *Sent:* Montag, 18. September 2017 15:56
> *To:* user@cassandra.apache.org
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> The command you're running will cause anticompaction and the range borders
> for all instances at the same time
>
>
>
> Since only one repair session can anticompact any given sstable, it's
> almost guaranteed to fail
>
>
>
> Run it on one instance at a time
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Hi Alex,
>
>
>
> I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel
> and this may pop up now:
>
>
>
> 0.176.38.128 (progress: 1%)
>
> [2017-09-18 07:59:17,145] Some repair failed
>
> [2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
>
> error: Repair job has failed with the error message: [2017-09-18
> 07:59:17,145] Some repair failed
>
> -- StackTrace --
>
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2017-09-18 07:59:17,145] Some repair failed
>
> at org.apache.cassandra.tools.RepairRunner.progress(
> RepairRunner.java:115)
>
> at org.apache.cassandra.utils.progress.jmx.
> JMXNotificationProgressListener.handleNotification(
> JMXNotificationProgressListener.java:77)
>
> at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.
> 

RE: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread Steinmaurer, Thomas
Nandan,

you may find the following useful.

Slideshare:
https://www.slideshare.net/DataStax/apache-cassandra-multidatacenter-essentials-julien-anguenot-iland-internet-solutions-c-summit-2016

Youtube:
https://www.youtube.com/watch?v=G6od16YKSsA

From a client perspective, if you are targeting quorum requests, be aware that 
there is LOCAL_QUORUM and QUORUM.

Regards,
Thomas

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: Dienstag, 19. September 2017 18:58
To: user 
Subject: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

Hi Techies,

I need to configure Apache Cassandra for my upcoming project on 2 DCs.
Both DCs should have 3 nodes respective.
Details are :-
DC1 nodes --
Node 1 ->10.0.0.1
Node 2 -> 10.0.0.2
Node 3 -> 10.0.0.3
DC2 nodes --
Node 1 -> 10.0.0.4
Node 2 -> 10.0.0.5
Node 3 -> 10.0.0.6

On all nodes , I want to use UBUNTU 16.04 .
Please suggest best way to configure my DCs as well as may be in Future I will 
extend my DCs more.

Best Regards,
Nandan Priyadarshi
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread Nitan Kainth
Nandan,

Use one node from each DC in seeds parameter on all nodes.
Use right DC names and DC suffix to ensure correctly identify.
Use network topology and RF on all keyspaces for both DCs

Please post specific questions if you have any.



Regards,
Nitan K.
Cassandra and Oracle Architect/SME
Datastax Certified Cassandra expert
Oracle 10g Certified

On Tue, Sep 19, 2017 at 11:58 AM, @Nandan@ 
wrote:

> Hi Techies,
>
> I need to configure Apache Cassandra for my upcoming project on 2 DCs.
> Both DCs should have 3 nodes respective.
> Details are :-
> DC1 nodes --
> Node 1 ->10.0.0.1
> Node 2 -> 10.0.0.2
> Node 3 -> 10.0.0.3
> DC2 nodes --
> Node 1 -> 10.0.0.4
> Node 2 -> 10.0.0.5
> Node 3 -> 10.0.0.6
>
> On all nodes , I want to use UBUNTU 16.04 .
> Please suggest best way to configure my DCs as well as may be in Future I
> will extend my DCs more.
>
> Best Regards,
> Nandan Priyadarshi
>


Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread @Nandan@
Hi Techies,

I need to configure Apache Cassandra for my upcoming project on 2 DCs.
Both DCs should have 3 nodes respective.
Details are :-
DC1 nodes --
Node 1 ->10.0.0.1
Node 2 -> 10.0.0.2
Node 3 -> 10.0.0.3
DC2 nodes --
Node 1 -> 10.0.0.4
Node 2 -> 10.0.0.5
Node 3 -> 10.0.0.6

On all nodes , I want to use UBUNTU 16.04 .
Please suggest best way to configure my DCs as well as may be in Future I
will extend my DCs more.

Best Regards,
Nandan Priyadarshi


RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Anthony P. Scism
unsubscribe

Anthony P. Scism
Info Tech-Risk Mgmt/Client Sys - Capacity Planning
Work: 402-544-0361 Mobile: 402-707-4446



From:   "Durity, Sean R" 
To: "user@cassandra.apache.org" 
Date:   09/19/2017 09:25 AM
Subject:RE: Multi-node repair fails after upgrading to 3.0.14



This email originated from outside of the company. Please use discretion 
if opening attachments or clicking on links. 
Required maintenance for a cluster should not be this complicated and 
should not be changing so often. To me, this is a major flaw in Cassandra.
 
 
Sean Durity
 
From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] 
Sent: Tuesday, September 19, 2017 2:33 AM
To: user@cassandra.apache.org
Subject: RE: Multi-node repair fails after upgrading to 3.0.14
 
Hi Kurt,
 
thanks for the link!
 
Honestly, a pity, that in 3.0, we can’t get the simple, reliable and 
predictable way back to run a full repair for very low data volume CFs 
being kicked off on all nodes in parallel, without all the magic behind 
the scene introduced by incremental repairs, even if not used, as 
anticompaction even with –full has been introduced with 2.2+ J
 
 
Regards,
Thomas
 
From: kurt greaves [mailto:k...@instaclustr.com] 
Sent: Dienstag, 19. September 2017 06:24
To: User 
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs 
still triggers anti-compaction on non-repaired SSTables (if I'm reading 
that right), so might need to make sure you don't run multiple repairs at 
the same time across your nodes (if your using vnodes), otherwise could 
still end up trying to run anti-compaction on the same SSTable from 2 
repairs.
 
Anyone else feel free to jump in and correct me if my interpretation is 
wrong.
 
On 18 September 2017 at 17:11, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Jeff,
 
what should be the expected outcome when running with 3.0.14:
 
nodetool repair –full –pr keyspace cfs
 
· Should –full trigger anti-compaction?
· Should this be the same operation as nodetool repair –pr 
keyspace cfs in 2.1?
· Should I be able to  run this on several nodes in parallel as in 
2.1 without troubles, where incremental repair was not the default?
 
Still confused if I’m missing something obvious. Sorry about that. J
 
Thanks,
Thomas
 
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Montag, 18. September 2017 16:10

To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
Sorry I may be wrong about the cause - didn't see -full
 
Mea culpa, its early here and I'm not awake


-- 
Jeff Jirsa
 

On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Hi Jeff,
 
understood. That’s quite a change then coming from 2.1 from an operational 
POV.
 
Thanks again.
 
Thomas
 
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Montag, 18. September 2017 15:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
 
The command you're running will cause anticompaction and the range borders 
for all instances at the same time
 
Since only one repair session can anticompact any given sstable, it's 
almost guaranteed to fail
 
Run it on one instance at a time


-- 
Jeff Jirsa
 

On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
Hi Alex,
 
I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel 
and this may pop up now:
 
0.176.38.128 (progress: 1%)
[2017-09-18 07:59:17,145] Some repair failed
[2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
error: Repair job has failed with the error message: [2017-09-18 
07:59:17,145] Some repair failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2017-09-18 07:59:17,145] Some repair failed
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at 
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)
 
2017-09-18 07:59:17 repair finished
 
 
If running the above nodetool call sequentially on all nodes, repair 
finishes without printing a stack trace.
 
The error message and stack trace isn’t really useful here. Any further 
ideas/experiences?
 
Thanks,
Thomas
 
From: 

Re: Re[6]: Modify keyspace replication strategy and rebalance the nodes

2017-09-19 Thread Jeff Jirsa
The replicas will exist, they just may not have the data we expect them to
have. But the read queries will still get the right data through read
repair, so it'll be fine.


On Tue, Sep 19, 2017 at 5:29 AM, Myron A. Semack 
wrote:

> But if you use ALL, and RF=3, it will be expecting 3 replicas.  After you
> switch to NetworkTopologyStrategy, you potentially only have 1 replica
> until the repair is done.  Won’t the query fail until the repair is done
> because the other two replicas won’t be ready yet (the ALL condition can’t
> be satisfied)?
>
>
>
> Sincerely,
>
> Myron A. Semack
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Monday, September 18, 2017 6:02 PM
> *To:* cassandra 
>
> *Subject:* Re: Re[6]: Modify keyspace replication strategy and rebalance
> the nodes
>
>
>
> Using CL:ALL basically forces you to always include the first replica in
> the query.
>
>
>
> The first replica will be the same for both SimpleStrategy/SimpleSnitch
> and NetworkTopologyStrategy/EC2Snitch.
>
>
>
> It's basically the only way we can guarantee we're not going to lose a row
> because it's only written to the second and third replicas while the first
> replica is down, in case the second and third replicas change to different
> hosts (racks / availability zones) during the ALTER.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Sep 18, 2017 at 1:57 PM, Myron A. Semack 
> wrote:
>
> How would setting the consistency to ALL help?  Wouldn’t that just cause
> EVERY read/write to fail after the ALTER until the repair is complete?
>
>
>
> Sincerely,
>
> Myron A. Semack
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Monday, September 18, 2017 2:42 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Re[6]: Modify keyspace replication strategy and rebalance
> the nodes
>
>
>
> The hard part here is nobody's going to be able to tell you exactly what's
> involved in fixing this because nobody sees your ring
>
>
>
> And since you're using vnodes and have a nontrivial number of instances,
> sharing that ring (and doing anything actionable with it) is nontrivial.
>
>
>
> If you weren't using vnodes, you could just fix the distribution and decom
> extra nodes afterward.
>
>
>
> I thought - but don't have time or energy to check - that the ec2snitch
> would be rack aware even when using simple strategy - if that's not the
> case (as you seem to indicate), then you're in a weird spot - you can't go
> to NTS trivially because doing so will reassign your replicas to be rack/as
> aware, certainly violating your consistency guarantees.
>
>
>
> If you can change your app to temporarily write with ALL and read with
> ALL, and then run repair, then immediately ALTER the keyspace, then run
> repair again, then drop back to whatever consistency you're using, you can
> probably get through it. The challenge is that ALL gets painful if you lose
> any instance.
>
>
>
> But please test in a lab, and note that this is inherently dangerous, I'm
> not advising you to do it, though I do believe it can be made to work.
>
>
>
>
>
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 11:18 AM, Dominik Petrovic  INVALID> wrote:
>
> @jeff what do you think is the best approach here to fix this problem?
> Thank you all for helping me.
>
> Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves <
> k...@instaclustr.com>:
>
> Sorry that only applies our you're using NTS. You're right that simple
> strategy won't work very well in this case. To migrate you'll likely need
> to do a DC migration to ensuite no downtime, as replica placement will
> change even if RF stays the same.
>
>
>
> On 15 Sep. 2017 08:26, "kurt greaves"  wrote:
>
> If you have racks configured and lose nodes you should replace the node
> with one from the same rack. You then need to repair, and definitely don't
> decommission until you do.
>
>
>
> Also 40 nodes with 256 vnodes is not a fun time for repair.
>
>
>
> On 15 Sep. 2017 03:36, "Dominik Petrovic" 
> wrote:
>
> @jeff,
> I'm using 3 availability zones, during the life of the cluster we lost
> nodes, retired others and we end up having some of the data
> written/replicated on a single availability zone. We saw it with nodetool
> getendpoints.
> Regards
>
> Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa <
> jji...@gmail.com>:
>
> With one datacenter/region, what did you discover in an outage you think
> you'll solve with network topology strategy? It should be equivalent for a
> single D.C.
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 14, 2017, at 8:47 AM, Dominik Petrovic  INVALID> wrote:
>
> Thank you for the replies!
>
> @jeff my current cluster details are:
> 1 datacenter
> 40 nodes, with vnodes=256
> RF=3
> What is your advice? is it a production cluster, so I need to be very
> careful about it.
> Regards
>
> Thu, 14 

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Steinmaurer, Thomas
Paulo,

as requested: https://issues.apache.org/jira/browse/CASSANDRA-13885

Feel free to adjust any properties of the ticket. Hopefully it gets proper 
attention. Thanks.

Thomas

-Original Message-
From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Dienstag, 19. September 2017 08:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14

In 4.0 anti-compaction is no longer run after full repairs, so we should 
probably backport this behavior to 3.0, given there are known limitations with 
incremental repair on 3.0 and non-incremental users may want to run keep 
running full repairs without the additional cost of anti-compaction.

Would you mind opening a ticket for this?

2017-09-19 1:33 GMT-05:00 Steinmaurer, Thomas
:
> Hi Kurt,
>
>
>
> thanks for the link!
>
>
>
> Honestly, a pity, that in 3.0, we can’t get the simple, reliable and
> predictable way back to run a full repair for very low data volume CFs
> being kicked off on all nodes in parallel, without all the magic
> behind the scene introduced by incremental repairs, even if not used,
> as anticompaction even with –full has been introduced with 2.2+ J
>
>
>
>
>
> Regards,
>
> Thomas
>
>
>
> From: kurt greaves [mailto:k...@instaclustr.com]
> Sent: Dienstag, 19. September 2017 06:24
> To: User 
>
>
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full
> repairs still triggers anti-compaction on non-repaired SSTables (if
> I'm reading that right), so might need to make sure you don't run
> multiple repairs at the same time across your nodes (if your using
> vnodes), otherwise could still end up trying to run anti-compaction on the 
> same SSTable from 2 repairs.
>
>
>
> Anyone else feel free to jump in and correct me if my interpretation
> is wrong.
>
>
>
> On 18 September 2017 at 17:11, Steinmaurer, Thomas
>  wrote:
>
> Jeff,
>
>
>
> what should be the expected outcome when running with 3.0.14:
>
>
>
> nodetool repair –full –pr keyspace cfs
>
>
>
> · Should –full trigger anti-compaction?
>
> · Should this be the same operation as nodetool repair –pr keyspace
> cfs in 2.1?
>
> · Should I be able to  run this on several nodes in parallel as in
> 2.1 without troubles, where incremental repair was not the default?
>
>
>
> Still confused if I’m missing something obvious. Sorry about that. J
>
>
>
> Thanks,
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 16:10
>
>
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Sorry I may be wrong about the cause - didn't see -full
>
>
>
> Mea culpa, its early here and I'm not awake
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas
>  wrote:
>
> Hi Jeff,
>
>
>
> understood. That’s quite a change then coming from 2.1 from an
> operational POV.
>
>
>
> Thanks again.
>
>
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 15:56
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> The command you're running will cause anticompaction and the range
> borders for all instances at the same time
>
>
>
> Since only one repair session can anticompact any given sstable, it's
> almost guaranteed to fail
>
>
>
> Run it on one instance at a time
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas
>  wrote:
>
> Hi Alex,
>
>
>
> I now ran nodetool repair –full –pr keyspace cfs on all nodes in
> parallel and this may pop up now:
>
>
>
> 0.176.38.128 (progress: 1%)
>
> [2017-09-18 07:59:17,145] Some repair failed
>
> [2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
>
> error: Repair job has failed with the error message: [2017-09-18
> 07:59:17,145] Some repair failed
>
> -- StackTrace --
>
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2017-09-18 07:59:17,145] Some repair failed
>
> at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115
> )
>
> at
> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListene
> r.handleNotification(JMXNotificationProgressListener.java:77)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatch
> Notification(ClientNotifForwarder.java:583)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(Cl
> ientNotifForwarder.java:533)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(Clie
> ntNotifForwarder.java:452)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(
> ClientNotifForwarder.java:108)
>
>
>

RE: Re[6]: Modify keyspace replication strategy and rebalance the nodes

2017-09-19 Thread Myron A. Semack
But if you use ALL, and RF=3, it will be expecting 3 replicas.  After you 
switch to NetworkTopologyStrategy, you potentially only have 1 replica until 
the repair is done.  Won’t the query fail until the repair is done because the 
other two replicas won’t be ready yet (the ALL condition can’t be satisfied)?

Sincerely,
Myron A. Semack


From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Monday, September 18, 2017 6:02 PM
To: cassandra 
Subject: Re: Re[6]: Modify keyspace replication strategy and rebalance the nodes

Using CL:ALL basically forces you to always include the first replica in the 
query.

The first replica will be the same for both SimpleStrategy/SimpleSnitch and 
NetworkTopologyStrategy/EC2Snitch.

It's basically the only way we can guarantee we're not going to lose a row 
because it's only written to the second and third replicas while the first 
replica is down, in case the second and third replicas change to different 
hosts (racks / availability zones) during the ALTER.






On Mon, Sep 18, 2017 at 1:57 PM, Myron A. Semack 
> wrote:
How would setting the consistency to ALL help?  Wouldn’t that just cause EVERY 
read/write to fail after the ALTER until the repair is complete?

Sincerely,
Myron A. Semack

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Monday, September 18, 2017 2:42 PM
To: user@cassandra.apache.org
Subject: Re: Re[6]: Modify keyspace replication strategy and rebalance the nodes

The hard part here is nobody's going to be able to tell you exactly what's 
involved in fixing this because nobody sees your ring

And since you're using vnodes and have a nontrivial number of instances, 
sharing that ring (and doing anything actionable with it) is nontrivial.

If you weren't using vnodes, you could just fix the distribution and decom 
extra nodes afterward.

I thought - but don't have time or energy to check - that the ec2snitch would 
be rack aware even when using simple strategy - if that's not the case (as you 
seem to indicate), then you're in a weird spot - you can't go to NTS trivially 
because doing so will reassign your replicas to be rack/as aware, certainly 
violating your consistency guarantees.

If you can change your app to temporarily write with ALL and read with ALL, and 
then run repair, then immediately ALTER the keyspace, then run repair again, 
then drop back to whatever consistency you're using, you can probably get 
through it. The challenge is that ALL gets painful if you lose any instance.

But please test in a lab, and note that this is inherently dangerous, I'm not 
advising you to do it, though I do believe it can be made to work.





--
Jeff Jirsa


On Sep 18, 2017, at 11:18 AM, Dominik Petrovic 
> 
wrote:
@jeff what do you think is the best approach here to fix this problem?
Thank you all for helping me.
Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves 
>:
Sorry that only applies our you're using NTS. You're right that simple strategy 
won't work very well in this case. To migrate you'll likely need to do a DC 
migration to ensuite no downtime, as replica placement will change even if RF 
stays the same.

On 15 Sep. 2017 08:26, "kurt greaves" 
> wrote:
If you have racks configured and lose nodes you should replace the node with 
one from the same rack. You then need to repair, and definitely don't 
decommission until you do.

Also 40 nodes with 256 vnodes is not a fun time for repair.

On 15 Sep. 2017 03:36, "Dominik Petrovic" 
.invalid> wrote:
@jeff,
I'm using 3 availability zones, during the life of the cluster we lost nodes, 
retired others and we end up having some of the data written/replicated on a 
single availability zone. We saw it with nodetool getendpoints.
Regards
Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa 
>:
With one datacenter/region, what did you discover in an outage you think you'll 
solve with network topology strategy? It should be equivalent for a single D.C.

--
Jeff Jirsa


On Sep 14, 2017, at 8:47 AM, Dominik Petrovic 
> 
wrote:
Thank you for the replies!

@jeff my current cluster details are:
1 datacenter
40 nodes, with vnodes=256
RF=3
What is your advice? is it a production cluster, so I need to be very careful 
about it.
Regards
Thu, 14 Sep 2017 -2:47:52 -0700 from Jeff Jirsa 
>:
The token distribution isn't going to change - the way Cassandra maps replicas 
will change.

How many data centers/regions will you have when you're done? What's your RF 
now? You definitely need to run 

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Paulo Motta
In 4.0 anti-compaction is no longer run after full repairs, so we
should probably backport this behavior to 3.0, given there are known
limitations with incremental repair on 3.0 and non-incremental users
may want to run keep running full repairs without the additional cost
of anti-compaction.

Would you mind opening a ticket for this?

2017-09-19 1:33 GMT-05:00 Steinmaurer, Thomas
:
> Hi Kurt,
>
>
>
> thanks for the link!
>
>
>
> Honestly, a pity, that in 3.0, we can’t get the simple, reliable and
> predictable way back to run a full repair for very low data volume CFs being
> kicked off on all nodes in parallel, without all the magic behind the scene
> introduced by incremental repairs, even if not used, as anticompaction even
> with –full has been introduced with 2.2+ J
>
>
>
>
>
> Regards,
>
> Thomas
>
>
>
> From: kurt greaves [mailto:k...@instaclustr.com]
> Sent: Dienstag, 19. September 2017 06:24
> To: User 
>
>
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs
> still triggers anti-compaction on non-repaired SSTables (if I'm reading that
> right), so might need to make sure you don't run multiple repairs at the
> same time across your nodes (if your using vnodes), otherwise could still
> end up trying to run anti-compaction on the same SSTable from 2 repairs.
>
>
>
> Anyone else feel free to jump in and correct me if my interpretation is
> wrong.
>
>
>
> On 18 September 2017 at 17:11, Steinmaurer, Thomas
>  wrote:
>
> Jeff,
>
>
>
> what should be the expected outcome when running with 3.0.14:
>
>
>
> nodetool repair –full –pr keyspace cfs
>
>
>
> · Should –full trigger anti-compaction?
>
> · Should this be the same operation as nodetool repair –pr keyspace
> cfs in 2.1?
>
> · Should I be able to  run this on several nodes in parallel as in
> 2.1 without troubles, where incremental repair was not the default?
>
>
>
> Still confused if I’m missing something obvious. Sorry about that. J
>
>
>
> Thanks,
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 16:10
>
>
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Sorry I may be wrong about the cause - didn't see -full
>
>
>
> Mea culpa, its early here and I'm not awake
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas
>  wrote:
>
> Hi Jeff,
>
>
>
> understood. That’s quite a change then coming from 2.1 from an operational
> POV.
>
>
>
> Thanks again.
>
>
>
> Thomas
>
>
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 15:56
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> The command you're running will cause anticompaction and the range borders
> for all instances at the same time
>
>
>
> Since only one repair session can anticompact any given sstable, it's almost
> guaranteed to fail
>
>
>
> Run it on one instance at a time
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Sep 18, 2017, at 1:11 AM, Steinmaurer, Thomas
>  wrote:
>
> Hi Alex,
>
>
>
> I now ran nodetool repair –full –pr keyspace cfs on all nodes in parallel
> and this may pop up now:
>
>
>
> 0.176.38.128 (progress: 1%)
>
> [2017-09-18 07:59:17,145] Some repair failed
>
> [2017-09-18 07:59:17,151] Repair command #3 finished in 0 seconds
>
> error: Repair job has failed with the error message: [2017-09-18
> 07:59:17,145] Some repair failed
>
> -- StackTrace --
>
> java.lang.RuntimeException: Repair job has failed with the error message:
> [2017-09-18 07:59:17,145] Some repair failed
>
> at
> org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115)
>
> at
> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
>
> at
> com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)
>
>
>
> 2017-09-18 07:59:17 repair finished
>
>
>
>
>
> If running the above nodetool call sequentially on all nodes, repair
> finishes without printing a stack trace.
>
>
>
> The error message and stack trace isn’t really useful here. Any further
> ideas/experiences?
>
>
>
> Thanks,
>
> Thomas
>
>
>
> From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
> Sent: Freitag, 15. September 2017 11:30
> To: 

RE: ConsitencyLevel and Mutations : Behaviour if the update of the commitlog fails

2017-09-19 Thread Leleu Eric
OK, thanks you.

De : kurt greaves [mailto:k...@instaclustr.com]
Envoyé : mardi 19 septembre 2017 06:35
À : User
Objet : Re: ConsitencyLevel and Mutations : Behaviour if the update of the 
commitlog fails


​Does the coordinator "cancel" the mutation on the "committed" nodes (and how)?
No. Those mutations are applied on those nodes.
 Is it an heuristic case where two nodes have the data whereas they shouldn't 
and we hope that HintedHandoff will replay the mutation ?
Yes. But really you should make sure you recover from this error in your 
client. Hinted handoff might work, but you have no way of knowing if it has 
taken place so if ALL is important you should retry/resolve the failed query 
accordingly.

!!!*
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.!!!"