Default values for dclocal_read_repair_chance = 0.1 and read_repair_chance = 0 Should I set both to 0?

2019-10-26 Thread Sergio
I have a COLUMN Family in a Keyspace with Replication Factor = 3.
The client reads it with LOCAL_QUORUM. Does this mean that all the reads
should kick a read_repair or not?
Are these parameters meaningful only with LOCAL_ONE or ONE Consistency then?

I have also an application that translates some data in SSTABLE format and
prepares the data to be streamed to the cluster with the SSTABLELOADER.
This operation is done to UPDATE the mentioned COLUMN FAMILY.
Can I avoid to repair the COLUMN FAMILY since the clients are using the
LOCAL_QUORUM Consistency Level?
If I use LOCAL_ONE should I repair the table with the REAPER or can I avoid
to repair it if I have all the nodes up and running?
There is no concern to read the most updated data and I believe that it
should be really unlikely that it is going to happen so even with LOCAL_ONE
I should not have concerns and avoid to perform REPAIR with REAPER.
I would like to achieve consistency and possibly avoiding to perform an
expensive repair cycle with REAPER.

What do you think about it?

Reference:
https://stackoverflow.com/questions/33240674/reparing-inconsistency-when-read-repair-chance-0
https://www.slideshare.net/DataStax/real-world-tales-of-repair-alexander-dejanovski-the-last-pickle-cassandra-summit-2016
SLIDE 85. Not repair everything.

Thanks everyone!

Have a great weekend!


Re: Decommissioned Node UNREACHABLE in describecluster but LEFT in gossipinfo

2019-10-26 Thread Sergio Bilello
It disappeared from describecluster after 1 day. It is only in gossipinfo now 
and this looks to be ok :)

On 2019/10/25 04:01:03, Sergio  wrote: 
> Hi guys,
> 
> Cassandra 3.11.4
> 
> nodetool gossipinfo
> /10.1.20.49
>   generation:1571694191
>   heartbeat:279800
>   STATUS:279798:LEFT,-1013739435631815991,1572225050446
>   LOAD:279791:3.4105213781E11
>   SCHEMA:12:5cad59d2-c3d0-3a12-ad10-7578d225b082
>   DC:8:live
>   RACK:10:us-east-1a
>   RELEASE_VERSION:4:3.11.4
>   INTERNAL_IP:6:10.1.20.49
>   RPC_ADDRESS:3:10.1.20.49
>   NET_VERSION:1:11
>   HOST_ID:2:be5a0193-56e7-4d42-8cc8-5d2141ab4872
>   RPC_READY:29:true
>   TOKENS:15:
> 
> The node is not shown in nodetool status
> 
> and it is displayed as UNREACHABLE in nodetool describecluster
> 
> I found this old conversation
> https://grokbase.com/t/cassandra/user/162gwp6pz6/decommissioned-nodes-shows-up-in-nodetool-describecluster-as-unreachable-in-2-1-12-version
> 
> Is there something that I should do to fix this?
> 
> Best,
> 
> Sergio
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: TWCS and gc_grace_seconds

2019-10-26 Thread Jon Haddad
My coworker Radovan wrote up a post on the relationship between gc grace
and hinted handoff:
https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

Jon

On Sat, Oct 26, 2019 at 6:45 AM Hossein Ghiyasi Mehr 
wrote:

> It needs to change gc_grace_seconds carefully because it has side effect
> on hinted handoff.
>
> On Fri, Oct 18, 2019 at 5:04 PM Paul Chandler  wrote:
>
>> Hi Adarsh,
>>
>> You will have problems if you manually delete data when using TWCS.
>>
>> To fully understand why, I recommend reading this The Last Pickle post:
>> https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>> And this post I wrote that dives deeper into the problems with deletes:
>> http://www.redshots.com/cassandra-twcs-must-have-ttls/
>>
>> Thanks
>>
>> Paul
>>
>> On 18 Oct 2019, at 14:22, Adarsh Kumar  wrote:
>>
>> Thanks Jeff,
>>
>>
>> I just checked with business and we have differences in having TTL. So it
>> will be manula purging always. We do not want to use LCS due to high IOs.
>> So:
>>
>>1. As the use case is of time series data model, TWCS will be give
>>some benefit (without TTL) and with frequent deleted data
>>2. Are there any best practices/recommendations to handle high number
>>of tombstones
>>3. Can we handle this use case  with STCS also (with some
>>configurations)
>>
>>
>> Thanks in advance
>>
>> Adarsh Kumar
>>
>> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa  wrote:
>>
>>> Is everything in the table TTL’d?
>>>
>>> Do you do explicit deletes before the data is expected to expire ?
>>>
>>> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
>>> data can’t be resurrected once it expires, so gcgs has no purpose unless
>>> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
>>> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
>>> (but much higher IO)
>>>
>>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar  wrote:
>>>
>>> 
>>> Hi,
>>>
>>> We have a use case of time series data with TTL where we want to use
>>> TimeWindowCompactionStrategy because of its better management for TTL and
>>> tombstones. In this case, data we have is frequently deleted so we want to
>>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
>>> on storage. I have following questions:
>>>
>>>1. Do we always need to run repair for the table in reduced
>>>gc_grace_seconds or there is any other way to manage repairs in this vase
>>>2. Do we have any other strategy (or combination of strategies) to
>>>manage frequently deleted time-series data
>>>
>>> Thanks in advance.
>>>
>>> Adarsh Kumar
>>>
>>>
>>


Re: Repair Issues

2019-10-26 Thread Ben Mills
Thanks Ghiyasi.

On Sat, Oct 26, 2019 at 9:17 AM Hossein Ghiyasi Mehr 
wrote:

> If the problem exist still, and all nodes are up, reboot them one by one.
> Then try to repair one node. After that repair other nodes one by one.
>
> On Fri, Oct 25, 2019 at 12:56 AM Ben Mills  wrote:
>
>>
>> Thanks Jon!
>>
>> This is very helpful - allow me to follow-up and ask a question.
>>
>> (1) Yes, incremental repairs will never be used (unless it becomes viable
>> in Cassandra 4.x someday).
>> (2) I hear you on the JVM - will look into that.
>> (3) Been looking at Cassandra version 3.11.x though was unaware that 3.7
>> is considered non-viable for production use.
>>
>> For (4) - Question/Request:
>>
>> Note that with:
>>
>> -XX:MaxRAMFraction=2
>>
>> the actual amount of memory allocated for heap space is effectively 2Gi
>> (i.e. half of the 4Gi allocated on the machine type). We can definitely
>> increase memory (for heap and nonheap), though can you expand a bit on your
>> heap comment to help my understanding (as this is such a small cluster with
>> such a small amount of data at rest)?
>>
>> Thanks again.
>>
>> On Thu, Oct 24, 2019 at 5:11 PM Jon Haddad  wrote:
>>
>>> There's some major warning signs for me with your environment.  4GB heap
>>> is too low, and Cassandra 3.7 isn't something I would put into production.
>>>
>>> Your surface area for problems is massive right now.  Things I'd do:
>>>
>>> 1. Never use incremental repair.  Seems like you've already stopped
>>> doing them, but it's worth mentioning.
>>> 2. Upgrade to the latest JVM, that version's way out of date.
>>> 3. Upgrade to Cassandra 3.11.latest (we're voting on 3.11.5 right now).
>>> 4. Increase memory to 8GB minimum, preferably 12.
>>>
>>> I usually don't like making a bunch of changes without knowing the root
>>> cause of a problem, but in your case there's so many potential problems I
>>> don't think it's worth digging into, especially since the problem might be
>>> one of the 500 or so bugs that were fixed since this release.
>>>
>>> Once you've done those things it'll be easier to narrow down the problem.
>>>
>>> Jon
>>>
>>>
>>> On Thu, Oct 24, 2019 at 4:59 PM Ben Mills  wrote:
>>>
 Hi Sergio,

 No, not at this time.

 It was in use with this cluster previously, and while there were no
 reaper-specific issues, it was removed to help simplify investigation of
 the underlying repair issues I've described.

 Thanks.

 On Thu, Oct 24, 2019 at 4:21 PM Sergio 
 wrote:

> Are you using Cassandra reaper?
>
> On Thu, Oct 24, 2019, 12:31 PM Ben Mills  wrote:
>
>> Greetings,
>>
>> Inherited a small Cassandra cluster with some repair issues and need
>> some advice on recommended next steps. Apologies in advance for a long
>> email.
>>
>> Issue:
>>
>> Intermittent repair failures on two non-system keyspaces.
>>
>> - platform_users
>> - platform_management
>>
>> Repair Type:
>>
>> Full, parallel repairs are run on each of the three nodes every five
>> days.
>>
>> Repair command output for a typical failure:
>>
>> [2019-10-18 00:22:09,109] Starting repair command #46, repairing
>> keyspace platform_users with repair options (parallelism: parallel, 
>> primary
>> range: false, incremental: false, job threads: 1, ColumnFamilies: [],
>> dataCenters: [], hosts: [], # of ranges: 12)
>> [2019-10-18 00:22:09,242] Repair session
>> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range
>> [(-1890954128429545684,2847510199483651721],
>> (8249813014782655320,-8746483007209345011],
>> (4299912178579297893,6811748355903297393],
>> (-8746483007209345011,-8628999431140554276],
>> (-5865769407232506956,-4746990901966533744],
>> (-4470950459111056725,-1890954128429545684],
>> (4001531392883953257,4299912178579297893],
>> (6811748355903297393,6878104809564599690],
>> (6878104809564599690,8249813014782655320],
>> (-4746990901966533744,-4470950459111056725],
>> (-8628999431140554276,-5865769407232506956],
>> (2847510199483651721,4001531392883953257]] failed with error [repair
>> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2,
>> [(-1890954128429545684,2847510199483651721],
>> (8249813014782655320,-8746483007209345011],
>> (4299912178579297893,6811748355903297393],
>> (-8746483007209345011,-8628999431140554276],
>> (-5865769407232506956,-4746990901966533744],
>> (-4470950459111056725,-1890954128429545684],
>> (4001531392883953257,4299912178579297893],
>> (6811748355903297393,6878104809564599690],
>> (6878104809564599690,8249813014782655320],
>> (-4746990901966533744,-4470950459111056725],
>> (-8628999431140554276,-5865769407232506956],
>> (2847510199483651721,4001531392883953257]]] Validation failed in 
>> /10.x.x.x
>> (progress: 26%)
>> [2019-10-18 00:22:09,246] Some 

Re: Select statement in batch

2019-10-26 Thread Hossein Ghiyasi Mehr
Hello,
Batch isn't for selet only query, it's for transactional queries. If you
want to read data, you should use select query (prepared or simple or etc.)

On Fri, Oct 11, 2019 at 5:50 PM Inquistive allen 
wrote:

> Hello Team,
>
> Wanted to understand the impacted of using a select statement inside a
> batch.
> I keep seeing some slow queries frequently in the logs.
>
> Please comment on what may the impact of the same. Is it the right
> practice. Will a select statement in batch be lead to increase in read
> latency than a normal select prepared statement.
>
> Thanks,
> Allen
>


Re: TWCS and gc_grace_seconds

2019-10-26 Thread Hossein Ghiyasi Mehr
It needs to change gc_grace_seconds carefully because it has side effect on
hinted handoff.

On Fri, Oct 18, 2019 at 5:04 PM Paul Chandler  wrote:

> Hi Adarsh,
>
> You will have problems if you manually delete data when using TWCS.
>
> To fully understand why, I recommend reading this The Last Pickle post:
> https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
> And this post I wrote that dives deeper into the problems with deletes:
> http://www.redshots.com/cassandra-twcs-must-have-ttls/
>
> Thanks
>
> Paul
>
> On 18 Oct 2019, at 14:22, Adarsh Kumar  wrote:
>
> Thanks Jeff,
>
>
> I just checked with business and we have differences in having TTL. So it
> will be manula purging always. We do not want to use LCS due to high IOs.
> So:
>
>1. As the use case is of time series data model, TWCS will be give
>some benefit (without TTL) and with frequent deleted data
>2. Are there any best practices/recommendations to handle high number
>of tombstones
>3. Can we handle this use case  with STCS also (with some
>configurations)
>
>
> Thanks in advance
>
> Adarsh Kumar
>
> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa  wrote:
>
>> Is everything in the table TTL’d?
>>
>> Do you do explicit deletes before the data is expected to expire ?
>>
>> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
>> data can’t be resurrected once it expires, so gcgs has no purpose unless
>> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
>> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
>> (but much higher IO)
>>
>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar  wrote:
>>
>> 
>> Hi,
>>
>> We have a use case of time series data with TTL where we want to use
>> TimeWindowCompactionStrategy because of its better management for TTL and
>> tombstones. In this case, data we have is frequently deleted so we want to
>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
>> on storage. I have following questions:
>>
>>1. Do we always need to run repair for the table in reduced
>>gc_grace_seconds or there is any other way to manage repairs in this vase
>>2. Do we have any other strategy (or combination of strategies) to
>>manage frequently deleted time-series data
>>
>> Thanks in advance.
>>
>> Adarsh Kumar
>>
>>
>


Re: Repair Issues

2019-10-26 Thread Hossein Ghiyasi Mehr
If the problem exist still, and all nodes are up, reboot them one by one.
Then try to repair one node. After that repair other nodes one by one.

On Fri, Oct 25, 2019 at 12:56 AM Ben Mills  wrote:

>
> Thanks Jon!
>
> This is very helpful - allow me to follow-up and ask a question.
>
> (1) Yes, incremental repairs will never be used (unless it becomes viable
> in Cassandra 4.x someday).
> (2) I hear you on the JVM - will look into that.
> (3) Been looking at Cassandra version 3.11.x though was unaware that 3.7
> is considered non-viable for production use.
>
> For (4) - Question/Request:
>
> Note that with:
>
> -XX:MaxRAMFraction=2
>
> the actual amount of memory allocated for heap space is effectively 2Gi
> (i.e. half of the 4Gi allocated on the machine type). We can definitely
> increase memory (for heap and nonheap), though can you expand a bit on your
> heap comment to help my understanding (as this is such a small cluster with
> such a small amount of data at rest)?
>
> Thanks again.
>
> On Thu, Oct 24, 2019 at 5:11 PM Jon Haddad  wrote:
>
>> There's some major warning signs for me with your environment.  4GB heap
>> is too low, and Cassandra 3.7 isn't something I would put into production.
>>
>> Your surface area for problems is massive right now.  Things I'd do:
>>
>> 1. Never use incremental repair.  Seems like you've already stopped doing
>> them, but it's worth mentioning.
>> 2. Upgrade to the latest JVM, that version's way out of date.
>> 3. Upgrade to Cassandra 3.11.latest (we're voting on 3.11.5 right now).
>> 4. Increase memory to 8GB minimum, preferably 12.
>>
>> I usually don't like making a bunch of changes without knowing the root
>> cause of a problem, but in your case there's so many potential problems I
>> don't think it's worth digging into, especially since the problem might be
>> one of the 500 or so bugs that were fixed since this release.
>>
>> Once you've done those things it'll be easier to narrow down the problem.
>>
>> Jon
>>
>>
>> On Thu, Oct 24, 2019 at 4:59 PM Ben Mills  wrote:
>>
>>> Hi Sergio,
>>>
>>> No, not at this time.
>>>
>>> It was in use with this cluster previously, and while there were no
>>> reaper-specific issues, it was removed to help simplify investigation of
>>> the underlying repair issues I've described.
>>>
>>> Thanks.
>>>
>>> On Thu, Oct 24, 2019 at 4:21 PM Sergio 
>>> wrote:
>>>
 Are you using Cassandra reaper?

 On Thu, Oct 24, 2019, 12:31 PM Ben Mills  wrote:

> Greetings,
>
> Inherited a small Cassandra cluster with some repair issues and need
> some advice on recommended next steps. Apologies in advance for a long
> email.
>
> Issue:
>
> Intermittent repair failures on two non-system keyspaces.
>
> - platform_users
> - platform_management
>
> Repair Type:
>
> Full, parallel repairs are run on each of the three nodes every five
> days.
>
> Repair command output for a typical failure:
>
> [2019-10-18 00:22:09,109] Starting repair command #46, repairing
> keyspace platform_users with repair options (parallelism: parallel, 
> primary
> range: false, incremental: false, job threads: 1, ColumnFamilies: [],
> dataCenters: [], hosts: [], # of ranges: 12)
> [2019-10-18 00:22:09,242] Repair session
> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range
> [(-1890954128429545684,2847510199483651721],
> (8249813014782655320,-8746483007209345011],
> (4299912178579297893,6811748355903297393],
> (-8746483007209345011,-8628999431140554276],
> (-5865769407232506956,-4746990901966533744],
> (-4470950459111056725,-1890954128429545684],
> (4001531392883953257,4299912178579297893],
> (6811748355903297393,6878104809564599690],
> (6878104809564599690,8249813014782655320],
> (-4746990901966533744,-4470950459111056725],
> (-8628999431140554276,-5865769407232506956],
> (2847510199483651721,4001531392883953257]] failed with error [repair
> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2,
> [(-1890954128429545684,2847510199483651721],
> (8249813014782655320,-8746483007209345011],
> (4299912178579297893,6811748355903297393],
> (-8746483007209345011,-8628999431140554276],
> (-5865769407232506956,-4746990901966533744],
> (-4470950459111056725,-1890954128429545684],
> (4001531392883953257,4299912178579297893],
> (6811748355903297393,6878104809564599690],
> (6878104809564599690,8249813014782655320],
> (-4746990901966533744,-4470950459111056725],
> (-8628999431140554276,-5865769407232506956],
> (2847510199483651721,4001531392883953257]]] Validation failed in /10.x.x.x
> (progress: 26%)
> [2019-10-18 00:22:09,246] Some repair failed
> [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds
>
> Additional Notes:
>
> Repairs encounter above failures more often than not. Sometimes on one
> node only,