RE: loosing data during saving data from java

2019-10-18 Thread adrien ruffie
Thank Jeff 🙂

but if you save several data to fast with cassandra repository and if cassandra 
doesn't have the same speed and inserts more slowly.
What is the bevahior ? cassandra store the overflow in a additionnal buffer ? 
No data can be lost on the cassandra's side ?

Thank a lot.

Adrian

De : Jeff Jirsa 
Envoyé : samedi 19 octobre 2019 00:41
Ă€ : cassandra 
Objet : Re: loosing data during saving data from java

There is no buffer in cassandra that is known to (or suspected to) lose 
acknowledged writes if it's overwhelmed.

There may be a client bug where you send so many async writes that they 
overwhelm a bounded queue, or otherwise get dropped or timeout, but those would 
be client bugs, and I'm not sure this list can help you with them.



On Fri, Oct 18, 2019 at 3:16 PM adrien ruffie 
mailto:adriennolar...@hotmail.fr>> wrote:
Hello all,

I have a table cassandra where I insert quickly several java entity
about 15.000 entries by minutes. But at the process ending, I only
have for exemple 199.921 entries instead 312.212
If I truncate the table and relaunch the process, several time I get 199.354
or 189.012 entries ... not a really fixed entries saved any time ...

several coworker tell me, they heard about a buffer which can be overwhelmed
sometimes, and loosing several entities stacked for insertion ...
right ?
Because I don't understand why this loosing insertion appears ...
And I java code is very simple like below:

myEntitiesList.forEach(myEntity -> {
  try {
myEntitiesRepository.save(myEntity).subscribe();
} catch (Exception e) {
e.printStackTrace();
}
}

And the repository is a:
public interface MyEntityRepository extends 
ReactiveCassandraRepository {
}


Some one already heard about this problem ?

Thank you very must and best regards

Adrian


Re: Cassandra Repair question

2019-10-18 Thread Krish Donald
Thanks Manish,

What is the best and fastest way to repair a table using nodetool repair ?
We are using 256 vnodes .


On Fri, Oct 18, 2019 at 10:05 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> No it will only cover primary ranges of nodes on single rac. Repair with
> -pr option is to be run on all nodes in a rolling manner.
>
> Regards
> Manish
>
> On 19 Oct 2019 10:03, "Krish Donald"  wrote:
>
>> Hi Cassandra experts,
>>
>>
>> We are on Cassandra 3.11.1.
>>
>> We have to run repairs for a big cluster.
>>
>> We have 2 DCs.
>>
>> 3 RACs in each DC.
>>
>> Replication factor is 3 for each datacenter .
>>
>> So if I run repair on all nodes of a single  RAC with "pr" option then
>> ideally it will cover all the ranges.
>>
>> Please correct my understanding.
>>
>>
>> Thanks
>>
>>
>>


Re: Cassandra Repair question

2019-10-18 Thread manish khandelwal
No it will only cover primary ranges of nodes on single rac. Repair with
-pr option is to be run on all nodes in a rolling manner.

Regards
Manish

On 19 Oct 2019 10:03, "Krish Donald"  wrote:

> Hi Cassandra experts,
>
>
> We are on Cassandra 3.11.1.
>
> We have to run repairs for a big cluster.
>
> We have 2 DCs.
>
> 3 RACs in each DC.
>
> Replication factor is 3 for each datacenter .
>
> So if I run repair on all nodes of a single  RAC with "pr" option then
> ideally it will cover all the ranges.
>
> Please correct my understanding.
>
>
> Thanks
>
>
>


Cassandra Repair question

2019-10-18 Thread Krish Donald
Hi Cassandra experts,


We are on Cassandra 3.11.1.

We have to run repairs for a big cluster.

We have 2 DCs.

3 RACs in each DC.

Replication factor is 3 for each datacenter .

So if I run repair on all nodes of a single  RAC with "pr" option then
ideally it will cover all the ranges.

Please correct my understanding.


Thanks


GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-18 Thread Sergio Bilello
Hello!

Is it still better to use ParNew + CMS Is it still better than G1GC  these days?

Any recommendation for i3.xlarge nodes read-heavy workload?


Thanks,

Sergio

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cassandra Recommended System Settings

2019-10-18 Thread Sergio Bilello
Hello everyone!

Do you have any setting that you would change or tweak from the below list?

sudo cat /proc/4379/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack sizeunlimitedunlimitedbytes
Max core file sizeunlimitedunlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 3276832768processes
Max open files1048576  1048576  files
Max locked memory unlimitedunlimitedbytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   unlimitedunlimitedsignals
Max msgqueue size unlimitedunlimitedbytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus

These are the sysctl settings
default['cassandra']['sysctl'] = {
'net.ipv4.tcp_keepalive_time' => 60, 
'net.ipv4.tcp_keepalive_probes' => 3, 
'net.ipv4.tcp_keepalive_intvl' => 10,
'net.core.rmem_max' => 16777216,
'net.core.wmem_max' => 16777216,
'net.core.rmem_default' => 16777216,
'net.core.wmem_default' => 16777216,
'net.core.optmem_max' => 40960,
'net.ipv4.tcp_rmem' => '4096 87380 16777216',
'net.ipv4.tcp_wmem' => '4096 65536 16777216',
'net.ipv4.ip_local_port_range' => '1 65535',
'net.ipv4.tcp_window_scaling' => 1,
   'net.core.netdev_max_backlog' => 2500,
   'net.core.somaxconn' => 65000,
'vm.max_map_count' => 1048575,
'vm.swappiness' => 0
}

Am I missing something else?

Do you have any experience to configure CENTOS 7
for 
JAVA HUGE PAGES
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html#CheckJavaHugepagessettings

OPTIMIZE SSD
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html#OptimizeSSDs

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

We are using AWS i3.xlarge instances

Thanks,

Sergio

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: loosing data during saving data from java

2019-10-18 Thread Jeff Jirsa
There is no buffer in cassandra that is known to (or suspected to)
lose acknowledged writes if it's overwhelmed.

There may be a client bug where you send so many async writes that they
overwhelm a bounded queue, or otherwise get dropped or timeout, but those
would be client bugs, and I'm not sure this list can help you with them.



On Fri, Oct 18, 2019 at 3:16 PM adrien ruffie 
wrote:

> Hello all,
>
> I have a table cassandra where I insert quickly several java entity
> about 15.000 entries by minutes. But at the process ending, I only
> have for exemple 199.921 entries instead 312.212
> If I truncate the table and relaunch the process, several time I get
> 199.354
> or 189.012 entries ... not a really fixed entries saved any time ...
>
> several coworker tell me, they heard about a buffer which can be
> overwhelmed
> sometimes, and loosing several entities stacked for insertion ...
> right ?
> Because I don't understand why this loosing insertion appears ...
> And I java code is very simple like below:
>
> myEntitiesList.forEach(myEntity -> {
>   try {
> myEntitiesRepository.save(myEntity).subscribe();
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
> And the repository is a:
> public interface MyEntityRepository extends ReactiveCassandraRepository yEntity, String> {
> }
>
>
> Some one already heard about this problem ?
>
> Thank you very must and best regards
>
> Adrian
>


loosing data during saving data from java

2019-10-18 Thread adrien ruffie
Hello all,

I have a table cassandra where I insert quickly several java entity
about 15.000 entries by minutes. But at the process ending, I only
have for exemple 199.921 entries instead 312.212
If I truncate the table and relaunch the process, several time I get 199.354
or 189.012 entries ... not a really fixed entries saved any time ...

several coworker tell me, they heard about a buffer which can be overwhelmed
sometimes, and loosing several entities stacked for insertion ...
right ?
Because I don't understand why this loosing insertion appears ...
And I java code is very simple like below:

myEntitiesList.forEach(myEntity -> {
  try {
myEntitiesRepository.save(myEntity).subscribe();
} catch (Exception e) {
e.printStackTrace();
}
}

And the repository is a:
public interface MyEntityRepository extends 
ReactiveCassandraRepository {
}


Some one already heard about this problem ?

Thank you very must and best regards

Adrian


Re: TWCS and gc_grace_seconds

2019-10-18 Thread Paul Chandler
Hi Adarsh,

You will have problems if you manually delete data when using TWCS.

To fully understand why, I recommend reading this The Last Pickle post: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
And this post I wrote that dives deeper into the problems with deletes: 
http://www.redshots.com/cassandra-twcs-must-have-ttls/

Thanks 

Paul

> On 18 Oct 2019, at 14:22, Adarsh Kumar  wrote:
> 
> Thanks Jeff,
> 
> 
> I just checked with business and we have differences in having TTL. So it 
> will be manula purging always. We do not want to use LCS due to high IOs.
> So:
> As the use case is of time series data model, TWCS will be give some benefit 
> (without TTL) and with frequent deleted data
> Are there any best practices/recommendations to handle high number of 
> tombstones 
> Can we handle this use case  with STCS also (with some configurations)
> 
> Thanks in advance
> 
> Adarsh Kumar
> 
> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa  > wrote:
> Is everything in the table TTL’d? 
> 
> Do you do explicit deletes before the data is expected to expire ? 
> 
> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d data 
> can’t be resurrected once it expires, so gcgs has no purpose unless you’re 
> deleting it before the ttl expires. If you’re doing that, twcs won’t be able 
> to drop whole sstables anyway, so maybe LCS will be less disk usage (but much 
> higher IO)
> 
>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar > > wrote:
>> 
>> 
>> Hi,
>> 
>> We have a use case of time series data with TTL where we want to use 
>> TimeWindowCompactionStrategy because of its better management for TTL and 
>> tombstones. In this case, data we have is frequently deleted so we want to 
>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure 
>> on storage. I have following questions:
>> Do we always need to run repair for the table in reduced gc_grace_seconds or 
>> there is any other way to manage repairs in this vase
>> Do we have any other strategy (or combination of strategies) to manage 
>> frequently deleted time-series data
>> Thanks in advance.
>> 
>> Adarsh Kumar



Re: TWCS and gc_grace_seconds

2019-10-18 Thread Adarsh Kumar
Thanks Jeff,


I just checked with business and we have differences in having TTL. So it
will be manula purging always. We do not want to use LCS due to high IOs.
So:

   1. As the use case is of time series data model, TWCS will be give some
   benefit (without TTL) and with frequent deleted data
   2. Are there any best practices/recommendations to handle high number of
   tombstones
   3. Can we handle this use case  with STCS also (with some configurations)


Thanks in advance

Adarsh Kumar

On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa  wrote:

> Is everything in the table TTL’d?
>
> Do you do explicit deletes before the data is expected to expire ?
>
> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d
> data can’t be resurrected once it expires, so gcgs has no purpose unless
> you’re deleting it before the ttl expires. If you’re doing that, twcs won’t
> be able to drop whole sstables anyway, so maybe LCS will be less disk usage
> (but much higher IO)
>
> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar  wrote:
>
> 
> Hi,
>
> We have a use case of time series data with TTL where we want to use
> TimeWindowCompactionStrategy because of its better management for TTL and
> tombstones. In this case, data we have is frequently deleted so we want to
> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure
> on storage. I have following questions:
>
>1. Do we always need to run repair for the table in reduced
>gc_grace_seconds or there is any other way to manage repairs in this vase
>2. Do we have any other strategy (or combination of strategies) to
>manage frequently deleted time-series data
>
> Thanks in advance.
>
> Adarsh Kumar
>
>