Re: consultant recommendations

2018-06-29 Thread Evelyn Smith
Hey Randy

Instaclustr provides consulting services for Cassandra as well as managed 
services if you are looking to offload the admin burden.

https://www.instaclustr.com/services/cassandra-consulting/ 


Alternatively, send me an email at evelyn.ba...@instaclustr.com and I’d be 
happy to chase this up on Monday with the head of consulting (it’s Friday night 
my time).

Cheers,
Evelyn.

> On 30 Jun 2018, at 2:26 am, Randy Lynn  wrote:
> 
> Having some OOM issues. Would love to get feedback from the group on what 
> companies/consultants you might use?
> 
> -- 
> Randy Lynn 
> rl...@getavail.com <> 
> 
> office: 
> 859.963.1616  ext 202 
> 163 East Main Street - Lexington, KY 40507 - USA 
> 
>    getavail.com 


Re: Tombstone

2018-06-19 Thread Evelyn Smith
TimeWindowCompactionStrategy and don’t delete the data you should be relying on 
Cassandra to drop the SSTables once the data inside has expired.

THat 18% is probably waiting on gc_grace, this shouldn’t be an issue if you are 
letting TWCS drop the data rather then running deletes.

Regards,
Evelyn.

> On 19 Jun 2018, at 8:28 pm, Abhishek Singh  wrote:
> 
> Hi all,
>We using Cassandra for storing events which are time series based 
> for batch processing once a particular batch based on hour is processed we 
> delete the entries but we were left with almost 18% deletes marked as 
> Tombstones.
>  I ran compaction on the particular CF tombstone didn't come 
> down.
> Can anyone suggest what is the optimal tunning/recommended 
> practice used for compaction strategy and GC_grace period with 100k entries 
> and deletes every hour.
> 
> Warm Regards
> Abhishek Singh


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Size of a single Data Row?

2018-06-10 Thread Evelyn Smith
Hi Ralph,

Yes, having partitions of 100mb will seriously hit your performance. But 
usually the issue here is for people handling large numbers of transactions and 
aiming for low latency. My understanding is the column value up to 2GB is it’s 
max. Like after that the system would start to fail, but well before that you 
are going to be seeing a significant performance hit (for most use cases).

I think an important question for you is are you going to be reading these 
files from Cassandra regularly? It sounds like something S3 or Hadoop might be 
more appropriate for.

The other option is if your xml files have some format you could extract the 
data from it and store it that way.

One final point, I’m pretty sure a TEXT type won’t hold a 10mb file let alone a 
1GB file, I think the max size is like 64K characters.

Regards,
Eevee.

> On 10 Jun 2018, at 7:54 pm, Ralph Soika  wrote:
> 
> Hi, 
> I have a general question concerning the Cassandra technology. I already read 
> 2 books but after all I am more and more confused about the question if 
> Cassandra is the right technology. My goal is to store Business Data form a 
> workflow engine into Cassandra. I want to use Cassandra as a kind of archive 
> service because of its fault tolerant and decentralized approach. 
> 
> But here are two things which are confusing me. On the one hand the project 
> claims that a single column value can be 2 GB (1 MB is recommended). On the 
> other hand people explain that a partition should not be larger than 100MB. 
> 
> I plan only one single simple table: 
> 
> CREATE TABLE documents ( 
>created text, 
>id text, 
>data text, 
>PRIMARY KEY (created,id) 
> ); 
> 
> 'created' is the partition key holding the date in ISO fomat (-MM-DD). 
> The 'id' is a clustering key and is unique. 
> 
> But my 'data' column holds a XML document with business data. This cell 
> contains many unstructured data and also media data. The data cell will be 
> between 1 and 10 MB. BUT it can also hold more than 100MB and less than 2GB 
> in some cases. 
> 
> Is Cassandra able to handle this kind of table? Or is Cassandra at the end 
> not recommended for this kind of data? 
> 
> For example I would like to ask if data for a specific date is available : 
> 
> SELECT created,id WHERE created = '2018-06-10' 
> 
> I select without the data column and just ask if data exists. Is the 
> performance automatically poor only because the data cell (no primary key) of 
> some rows is grater then 100MB? Or is cassandra running out of heap space in 
> any case? It is perfectly clear that it makes no sense to select multiple 
> cells which each contain over 100 MB of data in one single query. But this is 
> a fundamental problem and has nothing to do with Cassandra. My java 
> application running in Wildfly would also not be able to handle a data result 
> with multiple GB of data.  But I would expect hat I can select a set of keys 
> just to decide whether to load one single data cell. 
> 
> Cassandra seems like a great system. But many people seem to claim that it is 
> only suitable for mapping a user status list ala Facebook? Is this true? 
> Thanks for you comments in advance. 
> 
> 
> 
> 
> === 
> Ralph 
> 



Re: Single Host: Fix "Unknown CF" issue

2018-06-07 Thread Evelyn Smith
Hey Michael,

In the case that you have a production cluster set up with multiple nodes, 
assuming you have rf>1 it’s easier to just replace the broken node and restore 
it’s data. (For future reference)

I wasn’t sure if view was referring to materialised view at the time although 
Pradeeps comment along with your own suggest it might (I didn’t get a chance to 
look through the code to confirm if view was MV or something else and I’m not 
that familiar with the code base).

As far as the choice of using Materialised Views, they aren’t being deprecated 
they are currently marked as experimental and most people strongly advise you 
to not use them. If you can avoid it don’t do it. They’re associated with a lot 
of bugs and scalability issues. Also they’re just hard to do right if you 
aren’t exceptionally familiar with Cassandra.

Regards,
Evelyn.

> On 7 Jun 2018, at 3:05 am, Pradeep Chhetri  wrote:
> 
> Hi Michael,
> 
> We have faced the same situation as yours in our production environment where 
> we suddenly got "Unknown CF Exception" for materialized views too. We are 
> using Lagom apps with cassandra for persistence. In our case, since these 
> views can be regenerated from the original events, we were able to safely 
> recover.
> 
> Few suggestions from my operations experience:
> 
> 1) Upgrade your cassandra cluster to 3.11.2 because there are lots of bug 
> fixes specific to materialized views.
> 2) Never let your application create/update/delete cassandra 
> table/materialized views. Always create them manually to make sure that only 
> connection is doing the operation.
> 
> Regards,
> Pradeep
> 
> 
> 
> On Wed, Jun 6, 2018 at 9:44 PM, mailto:m...@vis.at>> wrote:
> Hi Evelyn,
> 
> thanks a lot for your detailed response message.
> 
> The data is not important. We've already wiped the data and created a new 
> cassandra installation. The data re-import task is already running. We've 
> lost the data for a couple of months but in this case this does not matter.
> 
> Nevertheless we will try what you told us - just to be smarter/faster if this 
> happens in production (where we will setup a cassandra cluster with multiple 
> cassandra nodes anyway). I will drop you a note when we are done.
> 
> Hmmm... the problem is within a "View". Are this the materialized views?
> 
> I'm asking this because:
> * Someone on the internet (stackoverflow if a recall correctly) mentioned 
> that using materialized views are to be deprecated.
> * I had been on a datastax workshop in Zurich a couple of days ago where a 
> datastax employee told me that we should not use materialized views - it is 
> better to create & fill all tables directly.
> 
> Would you also recommend not to use materialized views? As this problem is 
> related to a view - maybe we could avoid this problem simply by following 
> this recommendation.
> 
> Thanks a lot again!
> 
> Greetings,
> Michael
> 
> 
> 
> 
> On 06.06.2018 16:48, Evelyn Smith wrote:
> Hi Michael,
> 
> So I looked at the code, here are some stages of your error message:
> 1. at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292)
> [apache-cassandra-3.11.0.jar:3.11.0
>  At this step Cassandra is running through the keyspaces in it’s
> schema turning off compactions for all tables before it starts
> rerunning the commit log (so it isn’t an issue with the commit log).
> 2. at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>  Loading key space related to the column family that is erroring out
> 3. at org.apache.cassandra.db.Keyspace.(Keyspace.java:324)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>  Cassandra has initialised the column family and is reloading the view
> 4. at
> org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>  At this point I haven’t had enough time to tell if Cassandra is
> requesting info on a column specifically or still requesting
> information on a column family. Regardless, given we already rule out
> issues with the SSTables and their directory and Cassandra is yet to
> start processing the commit log this to me suggests it’s something
> wrong in one of the system keyspaces storing the schema information.
> 
> There should definitely be a way to resolve this with zero data loss
> by either:
> 1. Fixing the issue in the system keyspace SSTables (hard)
> 2. Rerunning the commit log on a new Cassandra node that has been
> restored from the current one (I’m not sure if this is possible but
> I’ll figure it out tomorrow)
> 
> The alternative is if you are ok with losing the commitlog then you
> can backu

Re: Single Host: Fix "Unknown CF" issue

2018-06-06 Thread Evelyn Smith
Hi Michael,

So I looked at the code, here are some stages of your error message:
1. at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[apache-cassandra-3.11.0.jar:3.11.0
At this step Cassandra is running through the keyspaces in it’s schema 
turning off compactions for all tables before it starts rerunning the commit 
log (so it isn’t an issue with the commit log).
2. at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127) 
~[apache-cassandra-3.11.0.jar:3.11.0]
Loading key space related to the column family that is erroring out
3. at org.apache.cassandra.db.Keyspace.(Keyspace.java:324) 
~[apache-cassandra-3.11.0.jar:3.11.0]
Cassandra has initialised the column family and is reloading the view
4. at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204) 
~[apache-cassandra-3.11.0.jar:3.11.0]
At this point I haven’t had enough time to tell if Cassandra is 
requesting info on a column specifically or still requesting information on a 
column family. Regardless, given we already rule out issues with the SSTables 
and their directory and Cassandra is yet to start processing the commit log 
this to me suggests it’s something wrong in one of the system keyspaces storing 
the schema information.

There should definitely be a way to resolve this with zero data loss by either:
1. Fixing the issue in the system keyspace SSTables (hard)
2. Rerunning the commit log on a new Cassandra node that has been restored from 
the current one (I’m not sure if this is possible but I’ll figure it out 
tomorrow)

The alternative is if you are ok with losing the commitlog then you can backup 
the data and restore it to a new node (or the same node but with everything 
blown away). This isn’t a trivial process though I’ve done it a few times.

How important is the data?

Happy to come back to this tomorrow (need some sleep)

Regards,
Eevee.




> On 5 Jun 2018, at 7:32 pm, m...@vis.at wrote:
> 
> Keyspace.getColumnFamilyStore



Re: Single Host: Fix "Unknown CF" issue

2018-06-05 Thread Evelyn Smith
Hey Michael,

I have a hunch.

If the system doesn’t recognise the column family which is stopping the node 
from starting perhaps try copying the column family directory to a backup then 
deleting it.

Then restart Cassandra. If it starts I’ll assume the schema didn’t have the 
column family:
* Create the column family again (be careful to create it exactly how it was in 
the original schema);
* Stop Cassandra again;
* Move the SSTables from the column family backup into the new column family 
folder (you have to do this as the column family folder will have a UUID in 
it’s name that will have changed); and
* Restart Cassandra.
You should now have Cassandra running without losing your data.

If Cassandra doesn’t restart after deleting the column family directory then 
just restore it from the backup and you are back to square one.

Regards,
Evelyn.

> On 5 Jun 2018, at 7:32 pm, m...@vis.at wrote:
> 
> Hi all!
> 
> We're using cassandra since a couple of month to get familiar with it. We're 
> currently using only 1-node. Yesterday our server had to be restarted and now 
> cassandra does not start anymore.
> 
> It reports:
> INFO  [main] 2018-06-05 09:50:43,030 ColumnFamilyStore.java:406 - 
> Initializing system_schema.indexes
> INFO  [main] 2018-06-05 09:50:43,036 ViewManager.java:137 - Not submitting 
> build tasks for views in keyspace system_schema as storage service is not 
> initialized
> INFO  [main] 2018-06-05 09:50:43,283 ColumnFamilyStore.java:406 - 
> Initializing system_traces.events
> INFO  [main] 2018-06-05 09:50:43,286 ColumnFamilyStore.java:406 - 
> Initializing system_traces.sessions
> INFO  [main] 2018-06-05 09:50:43,287 ViewManager.java:137 - Not submitting 
> build tasks for views in keyspace system_traces as storage service is not 
> initialized
> INFO  [main] 2018-06-05 09:50:43,300 ColumnFamilyStore.java:406 - 
> Initializing m2m_auth.user
> INFO  [main] 2018-06-05 09:50:43,302 ColumnFamilyStore.java:406 - 
> Initializing m2m_auth.eventsbytag1
> INFO  [main] 2018-06-05 09:50:43,306 ColumnFamilyStore.java:406 - 
> Initializing m2m_auth.mail2user
> ERROR [main] 2018-06-05 09:50:43,311 CassandraDaemon.java:706 - Exception 
> encountered during startup
> java.lang.IllegalArgumentException: Unknown CF 
> 0f6c8b36-5f34-11e8-a476-c93745f84272
>at 
> org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.db.view.ViewManager.addView(ViewManager.java:152) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.db.view.ViewManager.reload(ViewManager.java:125) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at org.apache.cassandra.db.Keyspace.(Keyspace.java:324) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at org.apache.cassandra.db.Keyspace.open(Keyspace.java:104) 
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
> [apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
>  [apache-cassandra-3.11.0.jar:3.11.0]
>at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
> [apache-cassandra-3.11.0.jar:3.11.0]
> 
> We do not know how to bring it up again and e.g. stackoverflow only 
> recommends to delete the node and let it rebuild (which we cannot do as we've 
> only 1 node currently).
> Can anybody drop us a hint ?
> 
> What can we do if our node does not start?
> 
> We've found an entry in the data directory with the given UUID but at the 
> time when we restarted the node the whole keyspace was idle for a couple of 
> hours.
> With the problem we're now _very_ concerned about data safety when we read 
> that it can be a problem if e.g. tables are created/deleted/updated 
> concurrently. But in our case we did not create/update tables concurrently 
> and got this problem anyway
> 
> Thanks for any help!
> 
> greetings,
> Michael
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Adding new nodes to cluster to speedup pending compactions

2018-04-27 Thread Evelyn Smith
Hi Mikhall,

There are a few ways to speed up compactions in the short term:
- nodetool setcompactionthroughput 0
This will unthrottle compactions but obviously unthrottling compactions puts 
you at risk of high latency while compactions are running.
- nodetool setconcurrentcompactors 2
You usually want to set this to the lower of disks or cores. If you are using 
SSDs you want to use the number of cores which it looks like d2.xlarge have 2 
virtual cores.
- nodetool disablebinary
You can use this to stop an individual node from acting as coordinator.
This will let the node focus on catching up on compactions and you can use it 
if one or two nodes has significantly higher pending compactions then the rest 
of the cluster.
- nodetool disablegossip / disablethrift
Same logic as above except with accepting writes and you can only do it for 
~2-2..5 hours or you risk inconsistent data by missing the hinted handoff 
period.

Long term solutions:
- Consider switching instance type
The nodes you are using are storage optimised. They have very little processing 
power which is needed to process compactions. Also the AWS documentation seems 
to suggest HDD not SSD on this instance. Are you sure you actually have SSDs 
because that makes a big difference.
- Add nodes
The data will redistribute over more nodes and each node will be responsible 
for less compactions (less data ~= less compactions)
- If it’s a batch load make Spark do it
My impression is that you want to batch load from Cassandra to Elasticsearch 
after batch loading from Spark to Cassandra. If that is the case, why not get 
Spark to do the batch load if it already has the data (maybe I’m 
misinterpreting what you are doing).
- Consider throttling Spark when it batch loads to Cassandra
If Cassandra gets overwhelmed it can start acting up, keep an eye out for lots 
of undersized SSTables, it might be a sign that Cassandra is running out of 
Memory during the batch load and flushing lots of little Memtables to Disk as 
SSTables to conserve memory.

Some final follow up questions:
- What is the purpose of this cluster?
Is it to support BAU, run daily analytics, or event an occasional one time 
cluster required to spin up for some analysis before being spun down? This info 
helps a lots in understanding where you can make concessions.
 - What is the flow of data and what are the timing requirements?

Cheers,
Eevee.

> On 28 Apr 2018, at 3:54 am, Mikhail Tsaplin  wrote:
> 
> The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD disks), 
> Cassandra 3.0.9.
> Increased compaction throughput from 16 to 200 - active compaction remaining 
> time decreased.
> What will happen if another node will join the cluster? - will former nodes 
> move part of theirs SSTables to the new node unchanged and compaction time 
> will be reduced?
> 
> 
> 
> $ nodetool cfstats -H  dump_es
>   
>  
> Keyspace: table_b
> Read Count: 0
> Read Latency: NaN ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: table_b
> SSTable count: 18155
> Space used (live): 1.2 TB
> Space used (total): 1.2 TB
> Space used by snapshots (total): 0 bytes
> Off heap memory used (total): 3.62 GB
> SSTable Compression Ratio: 0.20371982719658258
> Number of keys (estimate): 712032622
> Memtable cell count: 0
> Memtable data size: 0 bytes
> Memtable off heap memory used: 0 bytes
> Memtable switch count: 0
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 2.22 GB
> Bloom filter off heap memory used: 2.56 GB
> Index summary off heap memory used: 357.51 MB
> Compression metadata off heap memory used: 724.97 MB
> Compacted partition minimum bytes: 771 bytes
> Compacted partition maximum bytes: 1.55 MB
> Compacted partition mean bytes: 3.47 KB
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> 
> 
> 2018-04-27 22:21 GMT+07:00 Nicolas Guyomar  >:
> Hi Mikhail,
> 
> Could you please provide :
> - your cluster version/topology (number of nodes, cpu, r

Re: Sorl/DSE Spark

2018-04-12 Thread Evelyn Smith
Cassandra tends to be used in a lot of web applications. It’s loads are more 
natural and evenly distributed. Like people logging on throughout the day. And 
people operating it tend to be latency sensitive.

Spark on the other hand will try and complete it’s tasks as quickly as 
possible. This might mean bulk reading from the Cassandra at 10 times the usual 
operations load, but for only say 5 minutes every half hour (however long it 
takes to read in the data for a job and whenever that job is run). In this case 
during that 5 minutes your normal operations work (customers) are going to 
experience a lot of latency.

This even happens with streaming jobs, every time spark goes to interact with 
Cassandra it does so very quickly, hammers it for reads and then does it’s own 
stuff until it needs to write things out. This might equate to intermittent 
latency spikes.

In theory, you can throttle your reads and writes but I don’t know much about 
this and don’t see people actually doing it.

Regards,
Evelyn.

> On 12 Apr 2018, at 4:30 pm, sha p  wrote:
> 
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is not great 
> if you are are sensitive to that. 
> 
> 
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith,  <mailto:u5015...@gmail.com>> wrote:
> Are you building a search engine -> Solr
> Are you building an analytics function -> Spark
> 
> I feel they are used in significantly different use cases, what are you 
> trying to build?
> 
> If it’s an analytics functionality that’s seperate from your operations 
> functionality I’d build it in it’s own DC. Spark is notorious for causing 
> latency spikes in Cassandra which is not great if you are are sensitive to 
> that. 
> 
> Regards,
> Evelyn.
>> On 12 Apr 2018, at 6:55 am, kooljava2 > <mailto:koolja...@yahoo.com.INVALID>> wrote:
>> 
>> Hello,
>> 
>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>> 
>> Any other suggestion or comments are appreciated.
>> 
>> Thank you.
>> 
> 



Re: Sorl/DSE Spark

2018-04-11 Thread Evelyn Smith
Are you building a search engine -> Solr
Are you building an analytics function -> Spark

I feel they are used in significantly different use cases, what are you trying 
to build?

If it’s an analytics functionality that’s seperate from your operations 
functionality I’d build it in it’s own DC. Spark is notorious for causing 
latency spikes in Cassandra which is not great if you are are sensitive to 
that. 

Regards,
Evelyn.
> On 12 Apr 2018, at 6:55 am, kooljava2  wrote:
> 
> Hello,
> 
> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
> 1) How do we decide which one to use?
> 2) Do we run this on a DC where there is less workload?
> 
> Any other suggestion or comments are appreciated.
> 
> Thank you.
> 



Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Not sure if it differs for SASI Secondary Indexes but my understanding is it’s 
a bad idea to use high cardinality columns for Secondary Indexes. 
Not sure what your data model looks like but I’d assume UUID would have very 
high cardinality.

If that’s the case it pretty much guarantees any query on the secondary index 
will hit all the nodes, which is what you want to avoid.

Also Secondary Indexes are generally bad for Cassandra, if you don’t need them 
or there's a way around using them I’d go with that.

Regards,
Eevee.

> On 5 Apr 2018, at 11:27 pm, Zsolt Pálmai  wrote:
> 
> Tried both (although with the biggest table) and the result is the same. 
> 
> I stumbled upon this jira issue: 
> https://issues.apache.org/jira/browse/CASSANDRA-12662 
> <https://issues.apache.org/jira/browse/CASSANDRA-12662>
> Since the sasi indexes I use are only helping in debugging (for now) I 
> dropped them and it seems the tables get compacted now (at least it made it 
> further then before and the jvm metrics look healthy). 
> 
> Still this is not ideal as it would be nice to have those secondary indexes 
> :/ . 
> 
> The columns I indexed are basically uuids (so I can match the rows from other 
> systems but this is usually triggered manually so performance loss is 
> acceptable). 
> Is there a recommended index to use here? Or setting the 
> max_compaction_flush_memory_in_mb value? I saw that it can cause different 
> kind of problems... Or the default secondary index?
> 
> Thanks
> 
> 
> 
> 2018-04-05 15:14 GMT+02:00 Evelyn Smith  <mailto:u5015...@gmail.com>>:
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > <mailto:zpal...@gmail.com>> wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH <https://ibb.co/kLhdXH>
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> <https://ibb.co/ctkyXH>
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Oh and second, are you attempting a major compact while you have all those 
pending compactions?

Try letting the cluster catch up on compactions. Having that many pending is 
bad.

If you have replication factor of 3 and quorum you could go node by node and 
disable binary, raise concurrent compactors to 4 and unthrottle compactions by 
setting throughput to zero. This can help it catch up on those compactions. 
Then you can deal with trying a major compaction.

Regards,
Evelyn.

> On 5 Apr 2018, at 11:14 pm, Evelyn Smith  wrote:
> 
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > <mailto:zpal...@gmail.com>> wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH <https://ibb.co/kLhdXH>
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> <https://ibb.co/ctkyXH>
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Probably a dumb question but it’s good to clarify.

Are you compacting the whole keyspace or are you compacting tables one at a 
time?

> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
> 
> Hi!
> 
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
> and when running the nodetool compact command on any of the servers I get out 
> of memory exception after a while.
> 
> - Before calling the compact first I did a repair and before that there was a 
> bigger update on a lot of entries so I guess a lot of sstables were created. 
> The reapir created around ~250 pending compaction tasks, 2 of the nodes I 
> managed to finish with upgrading to a 2xlarge machine and twice the heap (but 
> running the compact on them manually also killed one :/ so this isn't an 
> ideal solution)
> 
> Some more info: 
> - Version is the newest 3.11.2 with java8u116
> - Using LeveledCompactionStrategy (we have mostly reads)
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got an 
> OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried 
> setting throughput between 16 and 128, no changes.
> - Storage load is 127Gb/140Gb/151Gb/155Gb
> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
> - The biggest partition I found was 90Mb but that table has only 2 sstables 
> attached and compacts in seconds. The rest is mostly 1 line partition with a 
> few 10KB of data.
> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
> 0, 0, 0]
> 
> In the metrics it looks something like this before dying: 
> https://ibb.co/kLhdXH 
> 
> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
> 
> 
> The load is usually pretty low, the nodes are almost idling (avg 500 
> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
> writes) and the pending tasks is also around 0 usually.
> 
> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
> cause problems? I could finish some bigger compactions where there was no 
> index attached but I'm not sure 100% if this is the cause.
> 
> Thanks,
> Zsolt
> 
> 
> 



Re: Many SSTables only on one node

2018-04-05 Thread Evelyn Smith
It might not be what cause it here. But check your logs for anti-compactions.

> On 5 Apr 2018, at 8:35 pm, Dmitry Simonov  wrote:
> 
> Thank you!
> I'll check this out.
> 
> 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski  >:
> 40 pending compactions is pretty high and you should have way less than that 
> most of the time, otherwise it means that compaction is not keeping up with 
> your write rate.
> 
> If you indeed have SSDs for data storage, increase your compaction throughput 
> to 100 or 200 (depending on how the CPUs handle the load). You can experiment 
> with compaction throughput using : nodetool setcompactionthroughput 100
> 
> You can raise the number of concurrent compactors as well and set it to a 
> value between 4 and 6 if you have at least 8 cores and CPUs aren't 
> overwhelmed.
> 
> I'm not sure why you ended up with only one node having 6k SSTables and not 
> the others, but you should apply the above changes so that you can lower the 
> number of pending compactions and see if it prevents the issue from happening 
> again.
> 
> Cheers,
> 
> 
> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov  > wrote:
> Hi, Alexander!
> 
> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
> Current compaction throughput is 16 MB/s (default value).
> 
> We always have about 40 pending and 2 active "CompactionExecutor" tasks in 
> "tpstats".
> Mostly because of another (bigger) keyspace in this cluster.
> But the situation is the same on each node.
> 
> According to "nodetool compactionhistory", compactions on this CF run 
> (sometimes several times per day, sometimes one time per day, the last run 
> was yesterday).
> We run "repair -full" regulary for this keyspace (every 24 hours on each 
> node), because gc_grace_seconds is set to 24 hours.
> 
> Should we consider increasing compaction throughput and 
> "concurrent_compactors" (as recommended for SSDs) to keep 
> "CompactionExecutor" pending tasks low?
> 
> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski  >:
> Hi Dmitry,
> 
> could you tell us which compaction strategy that table is currently using ?
> Also, what is the compaction max throughput and is auto-compaction correctly 
> enabled on that node ?
> 
> Did you recently run repair ?
> 
> Thanks,
> 
> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov  > wrote:
> Hello!
> 
> Could you please give some ideas on the following problem?
> 
> We have a cluster with 3 nodes, running Cassandra 2.2.11.
> 
> We've recently discovered high CPU usage on one cluster node, after some 
> investigation we found that number of sstables for one CF on it is very big: 
> 5800 sstables, on other nodes: 3 sstable.
> 
> Data size in this keyspace was not very big ~100-200Mb per node.
> 
> There is no such problem with other CFs of that keyspace.
> 
> nodetool compact solved the issue as a quick-fix.
> 
> But I'm wondering, what was the cause? How prevent it from repeating?
> 
> -- 
> Best Regards,
> Dmitry Simonov
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 
> 
> 
> -- 
> Best Regards,
> Dmitry Simonov
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 
> 
> 
> -- 
> Best Regards,
> Dmitry Simonov