Re: Should replica placement change after a topology change?

2015-09-11 Thread Robert Coli
On Fri, Sep 11, 2015 at 7:24 AM, Richard Dawe 
wrote:

> Thanks, Nate and Rob. We are going to have to migrate some installations
> from SimpleSnitch to Ec2Snitch, others to GossipingPropertyFileSnitch. Your
> help is much appreciated!
>

If I were operating in a hybrid ec2/non-ec2 environment, I'd use GPFS
everywhere, FWIW.

=Rob


Tag filtering data model

2015-09-11 Thread Artur Siekielski

I store documents submitted by users, with optional tags (lists of strings):

CREATE TABLE doc (
  user_id uuid,
  date text, // part of partition key, to distribute data better
  doc_id uuid,
  tags list,
  contents text,
  PRIMARY KEY((user_id, date), doc_id)
);

What is the best way to implement tag filtering? A user can select a 
list of tags and get documents with the tags. I thought about:


1) Full denormalization - include tags in the primary key and insert a 
doc for each subset of specified tags. This will however lead to large 
disk space usage, because there are 2**n subsets (for 10 tags and a 1MB 
doc 1000MB would be written).


2) Secondary index on 'tags' collection, and using queries like:
SELECT * FROM doc WHERE user_id=? AND date=? AND tags CONTAINS=? AND 
tags CONTAINS=? ...


Since I will supply partition key value, I assume there will be no 
problems with contacting multiple nodes. But how well will it work for 
hundreds of thousands of results? I think intersection of tag matches 
needs to be performed in memory so it will not scale well.


3) Partial denormalization - do inserts for each single tag and then 
manually compute intersection. However in the worst case it can lead to 
scanning almost the whole table.


4) Full denormalization but without contents. I would get correct 
doc_ids fast, then I would need to use '... WHERE doc_id IN ?' with 
potentially a very large list of doc_ids.



What's Cassandra's way to implement this?


Re: High CPU usage on some of nodes

2015-09-11 Thread Roman Tkachenko
I have another datapoint from our monitoring system that shows huge
outbound network traffic increase for the affected boxes during these
spikes:

[image: Inline image 1]

Looking at inbound traffic, it is increased on nodes other than these
(purple, yellow and blue) so it does look like some kind of excessive
internode communication is going on between these 3 nodes and the rest of
the cluster.

What could these network spikes be a sign of?


On Thu, Sep 10, 2015 at 12:00 PM, Graham Sanderson  wrote:

> Haven’t been following this thread, but we run beefy machines with 8gig
> new gen, 12 gig old gen (down from 16g since moving memtables off heap, we
> can probably go lower)…
>
> Apart from making sure you have all the latest -XX: flags from
> cassandra-env.sh (and MALLOC_ARENA_MAX), I personally would recommend
> running latest 2.1.x with
>
> memory_allocator: JEMallocAllocator
> memtable_allocation_type: offheap_objects
>
> Some people will probably disagree, but it works great for us (rare long
> pauses sub 2 secs), and if you’re seeing slow GC because of promotion
> failure of objects 131074 dwords big, then I definitely suggest you give it
> a try.
>
> On Sep 10, 2015, at 1:43 PM, Robert Coli  wrote:
>
> On Thu, Sep 10, 2015 at 10:54 AM, Roman Tkachenko 
> wrote:
>>
>> [5 second CMS GC] Is my best shot to play with JVM settings trying to
>> tune garbage collection then?
>>
>
> Yep. As a minor note, if the machines are that beefy, they probably have a
> lot of RAM, you might wish to consider trying G1 GC and a larger heap.
>
> =Rob
>
>
>
>
>


Subscribe again?

2015-09-11 Thread Ahamed, Aadil
I suddenly stopped receiving mails from this mailing list. Do I need to 
subscribe again?

Thanks,
Aadil


Re: High CPU usage on some of nodes

2015-09-11 Thread Graham Sanderson
again I haven’t read this thread from the beginning so I don’t know which node 
is which, but if nodes pause for longish GC, then other nodes will likely be 
saving hints (assuming you are writing at the time), then they will be 
delivered once the machines become responsive again. I’m just guessing though. 
Take a look at the hinting metrics.
> On Sep 11, 2015, at 2:45 PM, Roman Tkachenko  wrote:
> 
> I have another datapoint from our monitoring system that shows huge outbound 
> network traffic increase for the affected boxes during these spikes:
> 
> 
> 
> Looking at inbound traffic, it is increased on nodes other than these 
> (purple, yellow and blue) so it does look like some kind of excessive 
> internode communication is going on between these 3 nodes and the rest of the 
> cluster.
> 
> What could these network spikes be a sign of?
> 
> 
> On Thu, Sep 10, 2015 at 12:00 PM, Graham Sanderson  > wrote:
> Haven’t been following this thread, but we run beefy machines with 8gig new 
> gen, 12 gig old gen (down from 16g since moving memtables off heap, we can 
> probably go lower)…
> 
> Apart from making sure you have all the latest -XX: flags from 
> cassandra-env.sh (and MALLOC_ARENA_MAX), I personally would recommend running 
> latest 2.1.x with
> 
> memory_allocator: JEMallocAllocator
> memtable_allocation_type: offheap_objects
> 
> Some people will probably disagree, but it works great for us (rare long 
> pauses sub 2 secs), and if you’re seeing slow GC because of promotion failure 
> of objects 131074 dwords big, then I definitely suggest you give it a try.
> 
>> On Sep 10, 2015, at 1:43 PM, Robert Coli > > wrote:
>> 
>> On Thu, Sep 10, 2015 at 10:54 AM, Roman Tkachenko > > wrote: 
>> [5 second CMS GC] Is my best shot to play with JVM settings trying to tune 
>> garbage collection then?
>> 
>> Yep. As a minor note, if the machines are that beefy, they probably have a 
>> lot of RAM, you might wish to consider trying G1 GC and a larger heap.
>> 
>> =Rob
>> 
>>  
> 
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Subscribe again?

2015-09-11 Thread Gene
Check your spam folder.

On Fri, Sep 11, 2015 at 11:55 AM, Ahamed, Aadil  wrote:

> I suddenly stopped receiving mails from this mailing list. Do I need to
> subscribe again?
>
> Thanks,
> Aadil
>


Re: confusion about nodetool cfstats

2015-09-11 Thread Otis Gospodnetić
It's a single node, metric value showing just a single moment in time.
Something with historical data, aggregation across the whole cluster, etc.
may be better.  See SPM for Cassandra or OpsCenter - they come with out of
the box reports for Cassandra.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Sep 10, 2015 at 10:38 PM, Shuo Chen  wrote:

> Sorry to send the previous message.
>
> I want to monitor columnfamily space used with nodetool cfstats. The
> document says,
> Space used (live), bytes:9592399Space that is measured depends on
> operating system
>
> Is this metric shows space used on one nodes or on the whole cluster?
>
> If it is just one node, is there a method to retrieve load info on the
> whole cluster?
>
> 
> Shuo Chen
>
>
> On Fri, Sep 11, 2015 at 10:36 AM, Shuo Chen  wrote:
>
>> Hi!
>>
>> I want to monitor columnfamily space used with nodetool cfstats. The
>> document says,
>> Space used (live), bytes:9592399Space that is measured depends on
>> operating system
>>
>
>


Re: High CPU usage on some of nodes

2015-09-11 Thread Otis Gospodnetić
A quick and dirty way is to run jstack a few times and see if you can spot
some common methods where code is spending time.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Sep 10, 2015 at 1:05 AM, Roman Tkachenko 
wrote:

> Hey guys,
>
> We've been having issues in the past couple of days with CPU usage / load
> average suddenly skyrocketing on some nodes of the cluster, affecting
> performance significantly so majority of requests start timing out. It can
> go on for several hours, with CPU spiking through the roof then coming back
> down to norm and so on. Weirdly, it affects only a subset of nodes and it's
> always the same ones. The boxes Cassandra is running on are pretty beefy,
> 24 cores, and these CPU spikes go up to >1000%.
>
> What is the best way to debug such kind of issues and find out what
> Cassandra is doing during spikes like this? Doesn't seem to be compaction
> related as sometimes during these spikes "nodetool compactionstats" says no
> compactions are running.
>
> Thanks!
>
>


Re: Network / GC / Latency spike

2015-09-11 Thread Otis Gospodnetić
Hi Alain,

Nice charts! ;)  (attachments came through the list).

Since you're using SPM for monitoring Cassandra, you may want to have a
look at https://sematext.atlassian.net/wiki/display/PUBSPM/Network+Map
which I think would have shown which nodes were talking to which nodes and
how much. Don't have a screenshot to share, but it looks a bit like the one
on http://blog.sematext.com/2015/08/06/introducing-appmap/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Sep 10, 2015 at 11:43 AM, Alain RODRIGUEZ 
wrote:

> Hi, just wanted to drop the follow up here.
>
> I finally figure out that bigdata guys were basically hammering the
> cluster by reading 2 month of data as fast as possible on one table at boot
> time to cache it. As this table is storing 12 MB blobs (Bloom Filters),
> even if the number of reads was not very high, as each row is really big,
> reads + read repairs were putting to much pressure on Cassandra. Those
> reads were mixed with much higher workloads so I was not seeing any burst
> in reads, making this harder to troubleshoot. Local reads (from Sematext /
> Opscenter) helped finding this out.
>
> Given the use case (no random reads, write once, no update) and the data
> size for each element, we will get this out of Cassandra to some HDFS or S3
> storage, basically. We do not need any database for this kind of job.
> Meanwhile we just disabled this feature as it is not something critical.
>
> @Fabien, Thank you for your help.
>
> C*heers,
>
> Alain
>
> 2015-09-02 0:43 GMT+02:00 Fabien Rousseau :
>
>> Hi Alain,
>>
>> Maybe it's possible to confirm this by testing on a small cluster:
>> - create a cluster of 2 nodes (using https://github.com/pcmanus/ccm for
>> example)
>> - create a fake wide row of a few mb (using the python driver for example)
>> - drain and stop one of the two nodes
>> - remove the sstables of the stopped node (to provoke inconsistencies)
>> - start it again
>> - select a small portion of the wide row (many times, use nodetool
>> tpstats to know when a read repair has been triggered)
>> - nodetool flush (on the previously stopped node)
>> - check the size of the sstable (if a few kb, then only the selected
>> slice was repaired, but if a few mb then the whole row was repaired)
>>
>> The wild guess was: if a read repair was triggered when reading a small
>> portion of a wide row and if it resulted in streaming the whole wide row,
>> it could explain a network burst. (But, on a second thought it make more
>> sense to only repair the small portion being read...)
>>
>>
>>
>> 2015-09-01 12:05 GMT+02:00 Alain RODRIGUEZ :
>>
>>> Hi Fabien, thanks for your help.
>>>
>>> I did not mention it but I indeed saw a correlation between latency and
>>> read repairs spikes. Though this is like going from 5 RR per second to 10
>>> per sec cluster wide according to opscenter: http://img42.com/L6gx1
>>>
>>> I have indeed some wide rows and this explanation looks reasonable to
>>> me, I mean this makes sense. Yet isn't this amount of Read Repair too low
>>> to induce such a "shitstorm" (even if it spikes x2, I got network x10) ?
>>> Also wide rows are present on heavy used tables (sadly...), so I should be
>>> using more network all the time (why only a few spikes per day (like 2 / 3
>>> max) ?
>>>
>>> How could I confirm this, without removing RR and waiting a week I mean,
>>> is there a way to see the size of the data being repaired through this
>>> mechanism ?
>>>
>>> C*heers
>>>
>>> Alain
>>>
>>> 2015-09-01 0:11 GMT+02:00 Fabien Rousseau :
>>>
 Hi Alain,

 Could it be wide rows + read repair ? (Let's suppose the "read repair"
 repairs the full row, and it may not be subject to stream throughput limit)

 Best Regards
 Fabien

 2015-08-31 15:56 GMT+02:00 Alain RODRIGUEZ :

> I just realised that I have no idea about how this mailing list handle
> attached files.
>
> Please find screenshots there --> http://img42.com/collection/y2KxS
>
> Alain
>
> 2015-08-31 15:48 GMT+02:00 Alain RODRIGUEZ :
>
>> Hi,
>>
>> Running a 2.0.16 C* on AWS (private VPC, 2 DC).
>>
>> I am facing an issue on our EU DC where I have a network burst
>> (alongside with GC and latency increase).
>>
>> My first thought was a sudden application burst, though, I see no
>> corresponding evolution on reads / write or even CPU.
>>
>> So I thought that this might come from the node themselves as IN
>> almost equal OUT Network. I tried lowering stream throughput on the whole
>> DC to 1 Mbps, with ~30 nodes --> 30 Mbps --> ~4 MB/s max. My network 
>> went a
>> lot higher about 30 M in both sides (see screenshots attached).
>>
>> I have tried to use iftop to see where this 

Re: How to run any application on Cassandra cluster in high availability mode

2015-09-11 Thread Otis Gospodnetić
Hi Vikram,

Running a monitor somewhere other than on Cassandra node itself hmm
then you'd miss out on JVM metrics, OS metrics, ability to do transaction
tracing, on demand profiling, etc. which are all nice things to have when
you are troubleshooting issues, performance, doing stress tests, tuning,
and optimization...

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Aug 18, 2015 at 2:45 PM, Vikram Kone  wrote:

> Hi John,
> I have posted the same Q on azkaban google group but there is no response
> so far :(
> If i want to do the old school way of monitor, alert and start the process
> somewhere else..how can I do this? Are there some ready made tools to do
> this kind of general purpose monitoring and alerting for services on linux?
>
> On Sun, Aug 16, 2015 at 9:38 AM, Prem Yadav  wrote:
>
>> The MySQL is there just to save the state of things. I suppose it very
>> lightweight. Why not just install mysql on one of the nodes or a VM
>> somewhere.
>>
>>
>> On Sun, Aug 16, 2015 at 3:39 PM, John Wong  wrote:
>>
>>> Sorry i meant integration with Cassandra (based on the docs by default
>>> it suggests MySQL)
>>>
>>>
>>> On Sunday, August 16, 2015, John Wong  wrote:
>>>
 There is no leader in cassandra. I suggest you ask Azkaban community
 about intgteation with Azkaban and Azkaban HA.

 On Sunday, August 16, 2015, Vikram Kone  wrote:

> Can't we use zoo keeper for leader election in Cassandra and based on
> who is leader ..run azkaban or any app instance for that matter on that
> Cassandra server. I'm thinking that I can copy the applocation folder to
> all nodes and then determine which one to run using zookeeper. Is that
> possible ?
>
> Sent from Outlook 
>
>
>
>
> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
> gokoproj...@gmail.com> wrote:
>
> Hi
>>
>> I am not familiar with Azkaban and probably a better question to the
>> Azkaban community IMO. But there seems to be two modes (
>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>> two-server mode, but either way I think still SPOF? If there is no
>> election, just based on process, my 2 cents would be monitor, alert, and
>> start the process somewhere else. Better yet, don't install the process 
>> on
>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>> like AWS you will be able to autoscale min1 max1 easily.
>>
>>
>> Note: In peer-to-peer architecture, there is simply no concept of
>> master. You can start with some seed nodes for discovery. It depends how
>> you design discovery.
>>
>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone 
>> wrote:
>>
>>> Hi,
>>> We are planning to install Azkaban in solo server mode on a 24
>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>> architecture ie any node can become the master for the cluster, it 
>>> creates
>>> the problem for Azkaban master since it's not a peer-peer architecture
>>> where any node can become the master. Only a single mode has to be 
>>> master
>>> at any given time.
>>>
>>> What are our options here? Are there any framworks or tools out
>>> there that would allow any application to run on a cluster of machines 
>>> with
>>> high availablity?
>>> Should I be looking at something like zookeeper for this ? Or Mesos
>>> may be?
>>
>>
>>

 --
 Sent from Jeff Dean's printf() mobile console

>>>
>>>
>>> --
>>> Sent from Jeff Dean's printf() mobile console
>>>
>>
>>
>


Re: Best strategy for hiring from OSS communities.

2015-09-11 Thread Otis Gospodnetić
Hey Kevin - I think there is j...@apache.org

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Aug 13, 2015 at 6:02 PM, Kevin Burton  wrote:

> Mildly off topic but we are looking to hire someone with Cassandra
> experience..
>
> I don’t necessarily want to spam the list though.  We’d like someone from
> the community who contributes to Open Source, etc.
>
> Are there forums for Apache / Cassandra, etc for jobs? I couldn’t fine one.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Re: Should replica placement change after a topology change?

2015-09-11 Thread Richard Dawe
Thanks, Nate and Rob. We are going to have to migrate some installations from 
SimpleSnitch to Ec2Snitch, others to GossipingPropertyFileSnitch. Your help is 
much appreciated!

Best regards, Rich

On 10/09/2015 20:33, "Nate McCall" 
> wrote:


So if you have a topology that would change if you switched from SimpleStrategy 
to NetworkTopologyStrategy plus multiple racks, it sounds like a different 
migration strategy would be needed?

I am imagining:

  1.  Switch to a different snitch, and the keyspace from SimpleStrategy to NTS 
but keep it all in one rack. So effectively the same topology, but with a 
different snitch.
  2.  Set up a new data centre with the desired topology.
  3.  Change the keyspace to have replicas in the new DC.
  4.  Rebuild all the nodes in the new DC.
  5.  Flip all your clients over to the new DC.
  6.  Decommission your original DC.

That would work, yes. I would add :

- 4.5. Repair all nodes.

I can confirm that the above process works (definitely include Rob's repair 
suggestion, though). It is really the only way we've found to safely go from 
SimpleSnitch to rack-aware NTS.

The same process works/is required for SimpleSnitch to Ec2Snitch fwiw.