Re: CVE-2020-13946 Apache Cassandra RMI Rebind Vulnerability

2020-09-11 Thread Jeremiah D Jordan
This vulnerability is only exposed if someone can access your JMX port.  If you 
lock down access to JMX ports then you can avoid it.

-Jeremiah

> On Sep 2, 2020, at 3:36 AM, Sam Tunnicliffe  wrote:
> 
> Hi Manish,
> 
> unfortunately I'm afraid, as far as I'm aware there is not.
> 
> Thanks,
> Sam
> 
>> On 2 Sep 2020, at 04:14, manish khandelwal > > wrote:
>> 
>> Hi Sam
>> 
>> Is there any alternative to avoid this vulnerability? Like upgrade to 
>> specific JVM version.
>> 
>> Regards
>> Manish
>> 
>> On Tue, Sep 1, 2020 at 8:03 PM Sam Tunnicliffe > > wrote:
>> CVE-2020-13946 Apache Cassandra RMI Rebind Vulnerability
>> 
>> Versions Affected:
>> All versions prior to: 2.1.22, 2.2.18, 3.0.22, 3.11.8 and 4.0-beta2
>> 
>> Description:
>> It is possible for a local attacker without access to the Apache Cassandra 
>> process or configuration files to manipulate the RMI registry to perform a 
>> man-in-the-middle attack and capture user names and passwords used to access 
>> the JMX interface. The attacker can then use these credentials to access the 
>> JMX interface and perform unauthorised operations.
>> Users should also be aware of CVE-2019-2684, a JRE vulnerability that 
>> enables this issue to be exploited remotely.
>> 
>> Mitigation:
>> 2.1.x users should upgrade to 2.1.22
>> 2.2.x users should upgrade to 2.2.18
>> 3.0.x users should upgrade to 3.0.22
>> 3.11.x users should upgrade to 3.11.8
>> 4.0-beta1 users should upgrade to 4.0-beta2
>> 
>> 
> 



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-04 Thread Jeremiah D Jordan
JustFYI if being able to operationally do things many nodes at a time, you 
should look at setting up racks.  With num racks = RF you can take down all 
nodes in a given rack at once without affecting LOCAL_QUORUM.  Your single 
token example has the same functionality in this respect as a vnodes cluster 
using racks (and actually if you setup a single token cluster using racks you 
would have setup nodes N1 and N4 to be in the same rack).

> On Feb 3, 2020, at 11:07 PM, Max C.  wrote:
> 
> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
> each piece of data is stored as follows:
> 
> : 
> N1: N2 N3
> N2: N3 N4
> N3: N4 N5
> N4: N5 N6
> N5: N6 N1
> N6: N1 N2
> 
> With this setup, there are some circumstances where you could lose 2 nodes 
> (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is 
> very large, then you could lose even more — and that’s a good thing, because 
> if you have hundreds/thousands of nodes then you don’t want the world to come 
> tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on 
> your nodes, and want to (with very careful planning!) do it by taking down 
> more than 1 node at a time.
> 
> … but if you have a large number of vnodes, then a given node will share a 
> small segment of data with LOTS of other nodes, which destroys this property. 
>  The more vnodes, the less likely you’re able to handle > 1 node down.
> 
> For example, see this diagram in the Datastax docs —
> 
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
>  
> 
> 
> In that bottom picture, you can’t knock out 2 nodes and still maintain 
> CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no 
> longer meet CL=quorum;  but you can do that in the top diagram, since there 
> are no ranges shared between node 1 & 4.
> 
> Hope that helps.
> 
> - Max
> 
> 
>> On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
>> mailto:onmstes...@zoho.com.INVALID>> wrote:
>> 
>> Sorry if its trivial, but i do not understand how num_tokens affects 
>> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost 
>> at most one node and all of the tokens assigned to that node would be also 
>> assigned to two other nodes no matter what num_tokens is, right?
>> 
>> Sent using Zoho Mail 
>> 
>> 
>>  Forwarded message 
>> From: Jon Haddad mailto:j...@jonhaddad.com>>
>> To: mailto:d...@cassandra.apache.org>>
>> Date: Tue, 04 Feb 2020 01:15:21 +0330
>> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>>  Forwarded message 
>> 
>> I think it's a good idea to take a step back and get a high level view of 
>> the problem we're trying to solve. 
>> 
>> First, high token counts result in decreased availability as each node has 
>> data overlap with with more nodes in the cluster. Specifically, a node can 
>> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
>> going to almost always share data with every other node in the cluster that 
>> isn't in the same rack, unless you're doing something wild like using more 
>> than a thousand nodes in a cluster. We advertise 
>> 
>> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
>> each node needs to query against, so you're again, hitting every node 
>> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
>> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
>> for 4 tokens because you'd have overlap with only 16 nodes, which works 
>> well for small clusters as well as large. Assuming I was creating a new 
>> cluster for myself (in a hypothetical brand new application I'm building) I 
>> would put this in production. I have worked with several teams where I 
>> helped them put 4 token clusters in prod and it has worked very well. We 
>> didn't see any wild imbalance issues. 
>> 
>> As Mick's pointed out, our current method of using random token assignment 
>> for the default number of problematic for 4 tokens. I fully agree with 
>> this, and I think if we were to try to use 4 tokens, we'd want to address 
>> this in tandem. We can discuss how to better allocate tokens by default 
>> (something more predictable than random), but I'd like to avoid the 
>> specifics of that for the sake of this email. 
>> 
>> To Alex's point, repairs are problematic with lower token counts due to 
>> over streaming. I think this is a pretty serious issue and I we'd have to 
>> address it before going all the way down to 4. This, in my opinion, is a 
>> more complex problem to solve and I think trying to fix it here could make 
>> shipping 4.0 take even longer, something none of us want. 
>> 
>> For the sake of shi

Re: Cluster Maintenance Mishap

2016-10-20 Thread Jeremiah D Jordan
The easiest way to figure out what happened is to examine the system log.  It 
will tell you what happened.  But I’m pretty sure your nodes got new tokens 
during that time.

If you want to get back the data inserted during the 2 hours you could use 
sstableloader to send all the data from the /var/data/cassandra_new/cassandra/* 
folders back into the cluster if you still have it.

-Jeremiah


> On Oct 20, 2016, at 3:58 PM, Branton Davis  wrote:
> 
> Howdy folks.  I asked some about this in IRC yesterday, but we're looking to 
> hopefully confirm a couple of things for our sanity.
> 
> Yesterday, I was performing an operation on a 21-node cluster (vnodes, 
> replication factor 3, NetworkTopologyStrategy, and the nodes are balanced 
> across 3 AZs on AWS EC2).  The plan was to swap each node's existing 1TB 
> volume (where all cassandra data, including the commitlog, is stored) with a 
> 2TB volume.  The plan for each node (one at a time) was basically:
> rsync while the node is live (repeated until there were only minor 
> differences from new data)
> stop cassandra on the node
> rsync again
> replace the old volume with the new
> start cassandra
> However, there was a bug in the rsync command.  Instead of copying the 
> contents of /var/data/cassandra to /var/data/cassandra_new, it copied it to 
> /var/data/cassandra_new/cassandra.  So, when cassandra was started after the 
> volume swap, there was some behavior that was similar to bootstrapping a new 
> node (data started streaming in from other nodes).  But there was also some 
> behavior that was similar to a node replacement (nodetool status showed the 
> same IP address, but a different host ID).  This happened with 3 nodes (one 
> from each AZ).  The nodes had received 1.4GB, 1.2GB, and 0.6GB of data 
> (whereas the normal load for a node is around 500-600GB).
> 
> The cluster was in this state for about 2 hours, at which point cassandra was 
> stopped on them.  Later, I moved the data from the original volumes back into 
> place (so, should be the original state before the operation) and started 
> cassandra back up.
> 
> Finally, the questions.  We've accepted the potential loss of new data within 
> the two hours, but our primary concern now is what was happening with the 
> bootstrapping nodes.  Would they have taken on the token ranges of the 
> original nodes or acted like new nodes and got new token ranges?  If the 
> latter, is it possible that any data moved from the healthy nodes to the 
> "new" nodes or would restarting them with the original data (and repairing) 
> put the cluster's token ranges back into a normal state?
> 
> Hopefully that was all clear.  Thanks in advance for any info!



Re: Lucene index plugin for Apache Cassandra

2015-06-16 Thread Jeremiah D Jordan
Just an FYI.  DSE Search does not run in its own JVM, it runs in the same JVM 
that Cassandra is running in.  DSE Search also has integration with Spark 
map/reduce out of the box.


> On Jun 16, 2015, at 9:42 AM, Andres de la Peña  wrote:
> 
> Thanks for your interest. 
> 
> I am not familiar with DSE Search internals, so I can only express some 
> impressions. In my opinion, both projects have similarities, but there are 
> several key differences:
> DSE Solr, if I'm not wrong, runs in a separate JVM preserving its APIs and 
> interfaces, while Stratio's Lucene index is embedded inside Cassandra and 
> tightly integrated with it. Each has its own set of pros and cons.
> DSE Search provides several search engine features that are not yet provided 
> by Stratio's Lucene index, such as faceting, highlighting, etc. We are 
> working to bring as many of this features as we can to Apache Cassandra.
> Stratio's Lucene index filters can be used in conjunction with Cassandra's 
> Spark/Hadoop support in order to speed up table mapping. Perhaps Apache Solr 
> has a good integration with this mapreduce frameworks, I don't know if DSE 
> provides this kind of feature out-of-the-box.
> Stratio's Lucene index is open source, which is always a good thing.
> Finally, I think that they are not mutually exclusive tools and they can be 
> used together and separately depending on the scenarios.
> 
> I hope it helps,
> 
> 2015-06-15 18:08 GMT+02:00 Matthew Johnson  >:
> Hi Andres,
> 
>  
> This looks awesome, many thanks for your work on this. Just out of curiosity, 
> how does this compare to the DSE Cassandra with embedded Solr? Do they 
> provide very similar functionality? Is there a list of obvious pros and cons 
> of one versus the other?
> 
>  
> Thanks!
> 
> Matthew
> 
>  
>  
> From: Andres de la Peña [mailto:adelap...@stratio.com 
> ] 
> Sent: 13 June 2015 13:20
> 
> 
> To: user@cassandra.apache.org 
> Subject: Re: Lucene index plugin for Apache Cassandra
> 
>  
> Thanks for showing interest. 
> 
>  
> Faceting is not yet supported, but it is in our roadmap. Our goal is to add 
> to Cassandra as many Lucene features as possible.
> 
>  
> 2015-06-12 18:21 GMT+02:00 Mohammed Guller  >:
> 
> The plugin looks cool. Thank you for open sourcing it.
> 
>  
> Does it support faceting and other Solr functionality?
> 
>  
> Mohammed
> 
>  
> From: Andres de la Peña [mailto:adelap...@stratio.com 
> ] 
> Sent: Friday, June 12, 2015 3:43 AM
> To: user@cassandra.apache.org 
> Subject: Re: Lucene index plugin for Apache Cassandra
> 
>  
> I really appreciate your interest
> 
>  
> Well, the first recommendation is to not use it unless you need it, because a 
> properly Cassandra denormalized model is almost always preferable to 
> indexing. Lucene indexing is a good option when there is no viable 
> denormalization alternative. This is the case of range queries over multiple 
> dimensions, full-text search or maybe complex boolean predicates. It's also 
> appropriate for Spark/Hadoop jobs mapping a small fraction of the total 
> amount of rows in a certain table, if you can pay the cost of indexing.
> 
>  
> Lucene indexes run inside C*, so users should closely monitor the amount of 
> used memory. It's also a good idea to put the Lucene directory files in a 
> separate disk to those used by C* itself. Additionally, you should consider 
> that indexed tables write throughput will be appreciably reduced, maybe to a 
> few thousands rows per second.
> 
>  
> It's really hard to estimate the amount of resources needed by the index due 
> to the great variety of indexing and querying ways that Lucene offers, so the 
> only thing we can suggest is to empirically find the optimal setup for your 
> use case.
> 
>  
> 2015-06-12 12:00 GMT+02:00 Carlos Rolo  >:
> 
> Seems like an interesting tool!
> 
> What operational recommendations would you make to users of this tool (Extra 
> hardware capacity, extra metrics to monitor, etc)?
> 
> 
> 
> Regards,
> 
>  
> Carlos Juzarte Rolo
> 
> Cassandra Consultant
> 
>  
> Pythian - Love your data
> 
>  
> rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo 
> 
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> 
> www.pythian.com 
>  
> On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña  > wrote:
> 
> Unfortunately, we don't have published any benchmarks yet, but we have plans 
> to do it as soon as possible. However, you can expect a similar behavior as 
> those of Elasticsearch or Solr, with some overhead due to the need for 
> indexing both the Cassandra's row key and the partition's token. You can also 
> take a look at this presentation 
> 

Re: CQLSSTableWriter memory leak

2014-06-09 Thread Jeremiah D Jordan
You probably want to re-think your data model here.  50 million rows per 
partition is not going to be optimal.  You will be much better off keeping that 
down to hundreds of thousands per partition in a worst case.

-Jeremiah


On Jun 5, 2014, at 8:29 PM, Xu Zhongxing  wrote:

> Is writing too many rows to a single partition the cause of memory 
> consumption?
> 
> What I want to achieve is this: say I have 5 partition ID. Each corresponds 
> to 50 million IDs.  Given a partition ID, I need to get its corresponding 50 
> million IDs. Is there another way to design the schema to avoid such a 
> compound primary key?
> 
> I could use the clustering IDs as the primary key, and create index on the 
> partition ID. But is that equivalent to create another table with compound 
> keys?
> 
> At 2014-06-06 00:16:13, "Jack Krupansky"  wrote:
> How many rows (primary key values) are you writing for each partition of the 
> primary key? I mean, are there relatively few, or are these very wide 
> partitions?
>  
> Oh, I see! You’re writing 50,000,000 rows to a single partition! My, that IS 
> ambitious.
>  
> -- Jack Krupansky
>  
> From: Xu Zhongxing
> Sent: Thursday, June 5, 2014 3:34 AM
> To: user@cassandra.apache.org
> Subject: CQLSSTableWriter memory leak
>  
> I am using Cassandra's CQLSSTableWriter to import a large amount of data into 
> Cassandra. When I use CQLSSTableWriter to write to a table with compound 
> primary key, the memory consumption keeps growing. The GC of JVM cannot 
> collect any used memory. When writing to tables with no compound primary key, 
> the JVM GC works fine.
> 
> My Cassandra version is 2.0.5. The OS is Ubuntu 14.04 x86-64. JVM parameters 
> are -Xms1g -Xmx2g. This is sufficient for all other non-compound primary key 
> cases.
> 
> The problem can be reproduced by the following test case:
> 
> import org.apache.cassandra.io.sstable.CQLSSTableWriter;
> import org.apache.cassandra.exceptions.InvalidRequestException;
> 
> import java.io.IOException;
> import java.util.UUID;
> 
> class SS {
> public static void main(String[] args) {
> String schema = "create table test.t (x uuid, y uuid, primary key (x, 
> y))";
> 
> 
> String insert = "insert into test.t (x, y) values (?, ?)";
> CQLSSTableWriter writer = CQLSSTableWriter.builder()
> .inDirectory("/tmp/test/t")
> .forTable(schema).withBufferSizeInMB(32)
> .using(insert).build();
> 
> UUID id = UUID.randomUUID();
> try {
> for (int i = 0; i < 5000; i++) {
> UUID id2 = UUID.randomUUID();
> writer.addRow(id, id2);
> }
> 
> writer.close();
> } catch (Exception e) {
> System.err.println("hell");
> }
> }
> }



Re: Cassandra 2.0 unbalanced ring with vnodes after adding new node

2014-06-09 Thread Jeremiah D Jordan
That looks like you started the initial nodes with num tokens=1, then later 
switched to vnodes, by setting num tokens to 256, then added that new node with 
256 vnodes to start.  Am I right?

Since you don't have very much data, the easiest way out of this will be to 
decommission the original nodes one at a time.  Wipe all the data off.  Then 
bootstrap them back into the cluster.

-Jeremiah

On Jun 4, 2014, at 11:52 AM, Владимир Рудев  wrote:

> Hello to everyone!
> 
> Please, can someone explain where we made a mistake?
> 
> We have cluster with 4 nodes which uses vnodes(256 per node, default 
> settings), snitch is default on every node: SimpleSnitch.
> These four nodes was from beginning of a cluster.
> In this cluster we have keyspace with this options:
> Keyspace: K:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
> Options: [replication_factor:3]
> 
> All was normal and nodetool status K shows that each node owns 75% of all key 
> range. All 4 nodes are located in same datacenter and have same first two 
> bytes in IP address(others are different). 
> 
> Then we buy new server on different datacenter and add it to the cluster with 
> same settings as in previous four nodes(difference only in listen_address), 
> assuming that the effective own of each node for this keyspace will be 
> 300/5=60% or near. But after 3-5 minutes after start nodetool status K show 
> this:
> nodetool status K;
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  Owns (effective)  Host ID   
> Rack
> UN  N16,06 GB256 50.0% 
> 62f295b3-0da6-4854-a53a-f03d6b424b03  rack1
> UN  N25,89 GB256 50.0% 
> af4e4a23-2610-44dd-9061-09c7a6512a54  rack1
> UN  N36,02 GB256 50.0% 
> 0f0e4e78-6fb2-479f-ad76-477006f76795  rack1
> UN  N45,8 GB 256 50.0% 
> 670344c0-9856-48cf-9ec9-1a98f9a89460  rack1
> UN  N57,51 GB256 100.0%
> 82473d14-9e36-4ae7-86d2-a3e526efb53f  rack1
> 
> N5 is newly added node
> 
> nodetool repair -pr on N5 doesn't change anything
> 
> nodetool describering K shows that new node N5 participate in EACH range. 
> This is not we want at all. 
> 
> It looks like cassandra add new node to each range because it located in 
> different datacenter, but all settings and output are exactly prevent this.
> 
> Also interesting point is that while in all config files snitch is defined as 
> SimpleSnitch the output of the command nodetool describecluster is:
> Cluster Information:
> Name: Some Cluster Name
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 26b8fa37-e666-31ed-aa3b-85be75f2aa1a: [N1, N2, N3, N4, N5]
> 
> We use Cassandra 2.0.6
> 
> Questions we have at this moment:
> 1. How to rebalance ring so all nodes will own 60% of range?
>1a. Removing node from cluster and adding it again is a solution?
> 2. Where we possibly make a mistake when adding new node?
> 3. If we add new 6th node to ring it will take 50% from N5 or some portion 
> from each node?
> 
> Thanks in advance!
> 
> -- 
> С уважением, 
> Владимир Рудев
> (With regards, Vladimir Rudev)
> vladimir.ru...@gmail.com
> 
> 



Re: How to rebalance a cluster?

2014-05-13 Thread Jeremiah D Jordan
Unless the issue is "I have some giant partitions mixed in with non-giant ones" 
the usual reason for "data size imbalance" is STCS is being used.

You can look at nodetool cfhistograms and cfstats to get info about partition 
sizes.

If you copy the data off to a test node, and run "nodetool compact" does the 
data size drop down a bunch?  That will tell you if its "compaction didn't 
merge yet" or "actually have more data in that token range"

Now "where did the data come from?".  If its the compaction thing, most likely 
repair over streaming on wide partitions.  It could also be that you did a 
bunch of deletes, and the tombstones have been compacted with the "Data to be 
deleted" on some nodes and not others.

Probably not these, but here are a few operations things that could also cause 
this:
Ran "rebuild" on one of the nodes after it already had data.
Wipe a node and put the data back using "repair" not bootstrap or rebuild.
sstableload a backup over top of existing data

Now "how do I fix it", if your use case is such that LCS makes sense
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/tabProp.html?scroll=tabProp__moreCompaction
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Then using LCS will make the data compact together more quickly, at the expense 
of a bunch of extra disk io.

-Jeremiah

On May 12, 2014, at 9:19 AM, Oleg Dulin  wrote:

> I keep asking same question it seems -- sign of insanity.
> 
> Cassandra version 1.2, not using vnodes (legacy).
> 
> On 2014-03-07 19:37:48 +, Robert Coli said:
> 
>> On Fri, Mar 7, 2014 at 6:00 AM, Oleg Dulin  wrote:
>> I have the following situation:
>> 10.194.2.5RAC1Up Normal  378.6 GB50.00%  >> 0
>> 10.194.2.4RAC1Up Normal  427.5 GB50.00% 
>> 127605887595351923798765477786913079295
>> 10.194.2.7RAC1Up Normal  350.63 GB   50.00% 
>> 85070591730234615865843651857942052864
>> 10.194.2.6RAC1Up Normal  314.42 GB   50.00% 
>> 42535295865117307932921825928971026432
>> As you can see, the 2.4 node has over 100 G more data than 2.6 . You can 
>> definitely see the imbalance. It also happens to be the heaviest loaded node 
>> by CPU usage.
>> The first step is to understand why.
>> Are you using vnodes? What version of Cassandra?
>>  
>> What would be a clean way to rebalance ? If I use move operation follwoed by 
>> cleanup, would it require a repair afterwards ?
>> Move is not, as I understand it, subject to CASSANDRA-2434, so should not 
>> require a post-move repair.
>> =Rob
>>  
> 
> L
> 
> 



Re: C* 1.2.15 Decommission issues

2014-04-14 Thread Jeremiah D Jordan
Russell,
The hinted handoff manager is checking for hints to see if it needs to pass 
those off during the decommission so that the hints don't get lost.  You most 
likely have a lot of hints, or a bunch of tombstones, or something in the table 
causing the query to timeout.  You aren't seeing any other exceptions in your 
logs before the timeout are you?  Raising the read timeout period on your nodes 
before you decommission them, or manually deleting the hints CF, should most 
likely get your past this.  If you delete them, you would then want to make 
sure you ran a full cluster repair when you are done with all of your 
decommissions, to propagate data from any hints you deleted.

-Jeremiah Jordan

On Apr 10, 2014, at 1:08 PM, Russell Bradberry  wrote:

> We have about a 30 node cluster running the latest C* 1.2 series DSE.  One 
> datacenter uses VNodes and the other datacenter has VNodes Disabled (because 
> it is running DSE-Seearch)
> 
> We have been replacing nodes in the VNode datacenter with faster ones and we 
> have yet to have a successful decommission.  Every time we attempt to 
> decommission a node we get an “Operation Timed Out” error and the 
> decommission fails.  We keep retrying it and sometimes it will work and other 
> times we will just give up and force the node removal.  It seems though, that 
> all the data has streamed out of the node before the decommission fails.
> 
> What exactly does it need to read before leaving that would cause this?  We 
> also have noticed that in several nodes after the removal that there are 
> ghost entries for the removed node in the system.peers table and this doesn’t 
> get removed until we restart Cassandra on that node.
> 
> Also, we have noticed that running repairs with VNodes is considerably 
> slower. Is this a misconfiguration? Or is it expected that VNodes repairs 
> will be slow?
> 
> 
> Here is the stack trace from the decommission failure:
> 
> Exception in thread "main" java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 0 responses.
> at 
> org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:578)
> at 
> org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(HintedHandOffManager.java:528)
> at 
> org.apache.cassandra.service.StorageService.streamHints(StorageService.java:2925)
> at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2905)
> at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2866)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
> at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
> at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
> at sun.rmi.transport.Transport$1.run(Transport.java:177)
> at sun.rmi.transport.Transport$1.run(Transport.java:174)
> 

Re: DSE Hadoop support for provisioning hardware

2014-03-11 Thread Jeremiah D Jordan
Ariel,
DSE lets you specify an "Analytics" virtual data center.  You can then 
replicate your keyspaces over to that data center, and run your Analytics jobs 
against it, and as long as they are using the LOCAL_ consistency levels, they 
won't be hitting your real time nodes, and vice versa.  So the Cassandra 
"multiple data center" capabilities are used to separate your OLTP stuff and 
Analytics stuff from interfering with each other, but the data in each is 
seamlessly replicated so that both are always up to date, without you having to 
write ETL code.

Does that answer your question?

-Jeremiah


On Mar 11, 2014, at 10:27 AM, Ariel Weisberg  wrote:

> Hi,
> 
> I am doing a presentation at Big Data Boston about how people are
> bridging the gap between OLTP and ingest side databases and their
> analytic storage and queries. One class of systems I am talking about
> are things like HBase and DSE that let you run map reduce against your
> OLTP dataset.
> 
> I remember reading at some point that DSE allows you to provision
> dedicated hardware for map reduce, but the docs didn't seem to fully
> explain how that works.I looked at
> http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaStrt.html
> 
> My question is what kind of provisioning can I do? Can I provision
> dedicated hardware for just the filesystem or can I also provision
> replicas that are dedicated to the file system and also serving reads
> for map reduce jobs. What kind of support is there for keeping OLTP
> reads from hitting the Hadoop storage nodes and how does this relate to
> doing quorum reads and writes?
> 
> Thanks,
> Ariel



Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
Also, in terms of overhead, server side the overhead is pretty much all at the 
Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the same as 1 
keyspace, 100 CF's.

-Jeremiah

On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan  
wrote:

> The use of more than one keyspace is not uncommon.  Using 100's of them is.  
> That being said, different keyspaces let you specify different replication 
> and different authentication.  If you are not going to be doing one of those 
> things, then there really is no point to multiple keyspaces.  If you do want 
> to do one of those things, then go for it, make multiple keyspaces.
> 
> 
> -Jeremiah
> 
> On Mar 11, 2014, at 10:17 AM, Edward Capriolo  wrote:
> 
>> I am not sure. As stated the only benefit of multiple keyspaces is if you 
>> need:
>>  
>> 1) different replication per keyspace
>> 2) different multiple data center configurations per keyspace
>> 
>> Unless you have one of these cases you do not need to do this. I would 
>> always tackle this problem at the application level using something like:
>> 
>> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>> 
>> Client issues aside, it is not a very common case and I would advice against 
>> uncommon set ups.
>> 
>> 
>> 
>> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright  wrote:
>> Does this whole true for the native protocol?  I’ve noticed that you can 
>> create a session object in the datastax driver without specifying a keyspace 
>> and so long as you include the keyspace in all queries instead of just table 
>> name, it works fine.  In that case, I assume there’s only one connection 
>> pool for all keyspaces.
>> 
>> From: Edward Capriolo 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Tuesday, March 11, 2014 at 11:05 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: How expensive are additional keyspaces?
>> 
>> The biggest expense of them is that you need to be authenticated to a 
>> keyspace to perform and operation. Thus connection pools are bound to 
>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client, 
>> If you have 100 keyspaces you need 100 connection pools that starts to be a 
>> pain very quickly. 
>> 
>> I suggest keeping everything in one keyspace unless you really need 
>> different replication factors and or network replication settings per 
>> keyspace.
>> 
>> 
>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer  wrote:
>> Hey all -
>> 
>> My company is working on introducing a configuration service system to
>> provide cofig data to several of our applications, to be backed by
>> Cassandra. We're already using Cassandra for other services, and at
>> the moment our pending design just puts all the new tables (9 of them,
>> I believe) in one of our pre-existing keyspaces.
>> 
>> I've got a few questions about keyspaces that I'm hoping for input on.
>> Some Google hunting didn't turn up obvious answers, at least not for
>> recent versions of Cassandra.
>> 
>> 1) What trade offs are being made by using a new keyspace versus
>> re-purposing an existing one (that is in active use by another
>> application)? Organization is the obvious answer, I'm looking for any
>> technical reasons.
>> 
>> 2) Is there any per-keyspace overhead incurred by the cluster?
>> 
>> 3) Does it impact on-disk layout at all for tables to be in a
>> different keyspace from others? Is any sort of file fragmentation
>> potentially introduced just by doing this in a new keyspace as opposed
>> to an exiting one?
>> 
>> 4) Does it add any metadata overhead to the system keyspace?
>> 
>> 5) Why might we *not* want to make a separate keyspace for this?
>> 
>> 6) Does anyone have experience with creating additional keyspaces to
>> the point that Cassandra can no longer handle it? Note that we're
>> *not* planning to do this, I'm just curious.
>> 
>> Cheers,
>> Martin
>> 
>> 
> 



Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
The use of more than one keyspace is not uncommon.  Using 100's of them is.  
That being said, different keyspaces let you specify different replication and 
different authentication.  If you are not going to be doing one of those 
things, then there really is no point to multiple keyspaces.  If you do want to 
do one of those things, then go for it, make multiple keyspaces.


-Jeremiah

On Mar 11, 2014, at 10:17 AM, Edward Capriolo  wrote:

> I am not sure. As stated the only benefit of multiple keyspaces is if you 
> need:
>  
> 1) different replication per keyspace
> 2) different multiple data center configurations per keyspace
> 
> Unless you have one of these cases you do not need to do this. I would always 
> tackle this problem at the application level using something like:
> 
> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
> 
> Client issues aside, it is not a very common case and I would advice against 
> uncommon set ups.
> 
> 
> 
> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright  wrote:
> Does this whole true for the native protocol?  I’ve noticed that you can 
> create a session object in the datastax driver without specifying a keyspace 
> and so long as you include the keyspace in all queries instead of just table 
> name, it works fine.  In that case, I assume there’s only one connection pool 
> for all keyspaces.
> 
> From: Edward Capriolo 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, March 11, 2014 at 11:05 AM
> To: "user@cassandra.apache.org" 
> Subject: Re: How expensive are additional keyspaces?
> 
> The biggest expense of them is that you need to be authenticated to a 
> keyspace to perform and operation. Thus connection pools are bound to 
> keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If 
> you have 100 keyspaces you need 100 connection pools that starts to be a pain 
> very quickly. 
> 
> I suggest keeping everything in one keyspace unless you really need different 
> replication factors and or network replication settings per keyspace.
> 
> 
> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer  wrote:
> Hey all -
> 
> My company is working on introducing a configuration service system to
> provide cofig data to several of our applications, to be backed by
> Cassandra. We're already using Cassandra for other services, and at
> the moment our pending design just puts all the new tables (9 of them,
> I believe) in one of our pre-existing keyspaces.
> 
> I've got a few questions about keyspaces that I'm hoping for input on.
> Some Google hunting didn't turn up obvious answers, at least not for
> recent versions of Cassandra.
> 
> 1) What trade offs are being made by using a new keyspace versus
> re-purposing an existing one (that is in active use by another
> application)? Organization is the obvious answer, I'm looking for any
> technical reasons.
> 
> 2) Is there any per-keyspace overhead incurred by the cluster?
> 
> 3) Does it impact on-disk layout at all for tables to be in a
> different keyspace from others? Is any sort of file fragmentation
> potentially introduced just by doing this in a new keyspace as opposed
> to an exiting one?
> 
> 4) Does it add any metadata overhead to the system keyspace?
> 
> 5) Why might we *not* want to make a separate keyspace for this?
> 
> 6) Does anyone have experience with creating additional keyspaces to
> the point that Cassandra can no longer handle it? Note that we're
> *not* planning to do this, I'm just curious.
> 
> Cheers,
> Martin
> 
> 



Re: GCInspector GC for ConcurrentMarkSweep running every 15 seconds

2014-03-10 Thread Jeremiah D Jordan
Also it might be:
https://issues.apache.org/jira/browse/CASSANDRA-6541

That is causing the high heap.

-Jeremiah

On Feb 18, 2014, at 5:01 PM, Jonathan Ellis  wrote:

> Sounds like you have CMSInitiatingOccupancyFraction set close to 60.
> You can raise that and/or figure out how to use less heap.
> 
> On Mon, Feb 17, 2014 at 5:06 PM, John Pyeatt  
> wrote:
>> I have a 6 node cluster running on AWS. We are using m1.large instances with
>> heap size set to 3G.
>> 
>> 5 of the 6 nodes seem quite healthy. The 6th one however is running
>> GCInspector GC for ConcurrentMarkSweep every 15 seconds or so. There is
>> nothing going on on this box. No repairs and almost not user activity. But
>> the CPU is almost continuously at 50% or more.
>> 
>> The only message in the log at all is the
>> INFO 2014-02-17 22:58:53,429 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 213 ms for 1 collections, 1964940024 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:07,431 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 250 ms for 1 collections, 1983269488 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:21,522 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 280 ms for 1 collections, 1998214480 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:36,527 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 305 ms for 1 collections, 2013065592 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:50,529 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 334 ms for 1 collections, 2028069232 used; max is
>> 3200253952
>> 
>> We don't see any of these messages on the other nodes in the cluster.
>> 
>> We are seeing similar behaviour for both our production and QA clusters.
>> Production is running cassandra 1.2.9 and QA is running 1.2.13.
>> 
>> Here are some of the cassandra settings that I would think might be
>> relevant.
>> 
>> flush_largest_memtables_at: 0.75
>> reduce_cache_sizes_at: 0.85
>> reduce_cache_capacity_to: 0.6
>> in_memory_compaction_limit_in_mb: 64
>> 
>> Does anyone have any ideas why we are seeing this so selectively on one box?
>> 
>> Any cures???
>> --
>> John Pyeatt
>> Singlewire Software, LLC
>> www.singlewire.com
>> --
>> 608.661.1184
>> john.pye...@singlewire.com
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



Re: Data loss when swapping out cluster

2013-11-26 Thread Jeremiah D Jordan
TL;DR you need to run repair in between doing those two things.

Full explanation:
https://issues.apache.org/jira/browse/CASSANDRA-2434
https://issues.apache.org/jira/browse/CASSANDRA-5901

Thanks,
-Jeremiah Jordan

On Nov 25, 2013, at 11:00 AM, Christopher J. Bottaro 
 wrote:

> Hello,
> 
> We recently experienced (pretty severe) data loss after moving our 4 node 
> Cassandra cluster from one EC2 availability zone to another.  Our strategy 
> for doing so was as follows:
> One at a time, bring up new nodes in the new availability zone and have them 
> join the cluster.
> One at a time, decommission the old nodes in the old availability zone and 
> turn them off (stop the Cassandra process).
> Everything seemed to work as expected.  As we decommissioned each node, we 
> checked the logs for messages indicating "yes, this node is done 
> decommissioning" before turning the node off.
> 
> Pretty quickly after the old nodes left the cluster, we started getting 
> client calls about data missing.
> 
> We immediately turned the old nodes back on and when they rejoined the 
> cluster *most* of the reported missing data returned.  For the rest of the 
> missing data, we had to spin up a new cluster from EBS snapshots and copy it 
> over.
> 
> What did we do wrong?
> 
> In hindsight, we noticed a few things which may be clues...
> The new nodes had much lower load after joining the cluster than the old ones 
> (3-4 gb as opposed to 10 gb).
> We have EC2Snitch turned on, although we're using SimpleStrategy for 
> replication.
> The new nodes showed even ownership (via nodetool status) after joining the 
> cluster.
> Here's more info about our cluster...
> Cassandra 1.2.10
> Replication factor of 3
> Vnodes with 256 tokens
> All tables made via CQL
> Data dirs on EBS (yes, we are aware of the performance implications)
> 
> Thanks for the help.



Re: Virtual node support for Hadoop workloads

2013-10-18 Thread Jeremiah D Jordan
Paulo,
If you have large data sizes then the vnodes with hadoop issue is moot.  You 
will get that many splits with or without vnodes.  The issues come when you 
don't have a lot of data, so all the extra splits slow everything down to a 
crawl because there are 256 times as many tasks created as you actually needed 
for your job.

So for large data sets, there is no issue.  For small data sets, you can run 
jobs, they will just be slower than if you didn't have vnodes.

-Jeremiah

On Oct 17, 2013, at 3:49 PM, Paulo Motta  wrote:

> Hello,
> 
> According to DSE3.1 documentation [1], "DataStax recommends using virtual 
> nodes only on data centers running purely Cassandra workloads. You should 
> disable virtual nodes on data centers running either Hadoop or Solr workloads 
> by setting num_tokens to 1.".
> 
> There was a thread in this mailing list earlier this year [2], where it was 
> suggested a workaround to the problem of having a minimum of one map task per 
> token (unfeasible with vnodes). This suggestion involved implementing a new 
> Hadoop InputSplitFormat that could combine many tokens from a single node, 
> thus reducing the overhead of having too many tasks per node. 
> 
> Is there any JIRA ticket around this issue yet, or something being worked on 
> to support VNodes for Hadoop workloads, or the suggestion remains to avoid 
> VNodes for analytics workloads (hadoop, solr)?
> 
> Thanks, 
> 
> -- 
> Paulo
> 
> [1] 
> http://www.datastax.com/docs/datastax_enterprise3.1/deploy/configuring_replication
> [2] 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zgy...@mail.gmtokenail.com%3E



Re: I don't understand shuffle progress

2013-09-19 Thread Jeremiah D Jordan
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configVnodesProduction_t.html

On Sep 18, 2013, at 9:41 AM, Chris Burroughs  wrote:

> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
> 
> This is a basic outline.
> 
> 
> On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
>> I really like this idea. I can create a new cluster and have it replicate
>> the old one, after it finishes I can remove the original.
>> 
>> Any good resource that explains how to add a new datacenter to a live
>> single dc cluster that anybody can recommend?
>> 
>> 
>> On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
>> wrote:
>> 
>>> On 09/17/2013 09:41 PM, Paulo Motta wrote:
>>> 
 So you're saying the only feasible way of enabling VNodes on an upgraded
 C*
 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
 from the old cluster? Or is it possible to succeed on shuffling, even if
 that means waiting some weeks for the shuffle to complete?
 
>>> 
>>> In a multi "DC" cluster situation you *should* be able to bring up a new
>>> DC with vnodes, bootstrap it, and then decommission the old cluster.
>>> 
>> 
>> 
>> 
> 



Re: [RELEASE] Apache Cassandra 2.0 released

2013-09-03 Thread Jeremiah D Jordan
Thanks for everyone's work on this release!

-Jeremiah

On Sep 3, 2013, at 8:48 AM, Sylvain Lebresne  wrote:

> The Cassandra team is very pleased to announce the release of Apache Cassandra
> version 2.0.0. Cassandra 2.0.0 is a new major release that adds numerous
> improvements[1,2], including:
>   - Lightweight transactions[4] that offers linearizable consistency.
>   - Experimental Triggers Support[5].
>   - Numerous enhancements to CQL as well as a new and better version of the
> native protocol[6].
>   - Compaction improvements[7] (including a hybrid strategy that combines 
> leveled and size-tiered compaction).
>   - A new faster Thrift Server implementation based on LMAX Disruptor[8].
>   - Eager retries: avoids query timeout by sending data requests to other
> replicas if too much time passes on the original request.
> 
> See the full changelog[1] for more and please make sure to check the release
> notes[2] for upgrading details.
> 
> Both source and binary distributions of Cassandra 2.0.0 can be downloaded at:
> 
>  http://cassandra.apache.org/download/
> 
> As usual, a debian package is available from the project APT repository[3]
> (you will need to use the 20x series).
> 
> The Cassandra team
> 
> [1]: http://goo.gl/zU4sWv (CHANGES.txt)
> [2]: http://goo.gl/MrR6Qn (NEWS.txt)
> [3]: http://wiki.apache.org/cassandra/DebianPackaging
> [4]: 
> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
> [5]: 
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
> [6]: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
> [7]: https://issues.apache.org/jira/browse/CASSANDRA-5371
> [8]: https://issues.apache.org/jira/browse/CASSANDRA-5582
> 



Re: Upgrade from 1.0.9 to 1.2.8

2013-09-02 Thread Jeremiah D Jordan
> 1.0.9 -> 1.0.12 -> 1.1.12 -> 1.2.x?

Because this fix in 1.0.11:
* fix 1.0.x node join to mixed version cluster, other nodes >= 1.1 
(CASSANDRA-4195)

-Jeremiah

On Aug 30, 2013, at 2:00 PM, Mike Neir  wrote:

> Is there anything that you can link that describes the pitfalls you mention? 
> I'd like a bit more information. Just for clarity's sake, are you 
> recommending 1.0.9 -> 1.0.12 -> 1.1.12 -> 1.2.x? Or would  1.0.9 -> 1.1.12 -> 
> 1.2.x suffice?
> 
> Regarding the placement strategy mentioned in a different post, I'm using the 
> Simple placement strategy, with the RackInferringSnitch. How does that play 
> into the bugs mentioned previously about cross-DC replication?
> 
> MN
> 
> On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote:
>> You probably want to go to 1.0.11/12 first no matter what.  If you want the 
>> least chance of issue you should then go to 1.1.12.  While there is a high 
>> probability that going from 1.0.X->1.2 will work. You have the best chance 
>> at no failures if you go through 1.1.12.  There are some edge cases that can 
>> cause errors if you don't do that.
>> 
>> -Jeremiah
>> 
>> 



Re: successful use of "shuffle"?

2013-08-30 Thread Jeremiah D Jordan
You need to introduce the new "vnode enabled" nodes in a new DC.  Or you will 
have similar issues to https://issues.apache.org/jira/browse/CASSANDRA-5525

Add vnode DC:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

Point clients to new DC

Remove non vnode DC:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_decomission_dc_t.html

-Jeremiah

On Aug 30, 2013, at 3:04 AM, Alain RODRIGUEZ  wrote:

> +1.
> 
> I am still afraid of this step. Yet you can avoid it by introducing new 
> nodes, with vnodes enabled, and then remove old ones. This should work.
> 
> My problem is that I am not really confident in vnodes either...
> 
> Any share, on this transition, and then of the use of vnodes would be great 
> indeed.
> 
> Alain
> 
> 
> 2013/8/29 Robert Coli 
> Hi!
> 
> I've been wondering... is there anyone in the cassandra-user audience who has 
> used "shuffle" feature successfully on a non-toy-or-testing cluster? If so, 
> could you describe the experience you had and any problems you encountered?
> 
> Thanks!
> 
> =Rob
> 



Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Jeremiah D Jordan
FYI: 
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html

-Jeremiah

On Aug 30, 2013, at 9:21 AM, "Hiller, Dean"  wrote:

> is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 
> thrift)?
> 
> We are not worried about repeated reads since we are idempotent but would 
> rather have the direct speed (even if we had to read from a snapshot, it 
> would be fine).
> 
> (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
> have RF=3 right now).
> 
> Thanks,
> Dean



Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jeremiah D Jordan
You probably want to go to 1.0.11/12 first no matter what.  If you want the 
least chance of issue you should then go to 1.1.12.  While there is a high 
probability that going from 1.0.X->1.2 will work. You have the best chance at 
no failures if you go through 1.1.12.  There are some edge cases that can cause 
errors if you don't do that.

-Jeremiah


On Aug 30, 2013, at 11:41 AM, Mike Neir  wrote:

> In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is 
> no need to do streaming operations (move/repair/bootstrap/etc). The reading 
> I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans 
> streaming operations. Datastax seems to indicate here that doing a rolling 
> upgrade from 1.0.x to 1.2.x is viable:
> 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck
> 
> See the second bullet point in the Prerequisites section.
> 
> I'll look into 1.2.9. It wasn't available when I started my testing.
> 
> MN
> 
> On 08/30/2013 12:15 PM, Robert Coli wrote:
>> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir > > wrote:
>> 
>>I'm faced with the need to update a 36 node cluster with roughly 25T of 
>> data
>>on disk to a version of cassandra in the 1.2.x series. While it seems that
>>1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
>>upgrade, I'd still like to have a roll-back plan in case the rolling 
>> upgrade
>>goes sideways.
>> 
>> 
>> Upgrading two major versions online is an unsupported operation. I would not
>> expect it to work. Is there a detailed reason you believe it should work 
>> between
>> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
>> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>> 
>> =Rob
> 
> -- 
> 
> 
> 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
> 



Re: Node tokens / data move

2013-07-13 Thread Jeremiah D Jordan
Pretty sure you can put the list in the yaml file too.

-Jeremiah

On Jul 12, 2013, at 3:09 AM, aaron morton  wrote:

>>  Can he not specify all 256 tokens in the YAML of the new 
>> cluster and then copy sstables? 
>> I know it is a bit ugly but should work.
> You can pass a comma separated list of tokens to the 
> -Dcassandra.replace_token JVM param. 
> 
> AFAIK it's not possible to provide the list in the yaml file. 
> 
> Cheers
> A
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/07/2013, at 5:07 AM, Baskar Duraikannu  
> wrote:
> 
>> 
>> I copied the sstables and then ran a repair. It worked. Looks like export 
>> and import may have been much faster given that we had very little data.
>> 
>> Thanks everyone.
>> 
>> 
>> 
>> 
>> On Tue, Jul 9, 2013 at 1:34 PM, sankalp kohli  wrote:
>> Hi Aaron,
>>  Can he not specify all 256 tokens in the YAML of the new 
>> cluster and then copy sstables? 
>> I know it is a bit ugly but should work.
>> 
>> Sankalp
>> 
>> 
>> On Tue, Jul 9, 2013 at 3:19 AM, Baskar Duraikannu 
>>  wrote:
>> Thanks Aaron
>> 
>> On 7/9/13, aaron morton  wrote:
>> >> Can I just copy data files for the required keyspaces, create schema
>> >> manually and run repair?
>> > If you have something like RF 3 and 3 nodes then yes, you can copy the data
>> > from one node in the source cluster to all nodes in the dest cluster and 
>> > use
>> > cleanup to remove the unneeded data. Because each node in the source 
>> > cluster
>> > has a full copy of the data.
>> >
>> > If that's not the case you cannot copy the data files, even if they have 
>> > the
>> > same number of nodes, because the nodes in the dest cluster will have
>> > different tokens. AFAIK you need to export the full data set from the 
>> > source
>> > DC and then import it into the dest system.
>> >
>> > The Bulk Load utility may be of help
>> > http://www.datastax.com/docs/1.2/references/bulkloader . You could copy the
>> > SSTables from every node in the source system and bulk load them into the
>> > dest system. That process will ensure rows are sent to nodes that are
>> > replicas.
>> >
>> > Cheers
>> >
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Consultant
>> > New Zealand
>> >
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 9/07/2013, at 12:45 PM, Baskar Duraikannu
>> >  wrote:
>> >
>> >> We have two clusters used by two different groups with vnodes enabled. Now
>> >> there is a need to move some of the keyspaces from cluster 1 to cluster 2.
>> >>
>> >>
>> >> Can I just copy data files for the required keyspaces, create schema
>> >> manually and run repair?
>> >>
>> >> Anything else required?  Please help.
>> >> --
>> >> Thanks,
>> >> Baskar Duraikannu
>> >
>> >
>> 
>> 
> 



Re: Ranged Tombstones causing timeouts, not removed during compaction. How to remove?

2013-07-03 Thread Jeremiah D Jordan
To force clean out a tombstone.

1. Stop doing deletes on the CF, or switch to performing all deletes at ALL
2. Run a full repair of the cluster for that CF.
3. Change GC grace to be small, like 5 seconds or something for that CF
Either:
4. Find all sstables which have that row key in them using sstable2keys/json
5. Use JMX to force those tables to compact with each other.
Or
4. Do a major compaction on that CF.

-Jeremiah

On Jul 3, 2013, at 5:58 PM, Jeff House  wrote:

> 
> We are on 1.2.5 with a 4 node cluster (RF 3) and have a cql3 wide row table. 
> each row has about 2000 columns.   While running some test data through it, 
> it started throwing rpc_timeout errors when returning a couple specific rows 
> (with Consistency ONE).   
> 
> After hunting through sstable2json results and looking at the source for it, 
> it looks like these are Ranged Tombstones.  I see there's a bug filed (and a 
> patch) for this, but is there a way to clear out the tombstones?   I have 
> 'nodetool cleanup'ed, 'nodetool repair'ed and 'nodetool scrub'bed the table, 
> but they just seem to linger as does the problem reading the rows in question.
> 
> Is there a way I can clear this data out and move forward? 
> 
> Thanks,
> 
> -Jeff



Re: Problem with libcassandra

2013-07-02 Thread Jeremiah D Jordan
If you are using 1.2, I would checkout https://github.com/mstump/libcql

-Jeremiah

On Jul 2, 2013, at 5:18 AM, Shubham Mittal  wrote:

> I am trying to run below code, but it gives this error. It compiles without 
> any errors.  Kindly help me.
> (source of the code : 
> http://posulliv.github.io/2011/02/27/libcassandra-sec-indexes/)
> 
> terminate called after throwing an instance of 
> 'org::apache::cassandra::InvalidRequestException'
>   what():  Default TException.
> Aborted
> 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> using namespace std;
> using namespace libcassandra;
> 
> static string host("127.0.0.1");
> static int port= 9160;
> 
> int main()
> {
> 
> CassandraFactory cf(host, port);
> tr1::shared_ptr c(cf.create());
> 
> KeyspaceDefinition ks_def;
> ks_def.setName("demo");
> c->createKeyspace(ks_def);
> 
> ColumnFamilyDefinition cf_def;
> cf_def.setName("users");
> cf_def.setKeyspaceName(ks_def.getName());
> 
> ColumnDefinition name_col;
> name_col.setName("full_name");
> name_col.setValidationClass("UTF8Type");
> 
> ColumnDefinition sec_col;
> sec_col.setName("birth_date");
> sec_col.setValidationClass("LongType");
> sec_col.setIndexType(org::apache::cassandra::IndexType::KEYS);
> 
> ColumnDefinition third_col;
> third_col.setName("state");
> third_col.setValidationClass("UTF8Type");
> third_col.setIndexType(org::apache::cassandra::IndexType::KEYS);
> 
> cf_def.addColumnMetadata(name_col);
> cf_def.addColumnMetadata(sec_col);
> cf_def.addColumnMetadata(third_col);
> 
> c->setKeyspace(ks_def.getName());
> c->createColumnFamily(cf_def);
> 
> return 0;
> }
>