Re: Sharing Cassandra with Solandra
On 6/27/2011 3:39 PM, David Strauss wrote: On Mon, 2011-06-27 at 15:06 -0600, AJ wrote: Would anyone care to talk about their experiences with using Solandra along side another application that uses Cassandra (also on the same node)? I'm curious about any resource contention issues or compatibility between C* versions and Sol. Also, I read the developer somewhere say that you have to run Solandra on every C* node in the ring. I'm not sure if I interpreted that correctly. Also, what's the index size to data size ratio to expect (ballpark)? How does it perform? Any caveats? We're currently keeping the clusters separate at Pantheon Systems because our core API (which runs on standard Cassandra) is often ready for the next Cassandra version at a different time than Solandra. Solandra recently gained dual 0.7/0.8 support, but we're still opting to use the version on Cassandra that Solandra is primarily being built and tested on (which is currently 0.8). Thanks. But, I'm finally cluing in that Solandra is also developed by DataStax, so I feel safer about future compatibility.
Ec2 snitch with network topology strategy
I was thinking of leveraging ec2 snitch. But my question is then how do I give replica placement options? Or can I give snitch as ec2snitch and write the nodes cassandra-topology.prop and in give locator strategy at time of creating keyspace as network topology strategy. But will it work? And those who are struggling to deploy cassandra with across ec2 regions. 1. approach is to use milind's patch, it works but has some limitation. https://issues.apache.org/jira/browse/CASSANDRA-2362 2. openvpn is a good option but neverthless is futile with encryption available in 0.8.0 cassandra 3. Vijay has come up with a patch and so far tested I have not seen any jerks. https://issues.apache.org/jira/browse/CASSANDRA-2452 - its marked to be there in 0.8.2 release. -pankaj -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ec2-snitch-with-network-topology-strategy-tp6528188p6528188.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
question on capacity planning
if I'm planning to store 20TB of new data per week, and expire all data every 2 weeks, with a replication factor of 3, do I only need approximately 120 TB of disk? I'm going to use ttl in my column values to automatically expire data. Or would I need more capacity to handle sstable merges? Given this amount of data, would you recommend node storage at 2TB per node or more? This application will have a heavy write /moderate read use profile. -- Arun
Re: Ec2 snitch with network topology strategy
Hmm... Just tested the config. It works, got confused with the options, my bad. On Wed, Jun 29, 2011 at 2:26 PM, pankajsoni0126 pankajsoni0...@gmail.comwrote: I was thinking of leveraging ec2 snitch. But my question is then how do I give replica placement options? Or can I give snitch as ec2snitch and write the nodes cassandra-topology.prop and in give locator strategy at time of creating keyspace as network topology strategy. But will it work? And those who are struggling to deploy cassandra with across ec2 regions. 1. approach is to use milind's patch, it works but has some limitation. https://issues.apache.org/jira/browse/CASSANDRA-2362 2. openvpn is a good option but neverthless is futile with encryption available in 0.8.0 cassandra 3. Vijay has come up with a patch and so far tested I have not seen any jerks. https://issues.apache.org/jira/browse/CASSANDRA-2452 - its marked to be there in 0.8.2 release. -pankaj -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ec2-snitch-with-network-topology-strategy-tp6528188p6528188.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Cannot set column value to zero
I had a strange problem recently where I was unable to set the value of a column to '0' (it always returned '1') but setting it to other values worked fine: [default@Test] set Urls['rowkey']['status']='1'; Value inserted. [default@Test] get Urls['rowkey']; = (column=status, value=1, timestamp=1309189541891000) Returned 1 results. [default@Test] set Urls['rowkey']['status']='0'; Value inserted. [default@Test] get Urls['rowkey']; = (column=status, value=1, timestamp=1309189551407616) Returned 1 results. This was on a one-node test cluster (v0.7.6) with no other clients; setting other values (e.g. '9') worked fine. However, attempting to set the value back to '0' always resulted in a value of '1'. I noticed this shortly after truncating the CF. The column family was shown as follows below. One thing that looks odd is that on other test clusters the Column Name is followed by a reference to the index, e.g. Column Name: status (737461747573) - but here it isn't. I was wondering if there was some interaction between truncating the CF and the use of a KEYS index? (Presumably it would be safer to delete all data directories in order to wipe the cluster during experimentation, rather than truncating?) Unfortunately I'm not sure how to recreate the situation as this was a test machine on which I played around with various configurations - but maybe someone has seen a similar problem elsewhere? In the end I had to wipe the data and start again, and all seemed fine, although the index reference is still absent as mentioned above. [default@Test] describe keyspace; Keyspace: Test: ... ColumnFamily: Foo default_validation_class: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 0.0/14400 Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [Foo.737461747573] Column Metadata: Column Name: status Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Type: KEYS ... This message was sent using IMP, the Internet Messaging Program. This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Re: question on capacity planning
On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun arun.ja...@disney.com wrote: if I'm planning to store 20TB of new data per week, and expire all data every 2 weeks, with a replication factor of 3, do I only need approximately 120 TB of disk? I'm going to use ttl in my column values to automatically expire data. Or would I need more capacity to handle sstable merges? Given this amount of data, would you recommend node storage at 2TB per node or more? This application will have a heavy write /moderate read use profile. You'll need extra space for both compaction and the overhead in the storage format. As to the amount of storage per node, that depends on your latency and throughput requirements. -ryan
Data storage security
Are there any options to encrypt the column families when they are stored in the database. Say in a given keyspace some CF has sensitive info and I don't want a 'select *' of that CF to layout the data in plain text. Thanks.
Re: Data storage security
On Wed, Jun 29, 2011 at 12:37 PM, A J s5a...@gmail.com wrote: Are there any options to encrypt the column families when they are stored in the database. Say in a given keyspace some CF has sensitive info and I don't want a 'select *' of that CF to layout the data in plain text. Thanks. I think this is an application layer issue - just encrypt/decrypt there. The data stored within the column value can be any arbitrary bytes, and since column data is not indexed it wont affect how you can access the data with Cassandra in any way. -Eric
Re: custom reconciling columns? (improve performance of long rows )
I hacked around the code, and first I thought that the cost on map put and get was due to the synchronization cost , so I tried replacing concurrentSkipListMap with TreeMap. I created a subclass of ColumnFamily and use the subclass only in pure read path : interestingly on the read path, no more than one thread accesses the return CF at any time, so we can remove the concurrency control. but it did not offer any significant change in speed. then I tried changing TreeMap to HashMap, this time, it uses only half the time. but the problem is how to keep the sorted output. doing a sort on every return is going to be even slower... On Tue, Jun 28, 2011 at 10:07 PM, Yang tedd...@gmail.com wrote: btw I use only one box now just because I'm running it on dev junit test, not that it's going to be that way in production On Tue, Jun 28, 2011 at 10:06 PM, Yang tedd...@gmail.com wrote: ok, here is the profiling result. I think this is consistent (having been trying to recover how to effectively use yourkit ...) see attached picture since I actually do not use the thrift interface, but just directly use the thrift.CassandraServer and run my code in the same JVM as cassandra, and was running the whole thing on a single box, there is no message serialization/deserialization cost. but more columns did add on to more time. the time was spent in the ConcurrentSkipListMap operations that implement the memtable. regarding breaking up the row, I'm not sure it would reduce my run time, since our requirement is to read the entire rolling window history (we already have the TTL enabled , so the history is limited to a certain length, but it is quite long: over 1000 , in some cases, can be 5000 or more ) . I think accessing roughly 1000 items is not an uncommon requirement for many applications. in our case, each column has about 30 bytes of data, besides the meta data such as ttl, timestamp. at history length of 3000, the read takes about 12ms (remember this is completely in-memory, no disk access) I just took a look at the expiring column logic, it looks that the expiration does not come into play until when the CassandraServer.internal_get()===thriftifyColumns() gets called. so the above memtable access time is still spent. yes, then breaking up the row is going to be helpful, but only to the degree of preventing accessing expired columns (btw if this is actually built into cassandra code it would be nicer, so instead of spending multiple key lookups, I locate to the row once, and then within the row, there are different generation buckets, so those old generation buckets that are beyond expiration are not read ); currently just accessing the 3000 live columns is already quite slow. I'm trying to see whether there are some easy magic bullets for a drop-in replacement for concurrentSkipListMap... Yang On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall n...@datastax.com wrote: I agree with Aaron's suggestion on data model and query here. Since there is a time component, you can split the row on a fixed duration for a given user, so the row key would become userId_[timestamp rounded to day]. This provides you an easy way to roll up the information for the date ranges you need since the key suffix can be created without a read. This also benefits from spreading the read load over the cluster instead of just the replicas since you have 30 rows in this case instead of one. On Tue, Jun 28, 2011 at 5:55 PM, aaron morton aa...@thelastpickle.com wrote: Can you provide some more info: - how big are the rows, e.g. number of columns and column size ? - how much data are you asking for ? - what sort of read query are you using ? - what sort of numbers are you seeing ? - are you deleting columns or using TTL ? I would consider issues with the data churn, data model and query before looking at serialisation. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 29 Jun 2011, at 10:37, Yang wrote: I can see that as my user history grows, the reads time proportionally ( or faster than linear) grows. if my business requirements ask me to keep a month's history for each user, it could become too slow.- I was suspecting that it's actually the serializing and deserializing that's taking time (I can definitely it's cpu bound) On Tue, Jun 28, 2011 at 3:04 PM, aaron morton aa...@thelastpickle.com wrote: There is no facility to do custom reconciliation for a column. An append style operation would run into many of the same problems as the Counter type, e.g. not every node may get an append and there is a chance for lost appends unless you go to all the trouble Counter's do. I would go with using a row for the user and columns for each item. Then you can have fast no look writes. What problems are you seeing with the reads ?
hadoop results
I'll start with my question: given a CF with comparator TimeUUIDType, what is the most efficient way to get the greatest column's value? Context: I've been running cassandra for a couple of months now, so obviously it's time to start layering more on top :-) In my test environment, I managed to get pig/hadoop running, and developed a few scripts to collect metrics I've been missing since I switched from MySQL to cassandra (including the ever useful select count(*) from table equivalent). I was hoping to dump the results of this processing back into cassandra for use in other tools/processes. My initial thought was: new CF called stats with comparator TimeUUIDType. The basic idea being I'd store: stat_name - time stat was computed (as UUID) - value That way I can also see a historical perspective of any given stat for auditing (and for cumulative stats to see trends). The stat_name itself is a URI that is composed of what and any constraints on the what (including an optional time range, if the stat supports it). E.g. ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still deciding on the format of the URI). But, right now, the only way I know to get the current stat value would be to iterate over all columns (the TimeUUIDs) and then return the last one. Thanks for any tips, will
CQL injection attacks?
Someone asked a while ago whether Cassandra was vulnerable to injection attacks: http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra With Thrift, the answer was 'no'. With CQL, presumably the situation is different, at least until prepared statements are possible (CASSANDRA-2475) ? Has there been any discussion on this already that someone could point me to, please? I couldn't see anything on JIRA (searching for CQL AND injection, CQL AND security, etc). Thanks. This message was sent using IMP, the Internet Messaging Program. This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
RE: RAID or no RAID
With multiple data dirs you are still limited by the space free on any one drive. So if you have two data dirs with 40GB free on each, and you have 50GB to be compacted, it won't work, but if you had a raid, you would have 80GB free and could compact... -Original Message- From: mcasandra [mailto:mohitanch...@gmail.com] Sent: Tuesday, June 28, 2011 7:55 PM To: cassandra-u...@incubator.apache.org Subject: Re: RAID or no RAID aaron morton wrote: Not sure what the intended purpose is, but we've mostly used it as an emergency disk-capacity-increase option Thats what I've used it for. Cheers How does compaction work in terms of utilizing multiple data dirs? Also, is there a reference on wiki somewhere that says not to use multiple data dirs? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or -no-RAID-tp6522904p6527219.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Chunking if size 64MB
From what I read, Cassandra allows a single column value to be up-to 2GB but would chunk the data if greater than 64MB. Is the chunking transparent to the application or does the app need to know if/how/when the chunking happened for a specific column value that happened to be 64MB. Thank you.
api to extract gossiper results
Cassandra uses accrual failure detector to interpret the gossips. Is it somehow possible to extract these (gossip values and results of the failure detector) in an external system ? Thanks
Cassandra client loses connectivity to cluster
In reviewing client logs as part of our Cassandra testing, I noticed several Hector All host pools marked down exceptions in the logs. Further investigation showed a consistent pattern of java.net.SocketException: Broken pipe and java.net.SocketException: Connection reset messages. These errors occur for all 36 hosts in the cluster over a period of seconds, as Hector tries to find a working host to connect to. Failing to find a host results in the All host pools marked down messages. These messages recur for a period ranging from several seconds up to almost 15 minutes, clustering around two to three minutes. Then connectivity returns and when Hector tries to reconnect it succeeds. The clients are instances of a JBoss 5 web application. We use Hector 0.7.0-29 (plus a patch that was pulled in advance of -30) The Cassandra cluster has 72 nodes split between two datacenters. It's running 0.7.5 plus a couple of bug fixes pulled in advance of 0.7.6. The keyspace uses NetworkTopologyStrategy and RF=6 (3 in each datacenter). The clients are reading and writing at LOCAL_QUORUM to the 36 nodes in their own data center. Right now the second datacenter is for failover only, so there are no clients actually writing there. There's nothing else obvious in the JBoss logs at around the same time, e.g. other application errors, GC events. The Cassandra system.log files at INFO level shows nothing out of the ordinary. I have a capture of one of the incidents at DEBUG level where again I see nothing abnormal looking, but there's so much data that it would be easy to miss something. Other observations: * It only happens on weekdays (Our weekends are much lower load) * It has occurred every weekday for the last month except for Monday May 30, the Memorial Day holiday in the US. * Most days it occurs only once, but six times it has occurred twice, never more often than that. * It generally happens in the late afternoon, but there have been occurrences earlier in the afternoon and twice in the late morning. Earliest occurrence is 11:19 am, latest is 18:11 pm. Our peak loads are between 10:00 and 14:00, so most occurrences do *not* correspond with peak load times. * It only happens on a single client JBoss instance at a time. * Generally, it affects a different host each day, but the same host was affected on consecutive days once. * Out of 40 clients, one has been affected three times, seven have been affected twice, 11 have been affected once and 21 have not been affected. * The cluster is lightly loaded. Given that the problem affects a single client machine at a time and that machine loses the ability to connect to the entire cluster, It seems unlikely that the problem is on the C* server side. Even a network problem seems hard to explain, given that the clients are on the same subnet, I would expect all of them to fail if it were a network issue. I'm hoping that perhaps someone has seen a similar issue or can suggest things to try. Thanks in advance for any help! Jim
Re: api to extract gossiper results
A simple solution is to setup log4j to a DEBUG level on Gossip events. You can also use the StorageProxy/Fat client and then participate in gossip. Each system has its own converging view of the ring, thus what your local gossip things is the topology may not be the same across the cluster. Edward On Wed, Jun 29, 2011 at 5:20 PM, A J s5a...@gmail.com wrote: Cassandra uses accrual failure detector to interpret the gossips. Is it somehow possible to extract these (gossip values and results of the failure detector) in an external system ? Thanks
Re: No Transactions: An Example
On 6/22/2011 9:18 AM, Trevor Smith wrote: Right -- that's the part that I am more interested in fleshing out in this post. Here is one way. Use MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control. A single global clean-up process would be acceptable since it's not a single point of failure, only a single point of accumulating back-logged work and will not affect availability as long as you are notified if that process terminates and restart it in a reasonable amount of time but this will not affect the validity of subsequent reads. So, you would have a balance column. And each update will create a balance_timestamp with a positive or negative value indicating a credit or debit. Subsequent clients will read the latest value by doing a slice from balance to balance_~ (i.e. all balance* columns). (You would have to work-out your column naming conventions so that your slices return only the pertinent columns.) Then, the clients would have to apply all the credits and debits to the balance to get the current balance. This handles the lost update problem. For the dirty read and incorrect summary problems by others reading data that is in the middle of a transaction that hasn't committed yet, I would add a final transaction column to a Transactions CF. The key would be cf.key.column, e.g., Accounts.1234.balance, 1234 being the account # and Accounts being the CF owning the balance column. Then, a new column would be added for each successful transaction (e.g., after debiting and crediting the two accounts) using the same timestamp used in balance_timestamp. So, now, a client wanting the current balance would have to do a slice for all of the transactions for that column and only apply the balance updates up to the latest transaction. Note, you might have to do something else with the transaction naming schemes to make sure they are guaranteed to be unique, but you get the idea. If the transaction fails, the client simply does not add a transaction column to Transactions and deletes any balance_timestamp columns it added to in the Accounts CF (or let's the clean-up process do it... carefully). This should avoid the need for locks and as long as each account doesn't have a crazy amount of updates, the slices shouldn't be so large as to be a significant perf hit. A note about the updates. You have to make sure the clean-up process processes the updates in order and only 1 time. If you can't guarantee these, then you'll have to make sure your updates are idempotent and commutative. Oh yeah, and you must use QUORUM read/writes, of course. Any critiques? aj
Re: Cannot set column value to zero
The extra () in the describe keyspace output is only there if the column comparator is the BytesType, the client tries to format the data as UTF8. Dont forget truncate is doing snapshots, so check the snapshots dir and delete things if you are using it a lot for testing. The 0 == 1 thing does not ring any bells. Let us know if it happens again. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 30 Jun 2011, at 02:13, dnalls...@taz.qinetiq.com wrote: I had a strange problem recently where I was unable to set the value of a column to '0' (it always returned '1') but setting it to other values worked fine: [default@Test] set Urls['rowkey']['status']='1'; Value inserted. [default@Test] get Urls['rowkey']; = (column=status, value=1, timestamp=1309189541891000) Returned 1 results. [default@Test] set Urls['rowkey']['status']='0'; Value inserted. [default@Test] get Urls['rowkey']; = (column=status, value=1, timestamp=1309189551407616) Returned 1 results. This was on a one-node test cluster (v0.7.6) with no other clients; setting other values (e.g. '9') worked fine. However, attempting to set the value back to '0' always resulted in a value of '1'. I noticed this shortly after truncating the CF. The column family was shown as follows below. One thing that looks odd is that on other test clusters the Column Name is followed by a reference to the index, e.g. Column Name: status (737461747573) - but here it isn't. I was wondering if there was some interaction between truncating the CF and the use of a KEYS index? (Presumably it would be safer to delete all data directories in order to wipe the cluster during experimentation, rather than truncating?) Unfortunately I'm not sure how to recreate the situation as this was a test machine on which I played around with various configurations - but maybe someone has seen a similar problem elsewhere? In the end I had to wipe the data and start again, and all seemed fine, although the index reference is still absent as mentioned above. [default@Test] describe keyspace; Keyspace: Test: ... ColumnFamily: Foo default_validation_class: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 0.0/14400 Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [Foo.737461747573] Column Metadata: Column Name: status Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Type: KEYS ... This message was sent using IMP, the Internet Messaging Program. This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Re: hadoop results
How about get_slice() with reversed == true and count = 1 to get the highest time UUID ? Or you can also store a column with a magic name that have the value of the timeuuid that is the current metric to use. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 30 Jun 2011, at 06:35, William Oberman wrote: I'll start with my question: given a CF with comparator TimeUUIDType, what is the most efficient way to get the greatest column's value? Context: I've been running cassandra for a couple of months now, so obviously it's time to start layering more on top :-) In my test environment, I managed to get pig/hadoop running, and developed a few scripts to collect metrics I've been missing since I switched from MySQL to cassandra (including the ever useful select count(*) from table equivalent). I was hoping to dump the results of this processing back into cassandra for use in other tools/processes. My initial thought was: new CF called stats with comparator TimeUUIDType. The basic idea being I'd store: stat_name - time stat was computed (as UUID) - value That way I can also see a historical perspective of any given stat for auditing (and for cumulative stats to see trends). The stat_name itself is a URI that is composed of what and any constraints on the what (including an optional time range, if the stat supports it). E.g. ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still deciding on the format of the URI). But, right now, the only way I know to get the current stat value would be to iterate over all columns (the TimeUUIDs) and then return the last one. Thanks for any tips, will
Re: Chunking if size 64MB
AFAIK there is no server side chunking of column values. This link http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage is just suggesting in the app you do not store more than 64MB per column. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 30 Jun 2011, at 07:25, A J wrote: From what I read, Cassandra allows a single column value to be up-to 2GB but would chunk the data if greater than 64MB. Is the chunking transparent to the application or does the app need to know if/how/when the chunking happened for a specific column value that happened to be 64MB. Thank you.