Re: How to come up with a predefined topology
Thanks. Some follow up questions : 1. How do the reads use strategy/snitch information ? I am assuming the reads can go to any of the replicas. WIll it also use the snitch/strategy info to find next 'R' replicas 'closest' to coordinator-node ? 2. In a single DC ( with n racks and r replicas ) what algorithm cassandra uses to write its replicas in following scenarios : a. nr : I am assuming, have 1 replica in each rack. b. nr : ?? I am assuming, try to equally distribute replicas across in each racks. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs ty...@datastax.com wrote: I highly recommend specifying the same rack for all nodes (using cassandra-topology.properties) unless you really have a good reason not too (and you probably don't). The way that replicas are chosen when multiple racks are in play can be fairly confusing and lead to a data imbalance if you don't catch it. On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee prasen@gmail.com wrote: As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Aha, I was wondering if I could do that as well ( specify rack options ) :) Thanks for the pointer, I will dig into the code. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote: If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit -- Tyler Hobbs DataStax
Re: Why is our range query failing in Cassandra 0.8.10 Client
When executing a query like: get events WHERE Firm=434550 AND ds_timestamp=1341955958200 AND ds_timestamp=1341955958200; what the 2ndary index implementation will do is: 1) it queries the index for Firm for the row with key 434550 (because that's the only one restricted by an equal clause, and that is why you need at least one equal clause). 2) the query from 1 will return a bunch of events row keys for those events whose Firm=434550. So for each of those row key it queries the corresponding event 3) if a given queried event matches the remaining clauses (here ds_timestamp=1341955958200 AND ds_timestamp=1341955958200), it adds it to the result, otherwise it skips. So what I suspect is happening is that you have *lots* of events matching 'Firm=434550' but only one matches 'ds_timestamp=1341955958200 AND ds_timestamp=1341955958200'. And given that by default, it tries to find 100 results, it will scan all the events having 'Firm=434550' before returning. Which it probably cannot do within the timeout. But when you do get events WHERE Firm=434550 AND ds_timestamp=1341955958200 given that lots of event having 'Firm=434550' probably match ds_timestamp=1341955958200, it is able to find 100 of them quickly. Lastly, when you do get events WHERE Firm=434550 AND ds_timestamp=1341955958200; the implementation has now both clause that are equal, and based on internal stats it is able to determine that querying the ds_timestamp index will discriminate potential results more efficiently. So it will query the ds_timestamp index instead of the Firm one, which will yield all events whose timestamp is 1341955958200, but since there isn't many such events, it quickly finds the one matching Firm=434550 In other words, you are not doing something wrong, you are just hitting a limitation/weakness of the 2ndary index implementation. So if the query you really want to do is for a specific timestamp, you definitively want to use an equal rather than two non-strict inequalities. But if what you want is query events in a very small but non-discrete window of time, then using 2ndary indexes might just not fit the bill. In that case, one option would be to do a custom/specialized index yourself. If you construct an index where the row key is the Firm and the column name is the ds_timestamp (and the colum value is the event identifier), then finding events having a specific firm for a time window will be (always) efficient. However that's more work on your side since unfortunately Cassandra is not able to automatically create such specialized index itself (at least not yet). -- Sylvain On Wed, Jul 11, 2012 at 10:01 PM, JohnB j...@tiac.net wrote: Hi: We are currently using Cassandra 0.8.10 and have run into some strange issues surrounding querying for a range of data I ran a couple of get statements via the Cassandra client and found some interesting results: Consider the following Column Family Definition: ColumnFamily: events Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.2953125/1440/63 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [events.events_Firm_idx, events.events_OrdType_idx, events.events_OrderID_idx , events.events_OrderQty_idx, events.events_Price_idx, events.events_Symbol_idx, events.events_ds_timestamp_idx] Column Metadata: Column Name: Firm Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_Firm_idx Index Type: KEYS Column Name: OrdType Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_OrdType_idx Index Type: KEYS Column Name: OrderID Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_OrderID_idx Index Type: KEYS Column Name: OrderQty Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: events_OrderQty_idx Index Type: KEYS Column Name: Price Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: events_Price_idx Index Type: KEYS Column Name: Symbol Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_Symbol_idx Column Name: ds_timestamp Validation Class: org.apache.cassandra.db.marshal.LongType Index Name:
Re: failed to delete commitlog, cassandra can't accept writes
On Tue, 10 Jul 2012 14:35:23 -0700, Frank Hsueh wrote: after reading the JIRA, I decided to use Java 6. It has nothing to do with the JDK. I can reproduce it with either JDK6 or JDK7 as well. anybody seen this before? is this related to 4337 ? It's exactly that. -h
Increased replication factor not evident in CLI
We recently increased the replication factor of a keyspace in our cassandra 1.1.1 cluster from 2 to 4. This was done by setting the replication factor to 4 in cassandra-cli, and then running a repair on each node. Everything seems to have worked; the commands completed successfully and disk usage increased significantly. However, if I perform a describe on the keyspace, it still shows replication_factor:2. So, it appears that the replication factor might be 4, but it reports as 2. I'm not entirely sure how to confirm one or the other. Since then, I've stopped and restarted the cluster, and even ran an upgradesstables on each node. The replication factor still doesn't report as I would expect. Am I missing something here? - .Dustin
How to speed up data loading
I am loading a large set of data into a CF with composite key. The load is going pretty slow, hundreds or even thousands times slower than it would do in RDBMS. I have a choice of how granular my physical key (the first component of the primary key) is, this way I can balance between smaller rows and too many keys vs. wide rows and fewer keys. What are the guidelines about this? How the width of the physical row affects the speed of load? I see that Cassandra is doing a lot of processing behind the scene, even when I kill the client, the server is still consuming a lot of CPU for a long time. What else should I look at ? Anything in configuration? This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.
Re: Connected file list in Cassandra
Can pages appear in many documents ? If not try this Document CF: row_key: doc_id column: page_number:page_id page_number is the order of pages, page_id is the row key for below Page CF: row_key: page_id columns: - doc_id - page_data If you know the page_id, read the doc_id from Page CF, then iterate over the Document CF and read from Page CF. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 7:47 AM, David Brosius wrote: why not just hold the pages as different columns in the same row? columns are automatically sorted such that if the column name was associated with the page number it would automatically flow the way you wanted. - Original Message - From: Tomek Hankus tom...@gmail.com Sent: Wed, July 11, 2012 14:34 Subject: Connected file list in Cassandra Hi, at the moment I'm doing research about keeping linked/connected file list in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where first page is connected to second, second to third etc. This files connec tion/lin k is not specified. Main goal is to be able to get all linked files (the whole PDF/ all pages) while having only key to first file (page). Is there any Cassandra tool/feature which could help me to do that or the only way is to create some wrapper holding keys relations? Tom H
Re: Composite column/key creation via Hector
You may have better luck on the Hector Mailing list… https://groups.google.com/forum/?fromgroups#!forum/hector-users Here is something I found in the docs though http://hector-client.github.com/hector/build/html/content/composite_with_templates.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 9:04 AM, Michael Cherkasov wrote: Hi all, What is the right way to create CF with dynamic composite column and composite key? Now I use code like this: private static final String DEFAULT_DYNAMIC_COMPOSITE_ALIAES = (a=AsciiType,b=BytesType,i=IntegerType,x=LexicalUUIDType,l=LongType,t=TimeUUIDType,s=UTF8Type,u=UUIDType,A=AsciiType(reversed=true),B=BytesType(reversed=true),I=IntegerType(reversed=true),X=LexicalUUIDType(reversed=true),L=LongType(reversed=true),T=TimeUUIDType(reversed=true),S=UTF8Type(reversed=true),U=UUIDType(reversed=true)); for composite columns: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.DYNAMICCOMPOSITETYPE ); columnFamilyDefinition.setComparatorTypeAlias( DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( TestCase ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); ColumnFamilyDefinition cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.UTF8TYPE.getClassName() ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); for keys: columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.UTF8TYPE ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( Parameter ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.DYNAMICCOMPOSITETYPE.getClassName() + DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); Does it correct code? Do I really need so terrible DEFAULT_DYNAMIC_COMPOSITE_ALIAES ?
Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
It's always a good idea to have a read of the NEWS.txt file https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 5:51 PM, Tyler Hobbs wrote: On Wed, Jul 11, 2012 at 8:38 PM, Roshan codeva...@gmail.com wrote: Currently we are using Cassandra 1.0.6 in our production system but suffer with the CASSANDRA-3616 (it is already fixed in 1.0.7 version). We thought to upgrade the Cassandra to 1.1.X versions, to get it's new features, but having some concerns about the upgrade and expert advices are mostly welcome. 1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit logs, etc without ant issue? And vise versa. Because if something happens to 1.1.X after deployed to production, we want to downgrade to 1.0.6 version (because that's the versions we tested with our applications). 1.1 can handle 1.0 data/schemas/etc without a problem, but the reverse is not necessarily true. I don't know what in particular might break if you downgrade from 1.1 to 1.0, but in general, Cassandra does not handle downgrading gracefully; typically the SSTable formats have changed during major releases. If you snapshot prior to upgrading, you can always roll back to that, but you will have lost anything written since the upgrade. 2. How do we need to do upgrade process? Currently we have 3 node 1.0.6 cluster in production. Can we upgrade node by node? If we upgrade node by node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue? Yes, you can do a rolling upgrade to 1.1, one node at a time. It's usually fine to leave the cluster in a mixed state for a short while as long as you don't do things like repairs, decommissions, or bootstraps, but I wouldn't stay in a mixed state any longer than you have to. It's best to test major upgrades with a second, non-production cluster if that's an option. -- Tyler Hobbs DataStax
Re: How to come up with a predefined topology
WIll it also use the snitch/strategy info to find next 'R' replicas 'closest' to coordinator-node ? yes. 2. In a single DC ( with n racks and r replicas ) what algorithm The logic is here https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78 a. nr : I am assuming, have 1 replica in each rack. You have 1 replica in the first n racks. b. nr : ?? I am assuming, try to equally distribute replicas across in each racks. int(n/r) racks will have the same number of replicas. n % r will have more. This is why multi rack replication can be tricky. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote: Thanks. Some follow up questions : 1. How do the reads use strategy/snitch information ? I am assuming the reads can go to any of the replicas. WIll it also use the snitch/strategy info to find next 'R' replicas 'closest' to coordinator-node ? 2. In a single DC ( with n racks and r replicas ) what algorithm cassandra uses to write its replicas in following scenarios : a. nr : I am assuming, have 1 replica in each rack. b. nr : ?? I am assuming, try to equally distribute replicas across in each racks. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs ty...@datastax.com wrote: I highly recommend specifying the same rack for all nodes (using cassandra-topology.properties) unless you really have a good reason not too (and you probably don't). The way that replicas are chosen when multiple racks are in play can be fairly confusing and lead to a data imbalance if you don't catch it. On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee prasen@gmail.com wrote: As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Aha, I was wondering if I could do that as well ( specify rack options ) :) Thanks for the pointer, I will dig into the code. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote: If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit -- Tyler Hobbs DataStax
Re: Composite column/key creation via Hector
BTW, an issue was just fixed with dynamic columns in hector, you might want to try trunk. https://github.com/hector-client/hector/commit/2910b484629add683f61f392553e824c291fb6eb On 07/12/2012 06:25 PM, aaron morton wrote: You may have better luck on the Hector Mailing list… https://groups.google.com/forum/?fromgroups#!forum/hector-users https://groups.google.com/forum/?fromgroups#%21forum/hector-users Here is something I found in the docs though http://hector-client.github.com/hector/build/html/content/composite_with_templates.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 9:04 AM, Michael Cherkasov wrote: Hi all, What is the right way to create CF with dynamic composite column and composite key? Now I use code like this: private static final String DEFAULT_DYNAMIC_COMPOSITE_ALIAES = (a=AsciiType,b=BytesType,i=IntegerType,x=LexicalUUIDType,l=LongType,t=TimeUUIDType,s=UTF8Type,u=UUIDType,A=AsciiType(reversed=true),B=BytesType(reversed=true),I=IntegerType(reversed=true),X=LexicalUUIDType(reversed=true),L=LongType(reversed=true),T=TimeUUIDType(reversed=true),S=UTF8Type(reversed=true),U=UUIDType(reversed=true)); for composite columns: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.DYNAMICCOMPOSITETYPE ); columnFamilyDefinition.setComparatorTypeAlias( DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( TestCase ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); ColumnFamilyDefinition cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.UTF8TYPE.getClassName() ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); for keys: columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.UTF8TYPE ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( Parameter ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.DYNAMICCOMPOSITETYPE.getClassName() + DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); Does it correct code? Do I really need so terrible DEFAULT_DYNAMIC_COMPOSITE_ALIAES ?
Re: Increased replication factor not evident in CLI
Do multiple nodes say the RF is 2 ? Can you show the output from the CLI ? Do show schema and show keyspace say the same thing ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 7:39 AM, Dustin Wenz wrote: We recently increased the replication factor of a keyspace in our cassandra 1.1.1 cluster from 2 to 4. This was done by setting the replication factor to 4 in cassandra-cli, and then running a repair on each node. Everything seems to have worked; the commands completed successfully and disk usage increased significantly. However, if I perform a describe on the keyspace, it still shows replication_factor:2. So, it appears that the replication factor might be 4, but it reports as 2. I'm not entirely sure how to confirm one or the other. Since then, I've stopped and restarted the cluster, and even ran an upgradesstables on each node. The replication factor still doesn't report as I would expect. Am I missing something here? - .Dustin
Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
Thanks Aaron. My major concern is upgrade node by node. Because currently we are using 1.0.6 in production and plan is to upgrade singe node to 1.1.2 at a time. Any comments? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Concerns-about-Cassandra-upgrade-from-1-0-6-to-1-1-X-tp7581197p7581221.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Increased replication factor not evident in CLI
Possibly the bug with nanotime causing cassandra to think the change happened in the past. Talked about onlist in past few days. On Thursday, July 12, 2012, aaron morton aa...@thelastpickle.com wrote: Do multiple nodes say the RF is 2 ? Can you show the output from the CLI ? Do show schema and show keyspace say the same thing ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 7:39 AM, Dustin Wenz wrote: We recently increased the replication factor of a keyspace in our cassandra 1.1.1 cluster from 2 to 4. This was done by setting the replication factor to 4 in cassandra-cli, and then running a repair on each node. Everything seems to have worked; the commands completed successfully and disk usage increased significantly. However, if I perform a describe on the keyspace, it still shows replication_factor:2. So, it appears that the replication factor might be 4, but it reports as 2. I'm not entirely sure how to confirm one or the other. Since then, I've stopped and restarted the cluster, and even ran an upgradesstables on each node. The replication factor still doesn't report as I would expect. Am I missing something here? - .Dustin
Re: BulkLoading sstables from v1.0.3 to v1.1.1
Historically you have not been able to stream stables between different file formats. Cassandra 1.0 creates files named hc . While 1.1 uses hd. Since bulk loading streams I am not sure this will work. On Thursday, July 12, 2012, aaron morton aa...@thelastpickle.com wrote: Do you have the full error logs ? Their should be a couple of caused by: errors that will help track it down where the original Assertion is thrown. The second error is probably the result of the first. Something has upset the SSTable tracking. If you can get the full error stack, and some steps to reproduce, can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/07/2012, at 7:43 PM, rubbish me wrote: Thanks Ivo. We are quite close to releasing so we'd hope to understand what causing the error and may try to avoid it where possible. As said, it seems to work ok the first time round. The problem you referring in the last mail, was it restricted to bulk loading or otherwise? Thanks -A Ivo Meißner i...@overtronic.com 於 10 Jul 2012 07:20 寫道: Hi, there are some problems in version 1.1.1 with secondary indexes and key caches that are fixed in 1.1.2. I would try to upgrade to 1.1.2 and see if the error still occurs. Ivo Hi As part of a continuous development of a system migration, we have a test build to take a snapshot of a keyspace from cassandra v 1.0.3 and bulk load it to a cluster of 1.1.1 using the sstableloader.sh. Not sure if relevant, but one of the cf contains a secondary index. The build basically does: Drop the destination keyspace if exist Add the destination keyspace, wait for schema to agree
Re: Increased replication factor not evident in CLI
Sounds a lot like a bug that I hit that was filed and fixed recently: https://issues.apache.org/jira/browse/CASSANDRA-4432 -Mike On Jul 12, 2012, at 8:16 PM, Edward Capriolo wrote: Possibly the bug with nanotime causing cassandra to think the change happened in the past. Talked about onlist in past few days. On Thursday, July 12, 2012, aaron morton aa...@thelastpickle.com wrote: Do multiple nodes say the RF is 2 ? Can you show the output from the CLI ? Do show schema and show keyspace say the same thing ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 7:39 AM, Dustin Wenz wrote: We recently increased the replication factor of a keyspace in our cassandra 1.1.1 cluster from 2 to 4. This was done by setting the replication factor to 4 in cassandra-cli, and then running a repair on each node. Everything seems to have worked; the commands completed successfully and disk usage increased significantly. However, if I perform a describe on the keyspace, it still shows replication_factor:2. So, it appears that the replication factor might be 4, but it reports as 2. I'm not entirely sure how to confirm one or the other. Since then, I've stopped and restarted the cluster, and even ran an upgradesstables on each node. The replication factor still doesn't report as I would expect. Am I missing something here? - .Dustin
Re: is this something to be concerned about - MUTATION message dropped
oh. darn. I was hoping for something like, here's the data you requested, and by the way, latencies are 80% to the point of timeout; might want to back off a little mx4j it is. On Wed, Jul 11, 2012 at 10:46 PM, Tyler Hobbs ty...@datastax.com wrote: JMX is really the only way it exposes that kind of information. I recommend setting up mx4j if you want to check on the server stats programmatically. On Wed, Jul 11, 2012 at 8:17 PM, Frank Hsueh frank.hs...@gmail.comwrote: out of curiosity, is there a way that Cassandra can communicate that it's close to the being overloaded ? On Sun, Jun 17, 2012 at 6:29 PM, aaron morton aa...@thelastpickle.comwrote: http://wiki.apache.org/cassandra/FAQ#dropped_messages https://www.google.com/#q=cassandra+dropped+messages Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote: INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java (line 615) 15 MUTATION message dropped in last 5000ms ** ** It is at INFO level so I’m inclined to think not but is seems like whenever messages are dropped there may be some issue? -- Frank Hsueh | frank.hs...@gmail.com -- Tyler Hobbs DataStax http://datastax.com/ -- Frank Hsueh | frank.hs...@gmail.com
Re: Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost
Hi After change the parameter of concurrent compactor, we can limit Cassandra to use 100% of one core at that moment. (concurrent_compactors: 1) And I got the stack of the crazy thread, it last 2~3 minutes, on same stack. Any clue of this issue? Thread 18114: (state = IN_JAVA) - java.util.AbstractList$Itr.hasNext() @bci=8, line=339 (Compiled frame; information may be imprecise) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(org.apache.cassandra.db.ColumnFamily, int) @bci=6, line=841 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(org.apache.cassandra.db.ColumnFamily, int) @bci=17, line=835 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(org.apache.cassandra.db.ColumnFamily, int) @bci=8, line=826 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(org.apache.cassandra.db.DecoratedKey, org.apache.cassandra.db.compaction.CompactionController, org.apache.cassandra.db.ColumnFamily) @bci=38, line=77 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.init(org.apache.cassandra.db.compaction.CompactionController, java.util.List) @bci=33, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(java.util.List) @bci=223, line=133 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=44, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=1, line=87 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88, line=116 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5, line=99 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) @bci=542, line=141 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117, line=134 (Interpreted frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1, line=114 (Interpreted frame) - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame) - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame) - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) BRs //Jason 2012/7/11 Jason Tang ares.t...@gmail.com Hi I encounter the High CPU problem, Cassandra 1.0.3, happened on both sized and leveled compaction, 6G heap, 64bit Oracle java. For normal traffic, Cassandra will use 15% CPU. But every half a hour, Cassandra will use almost 100% total cpu (SUSE, 12 Core). And here is the top information for that moment. #top -H -p 12451 top - 12:30:14 up 15 days, 12:49, 6 users, load average: 10.52, 8.92, 8.14 Tasks: 706 total, 21 running, 685 sleeping, 0 stopped, 0 zombie Cpu(s): 25.7%us, 14.0%sy, 48.9%ni, 6.5%id, 0.0%wa, 0.0%hi, 4.9%si, 0.0%st Mem: 24150M total,12218M used,11932M free, 142M buffers Swap:0M total,0M used,0M free, 3714M cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 20291 casadm24 4 8003m 5.4g 167m R 92 22.7 0:42.46 java 20276 casadm24 4 8003m 5.4g 167m R 88 22.7 0:43.88 java 20181 casadm24 4 8003m 5.4g 167m R 86 22.7 0:52.97 java 20213 casadm24 4 8003m 5.4g 167m R 85 22.7 0:49.21 java 20188 casadm24 4 8003m 5.4g 167m R 82 22.7 0:54.34 java 20268 casadm24 4 8003m 5.4g 167m R 81 22.7 0:46.25 java 20269 casadm24 4 8003m 5.4g 167m R 41 22.7 0:15.11 java 20316 casadm24 4 8003m 5.4g 167m S 20 22.7 0:02.35 java 20191 casadm24 4 8003m 5.4g 167m R 15 22.7 0:16.85 java 12500 casadm20 0 8003m 5.4g 167m R6 22.7 1:07.86 java 15245 casadm20 0 8003m 5.4g 167m D5 22.7 0:36.45 java Jstack can not print the stack. Thread 20291: (state = IN_JAVA) Error occurred during stack walking: ... Thread 20276: (state = IN_JAVA) Error occurred during stack walking: After it come back, the stack shows: Thread 20291: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long)
High RecentWriteLatencyMicro
Hi As I understand that writes in cassandra are directly pushed to memory and using counters with CL.ONE shouldn't take the read latency for counters in account. So Writes for incrementing counters with CL.ONE should basically be really fast. But in my 8 node cluster(16 core/32G ram/cassandra1.0.5/java7 each) with RF=2, At a traffic of 55k qps = 14k increments per node/7k write requests per node, the write latency(from jmx) increases to around 7-8 ms from the low traffic value of 0.5ms. The Nodes aren't even pushed with absent I/O, lots of free RAM and 30% CPU idle time/OS Load 20. The write latency by cfstats (supposedly the latency for 1 node to increment its counter) is a small amount ( 0.05ms). 1) Is the whole of 7-8ms being spent in thrift overheads and Scheduling delays ? (there is insignificant .1ms ping time between machines) 2) Do keeping a large number of CF(17 in our case) adversely affect write performance? (except from the extreme flushing scenario) 3) I see a lot of threads(4,000-10,000) with names like pool-2-thread-* (pointed out as client-connection-threads on the mailing list before) periodically forming up. but with idle cpu time and zero pending tasks in tpstats, why do requests keep piling up (GC stops threads for 100ms every 1-2 seconds, effectively pausing cassandra 5-10% of its time, but this doesn't seem to be the reason) Thanks Rohit
Re: How to come up with a predefined topology
On Fri, Jul 13, 2012 at 4:04 AM, aaron morton aa...@thelastpickle.com wrote: The logic is here https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78 Thanks Aaron for pointing to the code. a. nr : I am assuming, have 1 replica in each rack. You have 1 replica in the first n racks. b. nr : ?? I am assuming, try to equally distribute replicas across in each racks. int(n/r) racks will have the same number of replicas. n % r will have more. Did you mean r%n ( since rn) ? Shouldn't the logic be : all racks will have at least int(r/n) and r%n will have 1 additional replica ? Sample use case ( r = 8, n = 3 ) n1 : 3 ( 2+1 ) n2: 3 ( 2+1 ) n3: 2 Is the above understanding correct ? -Thanks, Prasenjit This is why multi rack replication can be tricky. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote: Thanks. Some follow up questions : 1. How do the reads use strategy/snitch information ? I am assuming the reads can go to any of the replicas. WIll it also use the snitch/strategy info to find next 'R' replicas 'closest' to coordinator-node ? 2. In a single DC ( with n racks and r replicas ) what algorithm cassandra uses to write its replicas in following scenarios : a. nr : I am assuming, have 1 replica in each rack. b. nr : ?? I am assuming, try to equally distribute replicas across in each racks. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs ty...@datastax.com wrote: I highly recommend specifying the same rack for all nodes (using cassandra-topology.properties) unless you really have a good reason not too (and you probably don't). The way that replicas are chosen when multiple racks are in play can be fairly confusing and lead to a data imbalance if you don't catch it. On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee prasen@gmail.com wrote: As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Aha, I was wondering if I could do that as well ( specify rack options ) :) Thanks for the pointer, I will dig into the code. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote: If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit -- Tyler Hobbs DataStax