snapshot issue
Hi, I am trying to taking snapshot of my data but faced following error. Please help me to resolve this issue. [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711 Exception in thread main java.io.IOError: java.io.IOException: Cannot run program ln: java.io.IOException: error=12, Cannot allocate memory at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1660) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1686) at org.apache.cassandra.db.Table.snapshot(Table.java:198) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1393) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: Cannot run program ln: java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:475) at org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181) at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:730) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1653) ... 33 more Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory at java.lang.UNIXProcess.init(UNIXProcess.java:164) at java.lang.ProcessImpl.start(ProcessImpl.java:81) at java.lang.ProcessBuilder.start(ProcessBuilder.java:468) ... 37 more -- Thanks Regards *Adeel**Akbar*
RE snapshot issue
Hello, The problem is described here : http://wiki.apache.org/cassandra/Operations The recommended way to avoit it is to use jna. Cheers, Samuel Adeel Akbar adeel.ak...@panasiangroup.com 11/07/2012 11:38 Hi, I am trying to taking snapshot of my data but faced following error. Please help me to resolve this issue. [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711 Exception in thread main java.io.IOError: java.io.IOException: Cannot run program ln: java.io.IOException: error=12, Cannot allocate memory
Reduced key-cache due to memory pressure and cache size estimate
Hi, I'm trying to tune, memtable size, key cache size and heap size on Cassandra 1.1.0 but I keep having memory pressure and reduced cache size. With the following settings: heap size: 10GB (had the same issue with 8GB so I'm testing with increased heap size) memtable_total_space_in_mb: 2GB key_cache_size_in_mb: 2GB (global key cache capacity) Still, heap usage hits flush_largest_memtables_at (= 0.75) many times in a short period of time before hitting reduce_cache_sizes_at (= 0.85) that reduces the cache size and resolves memory pressure. In one instance, cache size is reported to be 1450MB before reduction and ~870MB after reduction, but the gain in heap space due to reduction in cache size is about 3GB. Could it be that the cache size estimate in megabytes isn't accurate? Thanks, Omid
rounded timestamp ?
Greetings. Running (CQL 3) queries like: update users set admin = 1 where corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5'; … I see in logs a row like: StorageProxy.java (line 175) Mutations/ConsistencyLevel are [RowMutation(keyspace='goh_test', key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', modifications=[ColumnFamily(users [admin:false:1@1342006844093000,])])]/ONE If I understand it correctly, that 1342006844093000 is the timestamp in microseconds, getting rounded to milliseconds. If I modify queries in this way: update users using timestamp 1342006844106123 set admin = 1 where corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5'; … the log row becomes: StorageProxy.java (line 175) Mutations/ConsistencyLevel are [RowMutation(keyspace='goh_test', key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', modifications=[ColumnFamily(users [admin:false:1@1342006844106123,])])]/ONE …and what I see is that the timestamp get through NOT rounded, with microseconds precision. We see this behavior using cqlsh, C++ thrift bindings and phpcassa. I guess they all use thrift, and so the rounding happens there. One of the problems is that sometimes it gets rounded up, so it's in the future. But that's just a side effect of rounding, and I can't understand why in the first place there is a rounding. I guess that the second case is just Cassandra correcting the timestamp with data found in the CQL, and maybe thrift is still sending a milliseconds-rounded timestamp, but I still can't see a reason for thrift doing this. Could someone enlighten me a bit on this matter ? -- Marco Matarazzo == Hex Keep == You can learn more about a man in one hour of play than in one year of conversation.” - Plato
RE: help using org.apache.cassandra.cql3
I see. The reason I looked at that package was, I need to use the batch feature, and I could not make it work using thrift with the CF having composite key. It worked fine with the simple key, but not composite, I was getting an error while trying to do the update. Sylvain suggested (in reply to my other posting) that I use cql3 batch statement, but I am not sure how to do it efficiently from Java. Can batch statement be prepared? Is it OK to put 1 of update statements in one batch, with 5 question marks in it? The set that many variables? Maybe I can try small example first, just to see if it works at all. From: Derek Williams [mailto:de...@fyrie.net] Sent: Tuesday, July 10, 2012 7:19 PM To: user@cassandra.apache.org Subject: Re: help using org.apache.cassandra.cql3 On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky lilyev...@mooncapital.commailto:lilyev...@mooncapital.com wrote: I am trying to use the org.apache.cassandra.cql3 package. Having problem connecting to the server using ClientState. I was not sure what to put in the credentials map (I did not set any users/passwords on my server), so I tried setting empty strings for “username” and “password”, setting them to bogus values, passing null to the login method – there was no difference. It does not complain at the login(), but then it complains about setKeyspace(my keyspace), saying that the specified keyspace does not exist (it obviously does exist). The configuration was loaded from cassandra.yaml used by the server. I did not have any problem like this when I used org.apache.cassandra.thrift.Cassandra.Client . What am I doing wrong? I think that package just contains server classes. Everything you need should be in org.apache.cassandra.thrift. To use cql3 I just use the client methods 'execute_cql_query', 'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql version to '3.0.0'. -- Derek Williams This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.
Re: rounded timestamp ?
There is no rounding or correction whatsoever. It just happens that if you don't give a timestamp in CQL, the timestamp is generated server side using that Java System.currentTimeMillis() that only provide milliseconds precision. If you provide your own timestamp however we use it without doing anything with it. -- Sylvain On Wed, Jul 11, 2012 at 1:56 PM, Marco Matarazzo marco.matara...@hexkeep.com wrote: Greetings. Running (CQL 3) queries like: update users set admin = 1 where corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5'; … I see in logs a row like: StorageProxy.java (line 175) Mutations/ConsistencyLevel are [RowMutation(keyspace='goh_test', key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', modifications=[ColumnFamily(users [admin:false:1@1342006844093000,])])]/ONE If I understand it correctly, that 1342006844093000 is the timestamp in microseconds, getting rounded to milliseconds. If I modify queries in this way: update users using timestamp 1342006844106123 set admin = 1 where corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5'; … the log row becomes: StorageProxy.java (line 175) Mutations/ConsistencyLevel are [RowMutation(keyspace='goh_test', key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', modifications=[ColumnFamily(users [admin:false:1@1342006844106123,])])]/ONE …and what I see is that the timestamp get through NOT rounded, with microseconds precision. We see this behavior using cqlsh, C++ thrift bindings and phpcassa. I guess they all use thrift, and so the rounding happens there. One of the problems is that sometimes it gets rounded up, so it's in the future. But that's just a side effect of rounding, and I can't understand why in the first place there is a rounding. I guess that the second case is just Cassandra correcting the timestamp with data found in the CQL, and maybe thrift is still sending a milliseconds-rounded timestamp, but I still can't see a reason for thrift doing this. Could someone enlighten me a bit on this matter ? -- Marco Matarazzo == Hex Keep == You can learn more about a man in one hour of play than in one year of conversation.” - Plato
Re: help using org.apache.cassandra.cql3
When I said to use the BATCH statement I mean't using a query that is a BATCH statement, so something like: BEGIN BATCH INSERT ...; INSERT ...; ... APPLY BATCH; If you want to that from java, you will want to look at the jdbc driver (http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/), though I don't know what is the status of the support for CQL3. On Wed, Jul 11, 2012 at 2:18 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: Is it OK to put 1 of update statements in one batch, with 5 question marks in it? The set that many variables? Yes batch statement can be prepared and in theory there isn't much limit on the number of update statement (nor question marks) you can put in one batch. However, the way C* work best is if you do reasonably sized batches. It's even more true for CQL in the sense that by using a huge batch statement you'll pay the parsing. So you probably want to prepare one batch statement with a reasonable number of statement in it (you'll have to test to find number that give the best performances, but I would typically start with say 50-100 and see if the performance are good enough) and reuse that to insert the data. The other reason why breaking the insert into smallish batches is a good idea is that it allows you to parallelize the insert using multiple threads. And you need to parallelize if you want to get the best out of C*. -- Sylvain Maybe I can try small example first, just to see if it works at all. From: Derek Williams [mailto:de...@fyrie.net] Sent: Tuesday, July 10, 2012 7:19 PM To: user@cassandra.apache.org Subject: Re: help using org.apache.cassandra.cql3 On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: I am trying to use the org.apache.cassandra.cql3 package. Having problem connecting to the server using ClientState. I was not sure what to put in the credentials map (I did not set any users/passwords on my server), so I tried setting empty strings for “username” and “password”, setting them to bogus values, passing null to the login method – there was no difference. It does not complain at the login(), but then it complains about setKeyspace(my keyspace), saying that the specified keyspace does not exist (it obviously does exist). The configuration was loaded from cassandra.yaml used by the server. I did not have any problem like this when I used org.apache.cassandra.thrift.Cassandra.Client . What am I doing wrong? I think that package just contains server classes. Everything you need should be in org.apache.cassandra.thrift. To use cql3 I just use the client methods 'execute_cql_query', 'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql version to '3.0.0'. -- Derek Williams This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.
RE: help using org.apache.cassandra.cql3
Thanks Sylvain, I actually tried the prepared batch, works fine. I did the 1000 rows in one batch, 20 columns each, and it was good. Then I tried 1, and it still works, I am going to measure which way it is faster overall. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Wednesday, July 11, 2012 9:32 AM To: user@cassandra.apache.org Subject: Re: help using org.apache.cassandra.cql3 When I said to use the BATCH statement I mean't using a query that is a BATCH statement, so something like: BEGIN BATCH INSERT ...; INSERT ...; ... APPLY BATCH; If you want to that from java, you will want to look at the jdbc driver (http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/), though I don't know what is the status of the support for CQL3. On Wed, Jul 11, 2012 at 2:18 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: Is it OK to put 1 of update statements in one batch, with 5 question marks in it? The set that many variables? Yes batch statement can be prepared and in theory there isn't much limit on the number of update statement (nor question marks) you can put in one batch. However, the way C* work best is if you do reasonably sized batches. It's even more true for CQL in the sense that by using a huge batch statement you'll pay the parsing. So you probably want to prepare one batch statement with a reasonable number of statement in it (you'll have to test to find number that give the best performances, but I would typically start with say 50-100 and see if the performance are good enough) and reuse that to insert the data. The other reason why breaking the insert into smallish batches is a good idea is that it allows you to parallelize the insert using multiple threads. And you need to parallelize if you want to get the best out of C*. -- Sylvain Maybe I can try small example first, just to see if it works at all. From: Derek Williams [mailto:de...@fyrie.net] Sent: Tuesday, July 10, 2012 7:19 PM To: user@cassandra.apache.org Subject: Re: help using org.apache.cassandra.cql3 On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: I am trying to use the org.apache.cassandra.cql3 package. Having problem connecting to the server using ClientState. I was not sure what to put in the credentials map (I did not set any users/passwords on my server), so I tried setting empty strings for username and password, setting them to bogus values, passing null to the login method - there was no difference. It does not complain at the login(), but then it complains about setKeyspace(my keyspace), saying that the specified keyspace does not exist (it obviously does exist). The configuration was loaded from cassandra.yaml used by the server. I did not have any problem like this when I used org.apache.cassandra.thrift.Cassandra.Client . What am I doing wrong? I think that package just contains server classes. Everything you need should be in org.apache.cassandra.thrift. To use cql3 I just use the client methods 'execute_cql_query', 'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql version to '3.0.0'. -- Derek Williams This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email. This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax
Re: Zurich / Swiss / Alps meetup
Coming back on this thread, we are proud to announce we opened a Swiss BigData UserGroup. http://www.bigdata-usergroup.ch/ Next meetup is July 16, with topic NoSQL Storage: War Stories and Best Practices. Hope to meet you there ! Benoit. 2012/5/17 Sasha Dolgy sdo...@gmail.com: All, A year ago I made a simple query to see if there were any users based in and around Zurich, Switzerland or the Alps region, interested in participating in some form of Cassandra User Group / Meetup. At the time, 1-2 replies happened. I didn't do much with that. Let's try this again. Who all is interested? I often am jealous about all the fun I miss out on with the regular meetups that happen stateside ... Regards, -sd -- Sasha Dolgy sasha.do...@gmail.com
Connected file list in Cassandra
Hi, at the moment I'm doing research about keeping linked/connected file list in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where first page is connected to second, second to third etc. This files connection/link is not specified. Main goal is to be able to get all linked files (the whole PDF/ all pages) while having only key to first file (page). Is there any Cassandra tool/feature which could help me to do that or the only way is to create some wrapper holding keys relations? Tom H
Re: Connected file list in Cassandra
I would use something other than the page itself as the key. Maybe a filename, something smaller. Then you could use a LongType comparator for the columns and use the page number for the column name, the value being the contents of the files. On Wed, Jul 11, 2012 at 1:34 PM, Tomek Hankus tom...@gmail.com wrote: Hi, at the moment I'm doing research about keeping linked/connected file list in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where first page is connected to second, second to third etc. This files connection/link is not specified. Main goal is to be able to get all linked files (the whole PDF/ all pages) while having only key to first file (page). Is there any Cassandra tool/feature which could help me to do that or the only way is to create some wrapper holding keys relations? Tom H
Re: Connected file list in Cassandra
why not just hold the pages as different columns in the same row? columns are automatically sorted such that if the column name was associated with the page number it would automatically flow the way you wanted. - Original Message -From: quot;Tomek Hankusquot; ;tom...@gmail.com
Why is our range query failing in Cassandra 0.8.10 Client
Hi: We are currently using Cassandra 0.8.10 and have run into some strange issues surrounding querying for a range of data I ran a couple of get statements via the Cassandra client and found some interesting results: Consider the following Column Family Definition: ColumnFamily: events Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.2953125/1440/63 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [events.events_Firm_idx, events.events_OrdType_idx, events.events_OrderID_idx , events.events_OrderQty_idx, events.events_Price_idx, events.events_Symbol_idx, events.events_ds_timestamp_idx] Column Metadata: Column Name: Firm Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_Firm_idx Index Type: KEYS Column Name: OrdType Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_OrdType_idx Index Type: KEYS Column Name: OrderID Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_OrderID_idx Index Type: KEYS Column Name: OrderQty Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: events_OrderQty_idx Index Type: KEYS Column Name: Price Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: events_Price_idx Index Type: KEYS Column Name: Symbol Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: events_Symbol_idx Column Name: ds_timestamp Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: events_ds_timestamp_idx Index Type: KEYS get events WHERE Firm=434550 AND ds_timestamp=1341955958200; …and the results are pretty much instantaneous. 1 Row Returned. [default@FIX] get events WHERE Firm=434550 AND ds_timestamp=1341955958200; --- RowKey: 64326430363362302d636164362d313165312d626637622d333836303737306639303133 = (column=ClOrdID, value=32323833, timestamp=1341955980651010) = (column=Firm, value=434550, timestamp=1341955980651026) = (column=OrdType, value=31, timestamp=1341955980651008) = (column=OrderQty, value=8200, timestamp=1341955980651013) = (column=Price, value=433561, timestamp=1341955980651019) = (column=Symbol, value=544e54, timestamp=1341955980651018) = (column=ds_timestamp, value=1341955958200, timestamp=1341955980651020) If I run the following query: get events WHERE Firm=434550 AND ds_timestamp=1341955958200 AND ds_timestamp=1341955958200; (which in theory would should return the same 1 row result) It runs for around 12 seconds, And I get: TimedOutException() If I run: get events WHERE Firm=434550 AND ds_timestamp=1341955958200; or get events WHERE Firm=434550 AND ds_timestamp=1341955958200; The results return quickly. Curious, I also ran a similar set of queries against the price field: get events WHERE Firm=434550 AND Price=433561; get events WHERE Firm=434550 AND Price=433561; get events WHERE Firm=434550 AND Price=433561; These all work fine. While, get events WHERE Firm=434550 AND Price=433561 AND Price = 433561; returns an IO Exception. This feels like it’s attempting to do a full table scan here…. What is going on here? Am I doing something incorrectly? We also see similar behavior when submit the query through our app via the Thrift API. Thanks, JohnB
Re: Java heap space on Cassandra start up version 1.0.10
Thanks Jonathan that did the trick. I deleted the Statistics.db files for the offending column family and was able to get Cassandra to start. Thank you, Jason
RE: How to come up with a predefined topology
Using PropertyFileSnitch you can fine tune the topology of the cluster. What you tell Cassandra about your DC and rack doesn't have to match how they are in real life. You can create virtual DCs for Cassandra and even treat each node as a separate rack. For example, in cassandra-topology.properties: # Format is Node IP=DC Name:Rack Name 192.168.0.11=DC1_realtime:node_1 192.168.0.12=DC1_realtime:node_2 192.168.0.13=DC1_analytics:node_3 192.168.1.11=DC2_realtime:node_1 If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit
Re: is this something to be concerned about - MUTATION message dropped
out of curiosity, is there a way that Cassandra can communicate that it's close to the being overloaded ? On Sun, Jun 17, 2012 at 6:29 PM, aaron morton aa...@thelastpickle.comwrote: http://wiki.apache.org/cassandra/FAQ#dropped_messages https://www.google.com/#q=cassandra+dropped+messages Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote: INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java (line 615) 15 MUTATION message dropped in last 5000ms ** ** It is at INFO level so I’m inclined to think not but is seems like whenever messages are dropped there may be some issue? -- Frank Hsueh | frank.hs...@gmail.com
Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
Hello Currently we are using Cassandra 1.0.6 in our production system but suffer with the CASSANDRA-3616 (it is already fixed in 1.0.7 version). We thought to upgrade the Cassandra to 1.1.X versions, to get it's new features, but having some concerns about the upgrade and expert advices are mostly welcome. 1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit logs, etc without ant issue? And vise versa. Because if something happens to 1.1.X after deployed to production, we want to downgrade to 1.0.6 version (because that's the versions we tested with our applications). 2. How do we need to do upgrade process? Currently we have 3 node 1.0.6 cluster in production. Can we upgrade node by node? If we upgrade node by node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue? Appreciate experts comments on this. Many Thanks. /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Concerns-about-Cassandra-upgrade-from-1-0-6-to-1-1-X-tp7581197.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How to come up with a predefined topology
As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Aha, I was wondering if I could do that as well ( specify rack options ) :) Thanks for the pointer, I will dig into the code. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote: If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit
Re: Using a node in separate cluster without decommissioning.
Since replication factor is 2 in first cluster, I won't lose any data. Assuming you have been running repair or working at CL QUORUM (which is the same as CL ALL for RF 2) Is it advisable and safe to go ahead? um, so the plan is to turn off 2 nodes in the first cluster, restask them into the new cluster and then reverse the process ? If you simply turn two nodes off in the first cluster you will have reduce the availability for a portion of the ring. 25% of the keys will now have at best 1 node they can be stored on. If a node is having any sort of problems, and it's is a replica for one of the down nodes, the cluster will appear down for 12.5% of the keyspace. If you work at QUORUM you will not have enough nodes available to write / read 25% of the keys. If you decomission the nodes, you will still have 2 replicas available for each key range. This is the path I would recommend. If you _really_ need to do it what you suggest will probably work. Some tips: * do safe shutdowns - nodetool disablegossip, disablethrift, drain * don't forget to copy the yaml file. * in the first cluster the other nodes will collect hints for the first hour the nodes are down. You are not going to want these so disable HH. * get the nodes back into the first cluster before gc_grace_seconds expires. * bring them back and repair them. * when you bring them back, reading at CL ONE will give inconsistent results. Reading at QUOURM may result in a lot of repair activity. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 6:35 AM, rohit bhatia wrote: Hi I want to take out 2 nodes from a 8 node cluster and use in another cluster, but can't afford the overhead of streaming the data and rebalance cluster. Since replication factor is 2 in first cluster, I won't lose any data. I'm planning to save my commit_log and data directories and bootstrapping the node in the second cluster. Afterwards I'll just replace both the directories and join the node back to the original cluster. This should work since cassandra saves all the cluster and schema info in the system keyspace. Is it advisable and safe to go ahead? Thanks Rohit
Re: failed to delete commitlog, cassandra can't accept writes
I don't think it's related to 4337. There is an explicit close call just before the deletion attempt. Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA with all of the information you've got here (including the full JVM vendor, version, build). Can you also check if the file it tries to delete exists ? (I assume it does, otherwise it would be a different error). Thanks for digging into this. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 9:36 AM, Frank Hsueh wrote: oops; I missed log line: ERROR [COMMIT-LOG-ALLOCATOR] 2012-07-10 14:19:39,776 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.io.IOError: java.io.IOException: Failed to delete C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176) at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Failed to delete C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172) ... 4 more On Tue, Jul 10, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.com wrote: after reading the JIRA, I decided to use Java 6. with Casandra 1.1.2 on Java 6 x64 on Win7 sp1 x64 (all latest versions), after a several minutes of sustained writes, I see: from system.log: java.io.IOError: java.io.IOException: Failed to delete C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176) at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Failed to delete C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172) ... 4 more anybody seen this before? is this related to 4337 ? On Sat, Jul 7, 2012 at 6:36 PM, Frank Hsueh frank.hs...@gmail.com wrote: bug already reported: https://issues.apache.org/jira/browse/CASSANDRA-4337 On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh frank.hs...@gmail.com wrote: Hi, I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest versions). If it matters, I'm using a late version of Astyanax as my client. I'm using 4 threads to write a lot of data into a single CF. After several minutes of load (~ 30m at last incident), Cassandra stops accepting writes (client reports an OperationTimeoutException). I looked at the logs and I see on the Cassandra server: ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.io.IOError: java.io.IOException: Rename from \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 failed at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127) at org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204) at org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Rename from \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 failed at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:105) ... 5 more Anybody else seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com
Re: snapshot issue
Make sure JNA is in the class path http://wiki.apache.org/cassandra/FAQ#jna Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 9:38 PM, Adeel Akbar wrote: Hi, I am trying to taking snapshot of my data but faced following error. Please help me to resolve this issue. [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711 Exception in thread main java.io.IOError: java.io.IOException: Cannot run program ln: java.io.IOException: error=12, Cannot allocate memory at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1660) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1686) at org.apache.cassandra.db.Table.snapshot(Table.java:198) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1393) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: Cannot run program ln: java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:475) at org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181) at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:730) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1653) ... 33 more Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory at java.lang.UNIXProcess.init(UNIXProcess.java:164) at java.lang.ProcessImpl.start(ProcessImpl.java:81) at java.lang.ProcessBuilder.start(ProcessBuilder.java:468) ... 37 more -- Thanks Regards *Adeel**Akbar*
Re: is this something to be concerned about - MUTATION message dropped
JMX is really the only way it exposes that kind of information. I recommend setting up mx4j if you want to check on the server stats programmatically. On Wed, Jul 11, 2012 at 8:17 PM, Frank Hsueh frank.hs...@gmail.com wrote: out of curiosity, is there a way that Cassandra can communicate that it's close to the being overloaded ? On Sun, Jun 17, 2012 at 6:29 PM, aaron morton aa...@thelastpickle.comwrote: http://wiki.apache.org/cassandra/FAQ#dropped_messages https://www.google.com/#q=cassandra+dropped+messages Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote: INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java (line 615) 15 MUTATION message dropped in last 5000ms ** ** It is at INFO level so I’m inclined to think not but is seems like whenever messages are dropped there may be some issue? -- Frank Hsueh | frank.hs...@gmail.com -- Tyler Hobbs DataStax http://datastax.com/
Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
On Wed, Jul 11, 2012 at 8:38 PM, Roshan codeva...@gmail.com wrote: Currently we are using Cassandra 1.0.6 in our production system but suffer with the CASSANDRA-3616 (it is already fixed in 1.0.7 version). We thought to upgrade the Cassandra to 1.1.X versions, to get it's new features, but having some concerns about the upgrade and expert advices are mostly welcome. 1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit logs, etc without ant issue? And vise versa. Because if something happens to 1.1.X after deployed to production, we want to downgrade to 1.0.6 version (because that's the versions we tested with our applications). 1.1 can handle 1.0 data/schemas/etc without a problem, but the reverse is not necessarily true. I don't know what in particular might break if you downgrade from 1.1 to 1.0, but in general, Cassandra does not handle downgrading gracefully; typically the SSTable formats have changed during major releases. If you snapshot prior to upgrading, you can always roll back to that, but you will have lost anything written since the upgrade. 2. How do we need to do upgrade process? Currently we have 3 node 1.0.6 cluster in production. Can we upgrade node by node? If we upgrade node by node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue? Yes, you can do a rolling upgrade to 1.1, one node at a time. It's usually fine to leave the cluster in a mixed state for a short while as long as you don't do things like repairs, decommissions, or bootstraps, but I wouldn't stay in a mixed state any longer than you have to. It's best to test major upgrades with a second, non-production cluster if that's an option. -- Tyler Hobbs DataStax http://datastax.com/
Re: How to come up with a predefined topology
I highly recommend specifying the same rack for all nodes (using cassandra-topology.properties) unless you really have a good reason not too (and you probably don't). The way that replicas are chosen when multiple racks are in play can be fairly confusing and lead to a data imbalance if you don't catch it. On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee prasen@gmail.comwrote: As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Aha, I was wondering if I could do that as well ( specify rack options ) :) Thanks for the pointer, I will dig into the code. -Thanks, Prasenjit On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote: If you then specify the parameters for the keyspace to use these, you can control exactly which set of nodes replicas end up on. For example, in cassandra-cli: create keyspace ks1 with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; As far as I know there isn't any way to use the rack name in the strategy_options for a keyspace. You might want to look at the code to dig into that, perhaps. Whichever snitch you use, the nodes are sorted in order of proximity to the client node. How this is determined depends on the snitch that's used but most (the ones that ship with Cassandra) will use the default ordering of same-node same-rack same-datacenter different-datacenter. Each snitch has methods to tell Cassandra which rack and DC a node is in, so it always knows which node is closest. Used with the Bloom filters this can tell us where the nearest replica is. -Original Message- From: prasenjit mukherjee [mailto:prasen@gmail.com] Sent: 11 July 2012 06:33 To: user Subject: How to come up with a predefined topology Quoting from http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy : Asymmetrical replication groupings are also possible depending on your use case. For example, you may want to have three replicas per data center to serve real-time application requests, and then have a single replica in a separate data center designated to running analytics. Have 2 questions : 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? The default networktopologystrategy with rackinferringsnitch will only give me equal distribution ( 2+2 ) 2. I am assuming the reads can go to any of the replicas. Is there a client which will send query to a node ( in cassandra ring ) which is closest to the client ? -Thanks, Prasenjit -- Tyler Hobbs DataStax http://datastax.com/