Move
I am running a move on one node in a 5 node cluster. There are no writes to the cluster during the move. I am seeing an exception on one of the nodes (not the node which I am doing the move on). The exception stack is ERROR [CompactionExecutor:1] 2011-02-04 08:10:46,855 PrecompactedRow.java (line 82) Skipping row DecoratedKey(656517988577125179070965247963445, 555345524e414d452e6a6f746173696c766573747265) in /var/lib/cassandra/data/Wenzani/UUID_UUID_SUPER-e-408-Data.db java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137) at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Output from nodetool ring. Address Status State LoadOwnsToken 105916716988735575505223832861775432335 1.1.1.2 Up Normal 34.29 GB45.36% 12956529933298582072612274413196299151 1.1.1.3Up Normal 34.46 GB11.41% 32366675628954067180152712803029297247 1.1.1.4 Up Normal 48.96 GB11.40% 51756081624280481651195537730585467204 1.1.1.5 Up Normal 22 GB 22.78% 90515859237527157456212262236145255573 1.1.1.6 Up Leaving 13.34 GB9.05% 105916716988735575505223832861775432335 1.1.1.6 is the node which I executed the move on. It seems to be locked in the Leaving state. Is this normal until the move completes? There is almost no activity in the logs and very little cpu usage across the cluster. Is this expected for a move? Cheers Stu
Re: for counters: does read have to be ALL ?
On Thu, Feb 3, 2011 at 10:39 PM, Yang tedd...@gmail.com wrote: the pdf at the design doc https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf does say so: page 2 - strongly consistent read: requires consistency level ALL. (QUORUM is insufficient.) but the wiki http://wiki.apache.org/cassandra/Counters gave a code example: rv = client.get_counter('key1', ColumnPath(column_family='Counter1', column='c1'), ConsistencyLevel.ONE) is one of them wrong? Three things: First, the design doc is talking of strongly consistent reads, the wiki gives a simple exemple of a read (it's even followed with a warning) so there is no actual contradiction here. Second, and more to the point, the design docs are slightly outdated, on this point at least. There is now support for QUORUM (or ALL) writes (since https://issues.apache.org/jira/browse/CASSANDRA-1944), so you have the usual consistency guarantee (i.e, you get strong consistency with QUORUM (resp. ONE) read provided you wrote at QUORUM (resp. ALL)). Third, it is good to recall that counters are not considered stable yet (that includes the documentations). -- Sylvain Thanks Yang
Re: cassandra 0.6.11 binary package problem
That's because of an issue I found in the ANT scripts while doing the maven-ant-tasks switch on 0.7.0. Any jar in build will be bundled... (so ivy goes into the bin dist... when I did the m-a-t version eric was wondering why i was including m-a-t in the bin dist, and I said I was being symmetric with the ivy version... he said it was a failed experiment that had been left in...) For 0.7.x there should just be the one jar. For the 0.6.x dists if you have forgotten to run ant realclean, then there could be earlier versions present -Stephen On 3 February 2011 14:36, Jonathan Ellis jbel...@gmail.com wrote: Well, that's odd. :) Do any of the other tar.gz balls contain multiple jars? On Thu, Feb 3, 2011 at 6:06 AM, Jean-Yves LEBLEU jleb...@gmail.com wrote: Hi all, Just for info, in apache-cassandra-0.6.11-bin.tar.gz there are both apache-cassandra-0.6.10.jar and apache-cassandra-0.6.11.jar in the lib directory. Causing troubles to my upgrade scripts which use this file to get installed version and check if upgrade needed . :( Thanks for the good job. Jean-Yves -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: Using Cassandra to store files
Hi Daniel When you say We are doing this do you mean via NFS or Cassandra. Thanks Brendan Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk From: Daniel Doubleday [mailto:daniel.double...@gmx.net] Sent: 03 February 2011 17:21 To: user@cassandra.apache.org Subject: Re: Using Cassandra to store files Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure. We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total. Generally the problem is that much data usually means that cassandra becomes io bound during repairs and compactions even if your hot dataset would fit in the page cache. There are efforts to overcome this and 0.7 will help with repair problems but for the time being you have to have quite some headroom in terms of io performance to handle these situations. Here is a related post: http://comments.gmane.org/gmane.comp.db.cassandra.user/11190 On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote: Hi Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose. Would it be efficient/reliable and is there anything I need to bear in mind? Thanks in advance Signature.jpg http://www.new-law.co.uk/ Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk http://www.new-law.co.uk/ P Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk http://www.sra.org.uk/ The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk mailto:i...@new-law.co.uk . www.new-law.co.uk http://www.new-law.co.uk/ . We use the word 'partner' to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners. Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However,it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for
RE: Using Cassandra to store files
The first line on the couchDB website doesn't fill me with confidence... The 1.0.0 release has a critical bug which can lead to data loss in the default configuration Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk -Original Message- From: buddhasystem [mailto:potek...@bnl.gov] Sent: 03 February 2011 15:03 To: cassandra-u...@incubator.apache.org Subject: Re: Using Cassandra to store files CouchDB -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-C assandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However,it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk. www.new-law.co.uk. We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners.
Recall: Using Cassandra to store files
Brendan Poole would like to recall the message, Using Cassandra to store files. Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However,it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk. www.new-law.co.uk. We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners. inline: Signature.jpg
Re: Do supercolumns have a purpose?
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote: On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote: The advantage would be to enable secondary indexes on supercolumn families. Then I suggest opening a ticket for adding secondary indexes to supercolumn families and voting on it. This will be 1 or 2 order of magnitude less work than getting rid of super column internally, and probably a much better solution anyway. I realize that this is largely subjective, and on such matters code speaks louder than words, but I don't think I agree with you on the issue of which alternative is less work, or even which is a better solution. You are right, I put probably too much emphase in that sentence. My main point was to say that it's think it is better to create tickets for what you want, rather than for something else completely different that would, as a by-product, give you what you want. Then I suspect that *if* the only goal is to get secondary indexes on super columns, then there is a good chance this would be less work than getting rid of super columns. But to be fair, secondary indexes on super columns may not make too much sense without #598, which itself would require quite some work, so clearly I spoke a bit quickly. If the goal is to have a hierarchical model, limiting the depth to two seems arbitrary. Why not go all the way and allow an arbitrarily deep hierarchy? If a more sophisticated hierarchical model is deemed unnecessary, or impractical, allowing a depth of two seems inconsistent and unnecessary. It's pretty trivial to overlay a hierarchical model on top of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has implemented a custom comparator that does the job [1]. Google's Megastore has a similar architecture and goes even further [2]. It seems to me that super columns are a historical artifact from Cassandra's early life as Facebook's inbox storage system. They needed posting lists of messages, sharded by user. So that's what they built. In my dealings with the Cassandra code, super columns end up making a mess all over the place when algorithms need to be special cased and branch based on the column/supercolumn distinction. I won't even mention what it does to the thrift interface. Actually, I agree with you, more than you know. If I were to start coding Cassandra now, I wouldn't include super columns (and I would probably not go for a depth unlimited hierarchical model either). But it's there and I'm not sure getting rid of them fully (meaning, including in thrift) is an option (it would be a big compatibility breakage). And (even though I certainly though about this more than once :)) I'm slightly less enthusiastic about keeping them in thrift but encoding them in regular column family internally: it would still be a lot of work but we would still probably end up with nasty tricks to stick to the thrift api. -- Sylvain Mike [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
How to delete bulk data from cassandra 0.6.3
Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
get_range_slices and tombstones
Hi! I'm getting tombstones from get_range_slices(). I know that's normal. But is there a way to know that a key is tombstone? I know tombstone has no columns but I can create a row without any columns that would look like a tombstone in get_range_slices(). Regards, Patrik
RE: CQL
Thanks Eric. I am able to make it running. -Original Message- From: Eric Evans [mailto:eev...@rackspace.com] Sent: Wednesday, February 02, 2011 9:34 PM To: user@cassandra.apache.org Subject: Re: CQL On Wed, 2011-02-02 at 06:57 +, Vivek Mishra wrote: I am trying to run CQL from a java client and facing one issue. Keyspace is passed as null. When I execute Use Keyspace1 followed by my Select query it is still not working. Can you provide some minimal sample code that demonstrates the problem you're seeing? -- Eric Evans eev...@rackspace.com Impetus to Present Big Data -- Analytics Solutions and Strategies at O'Reilly Strata Conference (Feb 1-3) in Santa Clara, CA. Our Big Data technology evangelist to speak on 'Deriving Intelligence From Large Data - Using Hadoop and Applying Analytics'. Impetus to organize and host CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas. Click http://www.impetus.com to know more. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Using Cassandra to store files
We are doing this with cassandra. But we cache a lot. We get around 20 writes/s and 1k reads/s (~ 100Mbit/s) for that particular CF but only 1% of them hit our cassandra cluster (5 nodes, rf=3). /Daniel On Feb 4, 2011, at 9:37 AM, Brendan Poole wrote: Hi Daniel When you say We are doing this do you mean via NFS or Cassandra. Thanks Brendan Signature.jpg Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk From: Daniel Doubleday [mailto:daniel.double...@gmx.net] Sent: 03 February 2011 17:21 To: user@cassandra.apache.org Subject: Re: Using Cassandra to store files Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure. We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total. Generally the problem is that much data usually means that cassandra becomes io bound during repairs and compactions even if your hot dataset would fit in the page cache. There are efforts to overcome this and 0.7 will help with repair problems but for the time being you have to have quite some headroom in terms of io performance to handle these situations. Here is a related post: http://comments.gmane.org/gmane.comp.db.cassandra.user/11190 On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote: Hi Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose. Would it be efficient/reliable and is there anything I need to bear in mind? Thanks in advance Signature.jpg Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk P Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk. www.new-law.co.uk. We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners. P Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use.
Column Sorting of integer names
Is there any way to sort the columns named as integers in the descending order ? Regards -Aditya
Moving data
I have several large SQL Server 2005 tables. I need to load the data in these tables into Cassandra. FYI, the Cassandra installation is on a linux server running CentOS. Can anyone suggest the best way to accomplish this? I am a newbie to Cassandra, so any advice would be greatly appreciated. Best, Gary
Re: Column Sorting of integer names
create a ReversedIntegerType. On Fri, Feb 4, 2011 at 5:15 AM, Aditya Narayan ady...@gmail.com wrote: Is there any way to sort the columns named as integers in the descending order ? Regards -Aditya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Unavalible Exception
ruslan usifov ruslan.usifov at gmail.com writes: HelloWhy i can get Unavalible Exception on live cluster (all nodes is up and never shutdown)PS: v 0.7.0 Can the nodes see each other? Check Cassandra logs for messages regarding other nodes. Oleg
Re: Unavalible Exception
2011/2/4 Oleg Proudnikov ol...@cloudorange.com ruslan usifov ruslan.usifov at gmail.com writes: HelloWhy i can get Unavalible Exception on live cluster (all nodes is up and never shutdown)PS: v 0.7.0 Can the nodes see each other? Check Cassandra logs for messages regarding other nodes. Yes they can, nodetool ring show well configured ring, and ther is nothing in logs (no WARN or ERROR)
Re: Unavalible Exception
ruslan usifov ruslan.usifov at gmail.com writes: 2011/2/4 Oleg Proudnikov olegp at cloudorange.com ruslan usifov ruslan.usifov at gmail.com writes: HelloWhy i can get Unavalible Exception on live cluster (all nodes is up andnever shutdown)PS: v 0.7.0 Can the nodes see each other? Check Cassandra logs for messages regarding other nodes. Yes they can, nodetool ring show well configured ring, and ther is nothing in logs (no WARN or ERROR) Try searching for InetAddress as INFO
Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up
On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 asdk...@gmail.com wrote: Thanks Matthew Ryan, The main inspiration behind me trying to generate Ids in sequential manner is to reduce the size of the userId, since I am using it for heavy denormalization. UUIDs are 16 bytes long, but I can also have a unique Id in just 4 bytes, and since this is just a one time process when the user signs-up, it makes sense to try cutting down the space requirements, if it is feasible without any downsides(!?). I am also using userIds to attach to Id of the other data of the user on my application. If I could reduce the userId size that I can also reduce the size of other Ids, I could drastically cut down the space requirements. [Sorry for this question is not directly related to cassandra but I think Cassandra factors here because of its tuneable consistency] Don't generate these ids in cassandra. Use something like snowflake, flickr's ticket servers [2] or zookeeper sequential nodes. -ryan 1. http://github.com/twitter/snowflake 2. http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
Re: Move
Looks like https://issues.apache.org/jira/browse/CASSANDRA-1992, fixed for 0.7.1. On Fri, Feb 4, 2011 at 12:18 AM, Stu King s...@stuartrexking.com wrote: I am running a move on one node in a 5 node cluster. There are no writes to the cluster during the move. I am seeing an exception on one of the nodes (not the node which I am doing the move on). The exception stack is ERROR [CompactionExecutor:1] 2011-02-04 08:10:46,855 PrecompactedRow.java (line 82) Skipping row DecoratedKey(656517988577125179070965247963445, 555345524e414d452e6a6f746173696c766573747265) in /var/lib/cassandra/data/Wenzani/UUID_UUID_SUPER-e-408-Data.db java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137) at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Output from nodetool ring. Address Status State Load Owns Token 105916716988735575505223832861775432335 1.1.1.2 Up Normal 34.29 GB 45.36% 12956529933298582072612274413196299151 1.1.1.3 Up Normal 34.46 GB 11.41% 32366675628954067180152712803029297247 1.1.1.4 Up Normal 48.96 GB 11.40% 51756081624280481651195537730585467204 1.1.1.5 Up Normal 22 GB 22.78% 90515859237527157456212262236145255573 1.1.1.6 Up Leaving 13.34 GB 9.05% 105916716988735575505223832861775432335 1.1.1.6 is the node which I executed the move on. It seems to be locked in the Leaving state. Is this normal until the move completes? There is almost no activity in the logs and very little cpu usage across the cluster. Is this expected for a move? Cheers Stu -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: How to delete bulk data from cassandra 0.6.3
You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: get_range_slices and tombstones
You can't create a row with no columns without tombstones being involved somehow. :) There's no distinction between a row with no columns because the individual columns were removed, and a row with no columns because the row was removed. the latter is just a more efficient expression of the former. On Fri, Feb 4, 2011 at 2:26 AM, Patrik Modesto patrik.mode...@gmail.com wrote: Hi! I'm getting tombstones from get_range_slices(). I know that's normal. But is there a way to know that a key is tombstone? I know tombstone has no columns but I can create a row without any columns that would look like a tombstone in get_range_slices(). Regards, Patrik -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: How to delete bulk data from cassandra 0.6.3
I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellis jbel...@gmail.com Date: Fri, 4 Feb 2011 08:58:35 To: useruser@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Re: How to delete bulk data from cassandra 0.6.3 You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: How to delete bulk data from cassandra 0.6.3
In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at 9:01 AM, roshandawr...@gmail.com wrote: I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellis jbel...@gmail.com Date: Fri, 4 Feb 2011 08:58:35 To: useruser@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Re: How to delete bulk data from cassandra 0.6.3 You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
CF Read and Write Latency Histograms
Hi All, I suspect that Write and Read Latency column headers need to be swapped. I am running a bulk load with no reads on this CF but I see Read column with values while the Write column has zeros only. The MBean shows the values correctly. Thank you, Oleg
Re: Using Cassandra to store files
I am also looking to possible solutions to store pdfs word documents. But why wont you store in them in the filesystem instead of a database unless your files are too small in which case it would be recommended to use a database. -Aditya On Fri, Feb 4, 2011 at 5:30 PM, Daniel Doubleday daniel.double...@gmx.net wrote: We are doing this with cassandra. But we cache a lot. We get around 20 writes/s and 1k reads/s (~ 100Mbit/s) for that particular CF but only 1% of them hit our cassandra cluster (5 nodes, rf=3). /Daniel On Feb 4, 2011, at 9:37 AM, Brendan Poole wrote: Hi Daniel When you say We are doing this do you mean via NFS or Cassandra. Thanks Brendan Signature.jpg Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk From: Daniel Doubleday [mailto:daniel.double...@gmx.net] Sent: 03 February 2011 17:21 To: user@cassandra.apache.org Subject: Re: Using Cassandra to store files Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure. We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total. Generally the problem is that much data usually means that cassandra becomes io bound during repairs and compactions even if your hot dataset would fit in the page cache. There are efforts to overcome this and 0.7 will help with repair problems but for the time being you have to have quite some headroom in terms of io performance to handle these situations. Here is a related post: http://comments.gmane.org/gmane.comp.db.cassandra.user/11190 On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote: Hi Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose. Would it be efficient/reliable and is there anything I need to bear in mind? Thanks in advance Signature.jpg Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk P Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However, it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk. www.new-law.co.uk. We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners. P Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or
Re: CF Read and Write Latency Histograms
Can you create a ticket? On Fri, Feb 4, 2011 at 9:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote: Hi All, I suspect that Write and Read Latency column headers need to be swapped. I am running a bulk load with no reads on this CF but I see Read column with values while the Write column has zeros only. The MBean shows the values correctly. Thank you, Oleg -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Using Cassandra to store files
Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Using Cassandra to store files
yes, definitely a database for mapping ofcourse! On Fri, Feb 4, 2011 at 11:17 PM, buddhasystem potek...@bnl.gov wrote: Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Moving data
I'm afraid there is no short answer. The long answer is, 1) Read about Cassandra data modeling at http://wiki.apache.org/cassandra/ArticlesAndPresentations. It is not as simple as one table equals one columnfamily. 2) Write a program to read your data out of SQL Server and write it into Cassandra, preferably with multiple threads On Fri, Feb 4, 2011 at 6:00 AM, Morey, Gary gary.mo...@xerox.com wrote: I have several large SQL Server 2005 tables. I need to load the data in these tables into Cassandra. FYI, the Cassandra installation is on a linux server running CentOS. Can anyone suggest the best way to accomplish this? I am a newbie to Cassandra, so any advice would be greatly appreciated. Best, Gary -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Using Cassandra to store files
For the number of file the OP has why not just use a traditional filesystem and solr to index the pdf data. You get to search inside of the files for relevant information? Sri On Fri, Feb 4, 2011 at 12:47 PM, buddhasystem potek...@bnl.gov wrote: Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Moving data
FWIW, I'm working on migrating a large amount of data out of Oracle into my test cluster. The data has been warehoused as CSV files on Amazon S3. Having that in place allows me to not put extra load on the production service when doing many repeated tests. I then parse the data using CSV Python module and, as Jonathan says, use threads to batch upload data into Cassandra. Notable points: since the data is relatively sparse (i.e. many zeros for integers and empty strings for strings etc), I establish a default value dictionary, and don't write these to Cassandra at all -- they can be reconstructed as needed when reading back. Also, make sure you wrap Cassandra writes etc into exceptions. When load is high, you might get timeouts at TSocket level etc. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: CF Read and Write Latency Histograms
Is this 0.7? -Original Message- From: Oleg Proudnikov [mailto:ol...@cloudorange.com] Sent: Friday, February 04, 2011 11:42 AM To: user@cassandra.apache.org Subject: CF Read and Write Latency Histograms Hi All, I suspect that Write and Read Latency column headers need to be swapped. I am running a bulk load with no reads on this CF but I see Read column with values while the Write column has zeros only. The MBean shows the values correctly. Thank you, Oleg
Re: CF Read and Write Latency Histograms
David Dabbs dmdabbs at gmail.com writes: Is this 0.7? Yes
Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up
Thanks so much Ryan for the links; I'll definitely take them into consideration. Just another thought which came to my mind:- perhaps it may be beneficial to store(or duplicate) some of the data like the Login credentials particularly userId to User's Name mapping, etc (which is very heavily read), in a fast MyISAM table. This could solve the problem of keys though auto-generated unique sequential primary keys. I could use the same keys for Cassandra rows for that user. And also since Cassandra reads are relatively slow, it makes sense to store data like userId to Name mapping in MyISAM as this data would be required after almost all queries to the database. Regards -Asil On Fri, Feb 4, 2011 at 10:14 PM, Ryan King r...@twitter.com wrote: On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 asdk...@gmail.com wrote: Thanks Matthew Ryan, The main inspiration behind me trying to generate Ids in sequential manner is to reduce the size of the userId, since I am using it for heavy denormalization. UUIDs are 16 bytes long, but I can also have a unique Id in just 4 bytes, and since this is just a one time process when the user signs-up, it makes sense to try cutting down the space requirements, if it is feasible without any downsides(!?). I am also using userIds to attach to Id of the other data of the user on my application. If I could reduce the userId size that I can also reduce the size of other Ids, I could drastically cut down the space requirements. [Sorry for this question is not directly related to cassandra but I think Cassandra factors here because of its tuneable consistency] Don't generate these ids in cassandra. Use something like snowflake, flickr's ticket servers [2] or zookeeper sequential nodes. -ryan 1. http://github.com/twitter/snowflake 2. http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
read latency in cassandra
Hi all, It often takes more than two seconds to load: - one row of ~450 events comprising ~600k - cluster size of 1 - client is pycassa 1.04 - timeout on recv - cold read (I believe) - load generally 0.5 on a 4-core machine, 2 EC2 instance store drives for cassandra - cpu wait generally 1% Often the following sequence occurs: 1. First attempt times out after 2 sec 2. Second attempt loads fine on immediate retry So, I assume it's an issue about cache miss and going to disk. Is 2 seconds the normal I went to disk latency for cassandra? What should we look to tune, if anything? I don't think keeping everything in-memory is an option for us given dataset size and access pattern (hot set is stuff being currently written, stuff being accessed is likely to be older). I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3. Thanks, dan
Re: How to monitor Cassandra's throughput?
The issue has been resolved, the fix is on Hector's GitHub. Oleg Proudnikov olegp at cloudorange.com writes: I have posted on Hector ML: http://thread.gmane.org/gmane.comp.db.hector.user/1690 Oleg
RE: Tracking down read latency
Thank you both for your advice. See my updated iostats below. From: sridhar.ba...@gmail.com [mailto:sridhar.ba...@gmail.com] On Behalf Of sridhar basam Sent: Thursday, February 03, 2011 10:58 AM To: user@cassandra.apache.org Subject: Re: Tracking down read latency The data provided is also a average value since boot time. Run the -x as suggested below but run it via a interval of around 5 seconds. You very well could be having i/o issue, it is hard to tell from the overall average value you provided. Collect iostat -x 5 during the times when you see slow reads and see how busy the disks are. Sridhar On Thu, Feb 3, 2011 at 3:21 AM, Peter Schuller wrote: $ iostat As rcoli already mentioned you don't seen to have an I/O problem, but as a point of general recommendation: When determining whether you are blocking on disk I/O, pretty much *always* use iostat -x rather than the much less useful default mode of iostat. The %util and queue wait/average time columns are massively useful/important; without them one is much more blind as to whether or not storage devices are actually saturated. Peter Schuller Our data is on sdb, commit logs on sdc. So do I read this correctly that we're 'await'ing 6+millis on average for data drive (sdb) requests to be serviced? $iostat -x 5 avg-cpu: %user %nice %system %iowait %steal %idle 0.590.000.220.940.00 98.25 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 11.20 0.00 42.00 0.00 4993.60 0.00 118.90 0.286.77 5.22 21.92 sdb1 11.20 0.00 42.00 0.00 4993.60 0.00 118.90 0.286.77 5.22 21.92 sdc 0.0031.00 0.00 1.40 0.00 259.20 185.14 0.000.14 0.14 0.02 sdc1 0.0031.00 0.00 1.40 0.00 259.20 185.14 0.000.14 0.14 0.02 avg-cpu: %user %nice %system %iowait %steal %idle 0.560.000.181.080.00 98.17 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 8.80 0.00 49.40 0.00 5936.00 0.00 120.16 0.336.62 5.22 25.78 sdb1 8.80 0.00 49.40 0.00 5936.00 0.00 120.16 0.336.62 5.22 25.78 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdc1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.990.000.221.080.00 97.71 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 11.40 0.00 46.20 0.00 5147.20 0.00 111.41 0.306.55 5.58 25.80 sdb1 11.40 0.00 46.20 0.00 5147.20 0.00 111.41 0.306.55 5.58 25.80 sdc 0.00 7.40 0.00 0.80 0.0065.6082.00 0.000.25 0.25 0.02 sdc1 0.00 7.40 0.00 0.80 0.0065.6082.00 0.000.25 0.25 0.02 avg-cpu: %user %nice %system %iowait %steal %idle 0.680.000.230.950.00 98.13 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.80 0.00 0.80 0.0012.7716.00 0.000.25 0.25 0.02 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.00 0.80 0.00 0.80 0.0012.7716.00 0.000.25 0.25 0.02 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 5.19 0.00 38.12 0.00 4356.09 0.00 114.26 0.266.70 5.91
Re: How to delete bulk data from cassandra 0.6.3
So do we need to write a script ? or its some thing i can do as a system admin without involving and developer.If yes please guide me in this case. On 02/04/2011 10:36 PM, Jonathan Ellis wrote: In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com wrote: I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellisjbel...@gmail.com Date: Fri, 4 Feb 2011 08:58:35 To: useruser@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Re: How to delete bulk data from cassandra 0.6.3 You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsanali.ah...@panasiangroup.com wrote: Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Re: read latency in cassandra
What operation are you calling ? Are you trying to read the entire row back? How many SSTables do you have for the CF? Does your data have a lot of overwrites ? Have you modified the default compaction settings ? Do you have row cache enabled ? How long does the second request take ? Can you use JConsole to check the read latency for the CF? Sorry for all the questions, the answer to your initial question is mmm, that does not sound right. It will depend on Aaron On 5 Feb 2011, at 08:13, Dan Kuebrich wrote: Hi all, It often takes more than two seconds to load: - one row of ~450 events comprising ~600k - cluster size of 1 - client is pycassa 1.04 - timeout on recv - cold read (I believe) - load generally 0.5 on a 4-core machine, 2 EC2 instance store drives for cassandra - cpu wait generally 1% Often the following sequence occurs: 1. First attempt times out after 2 sec 2. Second attempt loads fine on immediate retry So, I assume it's an issue about cache miss and going to disk. Is 2 seconds the normal I went to disk latency for cassandra? What should we look to tune, if anything? I don't think keeping everything in-memory is an option for us given dataset size and access pattern (hot set is stuff being currently written, stuff being accessed is likely to be older). I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3. Thanks, dan
Re: Tracking down read latency
On Fri, Feb 4, 2011 at 2:44 PM, David Dabbs dmda...@gmail.com wrote: Our data is on sdb, commit logs on sdc. So do I read this correctly that we're 'await'ing 6+millis on average for data drive (sdb) requests to be serviced? That is right. Those numbers look pretty good for rotational media. What sort of read latencies do you see? Have you also looked into GC. Sridhar
New Generation Size guidelines
Hi All, I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The servers have 4 cores. I used default heap settings. The Eden space ended up around 60M and the Survivor spaces are around 7M. This feels a little bit low for a process that creates so much short-lived garbage. I just wanted to get your thoughts on this. Space used in the Old Generation stays in a short range 1.2G-1.6G but when the activity is low and I force GC it drops too 120M. It feels like there is a lot of garbage that does not have a chance to get collected. The server is running a batch load and its CPUs are 10-40% busy. The higher value is at 1.6G. Yet I am reluctant to push my data load because I do hit OOMs. The amount of data loaded so far is small - around 100G in total. Should I increase my Gew Generation size? Thank you, Oleg
Re: New Generation Size guidelines
On Fri, Feb 4, 2011 at 1:45 PM, Oleg Proudnikov ol...@cloudorange.com wrote: Hi All, I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The servers have 4 cores. I used default heap settings. The Eden space ended up around 60M and the Survivor spaces are around 7M. This feels a little bit low for a process that creates so much short-lived garbage. I just wanted to get your thoughts on this. Space used in the Old Generation stays in a short range 1.2G-1.6G but when the activity is low and I force GC it drops too 120M. It feels like there is a lot of garbage that does not have a chance to get collected. The server is running a batch load and its CPUs are 10-40% busy. The higher value is at 1.6G. Yet I am reluctant to push my data load because I do hit OOMs. The amount of data loaded so far is small - around 100G in total. Almost certainly yes. -ryan
Re: Problems with Python Stress Test
Brandon, Thanks for the response. I have also noticed that stress.py's progress interval gets thrown off in low memory situations. What did you mean by contrib/stress on 0.7 instead. I don't see that dir in the src version of 0.7. - Sameer On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams dri...@gmail.com wrote: On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui cassandral...@gmail.comwrote: Hi guys, I was playing around with the stress.py test this week and noticed a few things. 1) Progress-interval does not always work correctly. I set it to 5 in the example below, but am instead getting varying intervals: Generally indicates that the client machine is being overloaded in my experience. 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also, what is the difference between the interval_key_rate and the interval_op_rate? For example in the example above, the first row shows 6662 keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the interval_op_rate. There should be no difference unless you're doing range slices, but IPC timing makes them vary somewhat. 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't grow by x after the test. In the example below I tried to write 500,000 keys * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I checked the amount of disk space used after the test it actually grew by 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the commit log got duplicate copies of the data as the SSTables? Commitlogs could be part of it, you're not factoring in the column names, and then there's index and bloom filter overhead. Use contrib/stress on 0.7 instead. -Brandon
Re: Sorting in time order without using TimeUUID type column names
IMHO If you know the time of the event use store the time as a long, rather than a UUID. It will make it easier to get back to a time and make it easier for you to compare columns. TimeUUIDS has a pseudo random part as well as the time part, it could be set to a constant. By why bother if you know the absolute time. I'm not sure what the ReminderCountOfThisUser is for, and as Sylvain says there is no need for the user name if this is in a row just for the user. Hope that helps. Aaron On 4 Feb 2011, at 01:32, Aditya Narayan wrote: If I use : TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser as key pattern for the rows of reminders, then I am storing the key, just as it is, as the column name and thus column values need not contain a link to the row containing the reminder details. I think UserId would be required along with timestamp in the key pattern to provide uniqueness to the key as there may be several reminders generated by users on the application, at the same time. But my question is about whether it is really advisable to even generate the keys like this pattern ... instead of going with timeuuids ? Are there are any downsides which I am not perhaps not aware of ? On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote: Hey all, I want to store some columns that are reminders to the users on my application, in time sorted order in a row(timeline row of the user). Would it be recommended to store these reminder columns in the timeline row with column names like: combination of timestamp(of time when the reminder gets due) + UserId+ Reminders Count of that user; Column Name= TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser If you have one row by user (which is a good idea), why keep the UserId in the column name ? Then what comparator could I use to sort them in order of the their due time ? This comparator should be able to sort no. in descending order.(I guess ascii type would do the opposite order) (Reminders need to be sorted in the timeline in the order of their due time.) *The* solution is write a custom comparator. Have a look at http://www.datastax.com/docs/0.7/data_model/column_families and http://www.sodeso.nl/?p=421 for instance. As a side note, the fact that the comparator sort in ascending order when you need descending order would be that much of a problem, since you can always do slice queries in reversed order. But even then, asciiType is not a very satisfying solution as you would have to be careful about the padding of your timestamp for it to work correctly. So again, custom comparator is the way to go. Basically I am trying to avoid 16 bytes long timeUUID first because they are too long and the above defined key pattern is guaranteeing me a unique key/Id for the reminder row always. Thanks Aditya Narayan -- Sylvain
Re: Unavalible Exception
Please provide some information the client you are using, the client side error stack, the command you are running, the output from nodetool ring Aaron On 5 Feb 2011, at 05:10, Oleg Proudnikov wrote: ruslan usifov ruslan.usifov at gmail.com writes: 2011/2/4 Oleg Proudnikov olegp at cloudorange.com ruslan usifov ruslan.usifov at gmail.com writes: HelloWhy i can get Unavalible Exception on live cluster (all nodes is up andnever shutdown)PS: v 0.7.0 Can the nodes see each other? Check Cassandra logs for messages regarding other nodes. Yes they can, nodetool ring show well configured ring, and ther is nothing in logs (no WARN or ERROR) Try searching for InetAddress as INFO
Re: Sorting in time order without using TimeUUID type column names
Thanks Aaron, Yes I can put the column names without using the userId in the timeline row, and when I want to retrieve the row corresponding to that column name, I will attach the userId to get the row key. Yes I'll store it as a long I guess I'll have to write with a custom comparator type (ReversedIntegerType) to sort those longs in descending order. Regards Aditya On Sat, Feb 5, 2011 at 6:24 AM, aaron morton aa...@thelastpickle.com wrote: IMHO If you know the time of the event use store the time as a long, rather than a UUID. It will make it easier to get back to a time and make it easier for you to compare columns. TimeUUIDS has a pseudo random part as well as the time part, it could be set to a constant. By why bother if you know the absolute time. I'm not sure what the ReminderCountOfThisUser is for, and as Sylvain says there is no need for the user name if this is in a row just for the user. Hope that helps. Aaron On 4 Feb 2011, at 01:32, Aditya Narayan wrote: If I use : TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser as key pattern for the rows of reminders, then I am storing the key, just as it is, as the column name and thus column values need not contain a link to the row containing the reminder details. I think UserId would be required along with timestamp in the key pattern to provide uniqueness to the key as there may be several reminders generated by users on the application, at the same time. But my question is about whether it is really advisable to even generate the keys like this pattern ... instead of going with timeuuids ? Are there are any downsides which I am not perhaps not aware of ? On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote: Hey all, I want to store some columns that are reminders to the users on my application, in time sorted order in a row(timeline row of the user). Would it be recommended to store these reminder columns in the timeline row with column names like: combination of timestamp(of time when the reminder gets due) + UserId+ Reminders Count of that user; Column Name= TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser If you have one row by user (which is a good idea), why keep the UserId in the column name ? Then what comparator could I use to sort them in order of the their due time ? This comparator should be able to sort no. in descending order.(I guess ascii type would do the opposite order) (Reminders need to be sorted in the timeline in the order of their due time.) *The* solution is write a custom comparator. Have a look at http://www.datastax.com/docs/0.7/data_model/column_families and http://www.sodeso.nl/?p=421 for instance. As a side note, the fact that the comparator sort in ascending order when you need descending order would be that much of a problem, since you can always do slice queries in reversed order. But even then, asciiType is not a very satisfying solution as you would have to be careful about the padding of your timestamp for it to work correctly. So again, custom comparator is the way to go. Basically I am trying to avoid 16 bytes long timeUUID first because they are too long and the above defined key pattern is guaranteeing me a unique key/Id for the reminder row always. Thanks Aditya Narayan -- Sylvain
Re: Unavalible Exception
We're going to need *way* more information than this On 03 Feb 2011, at 20:03, ruslan usifov wrote: Hello Why i can get Unavalible Exception on live cluster (all nodes is up and never shutdown) PS: v 0.7.0
Merging the rows of two column families(with similar attributes) into one ??
I read somewhere that more no of column families is not a good idea as it consumes more memory and more compactions to occur thus I am trying to reduce the no. of column families by adding the rows of other Column families(with similar attributes) as separate rows into one. I have two kinds of data for two separate features on my application. If I store them in two different column families then both of them will have similar attributes like same comparator type sorting needs. Thus I can also merge both of them in one column family, just by adding the rows of another to this one(increasing the no of rows). However some rows of 1st kind of data are very frequently used and rows of 2nd data are less freq. used. But I dont think this will be a problem as I am not merging two rows into one, but just adding them as separate rows in the column family. 1st kind of data has wider rows and 2nd kind of data has very less wide rows. But the caching requirements may be different as they cater to two different features.(but I think it is even advantageous since resources are free to be utilized by any data that's more frequently used) Is it recommended to merge these two column families into one ?? Thoughts ? -- Ertio
Re: Unavalible Exception
Start with grep -i down system.log on each machine On Fri, Feb 4, 2011 at 7:37 PM, David King dk...@ketralnis.com wrote: We're going to need *way* more information than this On 03 Feb 2011, at 20:03, ruslan usifov wrote: Hello Why i can get Unavalible Exception on live cluster (all nodes is up and never shutdown) PS: v 0.7.0 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Pig not reading all cassandra data
Found the culprit. There is a new feature in Pig 0.8 that will try to reduce the number of splits used to speed up the whole job. Since the ColumnFamilyInputFormat lists the input size as zero, this feature eliminates all of the splits except for one. The workaround is to disable this feature for jobs that use CassandraStorage by setting -Dpig.splitCombination=false in the pig_cassandra script. Hope somebody finds this useful, you wouldn't believe how many dead-ends I ran down trying to figure this out. -Matt On Feb 2, 2011, at 4:34 PM, Matthew E. Kennedy wrote: I noticed in the jobtracker log that when the pig job kicks off, I get the following info message: 2011-02-02 09:13:07,269 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201101241634_0193 = 0. Number of splits = 1 So I looked at the job.split file that is created for the Pig job and compared it to the job.split file created for the map-reduce job. The map reduce file contains an entry for each split, whereas the job.split file for the Pig job contains just the one split. I added some code to the ColumnFamilyInputFormat to output what it thinks it sees as it should be creating input splits for the pig jobs, and the call to getSplits() appears to be returning the correct list of splits. I can't figure out where it goes wrong though when the splits should be written to the job.split file. Does anybody know the specific class responsible for creating that file in a Pig job, and why it might be affected by using the pig CassandraStorage module? Is anyone else successfully running Pig jobs against a 0.7 cluster? Thanks, Matt
Re: Pig not reading all cassandra data
On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy stinkym...@gmail.com wrote: Found the culprit. There is a new feature in Pig 0.8 that will try to reduce the number of splits used to speed up the whole job. Since the ColumnFamilyInputFormat lists the input size as zero, this feature eliminates all of the splits except for one. The workaround is to disable this feature for jobs that use CassandraStorage by setting -Dpig.splitCombination=false in the pig_cassandra script. Hope somebody finds this useful, you wouldn't believe how many dead-ends I ran down trying to figure this out. Ouch, thanks for tracking that down. What should CFIF be returning differently? Do you mean the InputSplit.getLength? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Merging the rows of two column families(with similar attributes) into one ??
Thanks Tyler ! I could not fully understand the reason why more no of column families would mean more memory.. if you have under control parameters like memtable_throughput memtable_operations which are set per column family basis then you can directly control adjust by splitting the memory space between two CFs in proportion to what you would do in single CF. Hence there should be no extra memory consumption for multiple CFs that have been split from single one?? Regarding the compactions, I think even if they are more the size of the SST files to be compacted is smaller as the data has been split into two. Then more compactions but smaller too!! Then, provided the same amount of data, how can greater no of column families could be a bad option(if you split the values of parameters for memory consumption proportionately) ?? -- Regards, Ertio On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote: I read somewhere that more no of column families is not a good idea as it consumes more memory and more compactions to occur This is primarily true, but not in every case. But the caching requirements may be different as they cater to two different features. This is a great reason to *not* merge them. Besides the key and row caches, don't forget about the OS buffer cache. Is it recommended to merge these two column families into one ?? Thoughts ? No, this sounds like an anti-pattern to me. The overhead from having two separate CFs is not that high. -- Tyler Hobbs Software Engineer, DataStax Maintainer of the pycassa Cassandra Python client library
Re: Merging the rows of two column families(with similar attributes) into one ??
Yes, a disadvantage of more no. of CF in terms of memory utilization which I see is: - if some CF is written less often as compared to other CFs, then the memtable would consume space in the memory until it is flushed, this memory space could have been much better used by a CF that's heavily written and read. And if you try to make the thresholds for flush smaller then more compactions would be needed. On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew ertio...@gmail.com wrote: Thanks Tyler ! I could not fully understand the reason why more no of column families would mean more memory.. if you have under control parameters like memtable_throughput memtable_operations which are set per column family basis then you can directly control adjust by splitting the memory space between two CFs in proportion to what you would do in single CF. Hence there should be no extra memory consumption for multiple CFs that have been split from single one?? Regarding the compactions, I think even if they are more the size of the SST files to be compacted is smaller as the data has been split into two. Then more compactions but smaller too!! Then, provided the same amount of data, how can greater no of column families could be a bad option(if you split the values of parameters for memory consumption proportionately) ?? -- Regards, Ertio On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote: I read somewhere that more no of column families is not a good idea as it consumes more memory and more compactions to occur This is primarily true, but not in every case. But the caching requirements may be different as they cater to two different features. This is a great reason to *not* merge them. Besides the key and row caches, don't forget about the OS buffer cache. Is it recommended to merge these two column families into one ?? Thoughts ? No, this sounds like an anti-pattern to me. The overhead from having two separate CFs is not that high. -- Tyler Hobbs Software Engineer, DataStax Maintainer of the pycassa Cassandra Python client library