Re: CQL and schema-less column family
Hi, sorry for re-posting, but it would be very helpful to get some input on my previous post, so I'd know which direction to take. So if anyone of the more experienced users here can help, it would be greatly appreciated. Thank you Osi -- Forwarded message -- From: osishkin osishkin Date: Wed, Sep 7, 2011 at 2:02 PM Subject: Re: CQL and schema-less column family To: user@cassandra.apache.org, eev...@acunu.com Thank you very much Eric for your response. Some follow-up questions come to mind: 1. What will be the performance hit for querying a coulmn name not predefined in a schema? if it's not indexed, then I guess Cassandra will have to iterate all rows,which will impose huge overhead. 2. Assuming my guess from the previous question is correct, then in order to get decent performance I need to index a column. Can you tell me if indexing a column name (not predefined in a schema) has any performance impact? I'm not yet sure whether CQL/secondary indexes is the right direction for me, as opposed to manually-maintained indexes. My application also requires range predicates columns with potentially high numbers of unique values. From what I gather, both (range predicates, high cardinality values) are very inefficient in CQL/secondary indexes. But I'd like to get the whole picture before deciding. In my system each row may contain a lot of columns, common to only part of the rows. If I understand correctly from the documentation, every index is actually implemented as a new "hidden" column family. This means that in my case if I use a secondary index for every column name, I can quickly get a LOT of column families just to hold the secondary indexes for all my rows. My intuition says updating dozens of column families on every insert would probably be very bad performance-wise, in comparison with manually updating a single "global" column family index of my own (with multiple inserts) Is this true? Thank you p.s. Since I don't know whether a secondary index for a column already exists, this means I have to check if such an index already exists every time, and create it if not. Things seem to get even worse from my point of view...:) On Wed, Sep 7, 2011 at 12:34 PM, Eric Evans wrote: > On Tue, Sep 6, 2011 at 12:22 PM, osishkin osishkin wrote: >> Sorry for the newbie question but I failed to find a clear answer. >> Can CQL be used to query a schema-less column family? can they be indexed? >> That is, query for column names that do not necessarily exist in all >> rows, and were not defined in advance when the column family was >> created. > > Absolutely, yes. > > If you don't create schema for columns, then their type will simply be > the default for that column family. > > -- > Eric Evans > Acunu | http://www.acunu.com | @acunu >
Re: Mutations during selects
I a thinking about a scenario that goes like this: a node is reading a secondary index to reply to a select query. While in the middle of this, two rows are mutated, one that has already been read and considered for the select result, and one that is yet to be processed. Say both rows where changed in a way that causes them to be included to the result. The result however will contain only the second one, and will not represent the correct select result, neither before or after said mutation. This is extreme, I know, but given a cluster with enough activity, I don't believe it's impossible. So I guess the answer is, Cassandra doesn't care, the result is not guaranteed to represent a valid snapshot of the database, but is very likely to? Alexander > Consider this scenario in a SQL database: > > UPDATE foo SET x = 1 WHERE key = 'asdf'; > > Now, "simultaneously," two clients run > > UPDATE foo SET x = 2 WHERE key = 'asdf'; > and > SELECT * FROM foo WHERE x = 1; > > Either you get back row asdf, or you don't. Either is valid. Same > thing happens with Cassandra indexes. > > On Fri, Sep 9, 2011 at 10:41 AM, wrote: >> I see that Cassandra updates secondary indices as soon as a value of the >> indexed column is updated. This can happen, for example, during a select >> query with a condition on a secondary index. Does Cassandra perform no >> checking or locking? Will the result of this select, with old and new >> values, be returned as is? Am I missing some reason why this isn't a >> problem? >> >> Alexander >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > >
JNA fails to load ?
I have jna-3.2.7 in the classpath, but at the start of server messages, I see: INFO 17:48:12,321 Unable to link C library. Native methods will be disabled. java.lang.UnsatisfiedLinkError: Error looking up function '$$YJP$$mlockall': java: undefined symbol: $$YJP$$mlockall at com.sun.jna.Function.(Function.java:179) at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:344) at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:324) at com.sun.jna.Native.register(Native.java:1341) at com.sun.jna.Native.register(Native.java:1018) at org.apache.cassandra.utils.CLibrary.(CLibrary.java:57) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:118) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:330) mlockall() seems to be just in libc.so, and I do have libc.so.6 in /lib/
Key Range
Hi, Using 0.7.4, BOP, and RF=3, Write->QUORUM and Read->ONE "NO UPDATES" How does get_range_slices using Key Range works? Is it possible to get out of date list of keys? If a key is inserted into 2 Nodes and after that a read with a key range is issued, is it possible to have the 3 Node (that does not have this key yet) return the range with out that key? Regards, -- Alaa
Re: mysterious data disappearance - what happened?
> cluster name for both machines. So in other words, if we want to launch two > separate instances of cassandra and keep them separate, we must make sure > each uses a different cluster name or else they will gang up into the same > cluster? But how do they even discover each other? Can someone enlighten > me please? Thanks. It is highly recommended to use distinct cluster names, in particular because it can help avoid accidentally "merging" two independent clusters. As for how it happened: There is no magic discovery going on that would pick IP:s at random, but one could certainly e.g. accidentally configure the seed node on one to point to the other or some such. (1) does nodetool -h localhost ring show an unexpected node? (2) i'd suggest checking system.log on each node for the first appearance (if any) of the "unexpected" ip address and correlate (by time) with what happened on the other node (was it restarted at the time for example, potentially wth a bad conf?) (3) are these two single-instance cassandras that have never participated in another cluster? -- / Peter Schuller (@scode on twitter)
Re: Is sstable2json broken in 0.8.4+?
Okay, so let me get this straight, the command should accept: -f (-Data.db) The column family in question uses a UTF8Type Row Key and a composite column name (UTF8Type, TimeUUID) and UTF8Type Column value. Do I need to specify the rowkey, validator and comparator types as well? I'm not sure how non utf8 data got into a column for utf8 data. ColumnFamily: ServerIdentityProfiles Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.TimeUUIDType) Row cache size / save period in seconds: 1000.0/0 Key cache size / save period in seconds: 1.0/14400 Memtable thresholds: 0.290624997/1440/62 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] On Fri, Sep 9, 2011 at 1:08 PM, Jonathan Ellis wrote: > Sounds like you told Cassandra a key? column? was UTF8 but it had > non-UTF8 data in it. > > On Fri, Sep 9, 2011 at 2:06 PM, Anthony Ikeda > wrote: > > I can't seem to export an sstable. The parameter flags don't work either > > (using -k and -f). > > > > sstable2json > > > /Users/X/Database/cassandra_files/data/RegistryFoundation/ServerIdentityProfiles-g-3-Data.db > > WARN 12:01:55,721 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > { > > "6c6f63616c686f7374": [Exception in thread "main" > > org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes > > 4e694de2 > > at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:59) > > at > > > org.apache.cassandra.tools.SSTableExport.serializeColumn(SSTableExport.java:130) > > at > > > org.apache.cassandra.tools.SSTableExport.serializeColumns(SSTableExport.java:105) > > at > > > org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:191) > > at > org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:313) > > at > org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:335) > > at > org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:348) > > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:406) > > WARN 12:01:56,321 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,328 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,337 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,343 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,349 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,357 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,367 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,373 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,382 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,386 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,404 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,411 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,419 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,424 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,431 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,435 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,442 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,447 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,453 Invalid file '.DS_Store' in data directory > > /Users/X/Database/cassandra_files/data/RegistryFoundation. > > WARN 12:01:56,458 Invali
Re: Disable hector stats
I do it with a log4j properties file: log4j.appender.null=org.apache.log4j.varia.NullAppender log4j.category.me.prettyprint.cassandra.hector.TimingLogger=INFO, null log4j.additivity.me.prettyprint.cassandra.hector.TimingLogger=false On Sep 9, 2011, at 2:07 PM, Daning wrote: > Hi, > > How to disable hector stats? We keep getting this in log > > (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,290 > Statistics from Fri Sep 09 13:23:03 PDT 2011 to Fri Sep 09 13:24:03 PDT 2011 > (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,291 Tag > Avg(ms) Min Max Std Dev > 95th Count > (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,291 > ( > > We have tried to set log4j like this but that does not work, > > log4j.logger.com.ecyrd.speed4j.log.PeriodicalLog=ERROR > > Thanks, > > Daning >
performance diagnosis questions
I'm trying to understand where my queries are spending their time, trying yourkit , vmstat, iostat -x, plus trial and error by enabling/disabling some features my application basically creates a lot of entries for "user history", where each history is a row, and is maintained to be less than 20 items long, the application continuously creates new history and queries the history set with some key. as the application goes on, it became slower. somehow if I shutdown Cassandra and restart, it seems to go faster. I can't understand why this could be the case, with my limited understanding of Cassandra. any possible reasons? also I'm seeing a bunch of vmstat swap in and swap outs, about 10% of the time. just want to make sure: if a Mmapped RandomAccessFile loads a block into memory, that would also show up in "swap in" count, right? so far I've not enabled the JVM heap lock yet. will try that Thanks a lot Yang
Disable hector stats
Hi, How to disable hector stats? We keep getting this in log (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,290 Statistics from Fri Sep 09 13:23:03 PDT 2011 to Fri Sep 09 13:24:03 PDT 2011 (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,291 Tag Avg(ms) Min Max Std Dev 95th Count (PeriodicalLog.java:221) INFO Thread-53040 2011-09-09 13:24:03,291 ( We have tried to set log4j like this but that does not work, log4j.logger.com.ecyrd.speed4j.log.PeriodicalLog=ERROR Thanks, Daning
Re: would expiring columns expire a row?
Yes. On Fri, Sep 9, 2011 at 2:49 PM, Yang wrote: > if all the columns in a row expired, would the row key be deleted ? > that way the key lookup could possibly be faster due to a smaller key space > > Thanks > Yang > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Is sstable2json broken in 0.8.4+?
Sounds like you told Cassandra a key? column? was UTF8 but it had non-UTF8 data in it. On Fri, Sep 9, 2011 at 2:06 PM, Anthony Ikeda wrote: > I can't seem to export an sstable. The parameter flags don't work either > (using -k and -f). > > sstable2json > /Users/X/Database/cassandra_files/data/RegistryFoundation/ServerIdentityProfiles-g-3-Data.db > WARN 12:01:55,721 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > { > "6c6f63616c686f7374": [Exception in thread "main" > org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes > 4e694de2 > at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:59) > at > org.apache.cassandra.tools.SSTableExport.serializeColumn(SSTableExport.java:130) > at > org.apache.cassandra.tools.SSTableExport.serializeColumns(SSTableExport.java:105) > at > org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:191) > at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:313) > at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:335) > at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:348) > at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:406) > WARN 12:01:56,321 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,328 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,337 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,343 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,349 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,357 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,367 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,373 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,382 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,386 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,404 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,411 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,419 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,424 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,431 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,435 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,442 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,447 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,453 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. > WARN 12:01:56,458 Invalid file '.DS_Store' in data directory > /Users/X/Database/cassandra_files/data/RegistryFoundation. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
would expiring columns expire a row?
if all the columns in a row expired, would the row key be deleted ? that way the key lookup could possibly be faster due to a smaller key space Thanks Yang
Re: possible feature request RP vs. OPP
wouldn't that be ignoring the fact that is just a "prefix" and there is still the unique key after that prefix ;), so yes it may be just as clumpy as using OPP but only within a node which I don't really see as a big deal at that point, or am I missing something? Though maybe the default impl would be 3 bytes so everyone would be happy. main point being that I think cassandra could use OPP underlying like hbase and then expose a RP or OPP selection at column family creation timethat would be nice so I didn't have to write the code myself(and so no one else has to write it themselves). Any info on #1 and #2??? thanks, Dean On Fri, Sep 9, 2011 at 10:08 AM, Edward Capriolo wrote: > > > On Fri, Sep 9, 2011 at 10:34 AM, Dean Hiller wrote: > >> I saw this quote in the pdf. >> >> "For large indexes with common terms this too much data! Queries with > >> 100k hits" >> >> 1. What would be considered large? In most of my experience, we have the >> typical size of a RDBMS index but just have many many many more indexes as >> the size of the index is just dependent on our largest partition based on >> how we partition the data. >> >> 2. Does solandra have a lucene api underlying implementation? Our >> preference is to use lucene's api and the underlying implementation could be >> lucene, lucandra or solandra. >> >> 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha >> and the rest of the key is unique as the user would have to choose a unique >> key to begin with? After all, the hash only had to be bigger than the max >> number of nodes and 2^16 is quite large. >> >> thanks, >> Dean >> >> >> On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo wrote: >> >>> >>> >>> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller wrote: >>> I was wondering something. Since I can take OPP and I can create a layer that for certain column families, I hash the key so that some column families are just like RP but on top of OPP and some of my other column families are then on OPP directly so I could use lucandra, why not make RP deprecated and instead allow users to create OPP by column family or RP where RP == doing the hash of the key on my behalf and prefixing my key with that hashcode and stripping it back off when I read it in again. ie. why have RP when you could do RP per column family with the above reasoning on top of OPP and have the best of both worlds? ie. I think of having some column families random and then some column famiiles ordered so I could range query or use lucandra on top of those ones. thoughts? I was just curious. thanks, Dean >>> You can use ByteOrderPartitioner and hash data yourself. However that >>> makes every row key will be 128bits larger as the key has to be: >>> >>> md5+originalkey >>> >>> >>> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf >>> >>> Solandra now uses a 'modified' RandomPartitioner. >>> >> >> > I am not quite sure that using 8bit is good enough. It will shard your data > across a small number of nodes effectively, however I can imagine the > SStables will be "clumpy" because you reduce your sorting . It seems like a > http://en.wikipedia.org/wiki/Birthday_problem to me. (I could be wrong) >
Is sstable2json broken in 0.8.4+?
I can't seem to export an sstable. The parameter flags don't work either (using -k and -f). sstable2json /Users/X/Database/cassandra_files/data/RegistryFoundation/ServerIdentityProfiles-g-3-Data.db WARN 12:01:55,721 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. { "6c6f63616c686f7374": [Exception in thread "main" org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 4e694de2 at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:59) at org.apache.cassandra.tools.SSTableExport.serializeColumn(SSTableExport.java:130) at org.apache.cassandra.tools.SSTableExport.serializeColumns(SSTableExport.java:105) at org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:191) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:313) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:335) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:348) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:406) WARN 12:01:56,321 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,328 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,337 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,343 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,349 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,357 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,367 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,373 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,382 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,386 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,404 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,411 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,419 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,424 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,431 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,435 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,442 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,447 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,453 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation. WARN 12:01:56,458 Invalid file '.DS_Store' in data directory /Users/X/Database/cassandra_files/data/RegistryFoundation.
Re: [RELEASE] Apache Cassandra 0.8.5 released
Please don't forget that a lot of non-DataStax people contribute to Apache Cassandra! Thanks to everyone for their contributions. On Fri, Sep 9, 2011 at 1:29 PM, Evgeniy Ryabitskiy wrote: > > > 2011/9/9 Roshan Dawrani >> >> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly >> wrote: >>> >>> can take up to 12 hours for the sync to central >> >> Nearly 24 hours now, and 0.8.5 still not available at maven central >> - http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( >> rgds, >> Roshan > > Hi, > > 0.8.5 is available at Maven central: > http://repo1.maven.org/maven2/org/apache/cassandra/cassandra-all/0.8.5/ > > Don't look at http://mvnrepository.com/ it's just some indexing site for > central repository, not a repo itself. > > Also RPM is available > http://rpm.datastax.com/EL/5/x86_64/ > > My thanks to datastax team! > > Evgeny. > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [RELEASE] Apache Cassandra 0.8.5 released
2011/9/9 Roshan Dawrani > On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> can take up to 12 hours for the sync to central >> > > Nearly 24 hours now, and 0.8.5 still not available at maven central - > http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( > > rgds, > Roshan > Hi, 0.8.5 is available at Maven central: http://repo1.maven.org/maven2/org/apache/cassandra/cassandra-all/0.8.5/ Don't look at http://mvnrepository.com/ it's just some indexing site for central repository, not a repo itself. Also RPM is available http://rpm.datastax.com/EL/5/x86_64/ My thanks to datastax team! Evgeny.
Re: Replicate On Write behavior
They are evenly distributed. 5 nodes * 40 connections each using hector, and I can confirm that all 200 are active when this happened (from hector's perspective, from graphing the hector jmx data), and all 5 nodes saw roughly 40 connections, and all were receiving traffic over those connections. (netstat + ntop + trafshow, etc) I can also confirm that I changed my insert strategy to break up the rows using composite row keys, which reduced the row lengths and gave me an almost perfectly even data distribution among the nodes, and that was when I started to really dig into why the ROWs were still backing up on one node specifically, and why 2 nodes weren't seeing any. It was the 20%, 20%, 60% ROW distribution that really got me thinking, and when I took the 60% node out of the cluster, that ROW load jumped back to the node with the next-lowest IP address, and the 2 nodes that weren't seeing any *still* wheren't seeing any ROWs. At that point I tore down the cluster, recreated it as a 3 node cluster several times using various permutations of the 5 nodes available, and ROW load was *always* on the node with the lowest IP address. the theory might not be right, but it certainly represents the behavior I saw. On Sep 9, 2011, at 12:17 AM, Sylvain Lebresne wrote: > We'll solve #2890 and we should have done it sooner. > > That being said, a quick question: how do you do your inserts from the > clients ? Are you evenly > distributing the inserts among the nodes ? Or are you always hitting > the same coordinator ? > > Because provided the nodes are correctly distributed on the ring, if > you distribute the inserts > (increment) requests across the nodes (again I'm talking of client > requests), you "should" not > see the behavior you observe. > > -- > Sylvain > > On Thu, Sep 8, 2011 at 9:48 PM, David Hawthorne wrote: >> It was exactly due to 2890, and the fact that the first replica is always >> the one with the lowest value IP address. I patched cassandra to pick a >> random node out of the replica set in StorageProxy.java findSuitableEndpoint: >> >> Random rng = new Random(); >> >> return endpoints.get(rng.nextInt(endpoints.size())); // instead of return >> endpoints.get(0); >> >> Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the >> inserts/sec throughput. >> >> Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite >> load of a counter insert: >> >> One node will get RF/n of the disk work. Two nodes will always get 0 disk >> work. >> >> in a 3 node cluster, 1 node gets disk hit really hard. You get the >> performance of a one-node cluster. >> in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you >> the performance of ~2 node cluster. >> in a 10 node cluster, 1 node gets 30% of the disk work, giving you the >> performance of a ~3 node cluster. >> >> I confirmed this behavior with a 3, 4, and 5 node cluster size. >> >> >>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner... Address DC RackStatus State Load OwnsToken 136112946768375385385349842972707284580 10.0.0.57datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 10.0.0.56datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145 10.0.0.55datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290 10.0.0.54datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435 10.0.0.72datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580 The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it. >>> >>> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890. >>> >>> -- >>> Sylvain >> >>
Using Brisk with the latest Cassandra Build (0.8.5)
I just wanted to confirm that we will be able to install Brisk with 0.8.5 cassandra as I am aware of some significant fixes and enhancements to the latest build of Cassandra. Anthony
Re: Mutations during selects
Consider this scenario in a SQL database: UPDATE foo SET x = 1 WHERE key = 'asdf'; Now, "simultaneously," two clients run UPDATE foo SET x = 2 WHERE key = 'asdf'; and SELECT * FROM foo WHERE x = 1; Either you get back row asdf, or you don't. Either is valid. Same thing happens with Cassandra indexes. On Fri, Sep 9, 2011 at 10:41 AM, wrote: > I see that Cassandra updates secondary indices as soon as a value of the > indexed column is updated. This can happen, for example, during a select > query with a condition on a secondary index. Does Cassandra perform no > checking or locking? Will the result of this select, with old and new > values, be returned as is? Am I missing some reason why this isn't a > problem? > > Alexander > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Question on using consistency level with NetworkTopologyStrategy
Oh yes, that is cool. I see from the code now (was reading it incorrectly). So a Quorum with NTS would give me 3 copies across the cluster, not necessarily 2 local and 1 remote, but for most parts that would be true since WAN adds to latency. Thanks On Thu, Sep 8, 2011 at 3:40 PM, Jonathan Ellis wrote: > CL.QUORUM is supported with any replication strategy, not just simple. > > Also, Cassandra's optimizing of cross-DC writes only requires that it > know (via a correctly configured Snitch) where each node is located. > It is not affected by replication strategy choice. > > On Thu, Sep 8, 2011 at 3:14 PM, Anand Somani wrote: > > Hi, > > > > Have a requirement, where data is spread across multiple DC for disaster > > recovery. So I would use the NTS, that is clear, but I have some > questions > > with this scenario > > > > I have 2 Data Centers > > RF - 2 (active DC) , 2 (passive DC) > > with NTS - Consistency level options are - LOCAL_QUORUM and EACH _QUORUM > > I want LOCAL_QUORUM and 1 remote copy (not 2) for write to succeed, if I > > used EACH_QUORUM - it would mean that I need both the remote nodes up (as > I > > understand from http://www.datastax.com/docs/0.8/operations/datacenter). > > > > So if that is my requirement what consistency Level should I be using for > my > > writes? Is that even possible with NTS or another strategy? I could use > the > > SimpleStrategy with Quorum, but that would mean sending 2 copies (instead > of > > 1 per DC that NTS uses to optimize on WAN traffic) to remote DC (since it > > does not understand DC's)? > > > > Thanks > > Anand > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Cassandra as in-memory cache
Hi, Can we configure some column-families (or keyspaces) in Cassandra to perform as a pure in-memory cache? The feature should let the memtables always be in-memory (never flushed to the disk - sstables). The memtable flush threshold settings of time/ memory/ operations can be set to a max value to achieve this. However, it seems uneven distribution of the keys across the nodes in the cluster could lead to java error no-memory available. In order to prevent this error can we overflow some entries to the disk? Thanks, Kapil
Re: possible feature request RP vs. OPP
On Fri, Sep 9, 2011 at 10:34 AM, Dean Hiller wrote: > I saw this quote in the pdf. > > "For large indexes with common terms this too much data! Queries with > > 100k hits" > > 1. What would be considered large? In most of my experience, we have the > typical size of a RDBMS index but just have many many many more indexes as > the size of the index is just dependent on our largest partition based on > how we partition the data. > > 2. Does solandra have a lucene api underlying implementation? Our > preference is to use lucene's api and the underlying implementation could be > lucene, lucandra or solandra. > > 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha > and the rest of the key is unique as the user would have to choose a unique > key to begin with? After all, the hash only had to be bigger than the max > number of nodes and 2^16 is quite large. > > thanks, > Dean > > > On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo wrote: > >> >> >> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller wrote: >> >>> I was wondering something. Since I can take OPP and I can create a layer >>> that for certain column families, I hash the key so that some column >>> families are just like RP but on top of OPP and some of my other column >>> families are then on OPP directly so I could use lucandra, why not make RP >>> deprecated and instead allow users to create OPP by column family or RP >>> where RP == doing the hash of the key on my behalf and prefixing my key with >>> that hashcode and stripping it back off when I read it in again. >>> >>> ie. why have RP when you could do RP per column family with the above >>> reasoning on top of OPP and have the best of both worlds? >>> >>> ie. I think of having some column families random and then some column >>> famiiles ordered so I could range query or use lucandra on top of those >>> ones. >>> >>> thoughts? I was just curious. >>> thanks, >>> Dean >>> >>> >> You can use ByteOrderPartitioner and hash data yourself. However that >> makes every row key will be 128bits larger as the key has to be: >> >> md5+originalkey >> >> >> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf >> >> Solandra now uses a 'modified' RandomPartitioner. >> > > I am not quite sure that using 8bit is good enough. It will shard your data across a small number of nodes effectively, however I can imagine the SStables will be "clumpy" because you reduce your sorting . It seems like a http://en.wikipedia.org/wiki/Birthday_problem to me. (I could be wrong)
Re: [RELEASE] Apache Cassandra 0.8.5 released
On 9 September 2011 16:48, Stephen Connolly wrote: > On 9 September 2011 16:18, Sylvain Lebresne wrote: >> On Fri, Sep 9, 2011 at 4:52 PM, Stephen Connolly >> wrote: >>> is the staging repo released at repository.apache.org? or did somebody >>> forget to finish that step? >> >> Nobody forgot that step as can be seen in: >> https://repository.apache.org/content/repositories/releases/org/apache/cassandra/apache-cassandra/ >> > > Hard to tell from a phone over a shoe-string of a network connection. > Yep looks fine from the apache side... I'll give a peek at the other > sides I'm seeing it on central now, so all should be good > >> -- >> Sylvain >> >> >>> >>> - Stephen >>> >>> --- >>> Sent from my Android phone, so random spelling mistakes, random nonsense >>> words and other nonsense are a direct result of using swype to type on the >>> screen >>> >>> On 9 Sep 2011 04:48, "Roshan Dawrani" wrote: On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > can take up to 12 hours for the sync to central > Nearly 24 hours now, and 0.8.5 still not available at maven central - http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( rgds, Roshan >>> >> >
Re: [RELEASE] Apache Cassandra 0.8.5 released
On 9 September 2011 16:18, Sylvain Lebresne wrote: > On Fri, Sep 9, 2011 at 4:52 PM, Stephen Connolly > wrote: >> is the staging repo released at repository.apache.org? or did somebody >> forget to finish that step? > > Nobody forgot that step as can be seen in: > https://repository.apache.org/content/repositories/releases/org/apache/cassandra/apache-cassandra/ > Hard to tell from a phone over a shoe-string of a network connection. Yep looks fine from the apache side... I'll give a peek at the other sides > -- > Sylvain > > >> >> - Stephen >> >> --- >> Sent from my Android phone, so random spelling mistakes, random nonsense >> words and other nonsense are a direct result of using swype to type on the >> screen >> >> On 9 Sep 2011 04:48, "Roshan Dawrani" wrote: >>> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < >>> stephen.alan.conno...@gmail.com> wrote: >>> can take up to 12 hours for the sync to central >>> >>> Nearly 24 hours now, and 0.8.5 still not available at maven central - >>> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( >>> >>> rgds, >>> Roshan >> >
Mutations during selects
I see that Cassandra updates secondary indices as soon as a value of the indexed column is updated. This can happen, for example, during a select query with a condition on a secondary index. Does Cassandra perform no checking or locking? Will the result of this select, with old and new values, be returned as is? Am I missing some reason why this isn't a problem? Alexander
Re: [RELEASE] Apache Cassandra 0.8.5 released
On Fri, Sep 9, 2011 at 4:52 PM, Stephen Connolly wrote: > is the staging repo released at repository.apache.org? or did somebody > forget to finish that step? Nobody forgot that step as can be seen in: https://repository.apache.org/content/repositories/releases/org/apache/cassandra/apache-cassandra/ -- Sylvain > > - Stephen > > --- > Sent from my Android phone, so random spelling mistakes, random nonsense > words and other nonsense are a direct result of using swype to type on the > screen > > On 9 Sep 2011 04:48, "Roshan Dawrani" wrote: >> On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < >> stephen.alan.conno...@gmail.com> wrote: >> >>> can take up to 12 hours for the sync to central >>> >> >> Nearly 24 hours now, and 0.8.5 still not available at maven central - >> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( >> >> rgds, >> Roshan >
Re: [RELEASE] Apache Cassandra 0.8.5 released
is the staging repo released at repository.apache.org? or did somebody forget to finish that step? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 9 Sep 2011 04:48, "Roshan Dawrani" wrote: > On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> can take up to 12 hours for the sync to central >> > > Nearly 24 hours now, and 0.8.5 still not available at maven central - > http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( > > rgds, > Roshan
Re: possible feature request RP vs. OPP
I saw this quote in the pdf. "For large indexes with common terms this too much data! Queries with > 100k hits" 1. What would be considered large? In most of my experience, we have the typical size of a RDBMS index but just have many many many more indexes as the size of the index is just dependent on our largest partition based on how we partition the data. 2. Does solandra have a lucene api underlying implementation? Our preference is to use lucene's api and the underlying implementation could be lucene, lucandra or solandra. 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha and the rest of the key is unique as the user would have to choose a unique key to begin with? After all, the hash only had to be bigger than the max number of nodes and 2^16 is quite large. thanks, Dean On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo wrote: > > > On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller wrote: > >> I was wondering something. Since I can take OPP and I can create a layer >> that for certain column families, I hash the key so that some column >> families are just like RP but on top of OPP and some of my other column >> families are then on OPP directly so I could use lucandra, why not make RP >> deprecated and instead allow users to create OPP by column family or RP >> where RP == doing the hash of the key on my behalf and prefixing my key with >> that hashcode and stripping it back off when I read it in again. >> >> ie. why have RP when you could do RP per column family with the above >> reasoning on top of OPP and have the best of both worlds? >> >> ie. I think of having some column families random and then some column >> famiiles ordered so I could range query or use lucandra on top of those >> ones. >> >> thoughts? I was just curious. >> thanks, >> Dean >> >> > You can use ByteOrderPartitioner and hash data yourself. However that makes > every row key will be 128bits larger as the key has to be: > > md5+originalkey > > > http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf > > Solandra now uses a 'modified' RandomPartitioner. >
Re: possible feature request RP vs. OPP
Actually, we only need a 8 bit key(one whole byte) because an 8 bit key would be useful up to 2^8 nodes which I am pretty sure we would never get too. Of course, I guess we could use one more whole byte just in case ;) We are planning on doing this ourselvesit would just be nice if it was hidden from us instead and I would think it would simplify the cassandra code as well by only dealing with one partitioner. of course, let me catch up on the pdf you sent me as well. thanks, Dean On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo wrote: > > > On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller wrote: > >> I was wondering something. Since I can take OPP and I can create a layer >> that for certain column families, I hash the key so that some column >> families are just like RP but on top of OPP and some of my other column >> families are then on OPP directly so I could use lucandra, why not make RP >> deprecated and instead allow users to create OPP by column family or RP >> where RP == doing the hash of the key on my behalf and prefixing my key with >> that hashcode and stripping it back off when I read it in again. >> >> ie. why have RP when you could do RP per column family with the above >> reasoning on top of OPP and have the best of both worlds? >> >> ie. I think of having some column families random and then some column >> famiiles ordered so I could range query or use lucandra on top of those >> ones. >> >> thoughts? I was just curious. >> thanks, >> Dean >> >> > You can use ByteOrderPartitioner and hash data yourself. However that makes > every row key will be 128bits larger as the key has to be: > > md5+originalkey > > > http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf > > Solandra now uses a 'modified' RandomPartitioner. >
[ANN] BigDataCamp Delhi, India, Sep 10, 2011
Registration here (few seats left) - http://www.cloudcamp.org/delhi Agenda: 9:30 am - Food, Drinks & Networking 10:00 am - Welcome, Thank yous & Introductions 10:15 am - Lightning Talks (5 minutes each) 10:45 am - Unpanel 11:45 am - Prepare for Unconference Breakout Sessions (solicit breakout topics, etc.). 12:00 - 12:15 Break 12:15 pm - Unconference - Round 1 1:00 pm Lunch 2:15pm - Unconference - Round 2 2:45pm - Unconference - Round 3 3:15pm - Unconference - Round 4 3:45pm - Wrap Up Proposed Topics: Introduction to Hadoop / Big Data Kundera (ORM for Cassandra, Hbase and MongoDB) Introduction to NOSQL BigData Analytics Crux Sponsors: IBM, Impetus, Nasscom Location: Impetus Infotech (India) Pvt. Ltd. D-39 & 40, Sector 59 Noida (Near New Delhi) Uttar Pradesh - 201307 Regards, Sanjay Sharma Need to identify code bottlenecks? Register for Impetus Webinar on 'Rapid Bottleneck Identification through Software Performance Diagnostic Tools' on Aug 19. Click http://www.impetus.com to know more. Follow us on www.twitter.com/impetuscalling NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: row key as long type
It is resolved. I used Map/Reduce program, in that I converted long row key as byte array and stored it as ByteBuffer. Now row key are in sorted order. I can apply range query. E.g. $list ip[10001:10005]; Regards, Thamizhannal P --- On Fri, 9/9/11, Jonathan Ellis wrote: From: Jonathan Ellis Subject: Re: row key as long type To: user@cassandra.apache.org Date: Friday, 9 September, 2011, 1:46 AM Probably because you had some non-long data in it, then added the long type later. On Thu, Sep 8, 2011 at 2:51 PM, amulya rattan wrote: But I explicitly remember List throwing "long is exactly 8 bytes" when i invoked on a column family with long as key. Why would that happen? On Thu, Sep 8, 2011 at 10:07 AM, Jonathan Ellis wrote: List should work fine on any schema, including long keys. On Thu, Sep 8, 2011 at 8:23 AM, amulya rattan wrote: Row key can certainly be of type long..you'd just have to set key_validataion_class to be LongType. However, doing list on column family would throw an error..please look at http://wiki.apache.org/cassandra/FAQ#a_long_is_exactly_8_bytes On Thu, Sep 8, 2011 at 8:14 AM, Thamizh wrote: Hi All, Is there a way to store number(longtype) as row key in Cassadra? I wanted to execute range query based on row key value. e.g $list info[12345:]; . It should list all the rowkeys which are >= 12345. Is there a way accompolish this in cassandra? Secondary index does not helped me. So I am trying to store column value 'ip' as rowkey here. data model: create keyspace ipinfo with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; use rng; create column family info with comparator = AsciiType and key_validation_class = UTF8Type and column_metadata = [{ column_name : domain, validation_class : UTF8Type, index_type : 0, index_name : domain_idx}, { column_name : ip, validation_class : LongType, index_type : 0, index_name : ip_idx }]; Regards, Thamizhannal -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [RELEASE] Apache Cassandra 0.8.5 released
On Thu, Sep 8, 2011 at 8:21 PM, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > can take up to 12 hours for the sync to central > Nearly 24 hours now, and 0.8.5 still not available at maven central - http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all :-( rgds, Roshan
Re: what's the difference between repair CF separately and repair the entire node?
On Fri, Sep 9, 2011 at 4:18 AM, Yan Chunlu wrote: > I have 3 nodes and RF=3. I tried to repair every node in the cluster by > using "nodetool repair mykeyspace mycf" on every column family. it finished > within 3 hours, the data size is no more than 50GB. > after the repair, I have tried using nodetool repair immediately to repair > the entire node, but 48 hours has past it still going on. "compactionstats" > shows it is doing "SSTable rebuild". > so I am frustrating about why does "nodetool repair" so slow? how does it > different with repair every CF? What version of Cassandra are you using. If you are using something < 0.8.2, then it may be because "nodetool repair" used to schedule its sub-task poorly, in ways that were counter-productive (fixed by CASSANDRA-2816). If you are using a more recent version, then it's an interesting report. > I didn't tried to repair the system keyspace, does it also need to repair? It doesn't. -- Sylvain
Re: Replicate On Write behavior
We'll solve #2890 and we should have done it sooner. That being said, a quick question: how do you do your inserts from the clients ? Are you evenly distributing the inserts among the nodes ? Or are you always hitting the same coordinator ? Because provided the nodes are correctly distributed on the ring, if you distribute the inserts (increment) requests across the nodes (again I'm talking of client requests), you "should" not see the behavior you observe. -- Sylvain On Thu, Sep 8, 2011 at 9:48 PM, David Hawthorne wrote: > It was exactly due to 2890, and the fact that the first replica is always the > one with the lowest value IP address. I patched cassandra to pick a random > node out of the replica set in StorageProxy.java findSuitableEndpoint: > > Random rng = new Random(); > > return endpoints.get(rng.nextInt(endpoints.size())); // instead of return > endpoints.get(0); > > Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the > inserts/sec throughput. > > Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite > load of a counter insert: > > One node will get RF/n of the disk work. Two nodes will always get 0 disk > work. > > in a 3 node cluster, 1 node gets disk hit really hard. You get the > performance of a one-node cluster. > in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you > the performance of ~2 node cluster. > in a 10 node cluster, 1 node gets 30% of the disk work, giving you the > performance of a ~3 node cluster. > > I confirmed this behavior with a 3, 4, and 5 node cluster size. > > >> >>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with >>> ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that >>> normal? I'm using RandomPartitioner... >>> >>> Address DC Rack Status State Load Owns >>> Token >>> >>> 136112946768375385385349842972707284580 >>> 10.0.0.57 datacenter1 rack1 Up Normal 2.26 GB 20.00% >>> 0 >>> 10.0.0.56 datacenter1 rack1 Up Normal 2.47 GB 20.00% >>> 34028236692093846346337460743176821145 >>> 10.0.0.55 datacenter1 rack1 Up Normal 2.52 GB 20.00% >>> 68056473384187692692674921486353642290 >>> 10.0.0.54 datacenter1 rack1 Up Normal 950.97 MB 20.00% >>> 102084710076281539039012382229530463435 >>> 10.0.0.72 datacenter1 rack1 Up Normal 383.25 MB 20.00% >>> 136112946768375385385349842972707284580 >>> >>> The nodes with ReplicateOnWrites are the 3 in the middle. The first node >>> and last node both have a count of 0. This is a clean cluster, and I've >>> been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 >>> hours. The last time this test ran, it went all the way down to 500 >>> inserts/sec before I killed it. >> >> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890. >> >> -- >> Sylvain > >