Re: Understanding Virtual Nodes on Cassandra 1.2
Are there tickets/documents explain how data be replicated on Virtual Nodes? If there are multiple tokens on one physical host, may a chance two or more tokens chosen by replication strategy located on same host? If move/remove/add a token manually, does Cassandra Engine validate the case? Thanks. On Jan 30, 2013, at 12:46 PM, Zhong Li wrote: >> You add a physical node and that in turn adds num_token tokens to the ring. > > No, I am talking about Virtual Nodes with order preserving partitioner. For > an existing host with multiple tokens setting list on cassandra.inital_token. > After initial bootstrapping, the host will not aware changes of > cassandra.inital_token. If I want add a new token( virtual node), I have to > rebuild the host with new token list. > > My question is if there is way to add a virtual nodes without rebuild it? > > Thanks, > > On Jan 30, 2013, at 10:21 AM, Manu Zhang wrote: > >> On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: >>> One more question, can I add a virtual node manually without reboot >>> and rebuild a host data? >>> >>> I checked nodetool command, there is no option to add a node. >>> >>> Thanks. >>> >>> Zhong >>> >>> >>> On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: >>> >>>> I was misunderstood this >>>> http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , >>>> especially >>>> "If you want to get started with vnodes on a fresh cluster, however, >>>> that is fairly straightforward. Just don’t set the >>>> |initial_token| parameter in your|conf/cassandra.yaml| and instead >>>> enable the |num_tokens| parameter. A good default value for this is 256" >>>> >>>> Also I couldn't find document about set multiple tokens >>>> for cassandra.inital_token >>>> >>>> Anyway, I just tested, it does work to set comma separated list of >>>> tokens. >>>> >>>> Thanks, >>>> >>>> Zhong >>>> >>>> >>>> On Jan 29, 2013, at 3:06 AM, aaron morton wrote: >>>> >>>>>> After I searched some document on Datastax website and some old >>>>>> ticket, seems that it works for random partitioner only, and leaves >>>>>> order preserved partitioner out of the luck. >>>>> Links ? >>>>> >>>>>> or allow add Virtual Nodes manually? >>>>> If not looked into it but there is a cassandra.inital_token startup >>>>> param that takes a comma separated list of tokens for the node. >>>>> >>>>> There also appears to be support for the ordered partitions to >>>>> generate random tokens. >>>>> >>>>> But you would still have the problem of having to balance your row >>>>> keys around the token space. >>>>> >>>>> Cheers >>>>> - >>>>> Aaron Morton >>>>> Freelance Cassandra Developer >>>>> New Zealand >>>>> >>>>> @aaronmorton >>>>> http://www.thelastpickle.com <http://www.thelastpickle.com/> >>>>> >>>>> On 29/01/2013, at 10:31 AM, Zhong Li >>>> <mailto:z...@voxeo.com>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Virtual Nodes is great feature. After I searched some document on >>>>>> Datastax website and some old ticket, seems that it works for >>>>>> random partitioner only, and leaves order preserved partitioner out >>>>>> of the luck. I may misunderstand, please correct me. if it doesn't >>>>>> love order preserved partitioner, would be possible to add support >>>>>> multiple initial_token(s) for order preserved partitioner or >>>>>> allow add Virtual Nodes manually? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Zhong >>>>> >>>> >>> >> >> You add a physical node and that in turn adds num_token tokens to the ring. >
Re: Understanding Virtual Nodes on Cassandra 1.2
> You add a physical node and that in turn adds num_token tokens to the ring. No, I am talking about Virtual Nodes with order preserving partitioner. For an existing host with multiple tokens setting list on cassandra.inital_token. After initial bootstrapping, the host will not aware changes of cassandra.inital_token. If I want add a new token( virtual node), I have to rebuild the host with new token list. My question is if there is way to add a virtual nodes without rebuild it? Thanks, On Jan 30, 2013, at 10:21 AM, Manu Zhang wrote: > On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: >> One more question, can I add a virtual node manually without reboot >> and rebuild a host data? >> >> I checked nodetool command, there is no option to add a node. >> >> Thanks. >> >> Zhong >> >> >> On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: >> >>> I was misunderstood this >>> http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , >>> especially >>> "If you want to get started with vnodes on a fresh cluster, however, >>> that is fairly straightforward. Just don’t set the >>> |initial_token| parameter in your|conf/cassandra.yaml| and instead >>> enable the |num_tokens| parameter. A good default value for this is 256" >>> >>> Also I couldn't find document about set multiple tokens >>> for cassandra.inital_token >>> >>> Anyway, I just tested, it does work to set comma separated list of >>> tokens. >>> >>> Thanks, >>> >>> Zhong >>> >>> >>> On Jan 29, 2013, at 3:06 AM, aaron morton wrote: >>> >>>>> After I searched some document on Datastax website and some old >>>>> ticket, seems that it works for random partitioner only, and leaves >>>>> order preserved partitioner out of the luck. >>>> Links ? >>>> >>>>> or allow add Virtual Nodes manually? >>>> If not looked into it but there is a cassandra.inital_token startup >>>> param that takes a comma separated list of tokens for the node. >>>> >>>> There also appears to be support for the ordered partitions to >>>> generate random tokens. >>>> >>>> But you would still have the problem of having to balance your row >>>> keys around the token space. >>>> >>>> Cheers >>>> - >>>> Aaron Morton >>>> Freelance Cassandra Developer >>>> New Zealand >>>> >>>> @aaronmorton >>>> http://www.thelastpickle.com <http://www.thelastpickle.com/> >>>> >>>> On 29/01/2013, at 10:31 AM, Zhong Li >>> <mailto:z...@voxeo.com>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Virtual Nodes is great feature. After I searched some document on >>>>> Datastax website and some old ticket, seems that it works for >>>>> random partitioner only, and leaves order preserved partitioner out >>>>> of the luck. I may misunderstand, please correct me. if it doesn't >>>>> love order preserved partitioner, would be possible to add support >>>>> multiple initial_token(s) for order preserved partitioner or >>>>> allow add Virtual Nodes manually? >>>>> >>>>> Thanks, >>>>> >>>>> Zhong >>>> >>> >> > > You add a physical node and that in turn adds num_token tokens to the ring.
Re: Understanding Virtual Nodes on Cassandra 1.2
One more question, can I add a virtual node manually without reboot and rebuild a host data? I checked nodetool command, there is no option to add a node. Thanks. Zhong On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: > I was misunderstood this > http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially > "If you want to get started with vnodes on a fresh cluster, however, that is > fairly straightforward. Just don’t set the initial_token parameter in > yourconf/cassandra.yaml and instead enable the num_tokens parameter. A good > default value for this is 256" > > Also I couldn't find document about set multiple tokens for > cassandra.inital_token > > Anyway, I just tested, it does work to set comma separated list of tokens. > > Thanks, > > Zhong > > > On Jan 29, 2013, at 3:06 AM, aaron morton wrote: > >>> After I searched some document on Datastax website and some old ticket, >>> seems that it works for random partitioner only, and leaves order preserved >>> partitioner out of the luck. >> Links ? >> >>> or allow add Virtual Nodes manually? >> If not looked into it but there is a cassandra.inital_token startup param >> that takes a comma separated list of tokens for the node. >> >> There also appears to be support for the ordered partitions to generate >> random tokens. >> >> But you would still have the problem of having to balance your row keys >> around the token space. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 29/01/2013, at 10:31 AM, Zhong Li wrote: >> >>> Hi All, >>> >>> Virtual Nodes is great feature. After I searched some document on Datastax >>> website and some old ticket, seems that it works for random partitioner >>> only, and leaves order preserved partitioner out of the luck. I may >>> misunderstand, please correct me. if it doesn't love order preserved >>> partitioner, would be possible to add support multiple initial_token(s) for >>> order preserved partitioner or allow add Virtual Nodes manually? >>> >>> Thanks, >>> >>> Zhong >> >
Re: Understanding Virtual Nodes on Cassandra 1.2
I was misunderstood this http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially "If you want to get started with vnodes on a fresh cluster, however, that is fairly straightforward. Just don’t set the initial_token parameter in yourconf/cassandra.yaml and instead enable the num_tokens parameter. A good default value for this is 256" Also I couldn't find document about set multiple tokens for cassandra.inital_token Anyway, I just tested, it does work to set comma separated list of tokens. Thanks, Zhong On Jan 29, 2013, at 3:06 AM, aaron morton wrote: >> After I searched some document on Datastax website and some old ticket, >> seems that it works for random partitioner only, and leaves order preserved >> partitioner out of the luck. > Links ? > >> or allow add Virtual Nodes manually? > If not looked into it but there is a cassandra.inital_token startup param > that takes a comma separated list of tokens for the node. > > There also appears to be support for the ordered partitions to generate > random tokens. > > But you would still have the problem of having to balance your row keys > around the token space. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 29/01/2013, at 10:31 AM, Zhong Li wrote: > >> Hi All, >> >> Virtual Nodes is great feature. After I searched some document on Datastax >> website and some old ticket, seems that it works for random partitioner >> only, and leaves order preserved partitioner out of the luck. I may >> misunderstand, please correct me. if it doesn't love order preserved >> partitioner, would be possible to add support multiple initial_token(s) for >> order preserved partitioner or allow add Virtual Nodes manually? >> >> Thanks, >> >> Zhong >
Understanding Virtual Nodes on Cassandra 1.2
Hi All, Virtual Nodes is great feature. After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. I may misunderstand, please correct me. if it doesn't love order preserved partitioner, would be possible to add support multiple initial_token(s) for order preserved partitioner or allow add Virtual Nodes manually? Thanks, Zhong
Re: [RELEASE] Apache Cassandra 1.0.5 released
You may run "stress -d " to create Standard1 CF and data. On Dec 3, 2011, at 3:44 PM, wrote: > Hi Zhong Li, > > When I used stress tool to test, I got: > > Operation [15] retried 10 times - error inserting key 0015 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [37] retried 10 times - error inserting key 0037 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [17] retried 10 times - error inserting key 0017 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [40] retried 10 times - error inserting key 0040 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [28] retried 10 times - error inserting key 0028 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [2] retried 10 times - error inserting key 0002 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [13] retried 10 times - error inserting key 0013 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [29] retried 10 times - error inserting key 0029 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [30] retried 10 times - error inserting key 0030 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [23] retried 10 times - error inserting key 0023 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [21] retried 10 times - error inserting key 0021 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [42] retried 10 times - error inserting key 0042 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [0] retried 10 times - error inserting key > ((InvalidRequestException): unconfigured columnfamily Standard1) > > Operation [11] retried 10 times - error inserting key 0011 > ((InvalidRequestException): unconfigured columnfamily Standard1) > > 0,0,0,NaN,0 > END > > Do I need to create the column family manually first, or it is created by the > stress tool automatically? Why did I get the errors above? > > Thanks, > Mike > > -Original Message- > From: Zhong Li [mailto:z...@voxeo.com] > Sent: Friday, December 02, 2011 3:24 PM > To: user@cassandra.apache.org > Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released > > I just tested with stress tool, it is reproducible and timeout happens always. > > ./stress -d -e QUORUM -l 3 -o RANGE_SLICE > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > 0,0,0,NaN,10 > 0,0,0,NaN,20 > 0,0,0,NaN,30 > 0,0,0,NaN,40 > 0,0,0,NaN,50 > 0,0,0,NaN,60 > 0,0,0,NaN,70 > 0,0,0,NaN,80 > 0,0,0,NaN,91 > 0,0,0,NaN,101 > Operation [2] retried 10 times - error on calling get_indexed_slices for > range offset 002 ((TimedOutException)) > > Operation [46] retried 10 times - error on calling get_indexed_slices for > range offset 046 ((TimedOutException)) > > Operation [47] retried 10 times - error on calling get_indexed_slices for > range offset 047 ((TimedOutException)) > > Operation [24] retried 10 times - error on calling get_indexed_slices for > range offset 024 ((TimedOutException)) > > Operation [45] retried 10 times - error on calling get_indexed_slices for > range offset 045 ((TimedOutException)) > > Operation [48] retried 10 times - error on calling get_indexed_slices for > range offset 048 ((TimedOutException)) > > Operation [5] retried 10 times - error on calling get_indexed_slices for > range offset 005 ((TimedOutException)) > > Operation [28] retried 10 times - error on calling get_indexed_slices for > range offset 028 ((TimedOutException)) > > Operation [0] retried 10 times - error on calling get_indexed_slices for > range offset 000 ((TimedOutException)) > > Operation [23] retried 10 times - error on calling get_indexed_slices for > range offset 023 ((TimedOutException)) > > Operation [32] retried 10 times - error on calling get_indexed_slices for > range offset 032 ((TimedOutException)) > > Operation [36] retried 10 times - error on calling get_indexed_slices for > range offset 036 ((TimedOutException)) > > Operation [16] retried 10 times - error on calling get_indexed_slices for > range offset 016 ((TimedOutException)) > > Operation [6] retried 10 times - error on calling get_indexed_slices for > range offset 006 ((TimedOutException)) > > Operation [9] retried 10 times - error on calling get_ind
Re: [RELEASE] Apache Cassandra 1.0.5 released
((TimedOutException)) Operation [22] retried 10 times - error on calling get_indexed_slices for range offset 022 ((TimedOutException)) Operation [49] retried 10 times - error on calling get_indexed_slices for range offset 049 ((TimedOutException)) Operation [41] retried 10 times - error on calling get_indexed_slices for range offset 041 ((TimedOutException)) Operation [37] retried 10 times - error on calling get_indexed_slices for range offset 037 ((TimedOutException)) Operation [13] retried 10 times - error on calling get_indexed_slices for range offset 013 ((TimedOutException)) Operation [39] retried 10 times - error on calling get_indexed_slices for range offset 039 ((TimedOutException)) Operation [21] retried 10 times - error on calling get_indexed_slices for range offset 021 ((TimedOutException)) Operation [10] retried 10 times - error on calling get_indexed_slices for range offset 010 ((TimedOutException)) Operation [17] retried 10 times - error on calling get_indexed_slices for range offset 017 ((TimedOutException)) Operation [1] retried 10 times - error on calling get_indexed_slices for range offset 001 ((TimedOutException)) Operation [42] retried 10 times - error on calling get_indexed_slices for range offset 042 ((TimedOutException)) 0,0,0,NaN,106 END On Dec 2, 2011, at 2:45 PM, Janne Jalkanen wrote: > > Would be glad to be of any help; it's kind of annoying. > > * Nothing unusual on any nodes that I can see > * Cannot reproduce on a single-node cluster; I see it only on our prod > cluster which was running 0.6.13 until this point (cluster conf is attached > to the JIRA issue mentioned below). > > Let me know of anything that I can try, short of taking my production cluster > offline :-P > > /Janne > > On Dec 2, 2011, at 20:42 , Jonathan Ellis wrote: > >> The first step towards determining how serious it is, is showing us >> how to reproduce it or otherwise narrowing down what could be causing >> it, because timeouts can be caused by a lot of non-bug scenarios. >> Does it occur for every query or just some? Is there anything unusual >> on the coordinator or replica nodes, like high CPU? Can you reproduce >> with the stress tool? Can you reproduce on a single-node-cluster? >> That kind of thing. >> >> On Fri, Dec 2, 2011 at 12:18 PM, Pierre Belanger >> wrote: >>> Hello, >>> >>> Is this bug serious enough for 1.0.6 to come out shortly or not? >>> >>> Thank you, >>> PBR >>> >>> >>> >>> On Thu, Dec 1, 2011 at 6:05 PM, Zhong Li wrote: >>>> >>>> After upgrade to 1.0.5 RangeSlice got timeout. Ticket >>>> https://issues.apache.org/jira/browse/CASSANDRA-3551 >>>> >>>> On Dec 1, 2011, at 5:43 PM, Evgeniy Ryabitskiy wrote: >>>> >>>>> +1 >>>>> After upgrade to 1.0.5 also have Timeout exception on Secondary Index >>>>> search (get_indexed_slices API) . >>>> >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >
Re: [RELEASE] Apache Cassandra 1.0.5 released
After upgrade to 1.0.5 RangeSlice got timeout. Ticket https://issues.apache.org/jira/browse/CASSANDRA-3551 On Dec 1, 2011, at 5:43 PM, Evgeniy Ryabitskiy wrote: > +1 > After upgrade to 1.0.5 also have Timeout exception on Secondary Index search > (get_indexed_slices API) .
Re: Fail to upgrade to 1.0.0 from 0.8.5
Done. https://issues.apache.org/jira/browse/CASSANDRA-3407 On Oct 27, 2011, at 3:40 AM, Sylvain Lebresne wrote: > Do you mind opening a bug report on > https://issues.apache.org/jira/browse/CASSANDRA? > > -- > Sylvain > > On Thu, Oct 27, 2011 at 12:35 AM, Zhong Li wrote: >> >> >> Hi, >> >> I am trying to upgrade our test servers to 1.0.0 from 0.8.5 and got >> exceptions when restart. >> >> INFO 22:25:37,727 Opening /srv/opt/cassandra8/data/system/IndexInfo-g-121 >> (5428 bytes) >> ERROR 22:25:37,753 Exception encountered during startup_type: 0}, >> java.lang.StackOverflowError, validation_class: UTF8Type, index_type: 0}, >>at java.math.BigInteger.compareMagnitude(BigInteger.java:2477) >>at java.math.BigInteger.compareTo(BigInteger.java:2463)type: 0}, >>at >> org.apache.cassandra.dht.BigIntegerToken.compareTo(BigIntegerToken.java:39) >>at >> org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:83) >>at >> org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38) >>at java.util.Arrays.mergeSort(Arrays.java:1144)dex_type: 0}, >>at java.util.Arrays.sort(Arrays.java:1079)dex_type: 0}, >>at java.util.Collections.sort(Collections.java:117)}, >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:102) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:43) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >> . >> >>at >> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:51) >>at >> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:38) >>at >> org.apache.cassandra.db.DataTracker$View.buildIntervalTree(DataTracker.java:522) >>at >> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:547) >>at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:268) >>at >> org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) >>at >> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:216) >>at >> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:315) >>at >> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:285) >>at org.apache.cassandra.db.Table.initCf(Table.java:372) >>at org.apache.cassandra.db.Table.(Table.java:320) >>at org.apache.cassandra.db.Table.open(Table.java:121) >>at org.apache.cassandra.db.Table.open(Table.java:104) >>at >> org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:215) >>at >> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:150) >>at >> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:337) >>at >> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) >> Exception encountered during startup: null >> >> Full log file is attached >> >> My server is Linux 2.6.18 >> >> Best, >> >> Zhong Li >> >>
Import JSON sstable data
Hi, I try to upload sstable data on cassandra 0.8.4 cluster with json2sstable tool. Each time I have to restart the node with new file imported and do repair for the column family, otherwise new data will not show. Any thoughts? Thanks, Zhong Li
Migrate from 0.6.5 to 0.7.2
Hi all, We want migrate from version 0.6.5 to version 0.7.2. Is there step by step guide or document we can follow? Also there is new branch cassandra-0.7.2 on svn, what is purpose to create the new branch instead of one branch cassandra-0.7? Will you maintain both branches? Thanks, Zhong Li
Re: Cassandra performance
This is my personal experiences. MySQL is faster than Cassandra on most normal use cases. You should understand why you choose Cassandra instead of MySQL. If one central MySQL can handle your workload, MySQL is better than Cassandra. BUT if you are overload one MySQL and want multiple boxes, Cassandra can be a solution for cheap, Cassandra provides fault tolerant, decentralized, durable and rich data model. It will not provide your high performance, especially reading performance is poor. Digg failed to use Cassandra. You can check http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/ This doesn't mean Cassandra is bad. You need design carefully to use Cassandra for your application and business model for success. On Sep 15, 2010, at 12:06 PM, Wayne wrote: If MySQL is faster then use it. I struggled to do side by side comparisons with Mysql for months until finally realizing they are too different to do side by side comparisons. Mysql is always faster out of the gate when you come at the problem thinking in terms of relational databases. Add in replication factor, using wider rows, dealing with databases that are 2-3 terabytes, tables with 3+ billions rows, etc. etc. The nosql "noise" out there should be ignored, and a solution like cassandra should be evaluated for what it brings to the table in terms of a technology that can solve the problems of big data and not how it does individual queries relative to mysql. If a "normal" database works for you use it!! We have tested real loads using a 6 node cluster and consistently get 5ms reads under load. That is 200 reads/second (1 thread). Mysql is 10x faster, but then we also have wide rows and in that 5ms get 6 months of lots of different time series data which in the end means it is 10x faster than Mysql (1 thread). By embracing wide rows we turn slower into faster. Add in multiple threads/processes and the ability for a 20 node cluster to support concurrent reads and Mysql falls back in the dust. Also we don't have 300gb compressed backup files, we can easily add new nodes and grow, we can actually add columns dynamically without the dreaded ddl deadlock nightmare in mysql, and for once we have replication that just works. On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev wrote: Kamil Gorlo gmail.com> writes: > > So I've got more reads from single MySQL with 400GB of data than from > 8 machines storing about 266GB. This doesn't look good. What am I > doing wrong? :) The worst case for cassandra is random reads. You should ask youself a question, do you really have this kind of workload in production ? If you really do, that means cassandra is not the right tool for the job. Some product based on berkeley db should work better, e.g. voldemort. Just plain old filesystem is also good for 100% random reads (if you dont need to backup of course).
Re: data deleted came back after 9 days.
This was a mistake, there was one node set ReplicationFactor as 3. So Node3 has data. On Aug 23, 2010, at 10:21 AM, Zhong Li wrote: 1) I am using RackUnwarePartioner. 2) All nodes were rebuilt since we installed the system, we didn't do cleanup although. But Node1's data on Node3 are new data. I checked Cassandra source code, and can't figure out yet. Here may be the case. A NodeW write data on Node1, the FailureDetector may mark Node1 is live, but the writing may fail. What will Cassandra do next after a failed writing? Because the Consistency Level is One, will NodeW write data on Node2? If it will, will Node2 replace data on Node3? Thanks, Zhong Li On Aug 23, 2010, at 12:03 AM, Jonathan Ellis wrote: possibilities include 1) you're using something other than rackunwarepartitioner, which is the only one that behaves the way you describe 2) you've moved nodes around w/o running cleanup afterwards On Sun, Aug 22, 2010 at 10:09 PM, Zhong Li wrote: Today, I checked all nodes data and logs, there are very few nodes reported connections up/down. I found some data on each nodes which I don't understand. The ReplicationFactor is 2, write Consistency Level is one. Example, the ring like Node1(Token1)->Node2(Token2)->Node3(Token3)->... Node1 has token1, suppose all data with key as token1 should be on Node1 and Node2, but why I can find some Node1/Node2 data on Node3 also? I dumped the data on Node3 to my local, red them and found some Node1/Node2's data on the Node3 and those data should be deleted. Why Node3 has Node1/Node2's data? Thanks. On Aug 18, 2010, at 10:44 AM, Jonathan Ellis wrote: HH would handle it if it were a FD false positive, but if a node actually does go down then it can miss writes before HH kicks in. On Wed, Aug 18, 2010 at 9:30 AM, Raj N wrote: Guys, Correct me if I am wrong. The whole problem is because a node missed an update when it was down. Shouldn’t HintedHandoff take care of this case? Thanks -Raj -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, August 18, 2010 9:22 AM To: user@cassandra.apache.org Subject: Re: data deleted came back after 9 days. Actually, tombstones are read repaired too -- as long as they are not expired. But nodetool repair is much less error-prone than relying on RR and your memory of what deletes you issued. Either way, you'd need to increase GCGraceSeconds first to make the tombstones un-expired first. On Wed, Aug 18, 2010 at 12:43 AM, Benjamin Black wrote: On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li wrote: Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? No. Read repair does not replay operations. You must run nodetool repair. b -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: data deleted came back after 9 days.
1) I am using RackUnwarePartioner. 2) All nodes were rebuilt since we installed the system, we didn't do cleanup although. But Node1's data on Node3 are new data. I checked Cassandra source code, and can't figure out yet. Here may be the case. A NodeW write data on Node1, the FailureDetector may mark Node1 is live, but the writing may fail. What will Cassandra do next after a failed writing? Because the Consistency Level is One, will NodeW write data on Node2? If it will, will Node2 replace data on Node3? Thanks, Zhong Li On Aug 23, 2010, at 12:03 AM, Jonathan Ellis wrote: possibilities include 1) you're using something other than rackunwarepartitioner, which is the only one that behaves the way you describe 2) you've moved nodes around w/o running cleanup afterwards On Sun, Aug 22, 2010 at 10:09 PM, Zhong Li wrote: Today, I checked all nodes data and logs, there are very few nodes reported connections up/down. I found some data on each nodes which I don't understand. The ReplicationFactor is 2, write Consistency Level is one. Example, the ring like Node1(Token1)->Node2(Token2)->Node3(Token3)->... Node1 has token1, suppose all data with key as token1 should be on Node1 and Node2, but why I can find some Node1/Node2 data on Node3 also? I dumped the data on Node3 to my local, red them and found some Node1/Node2's data on the Node3 and those data should be deleted. Why Node3 has Node1/Node2's data? Thanks. On Aug 18, 2010, at 10:44 AM, Jonathan Ellis wrote: HH would handle it if it were a FD false positive, but if a node actually does go down then it can miss writes before HH kicks in. On Wed, Aug 18, 2010 at 9:30 AM, Raj N wrote: Guys, Correct me if I am wrong. The whole problem is because a node missed an update when it was down. Shouldn’t HintedHandoff take care of this case? Thanks -Raj -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, August 18, 2010 9:22 AM To: user@cassandra.apache.org Subject: Re: data deleted came back after 9 days. Actually, tombstones are read repaired too -- as long as they are not expired. But nodetool repair is much less error-prone than relying on RR and your memory of what deletes you issued. Either way, you'd need to increase GCGraceSeconds first to make the tombstones un-expired first. On Wed, Aug 18, 2010 at 12:43 AM, Benjamin Black wrote: On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li wrote: Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? No. Read repair does not replay operations. You must run nodetool repair. b -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: data deleted came back after 9 days.
Today, I checked all nodes data and logs, there are very few nodes reported connections up/down. I found some data on each nodes which I don't understand. The ReplicationFactor is 2, write Consistency Level is one. Example, the ring like Node1(Token1)->Node2(Token2)->Node3(Token3)->... Node1 has token1, suppose all data with key as token1 should be on Node1 and Node2, but why I can find some Node1/Node2 data on Node3 also? I dumped the data on Node3 to my local, red them and found some Node1/Node2's data on the Node3 and those data should be deleted. Why Node3 has Node1/Node2's data? Thanks. On Aug 18, 2010, at 10:44 AM, Jonathan Ellis wrote: HH would handle it if it were a FD false positive, but if a node actually does go down then it can miss writes before HH kicks in. On Wed, Aug 18, 2010 at 9:30 AM, Raj N wrote: Guys, Correct me if I am wrong. The whole problem is because a node missed an update when it was down. Shouldn’t HintedHandoff take care of this case? Thanks -Raj -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, August 18, 2010 9:22 AM To: user@cassandra.apache.org Subject: Re: data deleted came back after 9 days. Actually, tombstones are read repaired too -- as long as they are not expired. But nodetool repair is much less error-prone than relying on RR and your memory of what deletes you issued. Either way, you'd need to increase GCGraceSeconds first to make the tombstones un-expired first. On Wed, Aug 18, 2010 at 12:43 AM, Benjamin Black wrote: On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li wrote: Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? No. Read repair does not replay operations. You must run nodetool repair. b -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: data deleted came back after 9 days.
Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? On Aug 17, 2010, at 4:11 PM, Ned Wolpert wrote: (gurus, please check my logic here... I'm trying to validate my understanding of this situation.) Isn't the issue that while a server was disconnected, a delete could have occurred, and thus the disconnected server never got the 'tombstone'? (http://wiki.apache.org/cassandra/DistributedDeletes) When it comes back, only after it receives the delete request will the data be deleted from the reconnected server. I do not think this happens automatically when the server rejoins the cluster, but requires the manual repair command. From my understanding, if the consistency level is greater then the number of servers missing that tombstone, you'll get the correct data. If its less, then you 'could' get the right or wrong answer. So the issue is how often do you need to run repair? If you have a ReplicationFactor=3, and you use ConstencyLevel.QUORUM, (2 responses) then you need to run it after one server fails just to be sure. If you can handle some tolerance for this, you can wait a bit more before running the repair. On Tue, Aug 17, 2010 at 12:58 PM, Jeremy Dunck wrote: On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis wrote: > It doesn't have to be disconnected more than GC grace seconds to cause > what you are seeing, it just has to be disconnected at all (thus > missing delete commands). > > Thus you need to be running repair more often than gcgrace, or > confident that read repair will handle it for you (which clearly is > not the case for you :). see > http://wiki.apache.org/cassandra/Operations FWIW, the docs there say: "Remember though that if a node is down longer than your configured GCGraceSeconds (default: 10 days), it could have missed remove operations permanently" So that's probably a source of misunderstanding. -- Virtually, Ned Wolpert "Settle thy studies, Faustus, and begin..." --Marlowe
Re: data deleted came back after 9 days.
864000 It is default 10 days. I checked all system.log, all nodes are connected, although not all the time, but they reconnected after a few minutes. None of node disconnected more than GC grace seconds. Best, On Aug 17, 2010, at 11:53 AM, Peter Schuller wrote: We have 10 nodes cross 5 datacenters. Today I found a strange thing. On one node, few data deleted came back after 8-9 days. The data saved on a node and retrieved/deleted on another node in a remote datacenter. The CF is a super column. What is possible causing this? What is your GC grace seconds set to? Is it lower than 8-9 days, and is it possible one or more nodes were disconnected from the remainder of the cluster for a period longer than the GC grace seconds? See: http://wiki.apache.org/cassandra/DistributedDeletes -- / Peter Schuller
Re: data deleted came back after 9 days.
Cassandra version is 0.6.3 On Aug 17, 2010, at 11:39 AM, Zhong Li wrote: Hi All, We have strange issue here. We have 10 nodes cross 5 datacenters. Today I found a strange thing. On one node, few data deleted came back after 8-9 days. The data saved on a node and retrieved/deleted on another node in a remote datacenter. The CF is a super column. What is possible causing this? Thanks, Zhong Li
data deleted came back after 9 days.
Hi All, We have strange issue here. We have 10 nodes cross 5 datacenters. Today I found a strange thing. On one node, few data deleted came back after 8-9 days. The data saved on a node and retrieved/deleted on another node in a remote datacenter. The CF is a super column. What is possible causing this? Thanks, Zhong Li
Re: How to migrate any relational database to Cassandra
Yes, I use OrderPreservngPartitioner, the token considers datacenter+ip +function+timestamp+recordId+... On Aug 7, 2010, at 10:36 PM, Jonathan Ellis wrote: are you using OrderPreservingPartitioner then? On Sat, Aug 7, 2010 at 10:32 PM, Zhong Li wrote: Here is just my personal experiences. I recently use Cassandra to implement a system cross 5 datacenters. Because it is impossible to do it in SQL Database at low cost, Cassandra helps. Cassandra is all about indexing, there is no relationship naturally, you have to use indexing to keep all relationships. This is fine, because you can add new index when you want. The big pain is the token. Only one token you can choose for a node, all system have to adopt same rule to create index. It is huge huge pain. If Cassandra can implement token at CF level, it is much nature and easy for us to implement a system. Best, Zhong On Aug 6, 2010, at 9:23 PM, Peter Harrison wrote: On Sat, Aug 7, 2010 at 6:00 AM, sonia gehlot wrote: Can you please help me how to move forward? How should I do all the setup for this? My view is that Cassandra is fundamentally different from SQL databases. There may be artefact's which are superficially similar between the two systems, but I guess I'm thinking of a move to Cassandra like my move from dBase to Delphi; in other words there were concepts which modified how you write applications. Now, you can do something similar to a SQL database, but I don't think you would be leveraging the features of Cassandra. That said, I think there will be a new generation of abstraction tools that will make modeling easier. A perhaps more practical answer: there is no one to one mapping between SQL and Cassandra. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: How to migrate any relational database to Cassandra
Here is just my personal experiences. I recently use Cassandra to implement a system cross 5 datacenters. Because it is impossible to do it in SQL Database at low cost, Cassandra helps. Cassandra is all about indexing, there is no relationship naturally, you have to use indexing to keep all relationships. This is fine, because you can add new index when you want. The big pain is the token. Only one token you can choose for a node, all system have to adopt same rule to create index. It is huge huge pain. If Cassandra can implement token at CF level, it is much nature and easy for us to implement a system. Best, Zhong On Aug 6, 2010, at 9:23 PM, Peter Harrison wrote: On Sat, Aug 7, 2010 at 6:00 AM, sonia gehlot wrote: Can you please help me how to move forward? How should I do all the setup for this? My view is that Cassandra is fundamentally different from SQL databases. There may be artefact's which are superficially similar between the two systems, but I guess I'm thinking of a move to Cassandra like my move from dBase to Delphi; in other words there were concepts which modified how you write applications. Now, you can do something similar to a SQL database, but I don't think you would be leveraging the features of Cassandra. That said, I think there will be a new generation of abstraction tools that will make modeling easier. A perhaps more practical answer: there is no one to one mapping between SQL and Cassandra.
Re: set ReplicationFactor and Token at Column Family/SuperColumn level.
If I create 3-4 keyspaces, will this impact performance and resources (esp. memory and disk I/O) too much? Thanks, Zhong On Aug 5, 2010, at 4:52 PM, Benjamin Black wrote: On Thu, Aug 5, 2010 at 12:59 PM, Zhong Li wrote: The big thing bother me is initial ring token. We have some Column Families. It is very hard to choose one token suitable for all CFs. Also some Column Families need higher Consistent Level and some don't. If we set Consistency Level is set by clients, per request. If you require different _Replication Factors_ for different CFs, then just put them in different keyspaces. Additional keyspaces have very little overhead (unlike CFs). ReplicationFactor too high, it is too costy for crossing datacenter, especially in otherside the world. I know we can setup multiple rings, but it costs more hardware. if Cassandra can implement Ring,Token and RF on the CF level, or even SuperColumn level, it will make design much easier and more efficiency. Is it possible? The approach I described above is what you can do. The rest of what you asked is not happening. b
set ReplicationFactor and Token at Column Family/SuperColumn level.
All, Thanks for Apache Cassandra Project, it is great project. This is my first time to use it. We install it on 10 nodes and runs great. The 10 nodes cross all 5 datacenters around the world. The big thing bother me is initial ring token. We have some Column Families. It is very hard to choose one token suitable for all CFs. Also some Column Families need higher Consistent Level and some don't. If we set ReplicationFactor too high, it is too costy for crossing datacenter, especially in otherside the world. I know we can setup multiple rings, but it costs more hardware. if Cassandra can implement Ring,Token and RF on the CF level, or even SuperColumn level, it will make design much easier and more efficiency. Is it possible? Thanks, Zhong