Re: Astyanax error
I traced this to the misnomer of Integer datatype in Cassandra. IntegerType in Cassandra is infact a variable length BigInt. Changing it to Int32Type solved the issue. https://github.com/Netflix/astyanax/issues/59 On Mon, Sep 17, 2012 at 10:51 AM, A J wrote: > Hello, > > I am tyring to retrieve a list of Column Names (that are defined as > Integer) from a CF with RowKey as Integer as well. (I don't care for > the column values that are just nulls) > > Following is snippet of my Astyanax code. I am getting 0 columns but I > know the key that I am querying contains a few hundred columns. Any > idea what part of the code below is incorrect ? > > Thanks. > > Astyanax code: > > ColumnFamily CF1 = > new ColumnFamily( > "CF1", // Column Family Name > IntegerSerializer.get(), // Key Serializer > IntegerSerializer.get()); // Column Serializer > > //Reading data > int NUM_EVENTS = 9; > > StopWatch clock = new StopWatch(); > clock.start(); > for (int i = 0; i < NUM_EVENTS; ++i) { > ColumnList result = keyspace.prepareQuery(CF1) > .getKey(1919) > .execute().getResult(); > System.out.println( "results are: " + result.size() ); > } > clock.stop(); > > > > CF definition: > === > [default@ks1] describe CF1; > ColumnFamily: CF1 > Key Validation Class: org.apache.cassandra.db.marshal.IntegerType > Default column value validator: > org.apache.cassandra.db.marshal.BytesType > Columns sorted by: org.apache.cassandra.db.marshal.IntegerType
Astyanax error
Hello, I am tyring to retrieve a list of Column Names (that are defined as Integer) from a CF with RowKey as Integer as well. (I don't care for the column values that are just nulls) Following is snippet of my Astyanax code. I am getting 0 columns but I know the key that I am querying contains a few hundred columns. Any idea what part of the code below is incorrect ? Thanks. Astyanax code: ColumnFamily CF1 = new ColumnFamily( "CF1", // Column Family Name IntegerSerializer.get(), // Key Serializer IntegerSerializer.get()); // Column Serializer //Reading data int NUM_EVENTS = 9; StopWatch clock = new StopWatch(); clock.start(); for (int i = 0; i < NUM_EVENTS; ++i) { ColumnList result = keyspace.prepareQuery(CF1) .getKey(1919) .execute().getResult(); System.out.println( "results are: " + result.size() ); } clock.stop(); CF definition: === [default@ks1] describe CF1; ColumnFamily: CF1 Key Validation Class: org.apache.cassandra.db.marshal.IntegerType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.IntegerType
Astyanax - build
Hi, I am new to java and trying to get the Astyanax client running for Cassandra. Downloaded astyanax from https://github.com/Netflix/astyanax. How do I compile the source code from here it in a very simple fashion from linux command line ? Thanks.
Advantage of pre-defining column metadata
For static column family what is the advantage in pre-defining column metadata ? I can see ease of understanding type of values that the CF contains and that clients will reject incompatible insertion. But are there any major advantages in terms of performance or something else that makes it beneficial to define the metadata upfront ? Thanks.
Re: nodetool , localhost connection refused
Yes, the telnet does not work. Don't know what it was but switching to 1.1.4 solved the issue. On Mon, Aug 20, 2012 at 6:17 PM, Hiller, Dean wrote: > My guess is "telnet localhost 7199" also fails? And if you are on linux > and run netstat -anp, you will see no one is listening on that port? > > So database node did not start and bind to that port and you would see > exception in the logs of that database nodeŠ.just a guess. > > Dean > > On 8/20/12 4:10 PM, "A J" wrote: > >>I am running 1.1.3 >>Nodetool on the database node (just a single node db) is giving the error: >>Failed to connect to 'localhost:7199': Connection refused >> >>Any idea what could be causing this ? >> >>Thanks. >
nodetool , localhost connection refused
I am running 1.1.3 Nodetool on the database node (just a single node db) is giving the error: Failed to connect to 'localhost:7199': Connection refused Any idea what could be causing this ? Thanks.
'WHERE' with several indexed columns
Hi If I have a WHERE clause in CQL with several 'AND' and each column is indexed, which index(es) is(are) used ? Just the first field in the where clause or all the indexes involved in the clause ? Also is index used only with an equality operator or also with greater than /less than comparator as well ? Thanks.
Custom Partitioner Type
Is it possible to use a custom Partitioner type (other than RP or BOP) ? Say if my rowkeys are all Integers and I want all even keys to go to node1 and odd keys to node2, is it feasible ? How would I go about ? Thanks.
Physical storage of rowkey
Are row key hashed before being physically stored in Cassandra ? If so, what hash function is used to ensure collision is minimal. Thanks.
Re: solr query for string match in CQL
Never mind. Double quotes within the single quotes worked: select * from solr where solr_query='body:"sixty eight million nine hundred forty three thousand four hundred twenty four"'; On Thu, Apr 12, 2012 at 11:42 AM, A J wrote: > What is the syntax for a string match in CQL for solr_query ? > > cqlsh:wiki> select * from solr where solr_query='body:sixty eight > million nine hundred forty three thousand four hundred twenty four'; > Request did not complete within rpc_timeout. > > url encoding just returns without retrieving the row present: > cqlsh:wiki> select count(*) from solr where > solr_query='body:%22sixty%20eight%20million%20nine%20hundred%20forty%20three%20thousand%20four%20hundred%20twenty%20four%22' > ; > count > --- > 0 > > I have exactly one row matching this string that I can retrieve > through direct solr query. > > > Thanks.
Re: Order rows numerically
Yes, that is good enough for now. Thanks. On Fri, Mar 16, 2012 at 6:49 PM, Watanabe Maki wrote: > How about to fill zeros before smaller digits? > Ex. 0001, 0002, etc > > maki > > > On 2012/03/17, at 6:29, A J wrote: > >> If I define my rowkeys to be Integer >> (key_validation_class=IntegerType) , how can I order the rows >> numerically ? >> ByteOrderedPartitioner orders lexically and retrieval using get_range >> does not seem to make sense in order. >> >> If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type), >> BOP still does not give numerical enough. >> For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering). >> >> Any workaround for this ? >> >> Thanks.
Re: Max # of CFs
I have increased index_interval. Will let you know if I see a difference. My theory is that memtables are not getting flushed. If I manually flush them, the heap consumption goes down drastically. I think when memtable_total_space_in_mb is exceeded not enough memtables are getting flushed. There are 5000 memtables (one for each CF) but each memtable in itself is small. So flushing of one or two memtable by Cassandra is not helping. Question: How many memtables are flushed when memtable_total_space_in_mb is exceeded ? Any way to flush all memtables when the threshold is reached ? Thanks. On Wed, Mar 21, 2012 at 8:56 AM, Vitalii Tymchyshyn wrote: > Hello. > > There is also a primary row index. It's space can be controlled with > index_interval setting. Don't know if you can look for it's memory usage > somewhere. If I where you, I'd take jmap tool and examine heap histogram > first, heap dump second. > > Best regards, Vitalii Tymchyshyn > > 20.03.12 18:12, A J написав(ла): > >> I have both row cache and column cache disabled for all my CFs. >> >> cfstats says "Bloom Filter Space Used: 1760" per CF. Assuming it is in >> bytes, it is total of about 9MB of bloom filter size for 5K CFs; which >> is not a lot. >> >> >> On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn >> wrote: >>> >>> Hello. >>> >>> From my experience it's unwise to make many column families for same >>> keys >>> because you will have bloom filters and row indexes multiplied. If you >>> have >>> 5000, you should expect your heap requirements multiplied by same factor. >>> Also check your cache sizes. Default AFAIR is 10 keys per column >>> family. >>> >>> 20.03.12 16:05, A J написав(ла): >>> >>>> ok, the last thread says that 1.0+ onwards, thousands of CFs should >>>> not be a problem. >>>> >>>> But I am finding that all the allocated heap memory is getting consumed. >>>> I started with 8GB heap and then on reading >>>> >>>> >>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management >>>> realized that minimum of 1MB per memtable is used by the per-memtable >>>> arena allocator. >>>> So with 5K CFs, 5GB will be used just by arena allocators. >>>> >>>> But even on increasing the heap to 16GB, am finding that all the heap >>>> is getting consumed. Is there a different formula for heap calculation >>>> when you have thousands of CFs ? >>>> Any other configuration that I need to change ? >>>> >>>> Thanks. >>>> >>>> On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ >>>> wrote: >>>>> >>>>> This subject was already discussed, this may help you : >>>>> >>>>> >>>>> http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results >>>>> >>>>> If you still got questions after reading this thread or some others >>>>> about >>>>> the same topic, do not hesitate asking again, >>>>> >>>>> Alain >>>>> >>>>> >>>>> 2012/3/19 A J >>>>>> >>>>>> How many Column Families are one too many for Cassandra ? >>>>>> I created a db with 5000 CFs (I can go into the reasons later) but the >>>>>> latency seems to be very erratic now. Not sure if it is because of the >>>>>> number of CFs. >>>>>> >>>>>> Thanks. >>>>> >>>>> >
Re: Max # of CFs
I have both row cache and column cache disabled for all my CFs. cfstats says "Bloom Filter Space Used: 1760" per CF. Assuming it is in bytes, it is total of about 9MB of bloom filter size for 5K CFs; which is not a lot. On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn wrote: > Hello. > > From my experience it's unwise to make many column families for same keys > because you will have bloom filters and row indexes multiplied. If you have > 5000, you should expect your heap requirements multiplied by same factor. > Also check your cache sizes. Default AFAIR is 10 keys per column family. > > 20.03.12 16:05, A J написав(ла): > >> ok, the last thread says that 1.0+ onwards, thousands of CFs should >> not be a problem. >> >> But I am finding that all the allocated heap memory is getting consumed. >> I started with 8GB heap and then on reading >> >> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management >> realized that minimum of 1MB per memtable is used by the per-memtable >> arena allocator. >> So with 5K CFs, 5GB will be used just by arena allocators. >> >> But even on increasing the heap to 16GB, am finding that all the heap >> is getting consumed. Is there a different formula for heap calculation >> when you have thousands of CFs ? >> Any other configuration that I need to change ? >> >> Thanks. >> >> On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ >> wrote: >>> >>> This subject was already discussed, this may help you : >>> >>> http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results >>> >>> If you still got questions after reading this thread or some others about >>> the same topic, do not hesitate asking again, >>> >>> Alain >>> >>> >>> 2012/3/19 A J >>>> >>>> How many Column Families are one too many for Cassandra ? >>>> I created a db with 5000 CFs (I can go into the reasons later) but the >>>> latency seems to be very erratic now. Not sure if it is because of the >>>> number of CFs. >>>> >>>> Thanks. >>> >>> >
Re: Max # of CFs
ok, the last thread says that 1.0+ onwards, thousands of CFs should not be a problem. But I am finding that all the allocated heap memory is getting consumed. I started with 8GB heap and then on reading http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management realized that minimum of 1MB per memtable is used by the per-memtable arena allocator. So with 5K CFs, 5GB will be used just by arena allocators. But even on increasing the heap to 16GB, am finding that all the heap is getting consumed. Is there a different formula for heap calculation when you have thousands of CFs ? Any other configuration that I need to change ? Thanks. On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ wrote: > This subject was already discussed, this may help you : > http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results > > If you still got questions after reading this thread or some others about > the same topic, do not hesitate asking again, > > Alain > > > 2012/3/19 A J >> >> How many Column Families are one too many for Cassandra ? >> I created a db with 5000 CFs (I can go into the reasons later) but the >> latency seems to be very erratic now. Not sure if it is because of the >> number of CFs. >> >> Thanks. > >
Max # of CFs
How many Column Families are one too many for Cassandra ? I created a db with 5000 CFs (I can go into the reasons later) but the latency seems to be very erratic now. Not sure if it is because of the number of CFs. Thanks.
Order rows numerically
If I define my rowkeys to be Integer (key_validation_class=IntegerType) , how can I order the rows numerically ? ByteOrderedPartitioner orders lexically and retrieval using get_range does not seem to make sense in order. If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type), BOP still does not give numerical enough. For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering). Any workaround for this ? Thanks.
Re: Does the 'batch' order matter ?
ok . disappointing. You could have got atomicity like behavior most of the time, if it was otherwise. How does one execute a logical write that is spread in several CFs (say in User CF, you have 'state' as a column and userid as rowkey. But in State CF, you have state as rowkey and userid as a column) Given atomicity is not possible, it is ok for a brief period of inconsistency but I cannot afford permanent inconsistency for even a single successful or timed-out write. I cannot ever have a userid in the UserCF that is not in the state CF or vice-versa except for a very small fraction of writes and that too for only a few minutes at max. Writing to the state CF has to be almost always synchronous with write to User CF. I would guess this is general enough use case. How is this accomplished ? Do I write to a third CF, say the 'LOG CF' with PREPARING status as first batch. Then the second batch, which is conditional on 1st batch being successful writes to the main User and State CFs. Then the third batch, which is conditional on 2nd batch being successful updates the PREPARING flag to COMPLETED flag in the LOG CF ? I also run a standalone job every few minutes that takes PREPARING records from the LOG CF older than some interval and apply them to the main CFs and change its status. This approach may not be performant but could not think of anything else. Appreciate any ideas. Thanks On Thu, Mar 15, 2012 at 5:22 AM, aaron morton wrote: > The simple thing to say is: If you send a batch_mutate the order which the > rows are written is undefined. So you should not make any assumptions such > as if rows C is stored, rows A and B also have. > > They may do but AFAIK it is not part of the API contract. > > For the thrift API batch_mutate takes a Map of mutations keyed on the row > key. CQL builds a list of row mutations in the same order as the statement. > > Even if they are in a list there is no guarantee they will be processed in > that order. > > If you get a timed out error all you know is the mutation, as a whole, was > applied of < CL nodes. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 15/03/2012, at 1:22 PM, Tyler Hobbs wrote: > > Ah, my mistake, you are correct. Not sure why I had forgotten that. > > The pycassa docs are slightly wrong there, though. It's technically atomic > for the same key across multiple column families. I'll get that fixed. > > On Wed, Mar 14, 2012 at 5:22 PM, A J wrote: >> >> > No, batch_mutate() is an atomic operation. When a node locally applies >> > a batch mutation, either all of the changes are applied or none of them >> > are.< >> The steps in my batch are not confined to a single CF, nor to a single >> key. >> >> The documentation says: >> datastax: >> Column updates are only considered atomic within a given record (row). >> >> Pycassa.batch: >> This interface does not implement atomic operations across column >> families. All the limitations of the batch_mutate Thrift API call >> applies. Remember, a mutation in Cassandra is always atomic per key >> per column family only. >> >> >> On Wed, Mar 14, 2012 at 4:15 PM, Tyler Hobbs wrote: >> > On Wed, Mar 14, 2012 at 11:50 AM, A J wrote: >> >> >> >> >> >> Are you saying the way 'batch mutate' is coded, the order of writes in >> >> the batch does not mean anything ? You can ask the batch to do A,B,C >> >> and then D in sequence; but sometimes Cassandra can end up applying >> >> just C and A,B (and D) may still not be applied ? >> > >> > >> > No, batch_mutate() is an atomic operation. When a node locally applies >> > a >> > batch mutation, either all of the changes are applied or none of them >> > are. >> > >> > Aaron was referring to the possibility that one of the replicas received >> > the >> > batch_mutate, but the other replicas did not. >> > >> > -- >> > Tyler Hobbs >> > DataStax >> > > > > > > -- > Tyler Hobbs > DataStax > >
Re: Does the 'batch' order matter ?
> No, batch_mutate() is an atomic operation. When a node locally applies a > batch mutation, either all of the changes are applied or none of them are.< The steps in my batch are not confined to a single CF, nor to a single key. The documentation says: datastax: Column updates are only considered atomic within a given record (row). Pycassa.batch: This interface does not implement atomic operations across column families. All the limitations of the batch_mutate Thrift API call applies. Remember, a mutation in Cassandra is always atomic per key per column family only. On Wed, Mar 14, 2012 at 4:15 PM, Tyler Hobbs wrote: > On Wed, Mar 14, 2012 at 11:50 AM, A J wrote: >> >> >> Are you saying the way 'batch mutate' is coded, the order of writes in >> the batch does not mean anything ? You can ask the batch to do A,B,C >> and then D in sequence; but sometimes Cassandra can end up applying >> just C and A,B (and D) may still not be applied ? > > > No, batch_mutate() is an atomic operation. When a node locally applies a > batch mutation, either all of the changes are applied or none of them are. > > Aaron was referring to the possibility that one of the replicas received the > batch_mutate, but the other replicas did not. > > -- > Tyler Hobbs > DataStax >
Re: Does the 'batch' order matter ?
hmmnot sure I understand. Are you saying the way 'batch mutate' is coded, the order of writes in the batch does not mean anything ? You can ask the batch to do A,B,C and then D in sequence; but sometimes Cassandra can end up applying just C and A,B (and D) may still not be applied ? Thanks. On Wed, Mar 14, 2012 at 3:37 AM, aaron morton wrote: > It may, but it would not be guaranteed. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/03/2012, at 8:11 AM, A J wrote: > > I know batch operations are not atomic but does the success of a write > imply all writes preceeding it in the batch were successful ? > > For example, using cql: > BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 864 > INSERT INTO users (KEY, password, name) VALUES ('user2', > 'ch@ngem3b', 'second user') > UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2' > INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c') > DELETE name FROM users WHERE key = 'user2' > INSERT INTO users (KEY, password, name) VALUES ('user4', > 'ch@ngem3c', 'Andrew') > APPLY BATCH; > > Say the batch failed but I see that the third write was present on a > node. Does it imply that the first insert and the second update > definitely made to that node as well ? > > Thanks. > >
Does the 'batch' order matter ?
I know batch operations are not atomic but does the success of a write imply all writes preceeding it in the batch were successful ? For example, using cql: BEGIN BATCH USING CONSISTENCY QUORUM AND TTL 864 INSERT INTO users (KEY, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE KEY = 'user2' INSERT INTO users (KEY, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE key = 'user2' INSERT INTO users (KEY, password, name) VALUES ('user4', 'ch@ngem3c', 'Andrew') APPLY BATCH; Say the batch failed but I see that the third write was present on a node. Does it imply that the first insert and the second update definitely made to that node as well ? Thanks.
Why is row lookup much faster than column lookup
>From my tests, I am seeing that a CF that has less than 100 columns but millions of rows has a much lower latency to read a column in a row than a CF that has only a few thousands of rows but wide rows with each having 20K columns. Example: cf1 has 6 Million rows and each row has about 100 columns. t1 = time.time() cf1.get(1234,column_count=1) t2 = time.time() - t1 print int(t2*1000) takes 3 ms cf2 has 5K rows and each row has about 18K columns. t1 = time.time() cf2.get(1234,column_count=1) t2 = time.time() - t1 print int(t2*1000) takes 82ms Anything in general on the Cassandra architecture that causes row lookup to be much faster than column lookup ? Thanks.
Single column read latency
Hello, In a CF I have with valueless columns and column-name type being integer, I am seeing latency in the order of 80-90ms to retrieve a single column from a row containing 50K columns. It is just a single node db on a single box. Another row with 20K columns in the same CF, still has the latency around 30ms to get to a single column. Example in pycassa, for a row with 50K columns: t1 = time.time() cf1.get(5011,columns=[90006111]) t2 = time.time() - t1 print int(t2*1000),'ms' gives 82 ms Any idea what could be causing the latency to be so high ? That too after ensuring that the row_cache is large enough to contain all the rows and all the rows are pre-fetched. Thanks.
Test Data creation in Cassandra
What is the best way to create millions of test data in Cassandra ? I would like to have some script where I first insert say 100 rows in a CF. Then reinsert the same data on 'server side' with new unique key. That will make it 200 rows. Then continue the exercise a few times till I get lot of records. I don't care if the column names and values are identical between the different rows. Just a lot of records generated for a few seed records. The rows are very fat. So I don't want to use any client side scripting that would push individual or batched rows to cassandra. Thanks for any tips.
Logging 'write' operations
Hello, What is the best way to log write operations (insert,remove, counter add, batch operations) in Cassandra. I need to store the operations (with values being passed) in some fashion or the other for audit purposes (and possibly to undo some operation after inspection). Thanks.
batch mode and flushing
Hello, when you set 'commitlog_sync: batch' on all the nodes in a multi-DC cluster and call writes with CL=ALL, does the operation wait till the write is flushed to all the disks on all the nodes ? Thanks.
Re: Command to display config values
Yes, I can see the yaml files. But I need to confirm through some database query that the change in yaml on node restart was picked up by the database. On Tue, Jan 24, 2012 at 7:07 PM, aaron morton wrote: > Nothing through those API's, can you check the yaml file ? > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 25/01/2012, at 10:10 AM, A J wrote: > > Is there a command in cqlsh or cassandra CLI that can display the > various values of the configuration parameters at use. > I am particularly interested in finding the value of ' commitlog_sync' > that the current session is using ? > > Thanks. > AJ > >
Command to display config values
Is there a command in cqlsh or cassandra CLI that can display the various values of the configuration parameters at use. I am particularly interested in finding the value of ' commitlog_sync' that the current session is using ? Thanks. AJ
Encryption related question
Hello, I am trying to use internode encryption in Cassandra (1.0.6) for the first time. 1. Followed the steps 1 to 5 at http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore Q. In cassandra.yaml , what value goes for keystore ? I exported the certificate per step #3 above in duke.cer. Do I put the location and name of that file for this parameter ? Siminarly, what value goes for truststore ? The steps 1-5 don't indicate any other file to be exported that would possibly go here. Also do I need to follow these steps on each of the node ? Thanks AJ
Restart for change of endpoint_snitch ?
If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch, does it require restart of cassandra on that node ? Thanks.
Re: setStrategy_options syntax in thrift
Thanks, that worked. On Tue, Dec 20, 2011 at 4:08 PM, Dave Brosius wrote: > > KsDef ksDef = new KsDef(); > Map options = new HashMap(); > options.put("replication_factor", "2"); > ksDef.setStrategy_options(options); > > > > *- Original Message -* > *From:* "A J" > *Sent:* Tue, December 20, 2011 16:03 > *Subject:* Re: setStrategy_options syntax in thrift > > I am new to java. Can you specify the exact syntax for replication_factor=2 ? > > Thanks. > > On Tue, Dec 20, 2011 at 1:50 PM, aaron morton > wrote:> It looks like you tried to pass the string "{replication_factor:2}">> > You need to pas a Map type , where the the key is the option> > and the value is the option value.>> Cheers>> -----> Aaron > Morton> Freelance Developer> @aaronmorton> http://www.thelastpickle.com>> On > 20/12/2011, at 12:02 PM, A J wrote:>> What is the syntax of > setStrategy_options in thrift.>> The following fails:>> Util.java:22:> > setStrategy_options(java.util.Map)> in > org.apache.cassandra.thrift.KsDef cannot be applied to> (java.lang.String)> > newKs.setStrategy_options("{replication_factor:2}");>> > >
Re: java thrift error
The following worked: import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.Charset; import java.nio.charset.CharacterCodingException; import java.nio.charset.CharsetDecoder; import java.nio.charset.CharsetEncoder; import java.nio.charset.CodingErrorAction; . public static Charset charset = Charset.forName("UTF-8"); public static CharsetEncoder encoder = charset.newEncoder(); public static CharsetDecoder decoder = charset.newDecoder(); public static ByteBuffer str_to_bb(String msg){ try{ return encoder.encode(CharBuffer.wrap(msg)); }catch(Exception e){e.printStackTrace();} return null; } and then instead of "count_key".getBytes("UTF-8") do str_to_bb("count_key") On Tue, Dec 20, 2011 at 4:03 PM, Dave Brosius wrote: > A ByteBuffer is not a byte[] to convert a String to a ByteBuffer do > something like > > public static ByteBuffer toByteBuffer(String value) throws > UnsupportedEncodingException{return > ByteBuffer.wrap(value.getBytes("UTF-8"));} > > > > see http://wiki.apache.org/cassandra/ThriftExamples > > > *- Original Message -* > *From:* "A J" > *Sent:* Tue, December 20, 2011 15:52 > *Subject:* java thrift error > > The following syntax : > import org.apache.cassandra.thrift.*; > . > . > ColumnOrSuperColumn col = client.get("count_key".getBytes("UTF-8"), > cp, ConsistencyLevel.QUORUM); > > > is giving the error: > get(java.nio.ByteBuffer,org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel) > in org.apache.cassandra.thrift.Cassandra.Client cannot be applied to > (byte[],org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel) > > > Any idea on how to cast? > > Thanks. > >
Re: setStrategy_options syntax in thrift
I am new to java. Can you specify the exact syntax for replication_factor=2 ? Thanks. On Tue, Dec 20, 2011 at 1:50 PM, aaron morton wrote: > It looks like you tried to pass the string "{replication_factor:2}" > > You need to pas a Map type , where the the key is the option > and the value is the option value. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 20/12/2011, at 12:02 PM, A J wrote: > > What is the syntax of setStrategy_options in thrift. > > The following fails: > > Util.java:22: > setStrategy_options(java.util.Map) > in org.apache.cassandra.thrift.KsDef cannot be applied to > (java.lang.String) > newKs.setStrategy_options("{replication_factor:2}"); > >
java thrift error
The following syntax : import org.apache.cassandra.thrift.*; . . ColumnOrSuperColumn col = client.get("count_key".getBytes("UTF-8"), cp, ConsistencyLevel.QUORUM); is giving the error: get(java.nio.ByteBuffer,org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel) in org.apache.cassandra.thrift.Cassandra.Client cannot be applied to (byte[],org.apache.cassandra.thrift.ColumnPath,org.apache.cassandra.thrift.ConsistencyLevel) Any idea on how to cast? Thanks.
setStrategy_options syntax in thrift
What is the syntax of setStrategy_options in thrift. The following fails: Util.java:22: setStrategy_options(java.util.Map) in org.apache.cassandra.thrift.KsDef cannot be applied to (java.lang.String) newKs.setStrategy_options("{replication_factor:2}");
each_quorum in pycassa
What is the syntax for each_quorum in pycassa ? Thanks.
Increase replication factor
If I update a keyspace to increase the replication factor; what happens to existing data for that keyspace ? Does the existing data get automatically increase its replication ? Or only on a RR or node repair does the existing data increase its replication factor ? Thanks.
garbage collecting tombstones
Hello, Is 'garbage collecting tombstones ' a different operation than the JVM GC. Garbage collecting tombstones is controlled by gc_grace_seconds which by default is set to 10 days. But the traditional GC seems to happen much more frequently (when observed through jconsole) ? How can I force the garbage collecting tombstones to happen ad-hoc when I want to ? Thanks.
Re: Continuous export of data out of database
The issue with that is that I wish to have EACH_QUORUM in our other 2 datacenters but not in the third DC. Could not figure a way to accomplish that so exploring have a near-realtime backup copy in the third DC via some streaming process. On Tue, Nov 15, 2011 at 12:12 PM, Robert Jackson wrote: > The thing that I thought if initially would be setting up your cluster in a > multi-datacenter config[1]. In that scenario you could add an additional > machine in a second datacenter with RF=1. We are using a variant of this > setup to separate long running calculations from our interactive systems. > [1] - > http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers > > Robert Jackson >
Continuous export of data out of database
Hello VoltDB has an export feature to stream the data out of the database. http://voltdb.com/company/blog/voltdb-export-connecting-voltdb-other-systems This is different from Cassandra's export feature (http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export) which is more of a different way of snapshotting. My question is : is streaming data out on a continuous basis (as in VoltDB) possible in some fashion in Cassandra ? Thanks Bala
Re: (A or B) AND C AND !D
To clarify, I wish to keep N=4 and W=2 in the following scenario. Thanks. On Sun, Nov 13, 2011 at 11:20 PM, A J wrote: > Hello > Say I have 4 nodes: A, B, C and D and wish to have consistency level > for writes defined in such as way that writes meet the following > consistency level: > (A or B) AND C AND !D, > i.e. either of A or B will suffice and C to be included into > consistency level as well. But the write should not wait for D. > > Is such a configuration possible ? > I tried various combinations of EACH_QUORUM and LOCAL_QUORUM and > clubbing the nodes in different DCs but could not really come up with > a solution. Maybe I am missing something. > > Thanks. >
(A or B) AND C AND !D
Hello Say I have 4 nodes: A, B, C and D and wish to have consistency level for writes defined in such as way that writes meet the following consistency level: (A or B) AND C AND !D, i.e. either of A or B will suffice and C to be included into consistency level as well. But the write should not wait for D. Is such a configuration possible ? I tried various combinations of EACH_QUORUM and LOCAL_QUORUM and clubbing the nodes in different DCs but could not really come up with a solution. Maybe I am missing something. Thanks.
OOM
Hi, For a single node of cassandra(1.0 version) having 15G of data+index, 48GB RAM, 8GB heap and about 2.6G memtable threshold, I am getting OOM when I have 1000 concurrent inserts happening at the same time. I have kept concurrent_writes: 128 in cassandra.yaml as there are a total of 16 cores (suggestion is to keep 8 * number_of_cores). Can someone give pointers on what needs to be tuned. Thanks, AJ. ERROR 00:10:00,312 Fatal exception in thread Thread[Thread-3,5,main] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:614) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:105) at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:213)
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Just confirming. Thanks for the clarification. On Tue, Jul 12, 2011 at 10:53 AM, Peter Schuller wrote: >> From "Cassandra the definitive guide" - Basic Maintenance - Repair >> "Running nodetool repair causes Cassandra to execute a major compaction. >> During a major compaction (see “Compaction” in the Glossary), the >> server initiates a >> TreeRequest/TreeReponse conversation to exchange Merkle trees with >> neighboring >> nodes." >> >> So is this text from the book misleading ? > > It's just being a bit less specific (I suppose maybe misleading can be > claimed). If you repair everything on a node, that will imply a > validating compaction (i.e., do the read part of the compaction stage > but don't merge to and write new sstables) which is expensive for the > usual reasons with disk I/O; it's "major" since it covers all data. > The data read is in fact used to calculate a merkle tree for > comparison with neighbors, as claimed. > > -- > / Peter Schuller >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
>From "Cassandra the definitive guide" - Basic Maintenance - Repair "Running nodetool repair causes Cassandra to execute a major compaction. During a major compaction (see “Compaction” in the Glossary), the server initiates a TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring nodes." So is this text from the book misleading ? On Fri, Jul 8, 2011 at 10:36 AM, Jonathan Ellis wrote: > that's an internal term meaning "background i/o," not sstable merging per se. > > On Fri, Jul 8, 2011 at 9:24 AM, A J wrote: >> I think node repair involves some compaction too. See the issue: >> https://issues.apache.org/jira/browse/CASSANDRA-2811 >> It talks of 'validation compaction' being triggered concurrently >> during node repair. >> >> On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki >> wrote: >>> Repair doesn't compact. Those are different processes already. >>> >>> maki >>> >>> >>> On 2011/07/01, at 7:21, A J wrote: >>> >>>> Thanks all ! >>>> In other words, I think it is safe to say that a node as a whole can >>>> be made consistent only on 'nodetool repair'. >>>> >>>> Has there been enough interest in providing anti-entropy without >>>> compaction as a separate operation (nodetool repair does both) ? >>>> >>>> >>>> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: >>>>> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo >>>>> wrote: >>>>>> Read repair does NOT repair tombstones. >>>>> >>>>> It does, but you can't rely on RR to repair _all_ tombstones, because >>>>> RR only happens if the row in question is requested by a client. >>>>> >>>>> -- >>>>> Jonathan Ellis >>>>> Project Chair, Apache Cassandra >>>>> co-founder of DataStax, the source for professional Cassandra support >>>>> http://www.datastax.com >>>>> >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Node repair questions
Hello, Have the following questions related to nodetool repair: 1. I know that Nodetool Repair Interval has to be less than GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds and 'Nodetool Repair Interval'. What factors would want me to change the default of 10 days of GCGraceSeconds. Similarly what factors would want me to keep Nodetool Repair Interval to be just slightly less than GCGraceSeconds (say a day less). 2. Does a Nodetool Repair block any reads and writes on the node, while the repair is going on ? During repair, if I try to do an insert, will the insert wait for repair to complete first ? 3. I read that repair can impact your workload as it causes additional disk and cpu activity. But any details of the impact mechanism and any ballpark on how much the read/write performance deteriorates ? Thanks.
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Never mind. I see the issue with this. I will be able to catch the writes as failed only if I set CL=ALL. For other CLs, I may not know that it failed on some node. On Mon, Jul 11, 2011 at 2:33 PM, A J wrote: > Instead of doing nodetool repair, is it not a cheaper operation to > keep tab of failed writes (be it deletes or inserts or updates) and > read these failed writes at a set frequency in some batch job ? By > reading them, RR would get triggered and they would get to a > consistent state. > > Because these would targeted reads (only for those that failed during > writes), it should be a shorter list and quick to repair (than > nodetool repair). > > > On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: >> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo >> wrote: >>> Read repair does NOT repair tombstones. >> >> It does, but you can't rely on RR to repair _all_ tombstones, because >> RR only happens if the row in question is requested by a client. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Instead of doing nodetool repair, is it not a cheaper operation to keep tab of failed writes (be it deletes or inserts or updates) and read these failed writes at a set frequency in some batch job ? By reading them, RR would get triggered and they would get to a consistent state. Because these would targeted reads (only for those that failed during writes), it should be a shorter list and quick to repair (than nodetool repair). On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: > On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo > wrote: >> Read repair does NOT repair tombstones. > > It does, but you can't rely on RR to repair _all_ tombstones, because > RR only happens if the row in question is requested by a client. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
I think node repair involves some compaction too. See the issue: https://issues.apache.org/jira/browse/CASSANDRA-2811 It talks of 'validation compaction' being triggered concurrently during node repair. On Thu, Jun 30, 2011 at 8:51 PM, Watanabe Maki wrote: > Repair doesn't compact. Those are different processes already. > > maki > > > On 2011/07/01, at 7:21, A J wrote: > >> Thanks all ! >> In other words, I think it is safe to say that a node as a whole can >> be made consistent only on 'nodetool repair'. >> >> Has there been enough interest in providing anti-entropy without >> compaction as a separate operation (nodetool repair does both) ? >> >> >> On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: >>> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo >>> wrote: >>>> Read repair does NOT repair tombstones. >>> >>> It does, but you can't rely on RR to repair _all_ tombstones, because >>> RR only happens if the row in question is requested by a client. >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >
'select * from ' - FTS or Index
Does a 'select * from ' with no filter still use the primary index on the key or do a 'full table scan' ? Thanks.
What does a write lock ?
Does a write lock: 1. Just the columns in question for the specific row in question ? 2. The full row in question ? 3. The full CF ? I doubt read does any locks. Thanks.
List nodes where write was applied to
Is there a way to find what all nodes was a write applied to ? It could be a successful write (i.e. w was met) or unsuccessful write (i.e. less than w nodes were met). In either case, I am interested in finding: Number of nodes written to (before timeout or on success) Name of nodes written to (before timeout or on success) Thanks.
When is 'Cassandra High Performance Cookbook' expected to be available ?
https://www.packtpub.com/cassandra-apache-high-performance-cookbook/book
Re: deduct token values for BOP
Thanks. The above works. But when I try to use the binary values rather than the hex values, it does not work. i.e. instead of using 64ff, I use 01100100. Instead of 6Dff, I use 01101101. When using the binary values, everything (strings starting with a to z) seem to be going to n1 only. Any idea why ? On Wed, Jul 6, 2011 at 11:18 AM, Richard Low wrote: > On Wed, Jul 6, 2011 at 3:06 PM, A J wrote: >> I wish to use the order preserving byte-ordered partitioner. How do I >> figure the initial token values based on the text key value. >> Say I wish to put all keys starting from a to d on N1. e to m on N2 >> and n to z on N3. What would be the initial_token values on each of >> the 3 nodes to accomplish this ? > > If all keys use characters a-z then the following will work: > > N1: 64ff > N2: 6dff > N3: 7aff > > (64, 6d and 7a are hex for ascii codes of d, m, z). Here the key (in > hex) 64 will go to N2 even though it starts with d. But every > string that starts with a-d with only characters a-z afterwards will > go to N1. > > Richard. > > -- > Richard Low > Acunu | http://www.acunu.com | @acunu >
deduct token values for BOP
I wish to use the order preserving byte-ordered partitioner. How do I figure the initial token values based on the text key value. Say I wish to put all keys starting from a to d on N1. e to m on N2 and n to z on N3. What would be the initial_token values on each of the 3 nodes to accomplish this ? Thanks.
Details of 'nodetool move'
Hello, Where can I find details of nodetool move. Most places just mention that 'move the target node to a given Token. Moving is essentially a convenience over decommission + bootstrap.' Stuff like, when do I need to do and on what nodes? What is the value of 'new token' to be provided ? What happens if there is a mis-match between 'new token' in nodetool move command and initial_token in cassandra.yaml file. What happens when nodetool move is not successful. Does Cassandra know where to look for data (some data might still be on the old node and some on new) ? Repercussions of not running nodetool move or running it incorrectly ? Does a Read Repair take care of move for that specific key in question ? Does anti-entropy somehow take care of move ? Thanks.
Re: How to make a node an exact replica of another node ?
Perfect ! Thanks. On Tue, Jul 5, 2011 at 1:51 PM, Eric tamme wrote: > AJ, > > You can use offset mirror tokens to achieve this. Pick your initial > tokens for DC1N1 and DC1N2 as if they were the only nodes in your > cluster. Now increment each by 1 and use them as the tokens for DC2N1 > and DC2N2. This will give you a complete keyspace within each data > center with even distribution between nodes. > > > If you want a more detailed description, there is a recipe for this > titled "Calculating Ideal Initial Tokens for use with Network Topology > Strategy and Random Partitioner" in the last part of the sample > chapter of "Cassandra High Performance" book > http://www.packtpub.com/cassandra-apache-high-performance-cookbook/book > > -Eric >
How to make a node an exact replica of another node ?
Hello, Let me explain what I am trying to do: I am prototyping 2 Data centers (DC1 and DC2) with two nodes each. Say DC1_n1 and DC1_n2 nodes in DC1 and DC2_n1 and DC2_n2 in DC2. With PropertyFileSnitch and NetworkTopologyStrategy and 'strategy_options of DC1=1 and DC2=1', I am able to ensure that each DC has ONE AND ONLY ONE copy of ALL the keys. But how do I ensure that there is one to one mapping between the nodes? i.e. how to I ensure keys in DC1_n1 = keys in DC2_n1 and keys in DC1_n2 = keys in DC2_n2 ? Right now some keys of DC1_n1 are getting replicated to DC2_n1 and some to DC2_n2 (and the same for DC1_n2 keys). I am using RPP and calculating tokens to ensure that each node has 25% of the load (using the python script at http://www.datastax.com/docs/0.8/operations/clustering to come up with the tokens) Thanks for any inputs.
Re: cql error
Thanks. That worked. On Tue, Jul 5, 2011 at 11:35 AM, Jonathan Ellis wrote: > replace the s_o line with > > and strategy_options:DC1=1 and strategy_options:DC2=2 > > On Tue, Jul 5, 2011 at 10:09 AM, A J wrote: >> cqlsh> CREATE KEYSPACE twissandra with >> ... strategy_class = >> ... 'org.apache.cassandra.locator.NetworkTopologyStrategy' >> ... and strategy_options=[{DC1:1, DC2:1}]; >> Bad Request: line 4:37 no viable alternative at character ']' >> >> >> What is wrong with the above syntax ? >> >> Thanks. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
cql error
cqlsh> CREATE KEYSPACE twissandra with ... strategy_class = ... 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options=[{DC1:1, DC2:1}]; Bad Request: line 4:37 no viable alternative at character ']' What is wrong with the above syntax ? Thanks.
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Thanks all ! In other words, I think it is safe to say that a node as a whole can be made consistent only on 'nodetool repair'. Has there been enough interest in providing anti-entropy without compaction as a separate operation (nodetool repair does both) ? On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: > On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo > wrote: >> Read repair does NOT repair tombstones. > > It does, but you can't rely on RR to repair _all_ tombstones, because > RR only happens if the row in question is requested by a client. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Meaning of 'nodetool repair has to run within GCGraceSeconds'
I am little confused of the reason why nodetool repair has to run within GCGraceSeconds. The documentation at: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair is not very clear to me. How can a delete be 'unforgotten' if I don't run nodetool repair? (I understand that if a node is down for more than GCGraceSeconds, I should not get it up without resynching is completely. Otherwise deletes may reappear.http://wiki.apache.org/cassandra/DistributedDeletes ) But not sure how exactly nodetool repair ties into this mechanism of distributed deletes. Thanks for any clarifications.
api to extract gossiper results
Cassandra uses accrual failure detector to interpret the gossips. Is it somehow possible to extract these (gossip values and results of the failure detector) in an external system ? Thanks
Chunking if size > 64MB
>From what I read, Cassandra allows a single column value to be up-to 2GB but would chunk the data if greater than 64MB. Is the chunking transparent to the application or does the app need to know if/how/when the chunking happened for a specific column value that happened to be > 64MB. Thank you.
Data storage security
Are there any options to encrypt the column families when they are stored in the database. Say in a given keyspace some CF has sensitive info and I don't want a 'select *' of that CF to layout the data in plain text. Thanks.
Re: Clock skew
Thanks. On Tue, Jun 28, 2011 at 1:31 PM, Dominic Williams wrote: > Hi, yes you are correct, and this is a potential problem. > IMPORTANT: If you need to serialize writes from your application servers, > for example using distributed locking, then before releasing locks you must > sleep for a period equal to the maximum variance between the clocks on your > application server nodes. > I had a problem with the clocks on my nodes which led to all kinds of > problems. There is a slightly out of date post, which may not mentioned the > above point, on my experiences > here http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/ > Hope this helps > Dominic > On 27 June 2011 23:03, A J wrote: >> >> During writes, the timestamp field in the column is the system-time of >> that node (correct me if that is not the case and the system-time of >> the co-ordinator is what gets applied to all the replicas). >> During reads, the latest write wins. >> >> What if there is a clock skew ? It could lead to a stale write >> over-riding the actual latest write, just because the clock of that >> node is ahead of the other node. Right ? > >
Clock skew
During writes, the timestamp field in the column is the system-time of that node (correct me if that is not the case and the system-time of the co-ordinator is what gets applied to all the replicas). During reads, the latest write wins. What if there is a clock skew ? It could lead to a stale write over-riding the actual latest write, just because the clock of that node is ahead of the other node. Right ?
Auto compaction to be staggered ?
Is there an enhancement on the roadmap to stagger the auto compactions on different nodes, to avoid more than one node compacting at any given time (or as few nodes as possible to compact at any given time). If not, any workarounds ? Thanks.
Re: Force a node to form part of quorum
It would be great if Cassandra puts this on their roadmap. There is lot of durability benefits by incorporating dc awareness into the write consistency equation. MongoDB has this feature in their upcoming release: http://www.mongodb.org/display/DOCS/Data+Center+Awareness#DataCenterAwareness-Tagging%28version1.9.1%29 On Thu, Jun 16, 2011 at 6:57 AM, aaron morton wrote: > Short answer: No. > Medium answer: No all nodes are equal. It could create a single point of > failure if a QUOURM could not be formed without a specific node. > Writes are sent to every replica. Reads with Read Repair enabled are also > sent to every replica. For reads the "closest" UP node as determined by the > snitch and possibly re-ordered by the Dynamic Snitch is asked to return the > actual data. This replica must respond for the request to complete. > If it's a question about maximising cache hits > see https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L308 > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > On 16 Jun 2011, at 05:58, A J wrote: > > Is there a way to favor a node to always participate (or never > participate) towards fulfillment of read consistency as well as write > consistency ? > > Thanks > AJ > >
Force a node to form part of quorum
Is there a way to favor a node to always participate (or never participate) towards fulfillment of read consistency as well as write consistency ? Thanks AJ
Backup full cluster
Snapshot runs on a local node. How do I ensure I have a 'point in time' snapshot of the full cluster ? Do I have to stop the writes on the full cluster and then snapshot all the nodes individually ? Thanks.
Specifying exact nodes for Consistency Levels
Is it possible in some way to specify what specific nodes I want to include (or exclude) from the Consistency Level fulfillment ? Example, I have a cluster of 4 nodes (n1,n2,n3 and n4) and set N=4. I want to set W=3 and want to ensure that it is n1,n2 and n3 only that are used to satisfy w=3 (i.e. n4 is excluded for w=3 calculation). Thanks.
Host score calculation for dynamic_snitch
dynamic_snitch seems to do host score calculation to figure the latency of each node. What are the details of this calculation : 1. What is the mechanism to determine latency ? 2. Does it score the calculated scores and use the historical figures to come up with the latest scores ? (You can't just base the latency based on a singular ping) 3. Where are these scores stored ? In the cassandra database somewhere ? 4. cassandra.yaml also says that dynamic_snitch_update_interval_in_ms "controls how often to perform the more expensive part of host score calculation". What is the expensive part and how expensive is it? Thanks.
Re: Introduce Random latency
Thanks! This is a much better way. On Wed, Apr 6, 2011 at 1:27 PM, Scott Brooks wrote: > I'm not sure of your exact requirements for why you would do that > there, but have you thought about putting the delay in the network > instead? > http://stackoverflow.com/questions/614795/simulate-delayed-and-dropped-packets-on-linux > > That SO page has a pretty good description of how you can do it on the > network level without having to modify any source(assuming you are on > linux). > > Scott > > On Wed, Apr 6, 2011 at 11:20 AM, A J wrote: >> I want to run some tests where I incorporate random latency in some of >> the nodes (that varies over time). I think the best place is to place >> a 'sleep (random(10 to 50 seconds)) function' somewhere in the source >> code of cassandra. >> Or maybe the sleep is not random but the seconds to sleep are read >> from an external file on that node. (that I will modify as the tests >> are going on) >> >> What would be a good spot in the source code to include this sleep function ? >> >> Thanks. >> >
Re: LB scenario
I have done some more research. My question now is: 1. From my tests I see that it matters a lot whether the co-ordinator node is local (geographically) to the client or not. I will have a cassandra cluster where nodes will be distributed across the globe. Say 3 in US east_cost, 3 in US west_coast and 3 in europe. Now, if I can help, I would like to have most of the traffic from my california clients being handled by the west_coast nodes. But incase the west_coast nodes are down or slow, the co-ordinator node can be elsewhere. What is the best strategy to give different weight to different nodes, where some nodes are preferred over the others. Thanks. On Tue, Apr 5, 2011 at 2:23 PM, Peter Schuller wrote: >> Can someone comment on this ? Or is the question too vague ? > > Honestly yeah I couldn't figure out what you were asking ;) What > specifically about the diagram are you trying to convey? > > -- > / Peter Schuller >
Introduce Random latency
I want to run some tests where I incorporate random latency in some of the nodes (that varies over time). I think the best place is to place a 'sleep (random(10 to 50 seconds)) function' somewhere in the source code of cassandra. Or maybe the sleep is not random but the seconds to sleep are read from an external file on that node. (that I will modify as the tests are going on) What would be a good spot in the source code to include this sleep function ? Thanks.
Re: LB scenario
Can someone comment on this ? Or is the question too vague ? Thanks. On Wed, Mar 30, 2011 at 3:58 PM, A J wrote: > Does the following load balancing scenario look reasonable with cassandra ? > I will not be having any app servers. > > http://dl.dropbox.com/u/7258508/2011-03-30_1542.png > > Thanks. >
pycassa refresh server_list
In the pycassa.pool.ConnectionPool class, I can specify all the nodes in server_list parameter. But overtime, when nodes get decomissioned and new nodes with new IPs get added, how can the server_list parameter be refereshed ? Do I have to modify it manually, or is there a way to update the list automatically ?
Re: Ditching Cassandra
I did an apple to apple comparison of WRITE speed of Mongo vs Cassandra. Benchmarking (Cluster of 4 machines. Each of 4 cores and 15gb ram. 2 IN region1 but different zones, 1 in region2 and 1 in region3. Set consistency level of 3. i.e. atleast 3 machines have to confirm of write before the write is confirmed as done to client. also fsync kept at 10 seconds ). Same specs and hardware for both mongo and cassandra. I modified stress.py to work both for mongo and cassandra. Conclusion: Cassandra write speed is twice that of Mongo for same settings of CL and durability! Though I agree the documentation and ease of use of Mongo is much superior to Cassandra, the write speed and lesser hoops in sharding keeps the options in balance. (MongoDB needs additional Mongos server and config server. Data has to channel through the MongoS to the shards) Because Mongo is commercially backed, you would expect the edges to be smoother on that. Also CL for reads is not possible in MongoDB. So I am expecting reads too to be worse on Mongo (as I will always have to hit the primary to ensure consistency). Not tried testing yet. You also have to consider the fact 10gen can get acquired anytime. On Wed, Mar 30, 2011 at 12:55 AM, Colin wrote: > Thank you Eric. I appreciate it. > > -Original Message- > From: Eric Evans [mailto:eev...@rackspace.com] > Sent: Tuesday, March 29, 2011 11:47 PM > To: user@cassandra.apache.org > Subject: Re: Ditching Cassandra > > On Tue, 2011-03-29 at 20:56 -0500, Colin wrote: >> Are you saying that 8 will or not be compatible with 7? > > You will be able to perform a rolling upgrade from 0.7.x to 0.8. That is to > say, you'll be able to upgrade each node one at a time, mixing 0.7 and 0.8 > nodes until the upgrade is complete. > >> If not, would you recommend waiting until 8? We have done an awful >> lot of work, have an awful lot of work left, and have become very >> frustrated. > > If you're interested in exploring the CQL route, and if your time-line > permits it, that's what I would do. > >> Any idea on when 8 will be available? > > Provided nothing crops up, we'll freeze on April 11th (2 weeks from > yesterday), and release on the week of May 9th (4 weeks later). > > -- > Eric Evans > eev...@rackspace.com > > >
LB scenario
Does the following load balancing scenario look reasonable with cassandra ? I will not be having any app servers. http://dl.dropbox.com/u/7258508/2011-03-30_1542.png Thanks.
Re: International language implementations
Example, taobao.com is a chinese online bid site. All data is chinese and they use Mongodb successfully. Are there similar installations of cassandra where data is non-latin ? I know in theory, it should all work as cassandra has full utf-8 support. But unless there are real implementations, you cannot be sure of the issues related to storing,sorting etc.. Regards. On Tue, Mar 29, 2011 at 5:41 PM, Peter Schuller wrote: >> Can someone list some of the current international language >> implementations of cassandra ? > > What is an "international language implementation of Cassandra"? > > -- > / Peter Schuller >
International language implementations
Can someone list some of the current international language implementations of cassandra ? Thanks.
Re: EC2 - 2 regions
ok, I will test again and let you know. SD - to secure data stream between EC2 regions, can we not just setup a VPN in EC2 with this patch ? On Wed, Mar 23, 2011 at 8:50 PM, Milind Parikh wrote: > My nodetool repair does not hang. That's why I'm curious. > > /*** > sent from my android...please pardon occasional typos as I respond @ the > speed of thought > / > > On Mar 23, 2011 2:54 PM, "A J" wrote: > > 7000 and 9160 are accessible. Don't think I need other ports for basic > setup , right ? > If anyone coud get 'nodetool repair' working with this patch (across > regions), let me know. It may be I am doing something wrong. > > On Wed, Mar 23, 2011 at 1:08 AM, Milind Parikh > wrote: >> @aj >> are you sure...
Re: EC2 - 2 regions
7000 and 9160 are accessible. Don't think I need other ports for basic setup , right ? If anyone coud get 'nodetool repair' working with this patch (across regions), let me know. It may be I am doing something wrong. On Wed, Mar 23, 2011 at 1:08 AM, Milind Parikh wrote: > @aj > are you sure that all ports are accessible from all node? > > @sasha > I think that being able to have the semantics of address aNAT address can > emable security from different perspective. Describing an overlay nw will > take long hete. But that may solve your security concerns over the internet. > > /*** > sent from my android...please pardon occasional typos as I respond @ the > speed of thought > / > > On Mar 22, 2011 11:00 AM, "Sasha Dolgy" wrote: > > there are some other knock on issues too. the SSL work that has been > done would also have to be changed ... > > -sd > > On Tue, Mar 22, 2011 at 6:58 PM, A J wrote: >> Milind, >> Among the limitation you... > > -- > Sasha Dolgy > sasha.do...@gmail.com >
Re: nodetool repair takes forever
Actually I had modified the source code (to put a patch for cassandra to work across EC2 regions). That patch seems to be causing issue with 'nodetool repair' When I run without the patch (and within an ec2 region), the repair completes within reasonable time. On Tue, Mar 22, 2011 at 12:40 PM, Robert Coli wrote: > On Tue, Mar 22, 2011 at 8:53 AM, A J wrote: >> 0.7.4 >> >> On Tue, Mar 22, 2011 at 11:49 AM, Robert Coli wrote: >>> On Mon, Mar 21, 2011 at 8:33 PM, A J wrote: >>>> I am trying to estimate the time it will take to rebuild a node. After >>>> loading reasonable data, > > http://issues.apache.org/jira/browse/CASSANDRA-2324 > > May be it? > > =Rob >
Re: EC2 - 2 regions
Milind, Among the limitation you might want to add that 'nodetool repair' does not work with this patch. I tried several times and the repair hangs. When I run it directly on the trunk of 0.7.4 (without the patch) it completes successfully within reasonable time. Thanks. On Tue, Mar 22, 2011 at 1:07 PM, Jeremy Hanna wrote: > Never mind - I had thought it was more generalizable but since it's just > going against the public IP between regions, that's not going to be something > that makes it into trunk. I had just wanted to see if there was a way that > it could be done, but it sounds like since amazon doesn't provide decent > information between regions, something like this workaround patch is required. > > Anyway - thanks for the work on this. > > On Mar 22, 2011, at 8:33 AM, Jeremy Hanna wrote: > >> Milind, >> >> Thank you for attaching the patch here, but it would be really nice if you >> could create a jira account so you could participate in the discussion on >> the ticket and put the patch on there - that is the way people license their >> contributions with the apache 2 license. You just need to create an account >> with the public jira inked off of the ticket at the top. >> >> Understandable that it would necessarily be a general solution now - but >> it's a start to understanding what would need to be done so that if >> possible, something general could be derived. I'm just trying to help get >> the discussion started so it could be something that people could do out of >> the box. Not only that, but also so that it could be tested and evolve with >> the codebase so that people could know that it is hardened and used by >> others. >> >> Any limitations would be nice to note when you attach the patch to the >> ticket. >> >> Thanks so much for your work on this! >> >> Jeremy >> >> On Mar 21, 2011, at 11:29 PM, Milind Parikh wrote: >> >>> Patch is attached... I don't have access to Jira. >>> >>> A cautionery note: This is NOT a general solution and is not intended as >>> such. It could be included as a part of larger patch. I will explain in the >>> limitation sections about why it is not a general solution; as I find time. >>> >>> Regards >>> Milind >>> >>> On Mon, Mar 21, 2011 at 11:42 PM, Jeremy Hanna >>> wrote: >>> Sorry if I was presumptuous earlier. I created a ticket so that the patch >>> could be submitted and reviewed - that is if it can be generalized so that >>> it works across regions and doesn't adversely affect the common case. >>> https://issues.apache.org/jira/browse/CASSANDRA-2362 >>> >>> On Mar 21, 2011, at 10:41 PM, Jeremy Hanna wrote: >>> >>>> Sorry if I was presumptuous earlier. I created a ticket so that the patch >>>> could be submitted and reviewed - that is if it can be generalized so that >>>> it works across regions and doesn't adversely affect the common case. >>>> https://issues.apache.org/jira/browse/CASSANDRA-2362 >>>> >>>> On Mar 21, 2011, at 12:20 PM, Jeremy Hanna wrote: >>>> >>>>> I talked to Matt Dennis in the channel about it and I think everyone >>>>> would like to make sure that cassandra works great across multiple >>>>> regions. He sounded like he didn't know why it wouldn't work after >>>>> having looked at the patches. I would like to try it both ways - with >>>>> and without the patches later today if I can and I'd like to help out >>>>> with getting it working out of the box. >>>>> >>>>> Thanks for the investigative work and documentation Milind! >>>>> >>>>> Jeremy >>>>> >>>>> On Mar 21, 2011, at 12:12 PM, Dave Viner wrote: >>>>> >>>>>> Hi Milind, >>>>>> >>>>>> Great work here. Can you provide the patch against the 2 files? >>>>>> >>>>>> Perhaps there's some way to incorporate it into the trunk of cassandra >>>>>> so that this is feasible (in a future release) without patching the >>>>>> source code. >>>>>> >>>>>> Dave Viner >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2011 at 9:41 AM, A J wrote: >>>>>> Thanks for sharing the document, Milind ! >>>>>> Followed the instructions and it worked fo
Re: nodetool repair takes forever
0.7.4 On Tue, Mar 22, 2011 at 11:49 AM, Robert Coli wrote: > On Mon, Mar 21, 2011 at 8:33 PM, A J wrote: >> I am trying to estimate the time it will take to rebuild a node. After >> loading reasonable data, >> ... >> For some reason, the repair command runs forever. I just have 3G of >> data per node but still the repair is running for more than an hour ! > > What version of cassandra are you running? > > =Rob >
nodetool repair takes forever
I am trying to estimate the time it will take to rebuild a node. After loading reasonable data, I brought down a node and manually removed all its datafiles for a given keyspace (Keyspace1) I then restarted the node and got i back in the ring. At this point, I wish to run nodetool repair (bin/nodetool -h 127.0.0.1 repair Keyspace1) and estimate the time the time to rebuild from the time it takes to repair. For some reason, the repair command runs forever. I just have 3G of data per node but still the repair is running for more than an hour ! Can someone tell if it is normal or I am doing something wrong. Thanks.
Re: stress.py bug?
Not completely related. just fyi. I like it better to see the start time, end time, duration of each execution in each thread. And then do the aggregation (avg,max,min) myself. I modified last few lines of the Inserter function as follows: endtime = time.time() self.latencies[self.idx] += endtime - start self.opcounts[self.idx] += 1 self.keycounts[self.idx] += 1 open('log'+str(self.idx)+'.txt','a').write(str(endtime-start) + ' ' + str(self.idx) + ' ' + str(i) + ' ' + str(time.asctime())+ ' ' + str(start) + ' ' + str(endtime) + '\n') You need to understand little bit of python to plug this properly in stress.py. Above creates lot of log*.txt files. One for each thread. Each line in these log files have the duration, thread#,key,timestamp,starttime,endtime separated by space. i then load these log files to a database and do aggregations as I need. Remember to remove the old log files on rerun. The above will append to existing log files. Just a fyi. Most will not need this. On Mon, Mar 21, 2011 at 12:40 PM, Ryan King wrote: > On Mon, Mar 21, 2011 at 9:34 AM, pob wrote: >> You mean, >> more threads in stress.py? The purpose was figure out whats the >> biggest bandwidth that C* can use. > > You should try more threads, but at some point you'll hit diminishing > returns there. You many need to drive load from more than one host. > Either way, you need to find out what the bottleneck is. > > -ryan >
Re: EC2 - 2 regions
Thanks for sharing the document, Milind ! Followed the instructions and it worked for me. On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh wrote: > Here's the document on Cassandra (0.7.4) across EC2 regions. Clearly this is > work in progress but wanted to share what I have. PDF is the working > copy. > > > https://docs.google.com/document/d/175duUNIx7m5mCDa2sjXVI04ekyMa5bdiWdu-AFgisaY/edit?hl=en > > On Sun, Mar 20, 2011 at 7:49 PM, aaron morton > wrote: >> >> Recent discussion on the dev list >> http://www.mail-archive.com/dev@cassandra.apache.org/msg01832.html >> Aaron >> On 19 Mar 2011, at 06:46, A J wrote: >> >> Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) >> connections are done using the public DNS (that goes like >> ec2-.compute.amazonaws.com) >> >> On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: >> >> I am able to telnet from one region to another on 7000 port without >> >> issues. (I get the expected Connected to .Escape character is >> >> '^]'.) >> >> Also I am able to execute cassandra client on 9160 port from one >> >> region to another without issues (this is when I run cassandra >> >> separately on each region without forming a cluster). >> >> So I think the ports 7000 and 9160 are not the issue. >> >> >> >> On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: >> >> From the us-west instance, are you able to connect to the us-east instance >> >> using telnet on port 7000 and 9160? >> >> If not, then you need to open those ports for communication (via your >> >> Security Group) >> >> Dave Viner >> >> On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: >> >> Thats exactly what I am doing. >> >> I was able to do the first two scenarios without any issues (i.e. 2 >> >> nodes in same availability zone. Followed by an additional node in a >> >> different zone but same region) >> >> I am stuck at the third scenario of separate regions. >> >> (I did read the "Cassandra nodes on EC2 in two different regions not >> >> communicating" thread but it did not seem to end with resolution) >> >> >> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: >> >> Hi AJ, >> >> I'd suggest getting to a multi-region cluster step-by-step. First, get >> >> 2 >> >> nodes running in the same availability zone. Make sure that works >> >> properly. >> >> Second, add a node in a separate availability zone, but in the same >> >> region. >> >> Make sure that's working properly. Third, add a node that's in a >> >> separate >> >> region. >> >> Taking it step-by-step will ensure that any issues are specific to the >> >> region-to-region communication, rather than intra-zone connectivity or >> >> cassandra cluster configuration. >> >> Dave Viner >> >> On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: >> >> Hello, >> >> I am trying to setup a cassandra cluster across regions. >> >> For testing I am keeping it simple and just having one node in US-EAST >> >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say >> >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). >> >> Using Cassandra 0.7.4 >> >> >> The one in east region is the seed node and has the values as: >> >> auto_bootstrap: false >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> The one in west region is non seed and has the values as: >> >> auto_bootstrap: true >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> I first fire the seed node (east region instance) and it comes up >> >> without issues. >> >> When I fire the non-seed node (west region instance) it fails after >> >> sometime with the error: >> >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. >> >> INFO 15:09:08,844 Joining: getting load information >> >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... >> >> DEBUG 15:09:09,822 attempting to connect to >> >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 >> >> DEBUG 15:09:10,825 Disseminating lo
Re: EC2 - 2 regions
Just to add, all the telnet (port 7000) and cassandra-cli (port 9160) connections are done using the public DNS (that goes like ec2-.compute.amazonaws.com) On Fri, Mar 18, 2011 at 1:37 PM, A J wrote: > I am able to telnet from one region to another on 7000 port without > issues. (I get the expected Connected to .Escape character is > '^]'.) > > Also I am able to execute cassandra client on 9160 port from one > region to another without issues (this is when I run cassandra > separately on each region without forming a cluster). > > So I think the ports 7000 and 9160 are not the issue. > > > > On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: >> From the us-west instance, are you able to connect to the us-east instance >> using telnet on port 7000 and 9160? >> If not, then you need to open those ports for communication (via your >> Security Group) >> Dave Viner >> >> On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: >>> >>> Thats exactly what I am doing. >>> >>> I was able to do the first two scenarios without any issues (i.e. 2 >>> nodes in same availability zone. Followed by an additional node in a >>> different zone but same region) >>> >>> I am stuck at the third scenario of separate regions. >>> >>> (I did read the "Cassandra nodes on EC2 in two different regions not >>> communicating" thread but it did not seem to end with resolution) >>> >>> >>> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: >>> > Hi AJ, >>> > I'd suggest getting to a multi-region cluster step-by-step. First, get >>> > 2 >>> > nodes running in the same availability zone. Make sure that works >>> > properly. >>> > Second, add a node in a separate availability zone, but in the same >>> > region. >>> > Make sure that's working properly. Third, add a node that's in a >>> > separate >>> > region. >>> > Taking it step-by-step will ensure that any issues are specific to the >>> > region-to-region communication, rather than intra-zone connectivity or >>> > cassandra cluster configuration. >>> > Dave Viner >>> > >>> > On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: >>> >> >>> >> Hello, >>> >> >>> >> I am trying to setup a cassandra cluster across regions. >>> >> For testing I am keeping it simple and just having one node in US-EAST >>> >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say >>> >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). >>> >> Using Cassandra 0.7.4 >>> >> >>> >> >>> >> The one in east region is the seed node and has the values as: >>> >> auto_bootstrap: false >>> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >>> >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com >>> >> rpc_address: 0.0.0.0 >>> >> >>> >> The one in west region is non seed and has the values as: >>> >> auto_bootstrap: true >>> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >>> >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com >>> >> rpc_address: 0.0.0.0 >>> >> >>> >> I first fire the seed node (east region instance) and it comes up >>> >> without issues. >>> >> When I fire the non-seed node (west region instance) it fails after >>> >> sometime with the error: >>> >> >>> >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. >>> >> INFO 15:09:08,844 Joining: getting load information >>> >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... >>> >> DEBUG 15:09:09,822 attempting to connect to >>> >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 >>> >> DEBUG 15:09:10,825 Disseminating load info ... >>> >> DEBUG 15:10:10,826 Disseminating load info ... >>> >> DEBUG 15:10:38,845 ... got load info >>> >> INFO 15:10:38,845 Joining: getting bootstrap token >>> >> ERROR 15:10:38,847 Exception encountered during startup. >>> >> java.lang.RuntimeException: No other nodes seen! Unable to bootstrap >>> >> at >>> >> >>> >> org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:164) >>> >> at >>> >>
Re: EC2 - 2 regions
I am able to telnet from one region to another on 7000 port without issues. (I get the expected Connected to .Escape character is '^]'.) Also I am able to execute cassandra client on 9160 port from one region to another without issues (this is when I run cassandra separately on each region without forming a cluster). So I think the ports 7000 and 9160 are not the issue. On Fri, Mar 18, 2011 at 1:26 PM, Dave Viner wrote: > From the us-west instance, are you able to connect to the us-east instance > using telnet on port 7000 and 9160? > If not, then you need to open those ports for communication (via your > Security Group) > Dave Viner > > On Fri, Mar 18, 2011 at 10:20 AM, A J wrote: >> >> Thats exactly what I am doing. >> >> I was able to do the first two scenarios without any issues (i.e. 2 >> nodes in same availability zone. Followed by an additional node in a >> different zone but same region) >> >> I am stuck at the third scenario of separate regions. >> >> (I did read the "Cassandra nodes on EC2 in two different regions not >> communicating" thread but it did not seem to end with resolution) >> >> >> On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: >> > Hi AJ, >> > I'd suggest getting to a multi-region cluster step-by-step. First, get >> > 2 >> > nodes running in the same availability zone. Make sure that works >> > properly. >> > Second, add a node in a separate availability zone, but in the same >> > region. >> > Make sure that's working properly. Third, add a node that's in a >> > separate >> > region. >> > Taking it step-by-step will ensure that any issues are specific to the >> > region-to-region communication, rather than intra-zone connectivity or >> > cassandra cluster configuration. >> > Dave Viner >> > >> > On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: >> >> >> >> Hello, >> >> >> >> I am trying to setup a cassandra cluster across regions. >> >> For testing I am keeping it simple and just having one node in US-EAST >> >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say >> >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). >> >> Using Cassandra 0.7.4 >> >> >> >> >> >> The one in east region is the seed node and has the values as: >> >> auto_bootstrap: false >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> >> >> The one in west region is non seed and has the values as: >> >> auto_bootstrap: true >> >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com >> >> rpc_address: 0.0.0.0 >> >> >> >> I first fire the seed node (east region instance) and it comes up >> >> without issues. >> >> When I fire the non-seed node (west region instance) it fails after >> >> sometime with the error: >> >> >> >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. >> >> INFO 15:09:08,844 Joining: getting load information >> >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... >> >> DEBUG 15:09:09,822 attempting to connect to >> >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 >> >> DEBUG 15:09:10,825 Disseminating load info ... >> >> DEBUG 15:10:10,826 Disseminating load info ... >> >> DEBUG 15:10:38,845 ... got load info >> >> INFO 15:10:38,845 Joining: getting bootstrap token >> >> ERROR 15:10:38,847 Exception encountered during startup. >> >> java.lang.RuntimeException: No other nodes seen! Unable to bootstrap >> >> at >> >> >> >> org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:164) >> >> at >> >> >> >> org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:146) >> >> at >> >> >> >> org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:141) >> >> at >> >> >> >> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:450) >> >> at >> >> >> >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:404) >> >> at >> >> >> >> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:192) >> >> at >> >> >> >> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) >> >> at >> >> >> >> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) >> >> >> >> >> >> The seed node seems to somewhat acknowledge the non-seed node: >> >> attempting to connect to /2.2.3.4 >> >> attempting to connect to /10.170.190.31 >> >> >> >> Can you suggest how can I fix it (I did see a few threads on similar >> >> issue but did not really follow the chain) >> >> >> >> Thanks, AJ >> > >> > > >
Re: EC2 - 2 regions
Thats exactly what I am doing. I was able to do the first two scenarios without any issues (i.e. 2 nodes in same availability zone. Followed by an additional node in a different zone but same region) I am stuck at the third scenario of separate regions. (I did read the "Cassandra nodes on EC2 in two different regions not communicating" thread but it did not seem to end with resolution) On Fri, Mar 18, 2011 at 1:15 PM, Dave Viner wrote: > Hi AJ, > I'd suggest getting to a multi-region cluster step-by-step. First, get 2 > nodes running in the same availability zone. Make sure that works properly. > Second, add a node in a separate availability zone, but in the same region. > Make sure that's working properly. Third, add a node that's in a separate > region. > Taking it step-by-step will ensure that any issues are specific to the > region-to-region communication, rather than intra-zone connectivity or > cassandra cluster configuration. > Dave Viner > > On Fri, Mar 18, 2011 at 8:34 AM, A J wrote: >> >> Hello, >> >> I am trying to setup a cassandra cluster across regions. >> For testing I am keeping it simple and just having one node in US-EAST >> (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say >> ec2-2-2-3-4.us-west-1.compute.amazonaws.com). >> Using Cassandra 0.7.4 >> >> >> The one in east region is the seed node and has the values as: >> auto_bootstrap: false >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> listen_address: ec2-1-2-3-4.compute-1.amazonaws.com >> rpc_address: 0.0.0.0 >> >> The one in west region is non seed and has the values as: >> auto_bootstrap: true >> seeds: ec2-1-2-3-4.compute-1.amazonaws.com >> listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com >> rpc_address: 0.0.0.0 >> >> I first fire the seed node (east region instance) and it comes up >> without issues. >> When I fire the non-seed node (west region instance) it fails after >> sometime with the error: >> >> DEBUG 15:09:08,844 Created HHOM instance, registered MBean. >> INFO 15:09:08,844 Joining: getting load information >> INFO 15:09:08,845 Sleeping 9 ms to wait for load information... >> DEBUG 15:09:09,822 attempting to connect to >> ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 >> DEBUG 15:09:10,825 Disseminating load info ... >> DEBUG 15:10:10,826 Disseminating load info ... >> DEBUG 15:10:38,845 ... got load info >> INFO 15:10:38,845 Joining: getting bootstrap token >> ERROR 15:10:38,847 Exception encountered during startup. >> java.lang.RuntimeException: No other nodes seen! Unable to bootstrap >> at >> org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:164) >> at >> org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:146) >> at >> org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:141) >> at >> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:450) >> at >> org.apache.cassandra.service.StorageService.initServer(StorageService.java:404) >> at >> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:192) >> at >> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) >> at >> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) >> >> >> The seed node seems to somewhat acknowledge the non-seed node: >> attempting to connect to /2.2.3.4 >> attempting to connect to /10.170.190.31 >> >> Can you suggest how can I fix it (I did see a few threads on similar >> issue but did not really follow the chain) >> >> Thanks, AJ > >
EC2 - 2 regions
Hello, I am trying to setup a cassandra cluster across regions. For testing I am keeping it simple and just having one node in US-EAST (say ec2-1-2-3-4.compute-1.amazonaws.com) and one node in US-WEST (say ec2-2-2-3-4.us-west-1.compute.amazonaws.com). Using Cassandra 0.7.4 The one in east region is the seed node and has the values as: auto_bootstrap: false seeds: ec2-1-2-3-4.compute-1.amazonaws.com listen_address: ec2-1-2-3-4.compute-1.amazonaws.com rpc_address: 0.0.0.0 The one in west region is non seed and has the values as: auto_bootstrap: true seeds: ec2-1-2-3-4.compute-1.amazonaws.com listen_address: ec2-2-2-3-4.us-west-1.compute.amazonaws.com rpc_address: 0.0.0.0 I first fire the seed node (east region instance) and it comes up without issues. When I fire the non-seed node (west region instance) it fails after sometime with the error: DEBUG 15:09:08,844 Created HHOM instance, registered MBean. INFO 15:09:08,844 Joining: getting load information INFO 15:09:08,845 Sleeping 9 ms to wait for load information... DEBUG 15:09:09,822 attempting to connect to ec2-1-2-3-4.compute-1.amazonaws.com/1.2.3.4 DEBUG 15:09:10,825 Disseminating load info ... DEBUG 15:10:10,826 Disseminating load info ... DEBUG 15:10:38,845 ... got load info INFO 15:10:38,845 Joining: getting bootstrap token ERROR 15:10:38,847 Exception encountered during startup. java.lang.RuntimeException: No other nodes seen! Unable to bootstrap at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:164) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:146) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:141) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:450) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:404) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:192) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) The seed node seems to somewhat acknowledge the non-seed node: attempting to connect to /2.2.3.4 attempting to connect to /10.170.190.31 Can you suggest how can I fix it (I did see a few threads on similar issue but did not really follow the chain) Thanks, AJ
Re: [RELEASE] 0.7.4
I don't see binary_memtable_throughput_in_mb parameter in cassandra.yaml anymore. What is it replaced by ? thanks. On Tue, Mar 15, 2011 at 11:32 PM, Eric Evans wrote: > On Tue, 2011-03-15 at 22:19 -0500, Eric Evans wrote: >> On Tue, 2011-03-15 at 14:26 -0700, Mark wrote: >> > Still not seeing 0.7.4 as a download option on the main site? >> >> Something about the site's pubsub isn't working; I'll contact INFRA. > > https://issues.apache.org/jira/browse/INFRA-3520 > > -- > Eric Evans > eev...@rackspace.com > >
Re: Several 'TimedOutException' in stress.py
elf.run() File "stress.py", line 238, in run self.cclient.batch_mutate(cfmap, consistency) File "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 873, in batch_mutate self.recv_batch_mutate() File "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 899, in recv_batch_mutate raise result.te TimedOutException: TimedOutException() Process Inserter-10: Traceback (most recent call last): File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap self.run() File "stress.py", line 238, in run self.cclient.batch_mutate(cfmap, consistency) File "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 873, in batch_mutate self.recv_batch_mutate() File "/home/ec2-user/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", line 899, in recv_batch_mutate raise result.te TimedOutException: TimedOutException() The related server side errors look like: DEBUG 22:29:04,407 Deleting CommitLog-1299623301883.log.header DEBUG 22:29:04,412 Deleting CommitLog-1299623301883.log DEBUG 22:29:04,443 Deleting CommitLog-1299623318627.log.header DEBUG 22:29:04,443 Deleting CommitLog-1299623318627.log DEBUG 22:29:09,202 ... timed out DEBUG 22:29:09,426 ... timed out DEBUG 22:29:10,318 ... timed out DEBUG 22:29:11,354 logged out: # DEBUG 22:29:11,354 logged out: # DEBUG 22:29:11,354 logged out: # DEBUG 22:29:12,442 Processing response on a callback from 784@/10.253.203.224 DEBUG 22:29:12,443 Processing response on a callback from 786@/10.253.203.224 DEBUG 22:29:12,443 Processing response on a callback from 791@/10.253.203.224 On Tue, Mar 8, 2011 at 3:22 PM, aaron morton wrote: > Is this a client side time out or a server side one? What does the error > stack look like ? > Also check the server side logs for errors. The thrift API will raise a > timeout when less the CL level of nodes return in rpc_timeout. > Good luck > Aaron > On 9/03/2011, at 7:37 AM, ruslan usifov wrote: > > > 2011/3/8 A J >> >> Trying out stress.py on AWS EC2 environment (4 Large instances. Each >> of 2-cores and 7.5GB RAM. All in the same region/zone.) >> >> python stress.py -o insert -d >> 10.253.203.224,10.220.203.48,10.220.17.84,10.124.89.81 -l 2 -e ALL -t >> 10 -n 500 -S 100 -k >> >> (I want to try with column size of about 1MB. I am assuming the above >> gives me 10 parallel threads each executing 50 inserts sequentially >> (500/10) ). >> >> Getting several timeout errors.TimedOutException(). With just 10 >> concurrent writes spread across 4 nodes, kind of surprised to get so >> many timeouts. Any suggestions ? >> > > > It may by EC2 disc speed degradation (io speed of EC2 instances doesnt > const, also can vary in greater limits) > >
Several 'TimedOutException' in stress.py
Trying out stress.py on AWS EC2 environment (4 Large instances. Each of 2-cores and 7.5GB RAM. All in the same region/zone.) python stress.py -o insert -d 10.253.203.224,10.220.203.48,10.220.17.84,10.124.89.81 -l 2 -e ALL -t 10 -n 500 -S 100 -k (I want to try with column size of about 1MB. I am assuming the above gives me 10 parallel threads each executing 50 inserts sequentially (500/10) ). Getting several timeout errors.TimedOutException(). With just 10 concurrent writes spread across 4 nodes, kind of surprised to get so many timeouts. Any suggestions ? Thanks.
Re: Network Topology Strategy error
Thanks. It worked when I changed as you suggested to: create keyspace ks1 with strategy_options = [{DC1:1, DC2:1}] and placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; Something that I am observing: The replicas are always put in the first node on the other DC. (So if there are 2 nodes in each DC, replica of both nodes goes to the first node in the other DC. And vice-versa) This would make the first node in each DC a hotspot. Am I doing something wrong ? If not, any way to avoid this ? On Thu, Mar 3, 2011 at 3:41 PM, Jonathan Ellis wrote: > you need to specify per-DC replicas w/ NTS in strategy_options, > instead of using replication_factor > > On Thu, Mar 3, 2011 at 1:52 PM, A J wrote: >> using latest cassandra (0.7.2). I want to try out Network Topology Strategy. >> >> Following is related setting in cassandra.yaml >> endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch >> >> I have four nodes. Set them accordingly in >> ./conf/cassandra-topology.properties: >> 10.252.219.224=DC2:RAC1 >> 10.252.10.64=DC2:RAC1 >> 10.252.11.32=DC1:RAC1 >> 10.220.103.98=DC1:RAC1 >> >> >> I create a ks as: >> create keyspace ks1 with replication_factor=1 and >> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; >> >> When I try to insert, get the following error: >> set cf1['A']['c1']='xyz'; >> ERROR 19:21:58,081 Internal error processing insert >> java.lang.AssertionError: invalid response count 1 for replication factor 0 >> >> >> Please suggest what could be going on ? cassandra-topology.properties >> has two DCs. Why am I still getting the error ? >> >> Thanks for any suggestions. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Storing photos, images, docs etc.
why would you keep metadata in cassandra ? Even for millions of documents, metadata would be very small, mysql/postgres should suffice. Luster ofcourse is well known and widely used along with glusterfs. Luster I think requires kernel modifications and will be much more complex. Also it is easier said than done that store in fs and metadata in db. You will have to create a custom solution to integrate and create transactions across them, avoid metadata spof, ensure load-balance etc. On Thu, Mar 3, 2011 at 2:49 PM, mcasandra wrote: > Has anyone heard about lustre distributed file system? I am wondering if it > will work well where keep the metadata in Cassandra and images in Lustre. > > I looked at MogileFS but not too sure about it's support. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos-images-docs-etc-tp6078278p6086135.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Network Topology Strategy error
using latest cassandra (0.7.2). I want to try out Network Topology Strategy. Following is related setting in cassandra.yaml endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch I have four nodes. Set them accordingly in ./conf/cassandra-topology.properties: 10.252.219.224=DC2:RAC1 10.252.10.64=DC2:RAC1 10.252.11.32=DC1:RAC1 10.220.103.98=DC1:RAC1 I create a ks as: create keyspace ks1 with replication_factor=1 and placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; When I try to insert, get the following error: set cf1['A']['c1']='xyz'; ERROR 19:21:58,081 Internal error processing insert java.lang.AssertionError: invalid response count 1 for replication factor 0 Please suggest what could be going on ? cassandra-topology.properties has two DCs. Why am I still getting the error ? Thanks for any suggestions.
Re: cassandra-rack.properties or cassandra-topology.properties
Yes, that has topology and not rack. conf/access.properties conf/log4j-server.properties conf/cassandra-env.sh conf/log4j-tools.properties conf/cassandra-topology.properties conf/passwd.properties conf/cassandra.yaml conf/README.txt On Thu, Mar 3, 2011 at 1:28 PM, Jonathan Ellis wrote: > Did you try "ls conf/" ? > > On Thu, Mar 3, 2011 at 11:27 AM, A J wrote: >> In PropertyFileSnitch is cassandra-rack.properties or >> cassandra-topology.properties file used ? >> >> Little confused by the stmt: >> PropertyFileSnitch determines the location of nodes by referring to a >> user-defined description of the network details located in the >> property file cassandra-rack.properties. Your installation contains an >> example properties file for PropertyFileSnitch in >> $CASSANDRA_HOME/conf/cassandra-topology.properties. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >