from:"Jeremiah Jordan"

Re: column with TTL of 10 seconds lives very long...

2013-05-25 Thread Jeremiah Jordan

If you do that same get again, is the column still being returned? (days
later)

-Jeremiah


On Thu, May 23, 2013 at 6:16 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!

 TTL was set:

 [default@HLockingManager] get
 HLocks['/LockedTopic/31a30c12-652d-45b3-9ac2-0401cce85517'];
 = (column=69b057d4-3578-4326-a9d9-c975cb8316d2,
 value=36396230353764342d333537382d343332362d613964392d633937356362383331366432,
 timestamp=1369307815049000, ttl=10)


 Also, all other lock columns expire as expected.

 Thanks,
 Tamar

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




 On Thu, May 23, 2013 at 1:58 PM, moshe.kr...@barclays.com wrote:

 Maybe you didn’t set the TTL correctly.

 Check the TTL of the column using CQL, e.g.:

 SELECT TTL (colName) from colFamilyName WHERE condition;

 ** **

 *From:* Felipe Sere [mailto:felipe.s...@1und1.de]
 *Sent:* Thursday, May 23, 2013 1:28 PM
 *To:* user@cassandra.apache.org
 *Subject:* AW: column with TTL of 10 seconds lives very long...

 ** **

 This is interesting as it might affect me too :)
 I have been observing deadlocks with HLockManagerImpl which dont get
 resolved for a long time
 even though the columns with the locks should only live for about
 5-10secs.

 Any ideas how to investigate this further from the Cassandra-side?
 --

 *Von:* Tamar Fraenkel [ta...@tok-media.com]
 *Gesendet:* Donnerstag, 23. Mai 2013 11:58
 *An:* user@cassandra.apache.org
 *Betreff:* Re: column with TTL of 10 seconds lives very long...

 Thanks for the response.
 Running date simultaneously on all nodes (using parallel ssh) shows that
 they are synced.

 Tamar


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

 ** **

 ** **

 ** **

 On Thu, May 23, 2013 at 12:29 PM, Nikolay Mihaylov n...@nmmm.nu wrote:*
 ***

 Did you synchronized the clocks between servers?

 ** **

 On Thu, May 23, 2013 at 9:32 AM, Tamar Fraenkel ta...@tok-media.com
 wrote:

 Hi!
 I have Cassandra cluster with 3 node running version 1.0.11.

 I am using Hector HLockManagerImpl, which creates a keyspace named
 HLockManagerImpl and CF HLocks.

 For some reason I have a row with single column that should have expired
 yesterday who is still there.
 I tried deleting it using cli, but it is stuck...
 Any ideas how to delete it?

 Thanks,


 

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media 

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956 

 ** **

 ** **

 ** **

 ** **

 ___

 This message is for information purposes only, it is not a
 recommendation, advice, offer or solicitation to buy or sell a product or
 service nor an official confirmation of any transaction. It is directed at
 persons who are professionals and is not intended for retail customer use.
 Intended for recipient only. This message is subject to the terms at:
 www.barclays.com/emaildisclaimer.

 For important disclosures, please see:
 www.barclays.com/salesandtradingdisclaimer regarding market commentary
 from Barclays Sales and/or Trading, who are active market participants; and
 in respect of Barclays Research, including disclosures relating to specific
 issuers, please see http://publicresearch.barclays.com.

 ___



image001.pngtokLogo.png

Re: remove DC

2012-11-12 Thread Jeremiah Jordan

If you have any data that you wrote to DC2, since the last time you ran repair, 
you should probably run repair to make sure that data made it over to DC1, if 
you never wrote data directly to DC2, then you are correct you don't need to 
run repair.

You should just need to update the schema, and then decommission the node.

-Jeremiah

On Nov 12, 2012, at 2:25 PM, William Oberman ober...@civicscience.com wrote:

 There is a great guide here on how to add resources:
 http://www.datastax.com/docs/1.1/operations/cluster_management#adding-capacity
 
 What about deleting resources?  I'm thinking of removing a data center. 
 Clearly I'd need to change strategy options, which is currently something 
 like this: 
 {DC1:3,DC2:1} 
 to:
 {DC1:3})
 
 But, after that change, I'm wondering if anything else needs to happen?  All 
 of the data in DC1 is already in the correct spots, so I don't think I have 
 to run repair or cleanup...
 
 will

Re: CREATE COLUMNFAMILY

2012-11-11 Thread Jeremiah Jordan

That is fine.  You just have to be careful that you haven't already inserted 
data which would be rejected by the type you update to, as a client will have 
issues reading that data back.

-Jeremiah

On Nov 11, 2012, at 4:09 PM, Kevin Burton rkevinbur...@charter.net wrote:

 What happens when you are mainly concerned about the human readable formats? 
 Say initially you don’t supply metadata for a key like foo in the column 
 family, but you get tired of seeing binary data displayed for the values so 
 you update the column family to get a more human readable format by adding 
 metadata for foo. Will this work?
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Sunday, November 11, 2012 3:39 PM
 To: user@cassandra.apache.org
 Subject: Re: CREATE COLUMNFAMILY
  
 Also most idomatic clients use the information so they can return the 
 appropriate type to you. 
  
  Can the metadata be applied
 after the fact? If so how?
 UPDATE COLUMN FAMILY in the CLI will let you change it. 
 Note that we do not update the existing data. This can be a problem if you do 
 something like change a variable length integer to a fixed length one. 
  
 Cheers
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 12/11/2012, at 8:06 AM, Kevin Burton rkevinbur...@charter.net wrote:
 
 
 Thank you this helps with my understanding. 
 
 So the goal here is to supply as many name/type pairs as can be reasonably
 be foreseen when the column family is created? Can the metadata be applied
 after the fact? If so how?
 
 -Original Message-
 From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
 Sent: Sunday, November 11, 2012 9:37 AM
 To: user@cassandra.apache.org
 Subject: Re: CREATE COLUMNFAMILY
 
 If you supply metadata cassandra can use it for several things.
 
 1) It validates data on insertion
 2) Helps display the information in human readable formats in tools like the
 CLI and sstabletojson
 3) If you add a built-in secondary index the type information is needed,
 strings sort differently then integer
 4) columns in rows are sorted by the column name, strings sort differently
 then integers
 
 On Sat, Nov 10, 2012 at 11:55 PM, Kevin Burton rkevinbur...@charter.net
 wrote:
 
 I am sure this has been asked before but what is the purpose of 
 entering key/value or more correctly key name/data type values on the 
 CREATE COLUMNFAMILY command.

Re: NetworkTopologyStrategy with 1 node

2012-05-26 Thread Jeremiah Jordan

What is the output of nodetool ring?  Does the cluster actually think your node 
is in DC1?

-Jeremiah

On May 26, 2012, at 6:36 AM, Cyril Auburtin wrote:

I get the same issue on Cassandra 1.1:

create keyspace ks with strategy_class = 'NetworkTopologyStrategy' AND 
strategy_options ={DC1:1};

then for example

[default@ks] create column family rr WITH key_validation_class=UTF8Type and 
comparator = UTF8Type and column_metadata = [{column_name: boo, 
validation_class: UTF8Type}];
5c6d0b86-86f2-3444-8335-fe4bdaa4745d
Waiting for schema agreement...
... schemas agree across the cluster
[default@ks] set rr['1']['boo'] = '1';
null
UnavailableException()
at 
org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15898)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:788)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:772)
at org.apache.cassandra.cli.CliClient.executeSet(CliClient.java:896)
at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:213)
at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)



2012/5/26 Cyril Auburtin 
cyril.aubur...@gmail.commailto:cyril.aubur...@gmail.com
thx, but still not

I did:

update keyspace ks with strategy_options = [{DC1:1}] and placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy';

then in cassandra-cli :

[default@ks] list Position;
Using default limit of 100
Internal error processing get_range_slices

and in cassandra console:

 INFO 11:10:52,680 Keyspace updated. Please perform any manual operations.
ERROR 11:13:37,565 Internal error processing get_range_slices
java.lang.IllegalStateException: datacenter (DC1) has no more endpoints, (1) 
replicas still needed
at 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:118)
at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:101)
at 
org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:1538)

How do I have to set the cassndra-topology.properties for a single node in ths 
DC?
I will try to do the same thing with C1.1, it could work

2012/5/26 Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
replication_factor = 1 and strategy_options = [{DC1:0}]

You should not be setting both of these.

All you should need is:
strategy_options = [{DC1:1}]

On Fri, May 25, 2012 at 1:47 PM, Cyril Auburtin
cyril.aubur...@gmail.commailto:cyril.aubur...@gmail.com wrote:
 I was using a single node, on cassandra 0.7.10

 with Network strategy = SimpleStrategy, and replication factor = 1,
 everything is fine, I was using a consistency level of ONE, for
 reading/writing

 I have updated the keyspace to

 update keyspace Mymed with replication_factor = 1 and strategy_options =
 [{DC1:0}] and placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy';

 with conf/cassandra-topology.properties having just this for the moment:

 default=DC1:r1

 the keyspace could update, I could use ks; also, but can't read anything

 even from Thrift using ConsistencyLevel.ONE; it will complain that this
 strategy require Quorum

 I tried with  ConsistencyLevel.LOCAL_QUORUM;  but get an exception like :

 org.apache.thrift.TApplicationException: Internal error processing get_slice

 and in cassandra console:

 DEBUG 19:45:02,013 Command/ConsistencyLevel is
 SliceFromReadCommand(table='Mymed',
 key='637972696c2e617562757274696e40676d61696c2e636f6d',
 column_parent='QueryPath(columnFamilyName='Authentication',
 superColumnName='null', columnName='null')', start='', finish='',
 reversed=false, count=100)/LOCAL_QUORUM
 ERROR 19:45:02,014 Internal error processing get_slice
 java.lang.NullPointerException
 at
 org.apache.cassandra.locator.NetworkTopologyStrategy.getReplicationFactor(NetworkTopologyStrategy.java:139)
 at
 org.apache.cassandra.service.DatacenterReadCallback.determineBlockFor(DatacenterReadCallback.java:83)
 at org.apache.cassandra.service.ReadCallback.init(ReadCallback.java:77)
 at
 org.apache.cassandra.service.DatacenterReadCallback.init(DatacenterReadCallback.java:48)
 at
 org.apache.cassandra.service.StorageProxy.getReadCallback(StorageProxy.java:461)
 at
 org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:326)
 at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:291)

 So probably I guess Network topology strategy can't work with just one node?

 thx for any feedback

RE: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread Jeremiah Jordan

The limitation is because number of columns could be equal to number of rows.  
If number of rows is large this can become an issue.

-Jeremiah


From: David Vanderfeesten [feest...@gmail.com]
Sent: Wednesday, May 16, 2012 6:58 AM
To: user@cassandra.apache.org
Subject: understanding of native indexes: limitations, potential side 
effects,...

Hi

I like to better understand the limitations of native indexes, potential side 
effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the node 
itself.
- Indexes do not return values in a sorted way (hashes of the indexed row keys 
are defining the order)
- Given by the design referred in the first bullet, a coordinator node 
receiving a read of a native index, needs to spawn a read to multiple nodes(set 
of nodes together covering at least the complete key space + potentially more 
to assure read consistency level).
- Each write to an indexed column leads to an additional local read of the 
index to update the index (kind of obvious but easily forgotten when tuning 
your system for write-only workload)
- When using a where clause in CQL you need at least to specify an equal 
condition on a native indexed column. Additional conditions in the where clause 
are filtered out by the coordinator node receiving the CQL query.
- native indexes do not support very well columns with high number of discrete 
values throughout the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of discrete values:
I assume native indexes  are implemented with an internally managed CF per 
index. With high cardinality values, in worst case, the number of rows in the 
index are identical to the number of rows of the indexed CF. Or are there other 
reasons for the limitation, and if that's the case, is there a guideline on the 
max. nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write action) atomic 
and isolated from concurrent updates?

Txs!

David

Re: Does or will Cassandra support OpenJDK ?

2012-05-14 Thread Jeremiah Jordan

Open JDK is java 1.7.  Once Cassandra supports Java 1.7 it would most likely 
work on Open JDK, as the 1.7 Open JDK really is the same thing as Oracle JDK 
1.7 without some licensed stuff.

-Jeremiah

On May 11, 2012, at 10:02 PM, ramesh wrote:

I've had problem downloading the Sun (Oracle) JDK and found this thread where 
the Oracle official is insisting or rather forcing Linux users to move to 
OpenJDK. Here is the thread

https://forums.oracle.com/forums/thread.jspa?threadID=2365607

I need this because I run Cassandra.
Just curious to know if I would be able to avoid the pain of using Sun JDK in 
future for production Cassandra ?

regards
Ramesh

Re: DELETE from table with composite keys

2012-05-14 Thread Jeremiah Jordan

Slice deletes are not supported currently.  It is being worked on. 
https://issues.apache.org/jira/browse/CASSANDRA-3708

-Jeremiah


On May 14, 2012, at 12:18 PM, Roland Mechler wrote:


I have a table with a 3 part composite key and I want to delete rows based on 
the first 2 parts of the key. SELECT works using 2 parts of the key, but DELETE 
fails with the error:

Bad Request: Missing mandatory PRIMARY KEY part part3

(see details below). Is there a reason why deleting based on the first 2 parts 
should not work? I.e., is it just currently not supported, or is it a permanent 
limitation?

Note that deleting based on just the first part of the key will work… deletes 
all matching rows.

cqlsh:Keyspace1 CREATE TABLE MyTable (part1 text, part2 text, part3 text, data 
text, PRIMARY KEY(part1, part2, part3));
cqlsh:Keyspace1 INSERT INTO MyTable (part1, part2, part3, data) VALUES (‘a’, 
‘b’, ‘c’, ‘d’);
cqlsh:Keyspace1 SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
part1 | part2 | part3 | data
——-+——-+——-+——
a | b | c | d

cqlsh:Keyspace1 DELETE FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
Bad Request: Missing mandatory PRIMARY KEY part part3
cqlsh:Keyspace1 DELETE data FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
Bad Request: Missing mandatory PRIMARY KEY part part3
cqlsh:Keyspace1 DELETE FROM MyTable WHERE part1 = ‘a’;
cqlsh:Keyspace1 SELECT * FROM MyTable WHERE part1 = ‘a’ AND part2 = ‘b’;
cqlsh:Keyspace1

-Roland

RE: Initial token - newbie question (version 1.0.8)

2012-04-11 Thread Jeremiah Jordan

You have to use nodetool move to change the token after the node has started 
the first time.  The value in the config file is only used on first startup.

Unless you were using RF=3 on your 3 node ring, you can't just start with a new 
token without using nodetool.  You have to do move so that the data gets put in 
the right place.

How you would do it with out nodetool:
Dangerous, not smart, can easily shoot yourself in the foot and lose your data 
way, if you were RF = 3:
If you used RF=3, then all nodes should have all data, and you can stop all 
nodes, remove the system keyspace data, and start up the new cluster with the 
right stuff in the yaml file (blowing away system means this is like starting a 
brand new cluster).  Then re-create all of your keyspaces/column families and 
they will pick up the already existing data.

Though, if you are rf=3, nodetool move shouldn't be moving anything anyway, so 
you should just do it the right way and use nodetool.


From: Jay Parashar [jparas...@itscape.com]
Sent: Wednesday, April 11, 2012 1:44 PM
To: user@cassandra.apache.org
Subject: Initial token - newbie question (version 1.0.8)

I created a 3 node ring with the intial_token blank. Of course as expected,
Cassandra generated its own tokens on startup (e.g. tokens X, Y and Z)
The nodes or course were not properly balanced, so I did the following steps

1)  stopped all the 3 nodes
2) assigned initial_tokens (A, B, C) respectively
3) Restarted the nodes

What I find if that the node were still using the original tokens (X, Y and
Z). Log messages say for node 1 show Using saved token X

I could rebalance suing nodetool and now the nodes are using the correct
tokens.

But the question is, why were the new tokens not read from the
Cassandra.yaml file? Without using nodetool, how do I make it get the token
from the yaml file? Where is it saved?

Another question: I could not find the auto_bootstrap in the yaml file as
per the documentation. Where is this param located?
Appreciate it.
Thanks in advance
Jay

Re: Resident size growth

2012-04-09 Thread Jeremiah Jordan

He says he disabled JNA.  You can't mmap without JNA can you?

On Apr 9, 2012, at 4:52 AM, aaron morton wrote:

see http://wiki.apache.org/cassandra/FAQ#mmap

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 9/04/2012, at 5:09 AM, ruslan usifov wrote:

mmap sstables? It's normal

2012/4/5 Omid Aladini omidalad...@gmail.commailto:omidalad...@gmail.com
Hi,

I'm experiencing a steady growth in resident size of JVM running
Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with
and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to
prevent this bug [1] to leak memory. Still JVM's resident set size
grows steadily. A process with Xmx=2048M has grown to 6GB resident
size and one with Xmx=8192M to 16GB in a few hours and increasing. Has
anyone experienced this? Any idea how to deal with this issue?

Thanks,
Omid

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129

RE: Write performance compared to Postgresql

2012-04-03 Thread Jeremiah Jordan

So Cassandra may or may not be faster than your current system when you have a 
couple connections.  Where it is faster, and scales, is when you get hundreds 
of clients across many nodes.

See:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

With 60 clients running 200 threads each they were able to get 10K writes per 
second per server, and as you added servers from 48-288 you still got 10K 
writes per second, so the aggregate writes per second went from 48*10K to 
288*10K

-Jeremiah


From: Jeff Williams [je...@wherethebitsroam.com]
Sent: Tuesday, April 03, 2012 10:09 AM
To: user@cassandra.apache.org
Subject: Re: Write performance compared to Postgresql

Vitalii,

Yep, that sounds like a good idea. Do you have any more information about how 
you're doing that? Which client?

Because even with 3 concurrent client nodes, my single postgresql server is 
still out performing my 2 node cassandra cluster, although the gap is narrowing.

Jeff

On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:

 Note that having tons of TCP connections is not good. We are using async 
 client to issue multiple calls over single connection at same time. You can 
 do the same.

 Best regards, Vitalii Tymchyshyn.

 03.04.12 16:18, Jeff Williams написав(ла):
 Ok, so you think the write speed is limited by the client and protocol, 
 rather than the cassandra backend? This sounds reasonable, and fits with our 
 use case, as we will have several servers writing. However, a bit harder to 
 test!

 Jeff

 On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:

 Hi Jeff,

 Writing serially over one connection will be slower. If you run many 
 threads hitting the server at once you will see throughput improve.

 Jake



 On Apr 3, 2012, at 7:08 AM, Jeff Williamsje...@wherethebitsroam.com  
 wrote:

 Hi,

 I am looking at cassandra for a logging application. We currently log to a 
 Postgresql database.

 I set up 2 cassandra servers for testing. I did a benchmark where I had 
 100 hashes representing logs entries, read from a json file. I then looped 
 over these to do 10,000 log inserts. I repeated the same writing to a 
 postgresql instance on one of the cassandra servers. The script is 
 attached. The cassandra writes appear to perform a lot worse. Is this 
 expected?

 jeff@transcoder01:~$ ruby cassandra-bm.rb
 cassandra
 3.17   0.48   3.65 ( 12.032212)
 jeff@transcoder01:~$ ruby cassandra-bm.rb
 postgres
 2.14   0.33   2.47 (  7.002601)

 Regards,
 Jeff

 cassandra-bm.rb

RE: Counter Column

2012-04-03 Thread Jeremiah Jordan

Right, it affects every version of Cassandra from 0.8 beta 1 until the Fix 
Version, which right now is None, so it isn't fixed yet...

From: Avi-h [avih...@gmail.com]
Sent: Tuesday, April 03, 2012 5:23 AM
To: cassandra-u...@incubator.apache.org
Subject: Re: Counter Column

this bug is for 0.8 beta 1, is it also relevant for 1.0.8?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counter-Column-tp7432010p7432450.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

RE: Compression on client side vs server side

2012-04-02 Thread Jeremiah Jordan

The server side compression can compress across columns/rows so it will most 
likely be more efficient.
Whether you are CPU bound or IO bound depends on your application and node 
setup.  Unless your working set fits in memory you will be IO bound, and in 
that case server side compression helps because there is less to read from 
disk.  In many cases it is actually faster to read a compressed file from disk 
and decompress it, then to read an uncompressed file from disk.

See Ed's post:
Cassandra compression is like more servers for free!
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/cassandra_compression_is_like_getting


From: benjamin.j.mcc...@gmail.com [benjamin.j.mcc...@gmail.com] on behalf of 
Ben McCann [b...@benmccann.com]
Sent: Monday, April 02, 2012 10:42 AM
To: user@cassandra.apache.org
Subject: Compression on client side vs server side

Hi,

I was curious if I compress my data on the client side with Snappy whether 
there's any difference between doing that and doing it on the server side?  The 
wiki said that compression works best where each row has the same columns.  
Does this mean the compression will be more efficient on the server side since 
it can look at multiple rows at once instead of only the row being inserted?  
The reason I was thinking about possibly doing it client side was that it would 
save CPU on the datastore machine.  However, does this matter?  Is CPU 
typically the bottleneck on a machine or is it some other resource? (of course 
this will vary for each person, but wondering if there's a rule of thumb.  I'm 
making a web app, which hopefully will store about 5TB of data and have 10s of 
millions of page views per month)

Thanks,
Ben

Re: data size difference between supercolumn and regular column

2012-04-01 Thread Jeremiah Jordan

Is that 80% with compression?  If not, the first thing to do is turn on 
compression.  Cassandra doesn't behave well when it runs out of disk space.  
You really want to try and stay around 50%,  60-70% works, but only if it is 
spread across multiple column families, and even then you can run into issues 
when doing repairs.

-Jeremiah


On Apr 1, 2012, at 9:44 PM, Yiming Sun wrote:

Thanks Aaron.  Well I guess it is possible the data files from sueprcolumns 
could've been reduced in size after compaction.

This bring yet another question.  Say I am on a shoestring budget and can only 
put together a cluster with very limited storage space.  The first iteration of 
pushing data into cassandra would drive the disk usage up into the 80% range.  
As time goes by, there will be updates to the data, and many columns will be 
overwritten.  If I just push the updates in, the disks will run out of space on 
all of the cluster nodes.  What would be the best way to handle such a 
situation if I cannot to buy larger disks? Do I need to delete the rows/columns 
that are going to be updated, do a compaction, and then insert the updates?  Or 
is there a better way?  Thanks

-- Y.

On Sat, Mar 31, 2012 at 3:28 AM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
does cassandra 1.0 perform some default compression?
No.

The on disk size depends to some degree on the work load.

If there are a lot of overwrites or deleted you may have rows/columns that need 
to be compacted. You may have some big old SSTables that have not been 
compacted for a while.

There is some overhead involved in the super columns: the super col name, 
length of the name and the number of columns.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 29/03/2012, at 9:47 AM, Yiming Sun wrote:

Actually, after I read an article on cassandra 1.0 compression just now ( 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am 
more puzzled.  In our schema, we didn't specify any compression options -- does 
cassandra 1.0 perform some default compression? or is the data reduction purely 
because of the schema change?  Thanks.

-- Y.

On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun 
yiming@gmail.commailto:yiming@gmail.com wrote:
Hi,

We are trying to estimate the amount of storage we need for a production 
cassandra cluster.  While I was doing the calculation, I noticed a very 
dramatic difference in terms of storage space used by cassandra data files.

Our previous setup consists of a single-node cassandra 0.8.x with no 
replication, and the data is stored using supercolumns, and the data files 
total about 534GB on disk.

A few weeks ago, I put together a cluster consisting of 3 nodes running 
cassandra 1.0 with replication factor of 2, and the data is flattened out and 
stored using regular columns.  And the aggregated data file size is only 488GB 
(would be 244GB if no replication).

This is a very dramatic reduction in terms of storage needs, and is certainly 
good news in terms of how much storage we need to provision.  However, because 
of the dramatic reduction, I also would like to make sure it is absolutely 
correct before submitting it - and also get a sense of why there was such a 
difference. -- I know cassandra 1.0 does data compression, but does the schema 
change from supercolumn to regular column also help reduce storage usage?  
Thanks.

-- Y.

RE: Any improvements in Cassandra JDBC driver ?

2012-03-29 Thread Jeremiah Jordan

There is no such thing as pure insert which will give an error if the thing 
already exists.  Everything is really UPDATE OR INSERT.  Whether you say 
UPDATE, or INSERT, it will all act like UPDATE OR INSERT, if the thing is 
there it get over written, if it isn't there it gets inserted.

-Jeremiah



From: Dinusha Dilrukshi [sdddilruk...@gmail.com]
Sent: Wednesday, March 28, 2012 11:41 PM
To: user@cassandra.apache.org
Subject: Any improvements in Cassandra JDBC driver ?

Hi,

We are using Cassandra JDBC driver (found in [1]) to call to Cassandra sever 
using CQL and JDBC calls.  One of the main disadvantage is, this driver is not 
available in maven repository where people can publicly access. Currently we 
have to checkout the source and build ourselves. Is there any possibility to 
host this driver in a maven repository ?

And one of the other limitation in driver is, it does not support for the 
insert query. If we need to do a insert , then it can be done using the 
update statement. So basically it will be same query used for both UPDATE and 
INSERT. As an example, if you execute following query:
update USER set 'username'=?, 'password'=? where key = ?
and if the provided 'KEY' already exist in the Column family then it will do a 
update to existing  columns. If the provided KEY does not already exist, then 
it will do a insert..
Is that the INSERT query option is now available in latest driver?

Are there any other improvements/supports added to this driver recently ?

Is this driver compatible with Cassandra-1.1.0 and is that the changes done for 
driver will be backward compatible with older Cassandra versions (1.0.0) ?

[1]. 
http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

Regards,
~Dinusha~

Re: copy data for dev

2012-03-27 Thread Jeremiah Jordan

If you have the disck space you can just copy all the data files from the 
snapshot onto the dev node, renaming any with conflicting names.  Then bring up 
the dev node and it should see the data.  you can then compact to merge and 
drop all the duplicate data.

You can also use the sstable loader tool to send the snapshot files to the dev 
node.

-Jeremiah

On Mar 26, 2012, at 2:13 PM, Deno Vichas wrote:

all,

is there a easy way to take a 4 node snapshot and restore it on my single node 
dev cluster?



thanks,
deno

RE: Network, Compaction, Garbage collection and Cache monitoring in cassandra

2012-03-21 Thread Jeremiah Jordan

You can also use any network/server monitoring tool which can talk to JMX.  We 
are currently using vFabric Hyperic's JMX plugin for this.

IIRC there are some cacti and nagios scripts on github for getting the data 
into those.

-Jeremiah



From: R. Verlangen [ro...@us2.nl]
Sent: Wednesday, March 21, 2012 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Network, Compaction, Garbage collection and Cache monitoring in 
cassandra

Hi Rishabh,

Please take a look at OpsCenter:  http://www.datastax.com/products/opscenter

It provides most of the details you request for.

Good luck!

2012/3/21 Rishabh Agrawal 
rishabh.agra...@impetus.co.inmailto:rishabh.agra...@impetus.co.in
Hello,

Can someone help me with how to proactively monitor  Network, Compaction, 
Garbage collection and Cache use in Cassandra.


Regards
Rishabh



Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more 
about our Big Data quick-start program at the event.

New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ 
available at http://bit.ly/z6zT4L.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

RE: repair broke TTL based expiration

2012-03-20 Thread Jeremiah Jordan

You need to create the tombstone in case the data was inserted without a 
timestamp at some point.

-Jeremiah


From: Radim Kolar [h...@filez.com]
Sent: Monday, March 19, 2012 4:48 PM
To: user@cassandra.apache.org
Subject: Re: repair broke TTL based expiration

Dne 19.3.2012 20:28, i...@4friends.od.ua napsal(a):

 Hello

 Datasize should decrease during minor compactions. Check logs for
 compactions results.

they do but not as much as i expect. Look at sizes and file dates:

-rw-r--r--  1 root  wheel   5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
-rw-r--r--  1 root  wheel   6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
-rw-r--r--  1 root  wheel   5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar  2 05:03 resultcache-hc-27323-Data.db
-rw-r--r--  1 root  wheel   2.0G Mar  5 09:15 resultcache-hc-27542-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
-rw-r--r--  1 root  wheel   468M Mar 15 03:27 resultcache-hc-27822-Data.db
-rw-r--r--  1 root  wheel   483M Mar 16 05:23 resultcache-hc-27853-Data.db
-rw-r--r--  1 root  wheel53M Mar 17 05:33 resultcache-hc-27901-Data.db
-rw-r--r--  1 root  wheel   485M Mar 17 09:37 resultcache-hc-27930-Data.db
-rw-r--r--  1 root  wheel   480M Mar 19 00:45 resultcache-hc-27961-Data.db
-rw-r--r--  1 root  wheel95M Mar 19 09:35 resultcache-hc-27967-Data.db
-rw-r--r--  1 root  wheel98M Mar 19 17:04 resultcache-hc-27973-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 18:23 resultcache-hc-27974-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 19:50 resultcache-hc-27975-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 21:17 resultcache-hc-27976-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 22:05 resultcache-hc-27977-Data.db

I insert everything with 7days TTL + 10 days tombstone expiration.  This
means that there should not be in ideal case nothing older then Mar 2.

These 3x5 GB files waits to be compacted. Because they contains only
tombstones, cassandra should make some optimalizations - mark sstable as
tombstone only, remember time of latest tombstone and delete entire
sstable without need to merge it first.

1. Question is why create tombstone after row expiration at all, because
it will expire at all cluster nodes at same time without need to be deleted.
2. Its super column family. When i dump oldest sstable, i wonder why it
looks like this:

{
772c61727469636c65736f61702e636f6d: {},
7175616b652d34: {1: {deletedAt: -9223372036854775808,
subColumns: [[crc32,4f34455c,1328220892597002,d],
[id,4f34455c,1328220892597000,d],
[name,4f34455c,1328220892597001,d],
[size,4f34455c,1328220892597003,d]]}, 2: {deletedAt:
-9223372036854775808, subColumns:
[[crc32,4f34455c,1328220892597007,d],
[id,4f34455c,1328220892597005,d],
[name,4f34455c,1328220892597006,d],
[size,4f34455c,1328220892597008,d]]}, 3: {deletedAt:
-9223372036854775808, subColumns:

* all subcolums are deleted. why to keep their names in table? isnt
marking column as deleted enough? 1: {deletedAt:
-9223372036854775808} enough?
* another question is why was not tombstone entire row, because all its
members were expired.

RE: Hector counter question

2012-03-19 Thread Jeremiah Jordan

No,
Cassandra doesn't support atomic counters.  IIRC it is on the list of things 
for 1.2.

-Jeremiah

From: Tamar Fraenkel [ta...@tok-media.com]
Sent: Monday, March 19, 2012 1:26 PM
To: cassandra-u...@incubator.apache.org
Subject: Hector counter question

Hi!

Is there a way to read and increment counter column atomically, something like 
incrementAndGet (Hector)?

Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956

inline: tokLogo.png

RE: 0.8.1 Vs 1.0.7

2012-03-16 Thread Jeremiah Jordan

I would guess more aggressive compaction settings, did you update rows or 
insert some twice?
If you run major compaction a couple times on the 0.8.1 cluster does the data 
size get smaller?

You can use the describe command to check if compression got turned on.

-Jeremiah


From: Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
Sent: Thursday, March 15, 2012 4:41 AM
To: user@cassandra.apache.org
Subject: 0.8.1 Vs 1.0.7

Hi,

I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results were a 
little bit surprising

0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch

XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB   12.50%
XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB   12.50%
XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB   12.50%
XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB   12.50%
XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB   12.50%
XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB   12.50%
XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB   12.50%
XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB   12.50%

1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c yet to 
join ring],
PropertyFileSnitch

XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%
XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%
XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%
XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%

There seems to be 3X savings in size for the same dataset running 1.0.7. I have 
not enabled compression for any of the CFs. Will it be enabled by default when 
creating a new CF in 1.0.7? cassandra.yaml is also mostly identical.

Thanks and Regards,
Ravi

RE: Composite keys and range queries

2012-03-14 Thread Jeremiah Jordan

Right, so until the new CQL stuff exists to actually query with something smart 
enough to know about composite keys , You have to define and query on your 
own.

Row Key = UUID
Column = CompositeColumn(string, string)

You want to then use COLUMN slicing, not row ranges to query the data.  Where 
you slice in priority as the first part of a Composite Column Name.

See the Under the hood and historical notes section of the blog post.  You 
want to layout your data per the Physical representation of the denormalized 
timeline rows diagram.
Where your UUID is the user_id from the example, and your priority is the 
tweet_id

-Jeremiah



From: John Laban [j...@pagerduty.com]
Sent: Wednesday, March 14, 2012 12:37 PM
To: user@cassandra.apache.org
Subject: Re: Composite keys and range queries

Hmm, now I'm really confused.

 This may be of use to you 
 http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

This article is what I actually used to come up with my schema here.  In the 
Clustering, composite keys, and more section they're using a schema very 
similarly to how I'm trying to use it.  They define a composite key with two 
parts, expecting the first part to be used as the partition key and the second 
part to be used for ordering.

 The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .

Why?  Shouldn't only uuid-1 be used as the partition key?  (So shouldn't 
those two hash to the same location?)

I'm thinking of using supercolumns for this instead as I know they'll work 
(where the row key is the uuid and the supercolumn name is the priority), but 
aren't composite row keys supposed to essentially replace the need for 
supercolumns?

Thanks, and sorry if I'm getting this all wrong,
John



On Wed, Mar 14, 2012 at 12:52 AM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .

You cannot do what you want to. Even if you passed a start of (uuid1,empty) 
and no finish, you would not only get rows where the key starts with uuid1.

This may be of use to you 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Or you can store all the priorities that are valid for an ID in another row.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 1:05 PM, John Laban wrote:

 Forwarding to the Cassandra mailing list as well, in case this is more of an 
 issue on how I'm using Cassandra.

 Am I correct to assume that I can use range queries on composite row keys, 
 even when using a RandomPartitioner, if I make sure that the first part of 
 the composite key is fixed?

 Any help would be appreciated,
 John



 On Tue, Mar 13, 2012 at 12:15 PM, John Laban 
 j...@pagerduty.commailto:j...@pagerduty.com wrote:
 Hi,

 I have a column family that uses a composite key:

 (ID, priority) - ...

 Where the ID is a UUID and the priority is an integer.

 I'm trying to perform a range query now:  I want all the rows where the ID 
 matches some fixed UUID, but within a range of priorities.  This is supported 
 even if I'm using a RandomPartitioner, right?  (Because the first key in the 
 composite key is the partition key, and the second part of the composite key 
 is automatically ordered?)

 So I perform a range slices query:

 val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new 
 CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
 rangeQuery.setColumnFamily(RouteColumnFamilyName).
 setKeys( new Composite(id, priorityStart), new Composite(id, 
 priorityEnd) ).
 setRange( null, null, false, Int.MaxValue )


 But I get this error:

 me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
 InvalidRequestException(why:start key's md5 sorts after end key's md5.  this 
 is not allowed; you probably should not specify end key at all, under 
 RandomPartitioner)

 Shouldn't they have the same md5, since they have the same partition key?

 Am I using the wrong query here, or does Hector not support composte range 
 queries, or am I making some mistake in how I think Cassandra's composite 
 keys work?

 Thanks,
 John

Re: Schema change causes exception when adding data

2012-03-06 Thread Jeremiah Jordan


That is the best one I have found.

On 03/01/2012 03:12 PM, Tharindu Mathew wrote:

There are 2. I'd like to wait till there are one, when I insert the value.

Going through the code, calling client.describe_schema_versions() 
seems to give a good answer to this. And I discovered that if I wait 
till there is only 1 version, I will not get this error.


Is this the best practice if I want to check this programatically?

On Thu, Mar 1, 2012 at 11:15 PM, aaron morton aa...@thelastpickle.com 
mailto:aa...@thelastpickle.com wrote:


use describe cluster in the CLI to see how many schema versions
there are.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 12:25 AM, Tharindu Mathew wrote:




On Thu, Mar 1, 2012 at 11:47 AM, Tharindu Mathew
mcclou...@gmail.com mailto:mcclou...@gmail.com wrote:

Jeremiah,

Thanks for the reply.

This is what we have been doing, but it's not reliable as we
don't know a definite time that the schema would get
replicated. Is there any way I can know for sure that changes
have propagated?

[Edit: corrected to a question]


Then I can block the insertion of data until then.


On Thu, Mar 1, 2012 at 4:33 AM, Jeremiah Jordan
jeremiah.jor...@morningstar.com
mailto:jeremiah.jor...@morningstar.com wrote:

The error is that the specified colum family doesn’t
exist.  If you connect with the CLI and describe the
keyspace does it show up?  Also, after adding a new
column family programmatically you can’t use it
immediately, you have to wait for it to propagate.  You
can use calls to describe schema to do so, keep calling
it until every node is on the same schema.

-Jeremiah

*From:*Tharindu Mathew [mailto:mcclou...@gmail.com
mailto:mcclou...@gmail.com]
*Sent:* Wednesday, February 29, 2012 8:27 AM
*To:* user
*Subject:* Schema change causes exception when adding data

Hi,

I have a 3 node cluster and I'm dynamically updating a
keyspace with a new column family. Then, when I try to
write records to it I get the following exception shown
at [1].

How do I avoid this. I'm using Hector and the default
consistency level of QUORUM is used. Cassandra version
0.7.8. Replication Factor is 1.

How can I solve my problem?

[1] -
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:unconfigured columnfamily
proxySummary)

at

me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)

at

me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

at

me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:156)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(KeyspaceServiceImpl.java:401)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:67)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:59)

at

me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

at

me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:72)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(ThriftMultigetSliceQuery.java:58)



-- 
Regards,


Tharindu

blog: http://mackiemathew.com/




-- 
Regards,


Tharindu

blog: http://mackiemathew.com/




-- 
Regards,


Tharindu

blog: http://mackiemathew.com/






--
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Adding a second datacenter

2012-03-05 Thread Jeremiah Jordan

You need to make sure your clients are reading using LOCAL_* settings so 
that they don't try to get data from the other data center.  But you 
shouldn't get errors while replication_factor is 0.  Once you change the 
replication factor to 4, you should get missing data if you are using 
LOCAL_* for reading.


What version are you using?

See the IRC logs at the begining of this JIRA discussion thread for some 
info:


https://issues.apache.org/jira/browse/CASSANDRA-3483

But you should be able to:
1. Set dc2:0 in the replication_factor.
2. Set bootstrap to false on the new nodes.
2. Start all of the new nodes.
3. Change replication_factor to dc2:4
4. run repair on the nodes in dc2.

Once the repairs finish you should be able to start using DC2.  You are 
still going to need a bunch of extra space because the repair is going 
to get you a couple copies of the data.


Once 1.1 comes out it will have new nodetool commands for making this a 
little nicer per CASSANDRA-3483


-Jeremiah


On 03/05/2012 09:42 AM, David Koblas wrote:
Everything that I've read about data centers focuses on setting things 
up at the beginning of time.


I've the the following situation:

10 machines in a datacenter (DC1), with replication factor of 2.

I want to set up a second data center (DC2) with the following 
configuration:

  20 machines with a replication factor of 4

What I've found is that if I initially start adding things, the first 
machine to join the network attempts to replicate all of the data from 
DC1 and fills up it's disk drive.  I've played with setting the 
storage_options to have a replication factor of 0, then I can bring up 
all 20 machines in DC2 but then start getting a huge number of read 
errors from read on DC1.


Is there a simple cookbook on how to add a second DC?  I'm currently 
trying to set the replication factor to 1 and do a repair, but that 
doesn't feel like the right approach.


Thanks,

Re: Rationale behind incrementing all tokens by one in a different datacenter (was: running two rings on the same subnet)

2012-03-05 Thread Jeremiah Jordan

There is a requirement that all nodes have a unique token.  There is 
still one global cluster/ring that each node needs to be unique on.  The 
logically seperate rings that NetworkTopologyStrategy puts them into is 
hidden from the rest of the code.


-Jeremiah

On 03/05/2012 05:13 AM, Hontvári József Levente wrote:

I am thinking about the frequent example:

dc1 - node1: 0
dc1 - node2: large...number

dc2 - node1: 1
dc2 - node2: large...number + 1

In theory using the same tokens in dc2 as in dc1 does not 
significantly affect key distribution, specifically the two keys on 
the border will move to the next one, but that is not much. However it 
seems that there is an unexplained requirement (at least I could not 
find an explanation), that all nodes must have a unique token, even if 
they are put into a different circle by NetworkTopologyStrategy.





On 2012.03.05. 11:48, aaron morton wrote:
Moreover all tokens must be unique (even across datacenters), 
although - from pure curiosity - I wonder what is the rationale 
behind this.

Otherwise data is not evenly distributed.

Re: unidirectional communication/replication

2012-02-29 Thread Jeremiah Jordan

You might check out some of the stuff Netflix does with their Cassandra 
backup, and Cassandra ETL tools.:

http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html
http://techblog.netflix.com/2012/02/announcing-priam.html


-Jeremiah

On 02/29/2012 11:04 AM, Alexandru Sicoe wrote:




On Sun, Feb 26, 2012 at 8:24 PM, aaron morton aa...@thelastpickle.com 
mailto:aa...@thelastpickle.com wrote:


All nodes in the cluster need two way communication. Nodes need to
talk to Gossip to each other so they know they are alive.

If you need to dump a lot of data consider the Hadoop integration.
http://wiki.apache.org/cassandra/HadoopSupport It can run a bit
faster than going through the thrift api.


Thanks for the suggestion, I will look into it.


Copying sstables may be another option depending on the data size.


The problem with this is that the SSTable, from what I understand, is 
per CF, Since I will want to do a semi real time replication of just 
the latest data added this won't work because I will be copying over 
all the data in the CF.


Cheers,
A


Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote:


Hello everyone,

I'm battling with this contraint that I have: I need to regularly
ship out timeseries data from a Cassandra cluster that sits
within an enclosed network, outside of the network.

I tried to select all the data within a certian time window,
writing to a file, and then copying the file out but this hits
the I/O performance because even for a small time window (say
5mins) I am hitting more than a million rows.

It would really help if I used Cassandra to replicate the data
automatically outside. The problem is they will only allow me to
have outbound traffic out of the enclosed network (not inbound).
Is there any way to configure the cluster or have 2 data centers
in such a way that the data center (node or cluster) outside of
the enclosed network only gets a replica of the data, without
ever needing to communicate anything back?

I appreciate the help,
Alex

RE: Schema change causes exception when adding data

2012-02-29 Thread Jeremiah Jordan

The error is that the specified colum family doesn't exist.  If you
connect with the CLI and describe the keyspace does it show up?  Also,
after adding a new column family programmatically you can't use it
immediately, you have to wait for it to propagate.  You can use calls to
describe schema to do so, keep calling it until every node is on the
same schema.

 

-Jeremiah

 

From: Tharindu Mathew [mailto:mcclou...@gmail.com] 
Sent: Wednesday, February 29, 2012 8:27 AM
To: user
Subject: Schema change causes exception when adding data

 

Hi,

I have a 3 node cluster and I'm dynamically updating a keyspace with a
new column family. Then, when I try to write records to it I get the
following exception shown at [1].

How do I avoid this. I'm using Hector and the default consistency level
of QUORUM is used. Cassandra version 0.7.8. Replication Factor is 1.

How can I solve my problem?

[1] -
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:unconfigured columnfamily proxySummary)

at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Exce
ptionsTranslatorImpl.java:42)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(Keyspace
ServiceImpl.java:397)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(Keyspace
ServiceImpl.java:383)

at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation
.java:101)

at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailov
er(HConnectionManager.java:156)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover
(KeyspaceServiceImpl.java:129)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(Keysp
aceServiceImpl.java:401)

at
me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKey
space(ThriftMultigetSliceQuery.java:67)

at
me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKey
space(ThriftMultigetSliceQuery.java:59)

at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAnd
Measure(KeyspaceOperationCallback.java:20)

at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeys
pace.java:72)

at
me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(T
hriftMultigetSliceQuery.java:58)



-- 
Regards,

Tharindu

 

blog: http://mackiemathew.com/

Chicago Cassandra Meetup on 3/1 (Preview of my Pycon talk)

2012-02-22 Thread Jeremiah Jordan

I am going to be doing a trial run of my Pycon talk about setting up a 
development instance of Cassandra and accessing it from Python (Pycassa 
mostly, some thrift just to scare people off of using thrift) for a 
Chicago Cassandra Meetup.  Anyone in Chicago feel free to come by.  The 
talk is next Thursday, 3/1.  See the Meetup listing for full time/place/etc.


http://www.meetup.com/Cassandra-Chicago/events/53378712/

If you are going to be at Pycon, I will be presenting on Friday 3/9 @ 2:40.
https://us.pycon.org/2012/schedule/presentation/122/

If anyone is interested we could probably get some kind of Cassandra 
Open Space going as well.  I see DataStax is a Pycon sponsor, are you 
guys planning anything?


-Jeremiah

Re: Deleting a column vs setting it's value to empty

2012-02-10 Thread Jeremiah Jordan

Either one works fine.  Setting to  may cause you less headaches as 
you won't have to deal with tombstones.  Deleting a non existent column 
is fine.


-Jeremiah

On 02/10/2012 02:15 PM, Drew Kutcharian wrote:

Hi Everyone,

Let's say I have the following object which I would like to save in Cassandra:

class User {
   UUID id; //row key
   String name; //columnKey: name, columnValue: the name of the user
   String description; //columnKey: description, columnValue: the description 
of the user
}

Description can be nullable. What's the best approach when a user updates her 
description and sets it to null? Should I delete the description column or set 
it to an empty string?

In addition, if I go with the delete column strategy, since I don't know what 
was the previous value of description (the column could not even exist), what 
would happen when I delete a non existent column?

Thanks,

Drew

Re: Cassandra 1.0.6 multi data center question

2012-02-09 Thread Jeremiah Jordan

No, not an issue.  The nodes in DC2 know that they aren't supposed to 
have data, so they go ask the nodes in DC1 for the data to return to you.


-Jeremiah

On 02/09/2012 05:28 AM, Roshan Pradeep wrote:

Thanks Peter for the replies.

Previously it was a typing mistake and it should be getting. I 
checked the DC2 (with having replica 0) and noticed that there is no 
SSTables created.


I use java hector sample program to insert data to the keyspace. After 
I insert a data item, I

1) Login to one of node in DC having replica count 0 using cassanda-cli.
2) Use the keyspace and list the column family.
3) I can see the data item inserted from DC having replica count 1.

Is this a issue? Please clarify. Thanks again.


On Thu, Feb 9, 2012 at 6:00 PM, Peter Schuller 
peter.schul...@infidyne.com mailto:peter.schul...@infidyne.com wrote:


Again the *schema* gets propagated and the keyspace will exist
everywhere. You should just have exactly zero amount of data for the
keyspace in the DC w/o replicas.


--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Disable Nagle algoritm in thrift i.e. TCP_NODELAY

2012-01-26 Thread Jeremiah Jordan

Should already be on for all of the server side stuff.  All of the 
clients that I have used set it as well.


-Jeremiah

On 01/26/2012 07:17 AM, ruslan usifov wrote:

Hello

Is it possible set TCP_NODELAY on thrift socket in cassandra?

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Jeremiah Jordan

Are you deleting data or using TTL's?  Expired/deleted data won't go 
away until the sstable holding it is compacted.  So if compaction has 
happened on some nodes, but not on others, you will see this.  The 
disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, 
but with our data using TTL's if I run major compactions a couple times 
on that column family it can shrink ~30%-40%.


-Jeremiah

On 01/17/2012 12:51 PM, Marcel Steinbach wrote:

We are running regular repairs, so I don't think that's the problem.
And the data dir sizes match approx. the load from the nodetool.
Thanks for the advise, though.

Our keys are digits only, and all contain a few zeros at the same 
offsets. I'm not that familiar with the md5 algorithm, but I doubt 
that it would generate 'hotspots' for those kind of keys, right?


On 17.01.2012, at 17:34, Mohit Anchlia wrote:


Have you tried running repair first on each node? Also, verify using
df -h on the data dirs

On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
marcel.steinb...@chors.de mailto:marcel.steinb...@chors.de wrote:

Hi,

we're using RP and have each node assigned the same amount of the 
token space. The cluster looks like that:


Address Status State   LoadOwnsToken
  
205648943402372032879374446248852460236
1   Up Normal  310.83 GB   12.50% 
 56775407874461455114148055497453867724
2   Up Normal  470.24 GB   12.50% 
 78043055807020109080608968461939380940
3   Up Normal  271.57 GB   12.50% 
 99310703739578763047069881426424894156
4   Up Normal  282.61 GB   12.50% 
 120578351672137417013530794390910407372
5   Up Normal  248.76 GB   12.50% 
 141845999604696070979991707355395920588
6   Up Normal  164.12 GB   12.50% 
 163113647537254724946452620319881433804
7   Up Normal  76.23 GB12.50% 
 184381295469813378912913533284366947020
8   Up Normal  19.79 GB12.50% 
 205648943402372032879374446248852460236


I was under the impression, the RP would distribute the load more 
evenly.
Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a 
single node. Should we just move the nodes so that the load is more 
even distributed, or is there something off that needs to be fixed 
first?


Thanks
Marcel
hr style=border-color:blue
pchors GmbH
brhr style=border-color:blue
pspecialists in digital and direct marketing solutionsbr
Haid-und-Neu-Straße 7br
76131 Karlsruhe, Germanybr
www.chors.com/p
pManaging Directors: Dr. Volker Hatz, Markus 
PlattnerbrAmtsgericht Montabaur, HRB 15029/p
p style=font-size:9pxThis e-mail is for the intended recipient 
only and may contain confidential or privileged information. If you 
have received this e-mail by mistake, please contact us immediately 
and completely delete it (and any attachments) and do not forward it 
or inform any other person of its contents. If you send us messages 
by e-mail, we take this as your authorization to correspond with you 
by e-mail. E-mail transmission cannot be guaranteed to be secure or 
error-free as information could be intercepted, amended, corrupted, 
lost, destroyed, arrive late or incomplete, or contain viruses. 
Neither chors GmbH nor the sender accept liability for any errors or 
omissions in the content of this message which arise as a result of 
its e-mail transmission. Please note that all e-mail communications 
to and from chors GmbH may be monitored./p

Re: nodetool ring question

2012-01-17 Thread Jeremiah Jordan

There were some nodetool ring load reporting issues with early version 
of 1.0.X don't remember when they were fixed, but that could be your 
issue.  Are you using compressed column families, a lot of the issues 
were with those.

Might update to 1.0.7.

-Jeremiah

On 01/16/2012 04:04 AM, Michael Vaknine wrote:


Hi,

I have a 4 nodes cluster 1.0.3 version

This is what I get when I run nodetool ring

Address DC  RackStatus State   Load
OwnsToken


   
127605887595351923798765477786913079296


10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
25.00%  0


10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
25.00%  42535295865117307932921825928971026432


10.8.189.197datacenter1 rack1   Up Normal  53.7 GB 
25.00%  85070591730234615865843651857942052864


10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
25.00%  127605887595351923798765477786913079296


I have finished running repair on all 4 nodes.

I have less then 10 GB on the /var/lib/cassandra/data/ folders

My question is Why nodetool reports almost 50 GB on each node?

Thanks

Michael

Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan

Correct, any kind of locking in Cassandra requires clocks that are in 
sync, and requires you to wait possible clock out of sync time before 
reading to check if you got the lock, to prevent the issue you describe 
below.


There was a pretty detailed discussion of locking with only Cassandra a 
month or so back on this list.


-Jeremiah

On 01/06/2012 02:42 PM, Bryce Allen wrote:

On Fri, 6 Jan 2012 10:38:17 -0800
Mohit Anchliamohitanch...@gmail.com  wrote:

It could be as simple as reading before writing to make sure that
email doesn't exist. But I think you are looking at how to handle 2
concurrent requests for same email? Only way I can think of is:

1) Create new CF say tracker
2) write email and time uuid to CF tracker
3) read from CF tracker
4) if you find a row other than yours then wait and read again from
tracker after few ms
5) read from USER CF
6) write if no rows in USER CF
7) delete from tracker

Please note you might have to modify this logic a little bit, but this
should give you some ideas of how to approach this problem without
locking.

Distributed locking is pretty subtle; I haven't seen a correct solution
that uses just Cassandra, even with QUORUM read/write. I suspect it's
not possible.

With the above proposal, in step 4 two processes could both have
inserted an entry in the tracker before either gets a chance to check,
so you need a way to order the requests. I don't think the timestamp
works for ordering, because it's set by the client (even the internal
timestamp is set by the client), and will likely be different from
when the data is actually committed and available to read by other
clients.

For example:

* At time 0ms, client 1 starts insert of u...@example.org
* At time 1ms, client 2 also starts insert for u...@example.org
* At time 2ms, client 2 data is committed
* At time 3ms, client 2 reads tracker and sees that it's the only one,
   so enters the critical section
* At time 4ms, client 1 data is committed
* At time 5ms, client 2 reads tracker, and sees that is not the only
   one, but since it has the lowest timestamp (0ms vs 1ms), it enters
   the critical section.

I don't think Cassandra counters work for ordering either.

This approach is similar to the Zookeeper lock recipe:
http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
but zookeeper has sequence nodes, which provide a consistent way of
ordering the requests. Zookeeper also avoids the busy waiting.

I'd be happy to be proven wrong. But even if it is possible, if it
involves a lot of complexity and busy waiting it's probably not worth
it. There's a reason people are using Zookeeper with Cassandra.

-Bryce

Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan

Since a Zookeeper cluster is a quorum based system similar to Cassandra, 
it only goes down when n/2 nodes go down.  And the same way you have to 
stop writing to Cassandra if N/2 nodes are down (if using QUoRUM), your 
App will have to wait for the Zookeeper cluster to come online again 
before it can proceed.


On 01/06/2012 12:03 PM, Drew Kutcharian wrote:

Hi Everyone,

What's the best way to reliably have unique constraints like functionality with 
Cassandra? I have the following (which I think should be very common) use case.

User CF
Row Key: user email
Columns: userId: UUID, etc...

UserAttribute1 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

UserAttribute2 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

The issue is we need to guarantee that no two people register with the same email 
address. In addition, without locking, potentially a malicious user can 
hijack another user's account by registering using the user's email address.

I know that this can be done using a lock manager such as ZooKeeper or 
HazelCast, but the issue with using either of them is that if ZooKeeper or 
HazelCast is down, then you can't be sure about the reliability of the lock. So 
this potentially, in the very rare instance where the lock manager is down and 
two users are registering with the same email, can cause major issues.

In addition, I know this can be done with other tools such as Redis (use Redis 
for this use case, and Cassandra for everything else), but I'm interested in 
hearing if anyone has solved this issue using Cassandra only.

Thanks,

Drew

Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan

By using quorum.  One of the partitions will may be able to acquire 
locks, the other one won't...


On 01/06/2012 03:36 PM, Drew Kutcharian wrote:

Bryce,

I'm not sure about ZooKeeper, but I know if you have a partition between 
HazelCast nodes, than the nodes can acquire the same lock independently in each 
divided partition. How does ZooKeeper handle this situation?

-- Drew


On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:


On Fri, 6 Jan 2012 10:03:38 -0800
Drew Kutchariand...@venarc.com  wrote:

I know that this can be done using a lock manager such as ZooKeeper
or HazelCast, but the issue with using either of them is that if
ZooKeeper or HazelCast is down, then you can't be sure about the
reliability of the lock. So this potentially, in the very rare
instance where the lock manager is down and two users are registering
with the same email, can cause major issues.

For most applications, if the lock managers is down, you don't acquire
the lock, so you don't enter the critical section. Rather than allowing
inconsistency, you become unavailable (at least to writes that require
a lock).

-Bryce

Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name

2012-01-04 Thread Jeremiah Jordan

Unless you are running into an issue with using super columns that make 
the composite columns better fit what you are trying to do, I would just 
stick with super-columns.  if it ain't broke don't fix it.


-Jeremiah

On 01/03/2012 11:21 PM, Asil Klin wrote:
@Stephan: in that case, you can easily tell the names of all columns 
you want to retrieve, so you can make a query to retrieve those list 
of composite columns.



@Jeremiah,
So where is my best bet ? Should I leave the supercolumns as it is as 
of now, since I can find a good way to use them incase I replace them 
with composite columns?




On Wed, Jan 4, 2012 at 4:01 AM, Stephen Pope stephen.p...@quest.com 
mailto:stephen.p...@quest.com wrote:


 The bonus you're talking about here, how do I apply that?

 For example, my columns are in the form of number.id
http://number.id such as 4.steve, 4.greg, 5.steve, 5.george. Is
there a way to query a slice of numbers with a list of ids? As in,
I want all the columns with numbers between 4 and 10 which have
ids steve or greg.

 Cheers,
 Steve

-Original Message-
From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.com
mailto:jeremiah.jor...@morningstar.com]
Sent: Tuesday, January 03, 2012 3:12 PM
To: user@cassandra.apache.org mailto:user@cassandra.apache.org
Cc: Asil Klin
Subject: Re: Replacing supercolumns with composite columns;
Getting the equivalent of retrieving a list of supercolumns by name

The main issue with replacing super columns with composite columns
right now is that if you don't know all your sub-column names you
can't select multiple super columns worth of data in the same
query without getting extra stuff.  You have to use a slice to get
all subcolumns of a given super column, and you can't have
disjoint slices, so if you want two super columns full, you have
to get all the other stuff that is in between them, or make two
queries.
If you know what all of the sub-column names are you can ask for
all of the super/sub column pairs for all of the super columns you
want and not get extra data.

If you don't need to pull multiple super columns at a time with
slices like that, then there isn't really an issue.

A bonus of using composite keys like this, is that if there is a
specific sub column you want from multiple super columns, you can
pull all those out with a single multiget and you don't have to
pull the rest of the columns...

So there are pros and cons...

-Jeremiah


On 01/03/2012 01:58 PM, Asil Klin wrote:
 I have a super columns family which I always use to retrieve a
list of
 supercolumns(with all subcolumns) by name. I am looking forward to
 replace all SuperColumns in my schema with the composite columns.

 How could I design schema so that I could do the equivalent of
 retrieving a list of supercolumns by name, in case of using
composite
 columns.

 (As of now I thought of using the supercolumn name as the first
 component of the composite name and the subcolumn name as 2nd
 component of composite name.)

Re: Replacing supercolumns with composite columns; Getting the equivalent of retrieving a list of supercolumns by name

2012-01-03 Thread Jeremiah Jordan

The main issue with replacing super columns with composite columns right 
now is that if you don't know all your sub-column names you can't select 
multiple super columns worth of data in the same query without getting 
extra stuff.  You have to use a slice to get all subcolumns of a given 
super column, and you can't have disjoint slices, so if you want two 
super columns full, you have to get all the other stuff that is in 
between them, or make two queries.
If you know what all of the sub-column names are you can ask for all of 
the super/sub column pairs for all of the super columns you want and not 
get extra data.


If you don't need to pull multiple super columns at a time with slices 
like that, then there isn't really an issue.


A bonus of using composite keys like this, is that if there is a 
specific sub column you want from multiple super columns, you can pull 
all those out with a single multiget and you don't have to pull the rest 
of the columns...


So there are pros and cons...

-Jeremiah


On 01/03/2012 01:58 PM, Asil Klin wrote:
I have a super columns family which I always use to retrieve a list of 
supercolumns(with all subcolumns) by name. I am looking forward to 
replace all SuperColumns in my schema with the composite columns.


How could I design schema so that I could do the equivalent 
of retrieving a list of supercolumns by name, in case of using 
composite columns.


(As of now I thought of using the supercolumn name as the first 
component of the composite name and the subcolumn name as 2nd 
component of composite name.)

Re: Newbie question about writer/reader consistency

2011-12-29 Thread Jeremiah Jordan

So you can do this with Cassandra, but you need more logic in your code.
Basically, you get the last safe number, M, then get N..M, if there are any
gaps, you try again reading those numbers. As long as you are not over writing
data, and you only update the last safe number after a successful write to
Cassandra, you can do this. We currently do something very similar to this for
some of our data.

-Jeremiah

On Dec 26, 2011, at 12:38 PM, Vladimir Mosgalin wrote:

Hello everybody.

I am developer of financial-related application, and I'm currently evaluating
various nosql databases for our current goal: storing various views which show
state of the system in different aspects after each transaction.

The write load seems to be bigger than typical SQL database would handle
without problems - under test load of tens of transactions per second, each
transaction generates changes in dozen of views, which generates hundreds
messages per second total. Each message (change) for each view must be
stored, as well as resulting view (generated as kind-of update of old view);
it
means multiple inserts updates per message which go as single transaction. I
started to look into nosql databases. I'm a bit puzzled by guarantees of
atomicity and isolation that Cassandra provides, so my question will be about
how to (if possible at all) attain required level of consistency in Cassandra.
I've read various documents and introductions into Cassandra's data model but
still can't understands basics about data consistency. This discussion
http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-n
makes me feel disappointed about consistency in Cassandra, but I wonder is
there is a way to work around it.

The requirements are like this. There is one writer, which modifies two
tables (I'm sorry for using SQL terms, I just don't want to create
more confusion for mapping them into Cassandra terms at this stage). For
the first table, it's a simple insert; index is unique SCN which is
guaranteed to be larger than previous one.

Let's say it inserts
SCN DATA
1 AAA
2 BBB
3 CCC

The goal for the client (reader) is to get all the data from scn N to scn M
without gaps. It is fine if it can't see the very latest SCN yet, that is,
gets
1:AAA and 2:BBB on request SCN: 1..END; what is NOT fine is to get
something 1:AAA and 3:CCC. In other words, does Cassandra provide
consistency between writer and reader regarding the order of changes? Or under
some conditions (say, very fast writes - but always from single writer - and
many concurrent reads or something) it might be possible to get that kind of
gap?

The second question is similar, but on bigger scale. The second table must be
modified in more complicated way; both insert and update of old data are
required. Sometimes it's few insert and few updates, which must be done
atomically - under no conditions reader should be able to see the mid-state of
these inserts/updates. Fortunately, all these new changes will have a new key
(new SCNs), so if it would be just possible to use a column in separate table
which stores last safe SCN it would work - but I have no faith that
Cassandra
offers such level of consistency. In example, let's say it works like this

current last safe SCN: 1000

update (must be viewed as an atomic transaction):
SCN DATA
1001 AAA
1002 BBB
800 1001
1003 DDD

new last safe SCN: 1003

Here, readers need a mean to filter out lines with SCN1000 until the writer
is
done writing 1003:DDD line. They also need to filter out 800:1001 line
because it references SCN which is after current last safe one.

last safe SCN is stored somewhere, and for this pattern to work I once again
need execution order consistency - no reader should ever see last safe:
1003 line before all the previous lines were commited; and any reader who saw
last safe: 1003 line must be able to see all the lines from that update just
like they are right now.

Is this possible to do in Cassandra?

Re: memory estimate for each key in the key cache

2011-12-19 Thread Jeremiah Jordan

It is not telling you to multiply your key size by 10-12, it is telling you to 
multiply the output of the nodetool cfstats reported key cache size by 10-12.

-Jeremiah

On Dec 18, 2011, at 6:37 PM, Guy Incognito wrote:

 to be blunt, this doesn't sound right to me, unless it's doing something 
 rather more clever to manage the memory.
 
 i mocked up a simple class containing a byte[], ByteBuffer and long, and the 
 shallow size alone is 32 bytes.  deep size with a byte[16], 1-byte bytebuffer 
 and long is 132.  this is a on a 64-bit jvm on win x64, but is 
 consistent(ish) with what i've seen in the past on linux jvms.  the actual 
 code has rather more objects than this (it's a map, it has a pair, 
 decoratedKey) so would be quite a bit bigger per key.
 
 On 17/12/2011 03:42, Brandon Williams wrote:
 On Fri, Dec 16, 2011 at 9:31 PM, Dave Brosiusdbros...@mebigfatguy.com  
 wrote:
 Wow, Java is a lot better than I thought if it can perform that kind of
 magic.  I'm guessing the wiki information is just old and out of date. It's
 probably more like 60 + sizeof(key)
 With jamm and MAT it's fairly easy to test.  The number is accurate
 last I checked.
 
 -Brandon

Re: gracefully recover from data file corruptions

2011-12-16 Thread Jeremiah Jordan

You need to run repair on the node once it is back up (to get back the 
data you just deleted).  If this is happening on more than one node you 
could have data loss...


-Jeremiah

On 12/16/2011 07:46 AM, Ramesh Natarajan wrote:

We are running a 30 node 1.0.5 cassandra cluster  running RHEL 5.6
x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion
error during compactions and at this point we are suspecting anything
from OS/ESXi/HBA/iSCSI RAID.  Please correct me i am wrong, once a
node gets into this state I don't see any way to recover unless I
remove the corrupted data file and restart cassandra. I am running
tests with replication factor 3 and all reads and writes are done with
QUORUM. So i believe there will not be data loss if i do this.

If this is a correct way to recover I would like to know how to
gracefully do this in production environment..

- Disable thrift
- Disable gossip
- Drain the node
- kill the cassandra java process ( send a sigterm and or sigkill )
- do a filesystem sync
- remove the corrupted file from the /var/lib/cassandra/data directory
- start cassandra
- enable gossip so all pending hintedhandoff occurs
- enable thrift.

Thanks
Ramesh

Re: Cassandra C client implementation

2011-12-14 Thread Jeremiah Jordan


If you are OK linking to a C++ based library you can look at:
https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie
It is wrapper code around libcassandra which exports a C++ interface.
If you look at the function names etc in the other languages, just use 
the similar functions from the c_glib thrift...
If you are going to mess with using the c_glib thrift, make sure to 
check out the JIRA for it, it is new and has some issues...

https://issues.apache.org/jira/browse/THRIFT/component/12313854


On 12/14/2011 09:11 AM, Vlad Paiu wrote:

Hello,

I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
an application written entirelly in C, so C++ is not an option.

Is there any C client library for cassandra ?

  I have also tried to generate thrift glibc code for Cassandra, but on 
wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.

Can anybody suggest a C client library for Cassandra or provide some working 
examples for Thrift in C ?

Thanks and Regards,
Vlad

Re: Slow Compactions - CASSANDRA-3592

2011-12-13 Thread Jeremiah Jordan


Does your issue look similar this one?
https://issues.apache.org/jira/browse/CASSANDRA-3532
It is also dealing with compactaion taking 10X longer in 1.0.X

On 12/13/2011 09:00 AM, Dan Hendry wrote:


I have been observing that major compaction can be incredibly slow in 
Cassandra 1.0 and was curious the extent to which anybody else has 
noticed similar behaviour. Essentially I believe the problem involves 
the combination of wide rows and expiring columns.


Relevant details included in: 
https://issues.apache.org/jira/browse/CASSANDRA-3592


Dan Hendry

(403) 660-2297

Re: cassandra in production environment

2011-12-12 Thread Jeremiah Jordan

What java are you using? OpenJDK or Sun/Oracle 
(http://www.oracle.com/technetwork/java/javase/downloads/index.html)?  
If you are using OpenJDK you might try Sun.  Have you run diagnostics on 
the disk?  It is more likely there is an issue with your disk, not with 
Cassandra.



On 12/11/2011 07:04 PM, Ramesh Natarajan wrote:

Hi,

  We are currently testing cassandra in RHEL 6.1 64 bit environment
running on ESXi 5.0 and are experiencing issues with data file
corruptions. If you are using linux for production environment can you
please share which OS/version you are using?

thanks
Ramesh

Re: Need to reconcile data from 2 drives

2011-12-12 Thread Jeremiah Jordan

If you don't want downtime, you can take the original data and use the 
bulk sstable loader to send it back into the cluster.  If you don't mind 
downtime you can take all the files from both data folders and put them 
together, make sure there aren't any with the same names (rename them if 
there are) and then start cassandra, it will pick up all the files.


-Jeremiah

On 12/12/2011 12:53 PM, Stephane Legay wrote:
Here's the situation. We're running a 2-node cluster on EC2 (v 0.8.6). 
Each node writes data to a mounted EBS volume mounted on /mnt2.


On Dec. 9th, for some reason both instances were rebooted (not sure 
yet what triggered the reboot). But the EBS volumes were not added to 
/etc/fstab, and didn't mount upon reboot. Cassandra did auto-start 
without any problems, created a new data folder on the system drive 
and started writing there. We just found out about the issue today 
with users missing data.


So, to recap:

- each node contains data created since 12-09-2011, stored on the 
system drive
- each node has access to data created on or before 12-09-2011 on an 
EBS volume
- we need to move the data stored on the system drive to the EBS 
volume and restart Cassandra into a stable state will all data available


What's the best way for me to do this?

Thanks

Re: exporting data from Cassandra cluster

2011-12-09 Thread Jeremiah Jordan

Once you get all of the data on one machine you can then 
flush/drain/compact shutdown the single node and then take the data 
folder off that machine and back it up.  Then when you get your new 
cassandra cluster setup you can use the sstable loader to shoot the data 
from the backup into the new cluster.


On 12/09/2011 07:09 AM, Alexandru Dan Sicoe wrote:

Hi Jeremiah,
 The thing is I will send the data to a massive storage facility (I 
don't know what's behind the scenes) so I won't be backing up on one 
machine where I can install Cassandra. Does the sstable loader work 
just for copying data from a Cassandra cluster to somewhere on a disk 
where there is no Cassandra instance? If not what is the best way/tool 
to achieve that?


Cheers,
Alexandru

On Wed, Dec 7, 2011 at 10:00 PM, Jeremiah Jordan 
jeremiah.jor...@morningstar.com 
mailto:jeremiah.jor...@morningstar.com wrote:


Stop your current cluster.  Start a new cassandra instance on the
machine you want to store your data on.  Use the sstable loader to
load the sstables from all of the current machines into the new
machine.  Run major compaction a couple times.  You will have all
of the data on one machine.


On 12/07/2011 10:17 AM, Alexandru Dan Sicoe wrote:

Hello everyone.
 3 node Cassandra 0.8.5 cluster. I've left the system running
in production environment for long term testing. I've
accumulated about 350GB of data with RF=2. The machines I used
for the tests are older and need to be replaced. Because of
this I need to export the data to a permanent location. How
should I export the data? In order to reduce the storage spac
I want to export only the non-replicated data? I mean, just
one copy of the data (without the replicas). Is this possible?
How?

Cheers,
Alexandru

Re: exporting data from Cassandra cluster

2011-12-07 Thread Jeremiah Jordan

Stop your current cluster.  Start a new cassandra instance on the 
machine you want to store your data on.  Use the sstable loader to load 
the sstables from all of the current machines into the new machine.  Run 
major compaction a couple times.  You will have all of the data on one 
machine.


On 12/07/2011 10:17 AM, Alexandru Dan Sicoe wrote:

Hello everyone.
 3 node Cassandra 0.8.5 cluster. I've left the system running in 
production environment for long term testing. I've accumulated about 
350GB of data with RF=2. The machines I used for the tests are older 
and need to be replaced. Because of this I need to export the data to 
a permanent location. How should I export the data? In order to reduce 
the storage spac I want to export only the non-replicated data? I 
mean, just one copy of the data (without the replicas). Is this 
possible? How?


Cheers,
Alexandru

Re: Insufficient disk space to flush

2011-12-01 Thread Jeremiah Jordan

If you are writing data with QUORUM or ALL you should be safe to restart 
cassandra on that node.  If the extra space is all from *tmp* files from 
compaction they will get deleted at startup.  You will then need to run 
repair on that node to get back any data that was missed while it was 
full.  If your commit log was on a different device you may not even 
have lost much.


-Jeremiah

On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
 4 node Cassandra 0.8.5 cluster with RF =2.
 One node started throwing exceptions in its log:

ERROR 10:02:46,837 Fatal exception in thread 
Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient 
disk space to flush 17296 bytes
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space to 
flush 17296 bytes
at 
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
at 
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)

at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
at 
org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

... 3 more

Checked disk and obviously it's 100% full.

How do I recover from this without loosing the data? I've got plenty 
of space on the other nodes, so I thought of doing a decommission 
which I understand reassigns ranges to the other nodes and replicates 
data to them. After that's done I plan on manually deleting the data 
on the node and then joining in the same cluster position with 
auto-bootstrap turned off so that I won't get back the old data and I 
can continue getting new data with the node.


Note, I would like to have 4 nodes in because the other three barely 
take the input load alone. These are just long running tests until I 
get some better machines.


On strange thing I found is that the data folder on the ndoe that 
filled up the disk is 150 GB (as measured with du) while the data 
folder on all other 3 nodes is 50 GB. At the same time, DataStax 
OpsCenter shows a size of around 50GB for all 4 nodes. I though that 
the node was making a major compaction at which time it filled up the 
diskbut even that doesn't make sense because shouldn't a major 
compaction just be capable of doubling the size, not triple-ing it? 
Doesn anyone know how to explain this behavior?


Thanks,
Alex

Re: JMX monitoring

2011-11-23 Thread Jeremiah Jordan

jconsole is going to be the most up to date documentation for the JMX 
interface =(.


-Jeremiah

On 11/23/2011 10:49 AM, David McNelis wrote:

Ok.  in that case I think the Docs are wrong.

http://wiki.apache.org/cassandra/JmxInterface has StorageService as 
part of org.apache.cassandra.service.


Also, once I executed a CLI command, I started getting the expected 
output (output being that it  was able to return the live nodes).


--
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com http://www.agentisenergy.com
c: 219.384.5143

/A Smart Grid technology company focused on helping consumers of 
energy control an often under-managed resource./

Re: DataCenters each with their own local data source

2011-11-22 Thread Jeremiah Jordan

Cassandra's Multiple Data Center Support is meant for replicating all data 
across multiple datacenter's efficiently.

You could use the Byte Order Partitioner to prefix data with a key and assign 
those keys to nodes in specific data centers, though the edge nodes would get 
tricky as those would want to have replicas in other data centers, you could 
probably do some stuff with sentinel values, and some nodes that only replicate 
data and aren't the primary node for any data to make this not happen.

It is doable, though this would probably be more trouble then it is worth.  I 
would probably just make each DC its own cluster and have client logic which 
knows which DC to query.

-Jeremiah

On Nov 22, 2011, at 6:57 PM, Mathieu Lalonde wrote:

 
 
 Hi,
 
 I am wondering if Cassandra's features and datacenter awareness can help me 
 with my scalability problems.
 
 Suppose that I have a 10-20 Data centers, each with their own local (massive) 
 source of time series data.  I would like:
 - to avoid replication across data centers (this seems doable based on: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-KeySpaces-for-different-nodes-in-the-same-ring-td5096393.html#a5096568
  )
 - writes for local data to be done on the local data center (not sure about 
 that one)
 - reads from a master data center to any remote data centers (not sure about 
 that one either)
 
 It sounds like I am trying to use Cassandra in a very different way that it 
 was intended to be used.
 Should I simply have a middle-tier that takes care of distributing reads to 
 multiple data centers and treat each data center as its own autonomous 
 cluster?
 
 Thanks!
 Matt

Re: DataCenters each with their own local data source

2011-11-22 Thread Jeremiah Jordan

Oops, I was thinking all in the same keyspace.  If you made a new keyspace for 
each DC you could specify where to put the data and have them only be in one 
place.

-Jeremiah

On Nov 22, 2011, at 8:49 PM, Jeremiah Jordan wrote:

 Cassandra's Multiple Data Center Support is meant for replicating all data 
 across multiple datacenter's efficiently.
 
 You could use the Byte Order Partitioner to prefix data with a key and assign 
 those keys to nodes in specific data centers, though the edge nodes would get 
 tricky as those would want to have replicas in other data centers, you could 
 probably do some stuff with sentinel values, and some nodes that only 
 replicate data and aren't the primary node for any data to make this not 
 happen.
 
 It is doable, though this would probably be more trouble then it is worth.  I 
 would probably just make each DC its own cluster and have client logic which 
 knows which DC to query.
 
 -Jeremiah
 
 On Nov 22, 2011, at 6:57 PM, Mathieu Lalonde wrote:
 
 
 
 Hi,
 
 I am wondering if Cassandra's features and datacenter awareness can help me 
 with my scalability problems.
 
 Suppose that I have a 10-20 Data centers, each with their own local 
 (massive) source of time series data.  I would like:
 - to avoid replication across data centers (this seems doable based on: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-KeySpaces-for-different-nodes-in-the-same-ring-td5096393.html#a5096568
  )
 - writes for local data to be done on the local data center (not sure about 
 that one)
 - reads from a master data center to any remote data centers (not sure about 
 that one either)
 
 It sounds like I am trying to use Cassandra in a very different way that it 
 was intended to be used.
 Should I simply have a middle-tier that takes care of distributing reads to 
 multiple data centers and treat each data center as its own autonomous 
 cluster?
 
 Thanks!
 Matt

Re: 7199

2011-11-22 Thread Jeremiah Jordan

Yes, that is the port nodetool needs to access.


On Nov 22, 2011, at 8:43 PM, Maxim Potekhin wrote:

 Hello,
 
 I have this in my cassandra-env.sh
 
 JMX_PORT=7199
 
 Does this mean that if I use nodetool from another node, it will try to 
 connect to that
 particular port?
 
 Thanks,
 
 Maxim

Re: Efficiency of Cross Data Center Replication...?

2011-11-20 Thread Jeremiah Jordan

If hinting is off. Read Repair and Manual Repair are the only ways data will 
get there (just like when a single node is down).

On Nov 20, 2011, at 6:01 AM, Boris Yen wrote:

 A quick question, what if DC2 is down, and after a while it comes back on. 
 how does the data get sync to DC2 in this case? (assume hint is disable) 
 
 Thanks in advance.
 
 On Thu, Nov 17, 2011 at 10:46 AM, Jeremiah Jordan 
 jeremiah.jor...@morningstar.com wrote:
 Pretty sure data is sent to the coordinating node in DC2 at the same time it 
 is sent to replicas in DC1, so I would think 10's of milliseconds after the 
 transport time to DC2.
 
 On Nov 16, 2011, at 3:48 PM, ehers...@gmail.com wrote:
 
 On a related note - assuming there are available resources across the board 
 (cpu and memory on every node, low network latency, non-saturated 
 nics/circuits/disks), what's a reasonable expectation for timing on 
 replication? Sub-second? Less than five seconds? 
 
 Ernie
 
 On Wed, Nov 16, 2011 at 4:00 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 Great - thanks Jake
 
 B.
 
 On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani jak...@gmail.com wrote:
 the former
 
 
 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 
 Hi All,
  
 I have a question about inter-data centre replication : if you have 2 Data 
 Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node 
 in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or 
  - replicated explicitly to two separate nodes?
 
 Obviously from a LAN resource utilisation perspective, the former would be 
 preferable.
 
 Many thanks,
 
 Brian
 
 
 
 
 -- 
 http://twitter.com/tjake

Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Jeremiah Jordan

Pretty sure data is sent to the coordinating node in DC2 at the same time it is 
sent to replicas in DC1, so I would think 10's of milliseconds after the 
transport time to DC2.

On Nov 16, 2011, at 3:48 PM, ehers...@gmail.com wrote:

 On a related note - assuming there are available resources across the board 
 (cpu and memory on every node, low network latency, non-saturated 
 nics/circuits/disks), what's a reasonable expectation for timing on 
 replication? Sub-second? Less than five seconds? 
 
 Ernie
 
 On Wed, Nov 16, 2011 at 4:00 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 Great - thanks Jake
 
 B.
 
 On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani jak...@gmail.com wrote:
 the former
 
 
 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 
 Hi All,
  
 I have a question about inter-data centre replication : if you have 2 Data 
 Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node 
 in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or 
  - replicated explicitly to two separate nodes?
 
 Obviously from a LAN resource utilisation perspective, the former would be 
 preferable.
 
 Many thanks,
 
 Brian
 
 
 
 
 -- 
 http://twitter.com/tjake

Re: Is a direct upgrade from .6 to 1.0 possible?

2011-11-14 Thread Jeremiah Jordan

You should be able to do it as long as you shut down the whole cluster 
for it:


http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Upgrading-to-1-0-tp6954908p6955316.html


On 11/13/2011 02:14 PM, Timothy Smith wrote:

Due to some application dependencies I've been holding off on a
Cassandra upgrade for a while.  Now that my last application using the
old thrift client is updated I have the green light to prep my
upgrade.  Since I'm on .6 the upgrade is obviously a bit trickier.  Do
the standard instructions for upgrading from .6 to .7 still apply or
do I have to step from .6 -  .7 -  1.0?

Thanks,
Tim

Re: questions on frequency and timing of async replication between DCs

2011-11-11 Thread Jeremiah Jordan

If you query with ALL do you get the data?  If you query with a range 
slice do you get the data (list from the cli)?


On 11/11/2011 04:10 PM, Subrahmanya Harve wrote:


I have cross dc replication set up using 0.8.7 with 3 nodes on each DC 
by following the +1 rule for tokens.
I am seeing an issue where the insert into a DC happened successfully 
but on querying from cli or through Hector, i am not seeing the data 
being returned. i used cli on every node of both DCs and every node 
returned blank. So basic question is where is my data? CL.WRITE=ONE, 
CL.READ=1. RF = DC:2, DC:2
Apart from checking the data directory size on each DC to verify that 
cross-dc replication has happened, what others steps can i take to 
verify that cross dc replication is happening successfully? What 
tuning params can i control with regard to cross-dc replication? 
(frequency? batch size?, etc)


would greatly appreciate any help.

Re: Data retrieval inconsistent

2011-11-10 Thread Jeremiah Jordan

I am pretty sure the way you have K1 configured it will be placed across 
both DC's as if you had large ring.  If you want it only in DC1 you need 
to say DC1:1, DC2:0.
If you are writing and reading at ONE you are not guaranteed to get the 
data if RF  1.  If RF = 2, and you write with ONE, you data could be 
written to server 1, and then read from server 2 before it gets over there.


The differing on server times will only really matter for TTL's.  Most 
everything else works off comparing user supplied times.


-Jeremiah

On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:


I am facing an issue in 0.8.7 cluster -

- I have two clusters in two DCs (rather one cross dc cluster) and two 
keyspaces. But i have only configured one keyspace to replicate data 
to the other DC and the other keyspace to not replicate over to the 
other DC. Basically this is the way i ran the keyspace creation  -
create keyspace K1 with 
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
strategy_options = [{replication_factor:1}];
create keyspace K2 with 
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' 
and strategy_options = [{DC1:2, DC2:2}];


I had to do this because i expect that K1 will get a large volume of 
data and i do not want this wired over to the other DC.


I am writing the data at CL=ONE and reading the data at CL=ONE. I am 
seeing an issue where sometimes i get the data and other times i do 
not see the data. Does anyone know what could be going on here?


A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i 
can see that there are large changes in the yaml file, but a specific 
question i had was - how do i configure disk_access_mode like it used 
to be in 0.7.4?


One observation i have made is that some nodes of the cross dc cluster 
are at different system times. This is something to fix but could this 
be why data is sometimes retrieved and other times not? Or is there 
some other thing to it?


Would appreciate a quick response.

Re: Data retrieval inconsistent

2011-11-10 Thread Jeremiah Jordan

No, that is what I thought you wanted.  I was thinking your machines in 
DC1 had extra disk space or something...


(I stopped replying to the dev list)

On 11/10/2011 04:09 PM, Subrahmanya Harve wrote:


Thanks Ed and Jeremiah for that useful info.
I am pretty sure the way you have K1 configured it will be placed across
both DC's as if you had large ring.  If you want it only in DC1 you 
need to

say DC1:1, DC2:0.
Infact i do want K1 to be available across both DCs as if i had a large
ring. I just do not want them to replicate over across DCs. Also i did try
doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
goes into DC1 irrespective of whether the data is getting into the 
nodes of

DC1 or DC2, thereby creating a hot DC? Since the volume of data for this
case is huge, that might create a load imbalance on DC1? (Am i missing
something?)


On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan 
jeremiah.jor...@morningstar.com wrote:

 I am pretty sure the way you have K1 configured it will be placed across
 both DC's as if you had large ring.  If you want it only in DC1 you 
need to

 say DC1:1, DC2:0.
 If you are writing and reading at ONE you are not guaranteed to get the
 data if RF  1.  If RF = 2, and you write with ONE, you data could be
 written to server 1, and then read from server 2 before it gets over 
there.


 The differing on server times will only really matter for TTL's.  Most
 everything else works off comparing user supplied times.

 -Jeremiah


 On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:


 I am facing an issue in 0.8.7 cluster -

 - I have two clusters in two DCs (rather one cross dc cluster) and two
 keyspaces. But i have only configured one keyspace to replicate 
data to the

 other DC and the other keyspace to not replicate over to the other DC.
 Basically this is the way i ran the keyspace creation  -
create keyspace K1 with placement_strategy='org.**
 apache.cassandra.locator.**SimpleStrategy' and strategy_options =
 [{replication_factor:1}];
create keyspace K2 with placement_strategy='org.**
 apache.cassandra.locator.**NetworkTopologyStrategy' and 
strategy_options

 = [{DC1:2, DC2:2}];

 I had to do this because i expect that K1 will get a large volume 
of data

 and i do not want this wired over to the other DC.

 I am writing the data at CL=ONE and reading the data at CL=ONE. I am
 seeing an issue where sometimes i get the data and other times i do 
not see

 the data. Does anyone know what could be going on here?

 A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , 
i can
 see that there are large changes in the yaml file, but a specific 
question
 i had was - how do i configure disk_access_mode like it used to be 
in 0.7.4?


 One observation i have made is that some nodes of the cross dc cluster
 are at different system times. This is something to fix but could 
this be
 why data is sometimes retrieved and other times not? Or is there 
some other

 thing to it?

 Would appreciate a quick response.

Re: : Cassandra reads under write-only load, read degradation after massive writes

2011-11-09 Thread Jeremiah Jordan

Indexed columns cause read before write so that the index can be updated 
if the column already exists.


On 11/09/2011 02:46 PM, Oleg Tsernetsov wrote:
When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only 
test I observe significant read activity on column family where I 
write to. It seems strange to me, but I expected no read activity on 
write-only load. The read activity is caused by writes, as when I stop 
the write test, reads activity disappears. The test performs parallel 
column writes to a single row, writing the values of fixed column set 
over and over again. Furthermore, the second problem is that parallel 
massive reads of such row degrade over time (even without parallel 
write load) and cassandra starts burning 100% of CPU with read latency 
degrading x20 times comparing with exactly the same row created from 
scratch. The test setup is 3 cassandra nodes, read/write consistency = 
Quorum. Row has 10 and above columns (tested with 10, 100, 1000, 1 
cols), the higher is the number of columns, the worse is observed 
degradation. Column family has 2 indexed columns that are written with 
exactly the same values on each and every write. Row key, column name 
and column value are all Utf8Type. Column family compaction on all the 
nodes does not help, and the row remains degraded. Read here means 
one of: read all the the columns with slice query without bounds/with 
bounds; executing column count query for a row with bounds/without 
bounds. I use Hector as cassandra client. I would be thankful if 
anyone could explain the read activity on write load and give any 
hints on row read degradation after massive write load on that row.


Regards,
Oleg

Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan

Actually, the data will be visible at QUORUM as well if you can see it 
with ONE.  QUORUM actually gives you a higher chance of seeing the new 
value than ONE does.  In the case of R=3 you have 2/3 chance of seeing 
the new value with QUORUM, with ONE you have 1/3...  And this JIRA fixed 
an issue where two QUORUM reads in a row could give you the NEW value 
and then the OLD value.


https://issues.apache.org/jira/browse/CASSANDRA-2494

So quorum read on fail for a single row always gives consistent results 
now.  For multiple rows your still have issues, but you can always 
mitigate that in app with something like giving all of the changes the 
same time stamp, and then on read checking to make sure the time stamps 
match, and reading the data again if they don't.


I'm not arguing against atomic batch operations, they would be nice =).  
Just clarifying how things work now.


-Jeremiah

On 11/06/2011 02:05 PM, Pierre Chalamet wrote:

- support for atomic operations or batches (if QUORUM fails, data should

not be visible with ONE)

zookeeper is solving that.

I might have screwed up a little bit since I didn't talk about isolation;
let's reformulate: support for read committed (using DB terminology).
Cassandra is more like read uncommitted.
Even if row mutations in one CF for one key are atomic on one server , stuff
is not rolled back when the CL can't be satisfied at the coordinator level.
Data won't be visible at QUORUM level, but when using weaker CL, invalid
data can appear imho.
Also it should be possible to tell which operations failed with batch_mutate
but unfortunately it is not

Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan


- Batch read/slice from multiple column families.


On 11/01/2011 05:59 PM, Jonathan Ellis wrote:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

Re: Cassandra 1.0.0 - Node Load Bug

2011-10-21 Thread Jeremiah Jordan

I thought this patch made it into the 1.0 release?  I remember it being 
referenced in one of the re-rolls.


On Oct 20, 2011, at 9:56 PM, Jonathan Ellis jbel...@gmail.com wrote:

 That looks to me like it's reporting uncompressed size as the load.
 Should be fixed in the 1.0 branch for 1.0.1.
 (https://issues.apache.org/jira/browse/CASSANDRA-3338)
 
 On Thu, Oct 20, 2011 at 11:53 AM, Dan Hendry dan.hendry.j...@gmail.com 
 wrote:
 I have been playing around with Cassandra 1.0.0 in our test environment it
 seems pretty sweet so far. I have however come across what appears to be a
 bug tracking node load. I have enabled compression and levelled compaction
 on all CFs (scrub  + snapshot deletion) and the nodes have been operating
 normally for a day or two. I started getting concerned when the load as
 reported by nodetool ring kept increasing (it seems monotonically) despite
 seeing a compression ratio of ~2.5x (as a side note, I find it strange
 Cassandra does not provide the compression ratio via jmx or in the logs). I
 initially thought there might be a bug in cleaning up obsolete SSTables but
 I then noticed the following discrepancy:
 
 
 
 Nodetool ring reports:
 
 10.112.27.65datacenter1 rack1   Up Normal  8.64
 GB 50.00%  170141183460469231731687303715884105727
 
 
 
 Yet du . –h reports: only 2.4G in the data directory.
 
 
 
 After restarting the node, nodetool ring reports a more accurate:
 
 10.112.27.65datacenter1 rack1   Up Normal  2.35 GB
 50.00%  170141183460469231731687303715884105727
 
 
 
 Again, both compression and levelled compaction have been enabled on all
 CFs. Is this a known issue or has anybody else observed a similar pattern?
 
 
 
 Dan Hendry
 
 (403) 660-2297
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: Massive writes when only reading from Cassandra

2011-10-21 Thread Jeremiah Jordan

I could be totally wrong here, but If you are doing a QUORUM read and there is 
a bad value encountered from the QUORUM won't a repair happen?  I thought 
read_repair_chance 0 just means it won't query extra nodes to check for bad 
values.

-Jeremiah

On Oct 17, 2011, at 4:22 PM, Jeremy Hanna wrote:

 Even after disabling hinted handoff and setting read_repair_chance to 0 on 
 all our column families, we were still experiencing massive writes.  
 Apparently the read_repair_chance is completely ignored at any CL higher than 
 CL.ONE.  So we were doing CL.QUORUM on reads and writes and seeing massive 
 writes still.  It was because of the background read repairs being done.  We 
 did extensive logging and checking and that's all it could be as no mutations 
 were coming in via thrift to those column families.
 
 In any case, just wanted to give some follow-up here as it's been an 
 inexplicable rock in our backpack and hopefully clears up where that setting 
 is actually used.  I'll update the storage configuration wiki to include that 
 caveat as well.
 
 On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote:
 
 Thanks for the insights.  I may first try disabling hinted handoff for one 
 run of our data pipeline and see if it exhibits the same behavior.  Will 
 post back if I see anything enlightening there.
 
 On Sep 10, 2011, at 5:04 PM, Chris Goffinet wrote:
 
 You could tail the commit log with `strings` to see what keys are being 
 inserted.
 
 On Sat, Sep 10, 2011 at 2:24 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Two possibilities:
 
 1) Hinted handoff (this will show up in the logs on the sending
 machine, on the receiving one it will just look like any other write)
 
 2) You have something doing writes that you're not aware of, I guess
 you could track that down using wireshark to see where the write
 messages are coming from
 
 On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna
 jeremy.hanna1...@gmail.com wrote:
 Oh and we're running 0.8.4 and the RF is 3.
 
 On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote:
 
 In addition, the mutation stage and the read stage are backed up like:
 
 Pool NameActive   Pending   Blocked
 ReadStage32   773 0
 RequestResponseStage  0 0 0
 ReadRepairStage   0 0 0
 MutationStage   158525918 0
 ReplicateOnWriteStage 0 0 0
 GossipStage   0 0 0
 AntiEntropyStage  0 0 0
 MigrationStage0 0 0
 StreamStage   0 0 0
 MemtablePostFlusher   1 5 0
 FILEUTILS-DELETE-POOL 0 0 0
 FlushWriter   2 5 0
 MiscStage 0 0 0
 FlushSorter   0 0 0
 InternalResponseStage 0 0 0
 HintedHandoff 0 0 0
 CompactionManager   n/a29
 MessagingServicen/a  0,34
 
 On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote:
 
 We are experiencing massive writes to column families when only doing 
 reads from Cassandra.  A set of 5 hadoop jobs are reading from Cassandra 
 and then writing out to hdfs.  That is the only thing operating on the 
 cluster.  We are reading at CL.QUORUM with hadoop and have written with 
 CL.QUORUM.  Read repair chance is set to 0.0 on all column families.  
 However, in the logs, I'm seeing flush after flush of memtables and 
 compactions taking place.  Is there something else that would be writing 
 based on the above description?
 
 Jeremy
 
 
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: nodetool ring Load column

2011-10-21 Thread Jeremiah Jordan

Are you using compressed sstables? or the leveled sstables?  Make sure you 
include how you are configured in any JIRA you make, someone else was seeing a 
similar issue with compression turned on.

-Jeremiah

On Oct 14, 2011, at 1:13 PM, Ramesh Natarajan wrote:

 What does the Load column in nodetool ring mean?  From the output
 below it shows 101.62 GB. However if I do a disk usage it is about 6
 GB.
 
 thanks
 Ramesh
 
 
 [root@CAP2-CNode1 cassandra]#
 ~root/apache-cassandra-1.0.0-rc2/bin/nodetool -h localhost ring
 Address DC  RackStatus State   Load
 OwnsToken
 
148873535527910577765226390751398592512
 10.19.102.11datacenter1 rack1   Up Normal  101.62 GB
 12.50%  0
 10.19.102.12datacenter1 rack1   Up Normal  84.42 GB
 12.50%  21267647932558653966460912964485513216
 10.19.102.13datacenter1 rack1   Up Normal  95.47 GB
 12.50%  42535295865117307932921825928971026432
 10.19.102.14datacenter1 rack1   Up Normal  91.25 GB
 12.50%  63802943797675961899382738893456539648
 10.19.103.11datacenter1 rack1   Up Normal  93.98 GB
 12.50%  85070591730234615865843651857942052864
 10.19.103.12datacenter1 rack1   Up Normal  100.33 GB
 12.50%  106338239662793269832304564822427566080
 10.19.103.13datacenter1 rack1   Up Normal  74.1 GB
 12.50%  127605887595351923798765477786913079296
 10.19.103.14datacenter1 rack1   Up Normal  93.96 GB
 12.50%  148873535527910577765226390751398592512
 
 
 
 [root@CAP2-CNode1 cassandra]# du -hs /var/lib/cassandra/data/
 6.0G/var/lib/cassandra/data/

Re: How to speed up Waiting for schema agreement for a single node Cassandra cluster?

2011-10-04 Thread Jeremiah Jordan

But truncate is still slow, especially if it can't use JNA (windows) as it 
snapshots.  Depending on how much data you are inserting during your unit 
tests, just paging through all the keys and then deleting them is the fastest 
way, though if you use timestamps besides now this won't work, as the 
timestamps need to be increasing between test runs.


On Oct 4, 2011, at 9:33 AM, Joseph Norton wrote:

 
 I didn't consider using truncate because a set of potentially random Column 
 Families are created dynamically during the test.
 
 Are there any configuration knobs that could be adjusted for drop + recreate?
 
 thanks in advance,
 
 - Joe N
 
 
 Joseph Norton
 nor...@alum.mit.edu
 
 
 
 On Oct 4, 2011, at 11:19 PM, Jonathan Ellis wrote:
 
 Truncate is faster than drop + recreate.
 
 On Tue, Oct 4, 2011 at 9:15 AM, Joseph Norton nor...@lovely.email.ne.jp 
 wrote:
 
 Hello.
 
 For unit test purposes, I have a single node Cassandra cluster.  I need to 
 drop and re-create several keyspaces between each test iteration.  This 
 process takes approximately 10 seconds for a single node installation.
 
 Can you recommend any tricks or recipes to reduce the time required for 
 such operations and/or for Waiting for schema agreement to complete?
 
 regards,
 
 - Joe N.
 
 
 
 
 $ time ./setupDB.sh
 Deleteing cassandra keyspaces
 Connected to: Foo on 127.0.0.1/9160
 ed9c7fc0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 ee8c36f0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 eeb14b20-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 Insert data
 Creating cassandra keyspaces
 Connected to: Foo on 127.0.0.1/9160
 ef1a6d30-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 Authenticated to keyspace: Bars
 ef4af310-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 ef9bab20-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 efbceec0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f00e4310-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f0589280-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f0821380-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f0c44ca0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 Authenticated to keyspace: Baz
 f121d5f0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f1619e10-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f18b4620-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 Authenticated to keyspace: Buz
 f1debd50-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f20690a0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f25043d0-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 f29a1e10-ee91-11e0--534d24a6e7f7
 Waiting for schema agreement...
 ... schemas agree across the cluster
 Inserting data in cassandra
 Connected to: Foo on 127.0.0.1/9160
 Authenticated to keyspace: Boo
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 Value inserted.
 
 real0m9.554s
 user0m2.729s
 sys 0m0.194s
 
 
 Joseph Norton
 nor...@alum.mit.edu
 
 
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: Very large rows VS small rows

2011-09-29 Thread Jeremiah Jordan

If A works for our use case, it is a much better option.  A given row 
has to be read in full to return data from it, there used to be 
limitations that a row had to fit in memory, but there is now code to 
page through the data, so while that isn't a limitation any more, it 
means rows that don't fit in memory are very slow to use.  Also wide 
rows spread across nodes.  You should also consider more nodes in your 
cluster.  From our experience node perform better when they are only 
managing a few Hundred GB each.  Pretty sure that 10TB+ of data (100's * 
100GB) will not perform very well on a 3 node cluster, especially if you 
plan to have RF=3, making it 10TB+ per node.


-Jeremiah

On 09/29/2011 12:20 PM, M Vieira wrote:

What would be the best approach
A) millions of ~2Kb rows, where each row could have ~6 columns
B) hundreds of ~100Gb rows, where each row could have ~1million columns

Considerarions:
Most entries will be searched for (read+write) at least once a day but 
no more than 3 times a day.
Cheap hardware accross the cluster of 3 nodes each with 16Gb mem (heap 
= 8Gb)


Any input would be appreciated
M.

Re: Very large rows VS small rows

2011-09-29 Thread Jeremiah Jordan

So I need to read what I write before hitting send.  Should have been, 
If A works for YOUR use case. and Wide rows DON'T spread across nodes 
well


On 09/29/2011 02:34 PM, Jeremiah Jordan wrote:
If A works for our use case, it is a much better option.  A given row 
has to be read in full to return data from it, there used to be 
limitations that a row had to fit in memory, but there is now code to 
page through the data, so while that isn't a limitation any more, it 
means rows that don't fit in memory are very slow to use.  Also wide 
rows spread across nodes.  You should also consider more nodes in your 
cluster.  From our experience node perform better when they are only 
managing a few Hundred GB each.  Pretty sure that 10TB+ of data (100's 
* 100GB) will not perform very well on a 3 node cluster, especially if 
you plan to have RF=3, making it 10TB+ per node.


-Jeremiah

On 09/29/2011 12:20 PM, M Vieira wrote:

What would be the best approach
A) millions of ~2Kb rows, where each row could have ~6 columns
B) hundreds of ~100Gb rows, where each row could have ~1million columns

Considerarions:
Most entries will be searched for (read+write) at least once a day 
but no more than 3 times a day.
Cheap hardware accross the cluster of 3 nodes each with 16Gb mem 
(heap = 8Gb)


Any input would be appreciated
M.

Re: Thrift CPU Usage

2011-09-26 Thread Jeremiah Jordan

Yes.  All the stress tool does is flood data through the API, no real 
processing or anything happens.  So thrift reading/writing data should 
be the majority of the CPU time...


On 09/26/2011 08:32 AM, Baskar Duraikannu wrote:

Hello -

I have been running read tests on Cassandra using stress tool.  I have been noticing 
that thrift seems to be taking lot of CPU over 70% when I look at the CPU samples 
report. Is this normal?

CPU usage seems to go down by 5 to 10% when I change the RPC from sync to 
async.  Is this normal?

I am running Cassandra 0.8.4 on Cent OS 5.6 ( Kernel 2.6.18.238) and Oracle JVM.

-
Thanks
Baskar Duraikannu

Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released

2011-09-15 Thread Jeremiah Jordan

Is it possible to update an existing column family with 
{stable_compression: SnappyCompressor, 
compaction_strategy:LeveldCompactionStrategy}?  Or will I have to make a 
new column family and migrate my data to it?


-Jeremiah

On 09/15/2011 01:01 PM, Sylvain Lebresne wrote:

The Cassandra team is pleased to announce the release of the first beta for
the future Apache Cassandra 1.0.

Let me first stress that this is beta software and as such is *not* ready for
production use.

The goal of this release is to give a preview of what will be Cassandra 1.0
and more importantly to get wider testing before the final release. So please
help us make Cassandra 1.0 be the best it possibly could by testing this beta
release and reporting any problem you may encounter[3,4]. You can have a look
at the change log[1] and the release notes[2] to see where Cassandra 1.0
differs from the 0.8 series.

Apache Cassandra 1.0.0-beta1[5] is available as usual from the cassandra
website:

  http://cassandra.apache.org/download/

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/evCW0 (CHANGES.txt)
[2]: http://goo.gl/HbNsV (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]: https://svn.apache.org/repos/asf/cassandra/tags/cassandra-1.0.0-beta1

Re: Updates lost

2011-09-01 Thread Jeremiah Jordan

Are you running on windows?  If the default timestamp is just using 
time.time()*1e6 you will get the same timestamp twice if the code is 
close together.  time.time() on windows is only millisecond resolution.  
I don't use pycassa, but in the Thrift api wrapper I created for our 
python code I implemented the following function for getting timestamps:


def GetTimeInMicroSec():

Returns the current time in microseconds, returned value always 
increases with each call.


:return: Current time in microseconds

newTime = long(time.time()*1e6)
try:
if GetTimeInMicroSec.lastTime = newTime:
newTime = GetTimeInMicroSec.lastTime + 1
except AttributeError:
pass
GetTimeInMicroSec.lastTime = newTime
return newTime


On 08/29/2011 04:56 PM, Peter Schuller wrote:

If the client sleeps for a few ms at each loop, the success rate
increases. At 15 ms, the script always succeeds so far. Interestingly,
the problem seems to be sensitive to alphabetical order. Updating the
value from 'aaa' to 'bbb' never has problem. No pause needed.

Is it possible the version of pycassa you're using does not guarantee
that successive queries use non-identical and monotonically increasing
timestamps? I'm just speculating, but if that is the case and two
requests are sent with the same timestamp (due to resolution being
lower than the time it takes between calls), the tie breaking would be
the column value which jives with the fact that you're saying it seems
to depend on the value.

(I haven't checked current nor past versions of pycassa to determine
if this is plausible. Just speculating.)

Solandra distributed search

2011-08-15 Thread Jeremiah Jordan

When using Solandra, do I need to use the Solr sharding synxtax in my 
queries? I don't think I do because Cassandra is handling the 
sharding, not Solr, but just want to make sure.  The Solandra wiki 
references the distributed search limitations, which talks about the 
shard syntax further down the page.
From what I see with how it is implemented I should just be able to 
pick a random Solandra node and do my query, since they are all backed 
by the same Cassandra data store. Correct?


Thanks!
-Jeremiah

Re: Cassandra in Multiple Datacenters Active - Standby configuration

2011-08-15 Thread Jeremiah Jordan

Assign the tokens like they are two separate rings, just make sure you 
don't have any duplicate tokens.

http://wiki.apache.org/cassandra/Operations#Token_selection

The two datacenters are treated as separate rings, LOCAL_QUORUM will 
only delay the client as long as it takes to write the data to the local 
nodes.  The nodes in the other datacenter will get asynchronous writes.


On 08/15/2011 03:39 PM, Oleg Tsvinev wrote:

Hi all,

I have a question that documentation has not clear answer for. I have
the following requirements:

1. Synchronously store data in datacenter DC1 on 2+ nodes
2. Asynchronously replicate the same data to DC2 and store it on 2+
nodes to act as a hot standby

Now, I have configured keyspaces with o.a.c.l.NetworkTopologyStrategy
with strategy_options=[{DC1:2, DC2:2}] and use LOCAL_QUORUM
consistency level, following documentation here:
http://www.datastax.com/docs/0.8/operations/datacenter

Now, how do I assign initial tokens? If I have, say 6 nodes total, 3
in DC1 and 3 in DC2, and create a ring as if all 6 nodes share the
total 2^128 space equally.
Now say node N1:DC2 has key K and is in remote datacenter (for an app
in DC1). Wouldn't Cassandra always forward K to the DC2 node N1 thus
turning asynchronous writes into synchronous ones? Performance impact
will be huge as the latency between DC1 and DC2 is significant.

I hope there's an answer and I'm just missing something. My case falls
under Disaster Recovery in
http://www.datastax.com/docs/0.8/operations/datacenter but I don't see
how Cassandra will support my use case.

I appreciate any help on this.

Thank you,
   Oleg

Re: thrift c++ insert Exception [Column value is required]

2011-08-14 Thread Jeremiah Jordan

You can checkout libcassandra for a C++ client built on top of thrift.  It is 
not feature complete, but it is pretty good.

https://github.com/matkor/libcassandra


On Aug 14, 2011, at 3:59 AM, Konstantinos Chasapis wrote:

 Hi,
 Thank you for your answer. Is there any documentation that describes all this 
 values that I have to set?
 
 Konstantinos Chasapis
 
 On Aug 14, 2011, at 6:28 AM, Jonathan Ellis wrote:
 
 In C++ you need to set .__isset.fieldname on optional fields (e.g.
 .__isset.value).
 
 2011/8/13 Hassapis Constantinos cha...@ics.forth.gr:
 Hi all,
 
 I'm using Cassandra 0.8.3 and thrift for c++ and I can't insert column in
 a column family. Starting from an empty keyspace first I add a new
 keyspace and then a new column family and that works fine but I can't
 insert a column.
 The code that I have written is:
 
   transport-open();
   KsDef ks_def;
 
   ks_def.name = test_keyspace;
   ks_def.replication_factor = 0;
   ks_def.strategy_class = LocalStrategy;
   std::string res;
 
   cout  add keyspace..  endl;
   client.system_add_keyspace( res, ks_def);
   client.set_keyspace(test_keyspace);
 
   cout  add column family..  endl;
   CfDef cf_def;
   cf_def.keyspace= test_keyspace;
   cf_def.name = cf_name_test;
   client.system_add_column_family( res, cf_def );
 
   const string key=test_key;
   const string value=valu_;
 
   ColumnParent cparent;
   cparent.column_family = cf_name_test;
 
   Column c;
   c.name =  column_namess;
   c.value =  value;
   c.timestamp = getTS();
 
   cout  insert key value: c.value  endl;
   client.insert( key, cparent, c, ConsistencyLevel::ONE);
 
   cout  drop column family  endl;
   client.system_drop_column_family( res, cf_name_test);
 
   cout  drop keyspace  endl;
   client.system_drop_keyspace( res, test_keyspace);
 
   transport-close();
 
 and I recive the bellow Exception: Default TException.  [Column value is
 required] as you can see from the source code I have fill the value of the
 column.
 
 thank you in advance for your help.
 Konstantinos Chasapis
 p.s please cc me in the reply.
 
 
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: Restarting servers

2011-08-12 Thread Jeremiah Jordan

You need to wait for the servers to be up again before restarting the 
next one.  nodetool ring on one of the servers you aren't restarting 
will tell you when it is back up.  You can also watch for Starting up 
server gossip in the log file to know when it is starting to join the 
cluster again.


On 08/12/2011 01:59 PM, Jason Baker wrote:
So restarting cassandra servers has a tendency to cause a lot of 
exceptions like MaximumRetryException: Retried 6 times. Last failure 
was UnavailableException() and TApplicationException: Internal error 
processing batch_mutate (using pycassa).  If I restart the servers 
too quickly, I get all servers unavailable.  So two questions:


1.  Is there anything I can do to prevent MaximumRetryExceptions and 
TApplicationExceptions, or is this just a case of needing better 
exception handling?
2.  Are there any rules of thumb regarding how much time I should 
allow between server restarts?

RE: Write everywhere, read anywhere

2011-08-04 Thread Jeremiah Jordan

If you have RF=3 quorum won't fail with one node down.  So R/W quorum will be 
consistent in the case of one node down.  If two nodes go down at the same 
time, then you can get inconsistent data from quorum write/read if the write 
fails with TimeOut, the nodes come back up, and then read asks the two nodes 
that were down what the value is.  And another read asks the node that was up, 
and a node that was down.  Those two reads will get different answers.

 

From: Mike Malone [mailto:m...@simplegeo.com] 
Sent: Thursday, August 04, 2011 12:16 PM
To: user@cassandra.apache.org
Subject: Re: Write everywhere, read anywhere

 

 

2011/8/3 Patricio Echagüe patric...@gmail.com

 

On Wed, Aug 3, 2011 at 4:00 PM, Philippe watche...@gmail.com wrote:

Hello,

I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at 
CL.ONE. When I take one of the nodes down, writes fail which is what I expect.

When I run a repair, I see data being streamed from those column families... 
that I didn't expect. How can the nodes diverge ? Does this mean that reading 
at CL.ONE may return inconsistent data ?

 

we abort the mutation before hand when there are enough replicas alive. If a 
mutation went through and in the middle of it a replica goes down, in that case 
you can write to some nodes and the request will Timeout.

In that case the CL.ONE may return inconsistence data. 

 

Doesn't CL.QUORUM suffer from the same problem? There's no isolation or 
rollback with CL.QUORUM either. So if I do a quorum write with RF=3 and it 
fails after hitting a single node, a subsequent quorum read could return the 
old data (if it hits the two nodes that didn't receive the write) or the new 
data that failed mid-write (if it hits the node that did receive the write).

 

Basically, the scenarios where CL.ALL + CL.ONE results in a read of 
inconsistent data could also cause a CL.QUORUM write followed by a CL.QUORUM 
read to return inconsistent data. Right? The problem (if there is one) is that 
even in the quorum case columns with the most recent timestamp win during 
repair resolution, not columns that have quorum consensus.

 

Mike

Re: Cassandra 0.6.8 snapshot problem?

2011-08-02 Thread Jeremiah Jordan

Does snapshot in 0.6 cause a flush to happen first? If not there could
be data in the database that won't be in the snapshot.  Though that
seems like a long time for data to be sitting in the commit log and not
make it to the sstables.

On Thu, 2011-07-28 at 17:30 -0500, Jonathan Ellis wrote:
 Doesn't ring a bell.  But I'd say if you upgrade and it's still a
 problem, then (a) you're not _worse_ off than you are now, and (b)
 it's a lot more likely to get fixed in modern version.
 
 On Thu, Jul 28, 2011 at 9:47 AM, Jian Fang
 jian.fang.subscr...@gmail.com wrote:
  Hi,
 
  We have an old production Cassandra 0.6.8 instance without replica, i.e.,
  the replication factor is 1. Recently, we noticed that
  the snapshot data we took from this instance are inconsistent with the
  running instance data. For example, we took snapshot
  in early July 2011. From the running instance, we got a record that was
  created in March 2011, but on the snapshot copy, the
  record with the same key was different and was created in January 2011.
  Yesterday, we created another snapshot and reproduced
  the problem. I just like to know if this is a known issue for Cassandra 0.6.
 
  We are going to migrate to Cassandra 0.8, but we need to make sure this will
  not be a problem in 0.8.
 
  Thanks in advance,
 
  John

Re: RF=1

2011-08-02 Thread Jeremiah Jordan

If you have RF=1, taking one node down is going to cause 25% of your
data to be unavailable.  If you want to tolerate a machines going down
you need to have at least RF=2, if you want to use quorum and have a
machine go down, you need at least RF=3.

On Tue, 2011-08-02 at 16:22 +0200, Patrik Modesto wrote:
 Hi all!
 
 I've a test cluster of 4 nodes running cassandra 0.7.8, with one
 keyspace with RF=1, each node owns 25% of the data. As long as all
 nodes are alive, there is no problem, but when I shut down just one
 node I get UnavailableException in my application. cassandra-cli
 returns null and hadoop mapreduce task won't start at all.
 
 Loosing one node is not a problem for me, the data are not important,
 loosing even half the cluster is not a problem as long as everything
 runs just as with a full cluster.
 
 The error from hadoop is like this:
 Exception in thread main java.io.IOException: Could not get input splits
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:120)
 at 
 cz.xxx.yyy.zzz.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:111)
 at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
 at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
 at cz.xxx.yyy.zzz.ContextIndexer.run(ContextIndexer.java:663)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at cz.xxx.yyy.zzz.ContextIndexer.main(ContextIndexer.java:94)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: java.util.concurrent.ExecutionException:
 java.io.IOException: failed connecting to all endpoints 10.0.18.87
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:116)
 ... 20 more
 Caused by: java.io.IOException: failed connecting to all endpoints 10.0.18.87
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:197)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:67)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:153)
 at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:138)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Jeremiah Jordan

Connect with jconsole and run garbage collection.
All of the files that have a -Compacted with the same name will get
deleted the next time a full garbage collection runs, or when the node
is restarted.  They have already been combined into new files, the old
ones just haven't been deleted yet.

On Tue, 2011-08-02 at 16:09 -0400, Yiming Sun wrote:
 Hi,
 
 I am new to Cassandra, and am hoping someone could help me understand
 the (large amount of small) data files on disk that Cassandra
 generates. 
 
 The reason we are using Cassandra is because we are dealing with
 thousands to millions of small text files on disk, so we are
 experimenting with Cassandra hoping that by dropping the files
 contents into Cassandra, it will achieve more efficient disk usage
 because Cassandra is going to aggregate them into bigger files (one
 file per column family, according to the wiki).
 
 But after we pushed a subset of the files into a single node Cassandra
 v0.7.0 instance, we noted that in the Cassandra data directory for the
 keyspace, there are 8.5 million very small files, most are named
 
 SuperColumnFamilyName-e-n.Filter.db
 SuperColumnFamilyName-e-n.Compacted.db
 SuperColumnFamilyName-e-n.Index.db
 SuperColumnFamilyName-e-n.Statistics.db
 
 and among these files, the Compacted.db are always empty,  Filter and
 Index are under 100 bytes, and Statistics are around 4k.
 
 What are these files? Why are there so many of them?  We originally
 hope that Cassandra was going to solve our issue with the small files
 we have, but now it doesn't seem to help -- we still end up with tons
 of small files.   Is there any way to reduce/combine these small
 files?
 
 Thanks.
 
 -- Y.

Re: Nodetool ring not showing all nodes in cluster

2011-08-02 Thread Jeremiah Jordan

All of the nodes should have the same seedlist.  Don't use localhost as
one of the items in it if you have multiple nodes.

On Tue, 2011-08-02 at 10:10 -0700, Aishwarya Venkataraman wrote:
 Nodetool does not show me all the nodes. Assuming I have three nodes
 A, B and C. The seedlist of A is localhost. Seedlist of B is
 localhost, A_ipaddr and seedlist of C is localhost,B_ipaddr,A_ipaddr.
 I have autobootstrap set to false for all 3 nodes since they all have
 the correct data and do not hav to migrate data from any particular
 node.
 
 My problem here is why does n't nodetool ring show me all nodes in the
 ring ? I agree that the cluster thinks that only one node is present.
 How do I fix this ?
 
 Thanks,
 Aishwarya
 
 
 On Tue, Aug 2, 2011 at 9:56 AM, samal sa...@wakya.in wrote:
 
 
  ERROR 08:53:47,678 Internal error processing batch_mutate
  java.lang.IllegalStateException: replication factor (3) exceeds number
  of endpoints (1)
 
  You already answered
  It always keeps showing only one node and mentions that it is handling
  100% of the load.
 
  Cluster think only one node is present in ring, it doesn't agree RF=3  it is
  expecting RF=1.
  Original Q: I m not exactly sure what is the problem. But
  Does nodetool ring show all the host?
  What is your seed list?
  Is bootstrapped node has seed ip of its own?
  AFAIK gossip work even without actively joining a ring.
 
 
  On Tue, Aug 2, 2011 at 7:21 AM, Aishwarya Venkataraman
  cyberai...@gmail.com wrote:
   Replies inline.
  
   Thanks,
   Aishwarya
  
   On Tue, Aug 2, 2011 at 7:12 AM, Sorin Julean sorin.jul...@gmail.com
   wrote:
   Hi,
  
Until someone answers  with more details, few questions:
1. did you moved the system keyspace as well ?
   Yes. But I deleted the LocationInfo* files under the system folder.
   Shall I go ahead and delete the entire system folder ?
  
2. the gossip IP of the new nodes are the same as the old ones ?
   No. The Ip is different.
  
3. which cassandra version are you running ?
   I am using 0.8.1
  
  
   If 1. is yes and 2. is no, for a quick fix: take down the cluster,
   remove
   system keyspace, bring the cluster up and bootstrap the nodes.
  
  
   Kind regards,
   Sorin
  
  
   On Tue, Aug 2, 2011 at 2:53 PM, Aishwarya Venkataraman
   cyberai...@gmail.com wrote:
  
   Hello,
  
   I recently migrated 400 GB of data that was on a different cassandra
   cluster (3 node with RF= 3) to a new cluster. I have a 3 node
cluster
   with replication factor set to three. When I run nodetool ring, it
   does not show me all the nodes in the cluster. It always keeps
   showing
   only one node and mentions that it is handling 100% of the load. But
   when I look at the logs, the nodes are able to talk to each other via
   the gossip protocol. Why does this happen ? Can you tell me what I am
   doing wrong ?
  
   Thanks,
   Aishwarya

RE: custom StoragePort?

2011-07-11 Thread Jeremiah Jordan

If you are on linux see:
https://github.com/pcmanus/ccm 

-Original Message-
From: Yang [mailto:tedd...@gmail.com] 
Sent: Monday, July 11, 2011 3:08 PM
To: user@cassandra.apache.org
Subject: Re: custom StoragePort?

never mind, found this..
https://issues.apache.org/jira/browse/CASSANDRA-200?page=com.atlassian.j
ira.plugin.system.issuetabpanels:all-tabpanel

On Mon, Jul 11, 2011 at 12:39 PM, Yang tedd...@gmail.com wrote:
 I tried to run multiple cassandra daemons on the same host, using 
 different ports, for a test env.

 I thought this would work, but it turns out that the StoragePort used 
 by outputTcpConnection is always assumed to be the one specified in 
 .yaml, i.e. the code assumes that the storageport is same everywhere. 
 in fact this assumption seems deeply held in many places in the code, 
 so it's a bit difficult to refactor it , for example by substituting 
 InetAddress with InetSocketAddress.

 I am just wondering, do you see any other value to a custom 
 storageport, besides testing? if there is real value, maybe someone 
 more familiar with the code could do the refactoring

 Thanks
 yang

RE: Node repair questions

2011-07-11 Thread Jeremiah Jordan

The more often you repair, the quicker it will be.  The more often your
nodes go down the longer it will be.

Repair streams data that is missing between nodes.  So the more data
that is different the longer it will take.  Your workload is impacted
because the node has to scan the data it has to be able to compare with
other nodes, and if there are differences, it has to send/receive data
from other nodes.


-Original Message-
From: A J [mailto:s5a...@gmail.com] 
Sent: Monday, July 11, 2011 2:43 PM
To: user@cassandra.apache.org
Subject: Node repair questions

Hello,
Have the following questions related to nodetool repair:
1. I know that Nodetool Repair Interval has to be less than
GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds
and 'Nodetool Repair Interval'. What factors would want me to change the
default of 10 days of GCGraceSeconds. Similarly what factors would want
me to keep Nodetool Repair Interval to be just slightly less than
GCGraceSeconds (say a day less).

2. Does a Nodetool Repair block any reads and writes on the node, while
the repair is going on ? During repair, if I try to do an insert, will
the insert wait for repair to complete first ?

3. I read that repair can impact your workload as it causes additional
disk and cpu activity. But any details of the impact mechanism and any
ballpark on how much the read/write performance deteriorates ?

Thanks.

RE: Cassandra memory problem

2011-07-07 Thread Jeremiah Jordan

We are running into the same issue on some of our machines.  Still
haven't tracked down what is causing it.

From: William Oberman [mailto:ober...@civicscience.com] 
Sent: Thursday, July 07, 2011 7:19 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra memory problem

I think I had (and have) a similar problem: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-
what-settings-to-use-on-AWS-large-td6504060.html
My memory usage grew slowly until I ran out of mem and the OS killed my
process (due to no swap).

I'm still on 0.7.4, but I'm rolling out 0.8.1 next week, which I was
hoping would fix the problem.  I'm using Centos with Sun 1.6.0_24-b07

will

On Thu, Jul 7, 2011 at 7:41 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:

Hm - had to digg deeper and it totally looks like a native mem
leak to me: 

We are still growing with res += 100MB a day. Cassandra is  8G
now

I checked the cassandra process with pmap -x

Here's the human readable (aggregated) output:

Format is thingy: RSS in KB

Summary:

Total SST: 1961616
Anon RSS: 6499640

Total RSS: 8478376

Here's a little more detail:

SSTables (data and index files)
**
Attic: 0
PrivateChatNotification: 38108
Schema: 0
PrivateChat: 161048
UserData: 116788
HintsColumnFamily: 0
Rooms: 100548
Tracker: 476
Migrations: 0
ObjectRepository: 793680
BlobStore: 350924
Activities: 400044
LocationInfo: 0

Libraries
**
javajar: 2292
nativelib: 13028

Other
**
28201: 32
jna979649866618987247.tmp: 92
locale-archive: 1492
[stack]: 132
java: 44
ffi8TsQPY(deleted): 8

And
**
[anon]: 6499640

Maybe the output of pmap is totally misleading but my
interpretation is that only 2GB of RSS is attributed to paged in
sstables.
I have one large anon block which looks like this:

Address   Kbytes RSS   Dirty Mode   Mapping
00073f60   0 3093248 3093248 rwx--[ anon ]

This is the native heap thats been allocated on startup and
mlocked

So theres still 3.5GB of anon memory.

We haven't deployed
https://issues.apache.org/jira/browse/CASSANDRA-2654 yet and this might
be part of it but I don't think thats the main problem.
As I said mem goes up by 100MB each day pretty linearly.

Would be great if anyone could verify this by running pmap or
talk my off the roof by explaining that nothing's the way it seems.

All this might be heavily OS specific so maybe that's only on
Debian?

Thanks a lot
Daniel 

On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote:

mmap'd data will be attributed to res, but the OS can
page it out
instead of killing the process.

On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:

Hi all,

we have a mem problem with cassandra. res goes
up without bounds (well until

the os kills the process because we dont have
swap)

I found a thread that's about the same problem
but on OpenJDK:

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-hi
gh-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html

We are on Debian with Sun JDK.

Resident mem is 7.4G while heap is restricted to
3G.

Anyone else is seeing this with Sun JDK?

Cheers,

Daniel

:/home/dd# java -version

java version 1.6.0_24

Java(TM) SE Runtime Environment (build
1.6.0_24-b07)

Java HotSpot(TM) 64-Bit Server VM (build
19.1-b02, mixed mode)

:/home/dd# ps aux |grep java

cass 28201  9.5 46.8 372659544 7707172 ?
SLl  May24 5656:21

/usr/bin/java -ea -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42

-Xms3000M -Xmx3000M -Xmn400M ...

  PID USER

RE: custom reconciling columns?

2011-06-30 Thread Jeremiah Jordan

The reason to break it up is that the information will then be on
different servers, so you can have server 1 spending time retrieving row
1, while you have server 2 retrieving row 2, and server 3 retrieving row
3...  So instead of getting 3000 things from one server, you get 1000
from 3 servers in parallel...



From: Yang [mailto:tedd...@gmail.com] 
Sent: Wednesday, June 29, 2011 12:07 AM
To: user@cassandra.apache.org
Subject: Re: custom reconciling columns?


ok, here is the profiling result. I think this is consistent (having
been trying to recover how to effectively use yourkit ...)  see attached
picture 

since I actually do not use the thrift interface, but just directly use
the thrift.CassandraServer and run my code in the same JVM as cassandra,

and was running the whole thing on a single box, there is no message
serialization/deserialization cost. but more columns did add on to more
time.

the time was spent in the ConcurrentSkipListMap operations that
implement the memtable. 


regarding breaking up the row, I'm not sure it would reduce my run time,
since our requirement is to read the entire rolling window history (we
already have 
the TTL enabled , so the history is limited to a certain length, but it
is quite long: over 1000 , in some  cases, can be 5000 or more ) .  I
think accessing roughly 1000 items is not an uncommon requirement for
many applications. in our case, each column has about 30 bytes of data,
besides the meta data such as ttl, timestamp.  
at history length of 3000, the read takes about 12ms (remember this is
completely in-memory, no disk access) 

I just took a look at the expiring column logic, it looks that the
expiration does not come into play until when the
CassandraServer.internal_get()===thriftifyColumns() gets called. so the
above memtable access time is still spent. yes, then breaking up the row
is going to be helpful, but only to the degree of preventing accessing 
expired columns (btw  if this is actually built into cassandra code
it would be nicer, so instead of spending multiple key lookups, I locate
to the row once, and then within the row, there are different
generation buckets, so those old generation buckets that are beyond
expiration are not read ); currently just accessing the 3000 live
columns is already quite slow.

I'm trying to see whether there are some easy magic bullets for a
drop-in replacement for concurrentSkipListMap...

Yang




On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall n...@datastax.com wrote:


I agree with Aaron's suggestion on data model and query here.
Since
there is a time component, you can split the row on a fixed
duration
for a given user, so the row key would become userId_[timestamp
rounded to day].

This provides you an easy way to roll up the information for the
date
ranges you need since the key suffix can be created without a
read.
This also benefits from spreading the read load over the cluster
instead of just the replicas since you have 30 rows in this case
instead of one.


On Tue, Jun 28, 2011 at 5:55 PM, aaron morton
aa...@thelastpickle.com wrote:
 Can you provide some more info:
 - how big are the rows, e.g. number of columns and column size
?
 - how much data are you asking for ?
 - what sort of read query are you using ?
 - what sort of numbers are you seeing ?
 - are you deleting columns or using TTL ?
 I would consider issues with the data churn, data model and
query before
 looking at serialisation.
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 29 Jun 2011, at 10:37, Yang wrote:

 I can see that as my user history grows, the reads time
proportionally ( or
 faster than linear) grows.
 if my business requirements ask me to keep a month's history
for each user,
 it could become too slow.- I was suspecting that it's
actually the
 serializing and deserializing that's taking time (I can
definitely it's cpu
 bound)


 On Tue, Jun 28, 2011 at 3:04 PM, aaron morton
aa...@thelastpickle.com
 wrote:

 There is no facility to do custom reconciliation for a
column. An append
 style operation would run into many of the same problems as
the Counter
 type, e.g. not every node may get an append and there is a
chance for lost
 appends unless you go to all the trouble Counter's do.

 I would go with using a row for the user and columns for each
item. Then
 you can have fast no look writes.

 What problems are you seeing with the reads ?

 Cheers


 -
 Aaron Morton

RE: Cassandra ACID

2011-06-30 Thread Jeremiah Jordan

For your Consistency case, it is actually an ALL read that is needed,
not an ALL write.  ALL read, with what ever consistency level of write
that you need (to support machines dyeing) is the only way to get
consistent results in the face of a failed write which was at  ONE that
went to one node, but not the others.



From: AJ [mailto:a...@dude.podzone.net] 
Sent: Friday, June 24, 2011 11:28 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra ACID


Ok, here it is reworked; consider it a summary of the thread.  If I left
out an important point that you think is 100% correct even if you
already mentioned it, then make some noise about it and provide some
evidence so it's captured sufficiently.  And, if you're in a debate,
please try and get to a resolution; all will appreciate it.

It will be evident below that Consistency is not the only thing that is
tunable, at least indirectly.  Unfortunately, you still can't
tunafish.  Ar ar ar.

Atomicity
All individual writes are atomic at the row level.  So, a batch mutate
for one specific key will apply updates to all the columns for that one
specific row atomically.  If part of the single-key batch update fails,
then all of the updates will be reverted since they all pertained to one
key/row.  Notice, I said 'reverted' not 'rolled back'.  Note: atomicity
and isolation are related to the topic of transactions but one does not
imply the other.  Even though row updates are atomic, they are not
isolated from other users' updates or reads.
Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Consistency
Cassandra does not provide the same scope of Consistency as defined in
the ACID standard.  Consistency in C* does not include referential
integrity since C* is not a relational database.  Any referential
integrity required would have to be handled by the client.  Also, even
though the official docs say that QUORUM writes/reads is the minimal
consistency_level setting to guarantee full consistency, this assumes
that the write preceding the read does not fail (see comments below).
Therefore, an ALL write would be necessary prior to a QUORUM read of the
same data.  For a multi-dc scenario use an ALL write followed by a
EACH_QUORUM read.
Refs: http://wiki.apache.org/cassandra/ArchitectureOverview

Isolation
NOTHING is isolated; because there is no transaction support in the
first place.  This means that two or more clients can update the same
row at the same time.  Their updates of the same or different columns
may be interleaved and leave the row in a state that may not make sense
depending on your application.  Note: this doesn't mean to say that two
updates of the same column will be corrupted, obviously; columns are the
smallest atomic unit ('atomic' in the more general thread-safe context).
Refs: None that directly address this explicitly and clearly and in one
place.

Durability
Updates are made highly durable at the level comparable to a DBMS by the
use of the commit log.  However, this requires commitlog_sync: batch
in cassandra.yaml.  For some performance improvement with some cost
in durability you can specify commitlog_sync: periodic.  See
discussion below for more details.
Refs: Plenty + this thread.



On 6/24/2011 1:46 PM, Jim Newsham wrote: 

On 6/23/2011 8:55 PM, AJ wrote: 

Can any Cassandra contributors/guru's confirm my
understanding of Cassandra's degree of support for the ACID properties?

I provide official references when known.  Please let me
know if I missed some good official documentation.

Atomicity
All individual writes are atomic at the row level.  So,
a batch mutate for one specific key will apply updates to all the
columns for that one specific row atomically.  If part of the single-key
batch update fails, then all of the updates will be reverted since they
all pertained to one key/row.  Notice, I said 'reverted' not 'rolled
back'.  Note: atomicity and isolation are related to the topic of
transactions but one does not imply the other.  Even though row updates
are atomic, they are not isolated from other users' updates or reads.

Refs:
http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Consistency
If you want 100% consistency, use consistency level
QUORUM for both reads and writes and EACH_QUORUM in a multi-dc scenario.

Refs:
http://wiki.apache.org/cassandra/ArchitectureOverview




This is a pretty narrow interpretation of consistency.  In a
traditional database, consistency prevents you from getting into a
logically inconsistent state, where records in one table do not agree
with records in another table.  This includes referential integrity,
cascading deletes, etc.  It seems to me Cassandra has no support for
this concept whatsoever.

RE: RAID or no RAID

2011-06-29 Thread Jeremiah Jordan

With multiple data dirs you are still limited by the space free on any
one drive.  So if you have two data dirs with 40GB free on each, and you
have 50GB to be compacted, it won't work, but if you had a raid, you
would have 80GB free and could compact... 

-Original Message-
From: mcasandra [mailto:mohitanch...@gmail.com] 
Sent: Tuesday, June 28, 2011 7:55 PM
To: cassandra-u...@incubator.apache.org
Subject: Re: RAID or no RAID

aaron morton wrote:

 Not sure what the intended purpose is, but we've mostly used it as an

 emergency disk-capacity-increase option

 Thats what I've used it for.  

 Cheers

How does compaction work in terms of utilizing multiple data dirs? Also,
is there a reference on wiki somewhere that says not to use multiple
data dirs?

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or
-no-RAID-tp6522904p6527219.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive
at Nabble.com.

RE: Docs: Token Selection

2011-06-17 Thread Jeremiah Jordan

Run two Cassandra clusters... 

-Original Message-
From: Eric tamme [mailto:eta...@gmail.com] 
Sent: Friday, June 17, 2011 11:31 AM
To: user@cassandra.apache.org
Subject: Re: Docs: Token Selection

 What I don't like about NTS is I would have to have more replicas than 
 I need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt that 2 
 local replicas was insufficient, I'd have to move up to RF=6 which 
 seems like a waste... I'm predicting data in the TB range so I'm 
 trying to keep replicas to a minimum.

 My goal is to have 2-3 replicas in a local data center and 1 replica 
 in another dc.  I think that would be enough barring a major 
 catastrophe.  But, I'm not sure this is possible.  I define local as 
 in the same data center as the client doing the insert/update.

Yes, not being able to configure the replication factor differently for each 
data center is a bit annoying.  Im assuming you basically want DC1 to have a 
replication factor of {DC1:2, DC2:1} and DC2 to have {DC1:1,DC2:2}.

I would very much like that feature as well, but I dont know the feasibility of 
it.

-Eric

RE: Docs: Token Selection

2011-06-17 Thread Jeremiah Jordan

Run two clusters, one which has {DC1:2, DC2:1} and one which is
{DC1:1,DC2:2}.  You can't have both in the same cluster, otherwise it
isn't possible to tell where the data got written when you want to read
it.  For a given key XYZ you must be able to compute which nodes it is
stored on just using XYZ, so a strategy where it is on nodes
DC1_1,DC1_2, and DC2_1 when a node in DC1 is the coordinator, and to
DC1_1, DC2_1 and DC2_2 when a node in DC2 is the coordinator won't work.
Given just XYZ I don't know where to look for the data.
But, from the way you describe what you want to happen, clients
from DC1 aren't using data inserted by clients from DC2, so you should
just make two different Cassandra clusters.  Once for the DC1 guys which
is {DC1:2, DC2:1} and one for the DC2 guys which is {DC1:1,DC2:2}.

-Original Message-
From: AJ [mailto:a...@dude.podzone.net] 
Sent: Friday, June 17, 2011 1:02 PM
To: user@cassandra.apache.org
Subject: Re: Docs: Token Selection

Hi Jeremiah, can you give more details?

Thanks

On 6/17/2011 10:49 AM, Jeremiah Jordan wrote:
 Run two Cassandra clusters...

 -Original Message-
 From: Eric tamme [mailto:eta...@gmail.com]
 Sent: Friday, June 17, 2011 11:31 AM
 To: user@cassandra.apache.org
 Subject: Re: Docs: Token Selection

 What I don't like about NTS is I would have to have more replicas 
 than I need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt 
 that 2 local replicas was insufficient, I'd have to move up to RF=6 
 which seems like a waste... I'm predicting data in the TB range so 
 I'm trying to keep replicas to a minimum.

 My goal is to have 2-3 replicas in a local data center and 1 replica 
 in another dc.  I think that would be enough barring a major 
 catastrophe.  But, I'm not sure this is possible.  I define local 
 as in the same data center as the client doing the insert/update.
 Yes, not being able to configure the replication factor differently
for each data center is a bit annoying.  Im assuming you basically want
DC1 to have a replication factor of {DC1:2, DC2:1} and DC2 to have
{DC1:1,DC2:2}.

 I would very much like that feature as well, but I dont know the
feasibility of it.

 -Eric

RE: Docs: Why do deleted keys show up during range scans?

2011-06-14 Thread Jeremiah Jordan

I am pretty sure how Cassandra works will make sense to you if you think
of it that way, that rows do not get deleted, columns get deleted.
While you can delete a row, if I understand correctly, what happens is a
tombstone is created which matches every column, so in effect it is
deleting the columns, not the whole row.  A row key will not be
forgotten/deleted until there are no columns or tombstones which
reference it.  Until there are no references to that row key in any
SSTables you can still get that key back from the API.

-Jeremiah

-Original Message-
From: AJ [mailto:a...@dude.podzone.net] 
Sent: Monday, June 13, 2011 12:11 PM
To: user@cassandra.apache.org
Subject: Re: Docs: Why do deleted keys show up during range scans?

On 6/13/2011 10:14 AM, Stephen Connolly wrote:

 store the query inverted.

 that way empty -  deleted

I don't know what that means... get the other columns?  Can you
elaborate?  Is there docs for this or is this a hack/workaround?

 the tombstones are stored for each column that had data IIRC... but at

 this point my grok of C* is lacking
I suspected this, but wasn't sure.  It sounds like when a row is
deleted, a tombstone is not attached to the row, but to each column???
So, if all columns are deleted then the row is considered deleted?
Hmmm, that doesn't sound right, but that doesn't mean it isn't ! ;o)

RE: Docs: Why do deleted keys show up during range scans?

2011-06-14 Thread Jeremiah Jordan

Also, tombstone's are not attached anywhere.  A tombstone is just a
column with special value which says I was deleted.  And I am pretty
sure they go into SSTables etc the exact same way regular columns do.

-Original Message-
From: Jeremiah Jordan [mailto:jeremiah.jor...@morningstar.com] 
Sent: Tuesday, June 14, 2011 11:22 AM
To: user@cassandra.apache.org
Subject: RE: Docs: Why do deleted keys show up during range scans?

I am pretty sure how Cassandra works will make sense to you if you think
of it that way, that rows do not get deleted, columns get deleted.
While you can delete a row, if I understand correctly, what happens is a
tombstone is created which matches every column, so in effect it is
deleting the columns, not the whole row.  A row key will not be
forgotten/deleted until there are no columns or tombstones which
reference it.  Until there are no references to that row key in any
SSTables you can still get that key back from the API.

-Jeremiah

-Original Message-
From: AJ [mailto:a...@dude.podzone.net]
Sent: Monday, June 13, 2011 12:11 PM
To: user@cassandra.apache.org
Subject: Re: Docs: Why do deleted keys show up during range scans?

On 6/13/2011 10:14 AM, Stephen Connolly wrote:

 store the query inverted.

 that way empty -  deleted

I don't know what that means... get the other columns?  Can you
elaborate?  Is there docs for this or is this a hack/workaround?

 the tombstones are stored for each column that had data IIRC... but at

 this point my grok of C* is lacking
I suspected this, but wasn't sure.  It sounds like when a row is
deleted, a tombstone is not attached to the row, but to each column???
So, if all columns are deleted then the row is considered deleted?
Hmmm, that doesn't sound right, but that doesn't mean it isn't ! ;o)

RE: how to know there are some columns in a row

2011-06-08 Thread Jeremiah Jordan

I am pretty sure this would cut down on network traffic, but not on Disk
IO or CPU use.  I think Cassandra would still have to deserialize the
whole column to get to the name.  So if you really have a use case where
you just want the name, it would be better to store a separate name
with no data column.



From: Patrick de Torcy [mailto:pdeto...@gmail.com] 
Sent: Wednesday, June 08, 2011 4:00 AM
To: user@cassandra.apache.org
Subject: Re: how to know there are some columns in a row


There is no reason for ambiguities...
We could add in the api another method call (similar to get_count) :



get_columnNames


*   liststring get_columnNames(key, column_parent, predicate,
consistency_level) 

Get the columns names present in column_parent within the predicate. 

The method is not O(1). It takes all the columns from disk to calculate
the answer. The only benefit of the method is that you do not need to
pull all their values over Thrift interface to get their names



(just to get the idea...)

In fact column names can really be data in themselves, so there should
be a way to retrieve them (without their values). When you have big
values, it's a real show stopper to use get_slice, since a lot of
unnecessary traffic would be generated...

Forgive me if I am a little insistent, but it's important for us and I'm
sure we are not the only ones interested in this feature...

cheers

RE: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Jeremiah Jordan

Don't manually delete things.  Let Cassandra do it.  Force a garbage
collection or restart your instance and Cassandra will delete the unused
files.

-Original Message-
From: AJ [mailto:a...@dude.podzone.net] 
Sent: Tuesday, June 07, 2011 10:15 AM
To: user@cassandra.apache.org
Subject: Re: Backups, Snapshots, SSTable Data Files, Compaction

On 6/7/2011 2:29 AM, Maki Watanabe wrote:
 You can find useful information in:
 http://www.datastax.com/docs/0.8/operations/scheduled_tasks

 sstables are immutable. Once it written to disk, it won't be updated.
 When you take snapshot, the tool makes hard links to sstable files.
 After certain time, you will have some times of memtable flushs, so 
 your sstable files will be merged, and obsolete sstable files will be 
 removed. But snapshot set will remains on your disk, for backup.

Thanks for the doc source.  I will be experimenting with 0.8.0 since it
has many features I've been waiting for.

But, still, if the snapshots don't link to all of the previous sets of
.db files, then those unlinked previous file sets MUST be safe to
manually delete.  But, they aren't deleted until later after a GC.  It's
a bit confusing why they are kept after compaction up until GC when they
seem to not be needed.  We have Big Data plans... one node can have 10's
of TBs, so I'm trying to get an idea of how much disk space will be
required and whether or not I can free-up some disk space.

Hopefully someone can still elaborate on this.

RE: Reading quorum

2011-06-03 Thread Jeremiah Jordan

Only waiting for quorum responses and then resolving the one with the latest 
timestamp to return to the client.

From: Fredrik Stigbäck [mailto:fredrik.l.stigb...@sitevision.se] 
Sent: Friday, June 03, 2011 9:44 AM
To: user@cassandra.apache.org
Subject: Reading quorum

Does reading quorum mean only waiting for quorum respones or does it mean 
quorum respones with same latest timestamp?

Regards
/Fredrik

RE: Loading Keyspace from YAML in 0.8

2011-06-03 Thread Jeremiah Jordan

Or at least someone should write a script which will take a YAML config
and turn it into a CLI script.

From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Friday, June 03, 2011 12:00 PM
To: user@cassandra.apache.org
Subject: Re: Loading Keyspace from YAML in 0.8

On Fri, Jun 3, 2011 at 12:35 PM, Paul Loy ketera...@gmail.com wrote:

ugh!

On Fri, Jun 3, 2011 at 5:19 PM, Edward Capriolo
edlinuxg...@gmail.com wrote:

On Fri, Jun 3, 2011 at 12:14 PM, Paul Loy
ketera...@gmail.com wrote:

We embed cassandra in our app. When we first
load a cluster, we specify one node in the cluster as the seed node.
This node installs the schema using
StorageService.instance.loadKeyspacesFromYAML(). This call has
disappeared in 0.8.

How can we do the same thing in Cassandra 0.8?

Thanks,

-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

That was only a feature for migration from 0.6.X-0.7.X.

You can use bin/bassandra-cli -f file_with defs

But I would use the methods in thrift such as
system_add_keyspace(). 

Edward

-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

Yes, Cassandra is very aggressive with deprecating stuff. However to be
fair it is clear that StorageService is subject to change at any time. 
With things like this I personally do not see the harm in letting them
hang around for a while. In fact I really think it should be added back,
because it makes me wonder what the MANY people going from 0.6.X to
0.8.X are going to do.

RE: Appending to fields

2011-06-01 Thread Jeremiah Jordan

Cassandra handles this by using a different design, you don't append
anything.  You use the fact that in Cassandra you have dynamic columns
and you make a new column every time you want to put more data in.  Then
when you do finally need to read the data out you read out a slice of
columns, not just one column.

-Jeremiah

-Original Message-
From: Marcus Bointon [mailto:mar...@synchromedia.co.uk] 
Sent: Tuesday, May 31, 2011 2:23 PM
To: user@cassandra.apache.org
Subject: Appending to fields

I'm wondering how cassandra implements appending values to fields. Since
(so the docs tell me) there's not really any such thing such thing as an
update in Cassandra, I wonder if it falls into the same trap as MySQL
does. With a query like update x set y = concat(y, 'a') where id = 1,
mysql reads the entire value of y, appends the data, then writes the
whole thing back, which unfortunately is an O(n^2) operation. The
situation I'm doing this in involves what amount to log files on
hundreds of thousands of items, many of which might need updating at
once, so they're all simple appends, but it becomes unusably slow very
quickly. In MySQL it's just a plain bug as it could optimise this by
appending data at a known offset and then bumping up the field length
counter, which is back in at least O(n) territory. Does cassandra's
design avoid this problem?

Marcus

java.lang.RuntimeException: Cannot recover SSTable with version a (current version f).

2011-05-05 Thread Jeremiah Jordan

Running repair and I am getting this error:
java.lang.RuntimeException: Cannot recover SSTable with version a
(current version f).
 at
org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWrite
r.java:237)
 at
org.apache.cassandra.db.CompactionManager.submitSSTableBuild(CompactionM
anager.java:938)
 at
org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.
java:107)
 at
org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStr
eamReader.java:112)
 at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamR
eader.java:61)
 at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection
.java:91)

The comment by that exception is:
// TODO: streaming between different versions will fail:
need support for
// recovering other versions to provide a stable streaming
api

This cluster was updated from 0.6.8-0.7.4-0.7.5.  Do I need to run
scrub or compact or something to get all the sstables updated to the new
version?



Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

RE: Replica data distributing between racks

2011-05-03 Thread Jeremiah Jordan

So we are currently running a 10 node ring in one DC, and we are going to be 
adding 5 more nodes
in another DC.  To keep the rings in each DC balanced, should I really 
calculate the tokens independently
and just make sure none of them are the same? Something like:

DC1 (RF 5):
1:  0
2:  17014118346046923173168730371588410572
3:  34028236692093846346337460743176821144
4:  51042355038140769519506191114765231716
5:  68056473384187692692674921486353642288
6:  85070591730234615865843651857942052860
7:  102084710076281539039012382229530463432
8:  119098828422328462212181112601118874004
9:  136112946768375385385349842972707284576
10: 153127065114422308558518573344295695148

DC2 (RF 3):
1:  1 (one off from DC1 node 1)
2:  34028236692093846346337460743176821145 (one off from DC1 node 3)
3:  68056473384187692692674921486353642290 (two off from DC1 node 5)
4:  102084710076281539039012382229530463435 (three off from DC1 node 7)
5:  136112946768375385385349842972707284580 (four off from DC1 node 9)

Originally I was thinking I should spread the DC2 nodes evenly in between every 
other DC1 node.
Or does it not matter where they are in respect to the DC1 nodes, and long as 
they fall somewhere
after every other DC1 node? So it is DC1-1, DC2-1, DC1-2, DC1-3, DC2-2, DC1-4, 
DC1-5...

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Tuesday, May 03, 2011 9:14 AM
To: user@cassandra.apache.org
Subject: Re: Replica data distributing between racks

Right, when you are computing balanced RP tokens for NTS you need to compute 
the tokens for each DC independently.

On Tue, May 3, 2011 at 6:23 AM, aaron morton aa...@thelastpickle.com wrote:
 I've been digging into this and worked was able to reproduce something, not 
 sure if it's a fault and I can't work on it any more tonight.


 To reproduce:
 - 2 node cluster on my mac book
 - set the tokens as if they were nodes 3 and 4 in a 4 node cluster, 
 e.g. node 1 with 85070591730234615865843651857942052864 and node 2 
 127605887595351923798765477786913079296
 - set cassandra-topology.properties to put the nodes in DC1 on RAC1 
 and RAC2
 - create a keyspace using NTS and strategy_options = [{DC1:1}]

 Inserted 10 rows they were distributed as
 - node 1 - 9 rows
 - node 2 - 1 row

 I *think* the problem has to do with TokenMetadata.firstTokenIndex(). It 
 often says the closest token to a key is the node 1 because in effect...

 - node 1 is responsible for 0 to 
 85070591730234615865843651857942052864
 - node 2 is responsible for 85070591730234615865843651857942052864 to 
 127605887595351923798765477786913079296
 - AND node 1 does the wrap around from 
 127605887595351923798765477786913079296 to 0 as keys that would insert past 
 the last token in the ring array wrap to 0 because  insertMin is false.

 Thoughts ?

 Aaron


 On 3 May 2011, at 10:29, Eric tamme wrote:

 On Mon, May 2, 2011 at 5:59 PM, aaron morton aa...@thelastpickle.com wrote:
 My bad, I missed the way TokenMetadata.ringIterator() and firstTokenIndex() 
 work.

 Eric, can you show the output from nodetool ring ?



 Sorry if the previous paste was way to unformatted, here is a 
 pastie.org link with nicer formatting of nodetool ring output than 
 plain text email allows.

 http://pastie.org/private/50khpakpffjhsmgf66oetg





--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support 
http://www.datastax.com

RE: best way to backup

2011-04-30 Thread Jeremiah Jordan

The files inside the keyspace folders are the SSTable.

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Friday, April 29, 2011 4:49 PM
To: user@cassandra.apache.org
Subject: Re: best way to backup

William,  
Some info on the sstables from me
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ If you
want to know more check out the BigTable and original Facebook papers,
linked from the wiki

http://wiki.apache.org/cassandra/ArchitectureOverview Aaron

On 29 Apr 2011, at 23:43, William Oberman wrote:

Dumb question, but referenced twice now: which files are the
SSTables and why is backing them up incrementally a win? 

Or should I not bother to understand internals, and instead just
roll with the backup my keyspace(s) and system in a compressed tar
strategy, as while it may be excessive, it's guaranteed to work and work
easily (which I like, a great deal).

will

On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:

What we are about to set up is a time machine like
backup. This is more like an add on to the s3 backup. 

Our boxes have an additional larger drive for local
backup. We create a new backup snaphot every x hours which hardlinks the
files in the previous snapshot (bit like cassandras incremental_backups
thing) and than we sync that snapshot dir with the cassandra data dir.
We can do archiving / backup to external system from there without
impacting the main data raid.

But the main reason to do this is to have an 'omg we
screwed up big time and deleted / corrupted data' recovery.

On Apr 28, 2011, at 9:53 PM, William Oberman wrote:

Even with N-nodes for redundancy, I still want
to have backups.  I'm an amazon person, so naturally I'm thinking S3.
Reading over the docs, and messing with nodeutil, it looks like each new
snapshot contains the previous snapshot as a subset (and I've read how
cassandra uses hard links to avoid excessive disk use).  When does that
pattern break down?  

I'm basically debating if I can do a rsync
like backup, or if I should do a compressed tar backup.  And I obviously
want multiple points in time.  S3 does allow file versioning, if a file
or file name is changed/resused over time (only matters in the rsync
case).  My only concerns with compressed tars is I'll have to have free
space to create the archive and I get no delta space savings on the
backup (the former is solved by not allowing the disk space to get so
low and/or adding more nodes to bring down the space, the latter is
solved by S3 being really cheap anyways).

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Changing replica placement strategy

2011-04-18 Thread Jeremiah Jordan

If I am currently only running with one data center, can I change the
replica_placement_strategy from
org.apache.cassandra.locator.RackUnawareStrategy to
org.apache.cassandra.locator.NetworkTopologyStrategy without issue?  We
are planning to add another data center in the near future and want to
be able to use NetworkTopologyStrategy.  I am pretty sure
RackUnawareStrategy and NetworkTopologyStrategy pick the same nodes to
put data on if there is only one DC, so it should be ok right?



Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

Link to Hudson on the download page is broken

2011-04-13 Thread Jeremiah Jordan

The apache Hudson server address needs to be updated on the download
page, it is now:
https://builds.apache.org
The link to the latest builds from the download page:
http://cassandra.apache.org/download/
Needs to be updated from:
http://hudson.zones.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/
artifact/cassandra/build/
To:
https://builds.apache.org/hudson/job/Cassandra/lastSuccessfulBuild/artif
act/cassandra/build/

The old link doesn't work anymore.

-Jeremiah


Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

RE: Abnormal memory consumption

2011-04-07 Thread Jeremiah Jordan

Connect with jconsole and watch the memory consumption graph.  Click the
force GC button watch what the low point is, that is how much memory is
being used for persistent stuff, the rest is garbage generated while
satisfying queries.  Run a query, watch how the graph spikes up when you
run your query, that is how much is needed for the query.  Like others
have said, Cassandra isn't using 600Mb of RAM, the Java Virtual Machine
is using 600Mb of RAM, because your settings told it it could.  The JVM
will use as much memory as your settings allow it to.  If you really are
putting that little data into your test server, you should be able to
tune everything down to only 256Mb easily (I do this for test instances
of Cassandra that I spin up to run some tests on), maybe further.
 
-Jeremiah



From: openvictor Open [mailto:openvic...@gmail.com] 
Sent: Wednesday, April 06, 2011 7:59 PM
To: user@cassandra.apache.org
Subject: Re: Abnormal memory consumption


Hello Paul,

Thank you for the tip. The random port attribution policy of JMX was
really making me mad ! Good to know there is a solution for that
problem.

Concerning the rest of the conversation, my only concern is that as an
administrator and a student it is hard to constantly watch  Cassandra
instances so that they don't crash. As much as I love the principle of
Cassandra, being constantly afraid of memory consumption is an issue in
my opinion. That being said, I took a new 16 Gb server today, but I
don't want Cassandra to eat up everything if it is not needed, because
Cassandra will have some neighbors such as Tomcat, solR on this server. 
And for me it is very weird that on my small instance where I put a lot
of constraints like throughput_memtableInMb to 6 Cassandra uses 600 Mb
of ram for 6 Mb of data. It seems to be a little bit of an overkill to
me... And so far I failed to find any information on what this massive
overhead can be...

Thank you for your answers and for taking the time to answer my
questions.


2011/4/6 Paul Choi paulc...@plaxo.com


You can use JMX over ssh by doing this:

http://blog.reactive.org/2011/02/connecting-to-cassandra-jmx-via-ssh.htm
l
Basically, you use SSH -D to do dynamic application port
forwarding.

In terms of scaling, you'll be able to afford 120GB RAM/node in
3 years if you're successful. Or, a machine with much less RAM and
flash-based storage. :)
Seriously, though, the formula in the tuning guidelines is a
guideline. You can probably get acceptable performance with much less.
If not, you can shard your app such that you host a few Cfs per cluster.
I doubt you'll need to though.



From: openvictor Open openvic...@gmail.com
Reply-To: user@cassandra.apache.org
Date: Mon, 4 Apr 2011 18:24:25 -0400
To: user@cassandra.apache.org
Subject: Re: Abnormal memory consumption


Okay, I see. But isn't there a big issue for scaling here ? 
Imagine that I am the developper of a certain very successful
website : At year 1 I need 20 CF. I might need to have 8Gb of RAM. Year
2 I need 50 CF because I added functionalities to my wonderful webiste
will I need 20 Gb of RAM ? And if at year three I had 300 Column
families, will I need 120 Gb of ram / node ? Or did I miss something
about memory consuption ?

Thank you very much,

Victor


2011/4/4 Peter Schuller peter.schul...@infidyne.com


 And about the production 7Gb or RAM is sufficient ? Or
11 Gb is the minimum
 ?
 Thank you for your inputs for the JVM I'll try to tune
that


Production mem reqs are mostly dependent on memtable
thresholds:

  http://www.datastax.com/docs/0.7/operations/tuning

If you enable key caching or row caching, you will have
to adjust
accordingly as well.

--
/ Peter Schuller

Secondary Index keeping track of column names

2011-04-06 Thread Jeremiah Jordan

In 0.7.X is there a way to have an automatic secondary index
which keeps track of what keys contain a certain column?  Right now we
are keeping track of this manually, so we can quickly get all of the
rows which contain a given column, it would be nice if it was automatic.

-Jeremiah


Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

Thrift version

2011-04-05 Thread Jeremiah Jordan

Anyone know if 0.7.4 will work with thirft 0.6?  Or do I have to keep
thrift 0.5 around to use it?

Thanks!

Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

1 2 >

1 - 100 of 109 matches

Mail list logo