Re: question about deleting from cassandra

2010-03-13 Thread Jonathan Ellis
You should submit your minor change to jira for others who might want to try it. On Sat, Mar 13, 2010 at 3:18 AM, Weijun Li weiju...@gmail.com wrote: Tried Sylvain's feature in 0.6 beta2 (need minor change) and it worked perfectly. Without this feature, as far as you have high volume new and

Re: About the replication strategy of Cassandra

2010-03-13 Thread Jonathan Ellis
On Sat, Mar 13, 2010 at 12:51 AM, Kauzki Aranami kazuki.aran...@gmail.com wrote: 1. Please give notes the replication strategy of Cassandra is selected. Can you be more specific? 2. About the Zab protocol adopted with Zookeeper. The weak point of the Paxos protocol of Chubby is a delay. Is

Re: question about deleting from cassandra

2010-03-13 Thread Jonathan Ellis
based on 0.6 beta2. -Weijun -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Saturday, March 13, 2010 5:36 AM To: cassandra-user@incubator.apache.org Subject: Re: question about deleting from cassandra You should submit your minor change to jira for others

Re: get_range_slice(s) question

2010-03-12 Thread Jonathan Ellis
That would be a bug, not intended behavior. Can you open a ticket? On Fri, Mar 12, 2010 at 11:48 AM, Omer van der Horst Jansen ome...@yahoo.com wrote: I've noticed that both 0.5.1 and 0.6b2 return (ReplicationFactor) identical copies of the data stored in my keyspace whenever I make a call to

Re: How to force GC in Cassandra?

2010-03-12 Thread Jonathan Ellis
I think you mean compaction? You can use nodeprobe / nodetool for that. http://wiki.apache.org/cassandra/NodeProbe On Fri, Mar 12, 2010 at 12:40 PM, Weijun Li weiju...@gmail.com wrote: Suppose I insert a lot of new items but also delete a lot of new items daily, it will be ideal if I can

Re: Grails Cassandra plugin

2010-03-12 Thread Jonathan Ellis
Great! You should also link it from http://wiki.apache.org/cassandra/ClientExamples (click Login at the top to create an account.) On Fri, Mar 12, 2010 at 3:57 PM, Ned Wolpert ned.wolp...@imemories.com wrote: Folks-   I put together a quick n' dirty grails plugin for Cassandra, wrapped with

Re: Cassandra 0.5.1 get_key_range problem

2010-03-12 Thread Jonathan Ellis
get_key_range is deprecated. You should use get_range_slice. On Fri, Mar 12, 2010 at 3:59 PM, Jon Graham sjclou...@gmail.com wrote: Hello, When using the get_key_range method with ConsistencyLevel.ONE an entire block of keys is not returned. I loop over the get_key_range method, advancing

Re: Cassandra Demo/Tutorial Applications

2010-03-12 Thread Jonathan Ellis
On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar ksanka...@gmail.com wrote: I was looking at this from CASSANDRA-873 as well as hands-on homework (!) for my OSCON tutorial. Have couple of questions. Would appreciate insights: A)  Cassandra-873 suggests Luenandra as one demo application B)  Are

Re: Hackathon?!?

2010-03-11 Thread Jonathan Ellis
goffi...@digg.com wrote: We could do it on April 22 (1 week later), that's my birthday :-) What better way to celebrate haha. -Chris On Mar 10, 2010, at 9:58 AM, Jonathan Ellis wrote: I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp

Re: Effective allocation of multiple disks

2010-03-11 Thread Jonathan Ellis
Except that for a major compaction the whole thing gets put in one directory. That's the problem w/ the JBOD approach. On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans eev...@rackspace.com wrote: On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: On Wed, Mar 10, 2010 at 9:31 PM, Anthony

Re: SuperColumn.getSubColumns() ordering

2010-03-11 Thread Jonathan Ellis
it's ordered by the column name as determined by the subcolumn comparator you declared in the definition, yes On Thu, Mar 11, 2010 at 12:24 PM, Matteo Caprari matteo.capr...@gmail.com wrote: Hi. If I iterate over SuperColumn.getSubColumn(), do I get columns sorted by the column name?

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Jonathan Ellis
Cool! On Thu, Mar 11, 2010 at 11:12 PM, Padraig O'Sullivan osullivan.padr...@gmail.com wrote: We have developed a C++ client library based on the hector Java client for Cassandra that we intend on using for Drizzle integration. This library is still very much alpha and more features will be

Re: Login Failure Error

2010-03-10 Thread Jonathan Ellis
Please don't use trunk unless you're actively fixing bugs. If you want the latest greatest, get the 0.6 branch from svn. On Wed, Mar 10, 2010 at 6:46 AM, shirish shirishredd...@gmail.com wrote: hello, I have just download the source code from the trunk using svn, I have set up the following

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread Jonathan Ellis
For the record, I note that no row cache is the default on user-defined CFs; we include it in the sample configuration file as an example only. On Wed, Mar 10, 2010 at 9:58 AM, Sylvain Lebresne sylv...@yakaz.com wrote: So did you disable the row cache entirely? Yes (getting back reasonable

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
Thanks for testing that, added a note to http://wiki.apache.org/cassandra/CassandraHardware on stripe size. On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss bburr...@real.com wrote: with the file sizes we're talking about with cassandra and other database products, the stripe size doesn't seem

Re: schema design question

2010-03-10 Thread Jonathan Ellis
', 'www.example.com')),                ],                'Item_likers': [                        Mutation(Column('user_1', 'xx')),                        Mutation(Column('user_2', 'xx'))                ]        } } On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue

Re: Hackathon?!?

2010-03-10 Thread Jonathan Ellis
I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp it at their own conference. On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote: Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM,

Re: NoSQL live tomorrow

2010-03-10 Thread Jonathan Ellis
http://nosqlboston.eventbrite.com/ don't know about recording / casting plans. On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines tmhai...@gmail.com wrote: Hey Jonathan, What event is this and will it be livecasted/recorded? Cheers, Tim. On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel

Re: Testing row cache feature in trunk: write should put record in cache

2010-03-10 Thread Jonathan Ellis
stands I'll take you up on it. I took a crack at it in https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to get my feet wet with the code. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, February 16, 2010 9:22 PM To: cassandra

Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-10 Thread Jonathan Ellis
I think he means how the column names are rendered as bytes but the values are strings. On Wed, Mar 10, 2010 at 5:22 PM, Brandon Williams dri...@gmail.com wrote: On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote: I am checking out 0.6.0-beta2 since I need the batch-mutate

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: I would almost recommend just keeping things simple and removing multiple data directories from the config altogether and just documenting that you should plan on using OS level mechanisms for growing

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne sylv...@yakaz.com wrote:  1) stress.py -t 10 -o read -n 5000 -c 1 -r  2) stress.py -t 10 -o read -n 50 -c 1 -r In the case 1) I get around 200 reads/seconds and that's pretty stable. The disk is spinning like crazy (~25% io_wait), very

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 8:31 AM, Sylvain Lebresne sylv...@yakaz.com wrote: Well, unless I'm mistaking, that's the same in my example as I give in both case to stress.py the option '-c 1' which tells it to retrieve only one column each time even in the case where I have 100 columns by row. Oh.

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari matteo.capr...@gmail.com wrote: Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: That's true.  So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long

Re: IllegalStateException: Queue full

2010-03-09 Thread Jonathan Ellis
v2 of patch attached to #864 (replaces old one) On Tue, Mar 9, 2010 at 6:08 PM, Todd Burruss bburr...@real.com wrote: using tip of 0.6 branch with 864.txt patch.  i have 4 nodes, one node is overcome with compaction right now.  i started with no load then added a tiny bit of load and almost

Re: Hackathon?!?

2010-03-09 Thread Jonathan Ellis
I can make it. \o/ On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it.

Re: atomicity across keys and secondary index support

2010-03-09 Thread Jonathan Ellis
Atomicity: no. 2ary indexes: CASSANDRA-749 is targeting the 0.8 release 2010/3/9 Patricio Echagüe patric...@gmail.com: Hey Jonathan, has there been any update on this feature? Thanks a lot Patricio On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis jbel...@gmail.com wrote: that is still very

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad erikhols...@gmail.com wrote: I was probably a little bit unclear here. I'm wondering about the two byte[] in Column. One for name and one for value. I was under the impression that the skiplistmap wraps the Columns, not that the name and the value

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
It means that you're doing a lot of reads that saw multiple versions of the answer, which depending on your workload may be normal On Mon, Mar 8, 2010 at 5:31 PM, B. Todd Burruss bburr...@real.com wrote: i am seeing a lot of these INFO level messages in cassandra server's logs: 2010-03-08

Re: Cassandra latency question

2010-03-08 Thread Jonathan Ellis
something is screwed up if writes are 10x slower than reads On Mon, Mar 8, 2010 at 5:52 PM, David Dabbs dmda...@gmail.com wrote: Hello. I've been running the vPork load generator against two Cassandra nodes running in VMs. I'm running a trunk build with W=2 and R=1 and out-of-the-box JVM_OPTS

Re: schema design question

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari matteo.capr...@gmail.com wrote: The 'key' queries are: These map straightforwardly to one CF per query. - list all the items a user liked row key is user id, columns names are timeuuid of when the like-ing occurred, column value is either item

Re: DigestMismatchException

2010-03-08 Thread Jonathan Ellis
) and one of them must have been from the third replica that may not have been updated yet by async replication? On Mon, 2010-03-08 at 15:36 -0800, Jonathan Ellis wrote: It means that you're doing a lot of reads that saw multiple versions of the answer, which depending on your workload may

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 2:13 AM, shiv shivaji shivaji...@yahoo.com wrote: 1. Is there a way to estimate the time it would take to compact this work load? I hope the load balancing will be much faster after the compaction. Curious how fast I can get the transfer once compaction is done. 0.6

Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Jonathan Ellis
Generally, you want to have different types of data in different CFs so you can tune them separately (key / row caches). Mixing different row types in one CF also makes doing get_slice_range scans difficult. On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad erikhols...@gmail.com wrote: What are the

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-05 Thread Jonathan Ellis
On Fri, Mar 5, 2010 at 1:36 PM, shiv shivaji shivaji...@yahoo.com wrote: Sorry, how to get compaction progress with 0.6. Is it in nodetool or somewhere else? I tried a few options after nodetool and did not get this info. it's under CompactionManager in jmx. I'm not sure if nodetool exposes

Re: ConcurrentModificationException

2010-03-05 Thread Jonathan Ellis
Fixed, thanks. On Fri, Mar 5, 2010 at 11:12 AM, B. Todd Burruss bburr...@real.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-853 On Thu, 2010-03-04 at 19:00 -0800, Jonathan Ellis wrote: This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4

Re: Unreliable transport layer

2010-03-05 Thread Jonathan Ellis
In 0.6 gossip is over TCP. On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Hey guys! I have a simple question. I'm a casual observer, not a real Cassandra user yet. So, excuse my ignorance. I see that the Gossip feature uses UDP. I was curious to know if

Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)

2010-03-04 Thread Jonathan Ellis
You probably need to switch the server to framed thrift mode. On Thu, Mar 4, 2010 at 2:02 AM, J T jt4websi...@googlemail.com wrote: Hi, I've been trying to piece together some notion of how to use cassandra from an erlang client. So far I have managed to come up with the following, but it

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-04 Thread Jonathan Ellis
balancing seems a little slow, but I will open a new thread on that if needed. Thanks, Shiv From: Jonathan Ellis jbel...@gmail.com To: cassandra-user@incubator.apache.org Sent: Wed, March 3, 2010 9:21:28 AM Subject: Re: Anti-compaction Diskspace issue even when

Re: map/reduce question

2010-03-04 Thread Jonathan Ellis
0.6 has Hadoop map/reduce support (see contrib/wordcount for an example) but this is more for analytics (small numbers of large queries) than live web app style load. Cassandra also has range queries (get_slice_range) -- performing a slice across sequential rows, where you don't necessarily know

Re: Help with Replication Issue

2010-03-04 Thread Jonathan Ellis
On Thu, Mar 4, 2010 at 1:33 PM, joe smith water4...@yahoo.com wrote: Hi, I installed a cluster of 2 nodes using 0.5 version of binary distribution. Node 1 is on a Macbook 10.4 w/SoyLatte (java 1.6 port). Node 2 is on a Linux desktop. The configuration is straight out of the distribution -

Re: ConcurrentModificationException

2010-03-04 Thread Jonathan Ellis
This is the 0.6 beta yes? Looks like a regression, please open a ticket. On Thu, Mar 4, 2010 at 8:54 PM, Todd Burruss bburr...@real.com wrote: i'm seeing a lot of these ... any idea? 2010-03-04 18:53:21,455 ERROR [MEMTABLE-POST-FLUSHER:1] [DebuggableThreadPoolExecutor.java:94] Error in

Re: Memtable size and garbage collection in JVM

2010-03-04 Thread Jonathan Ellis
A lot of churn is hard on Cassandra because of http://wiki.apache.org/cassandra/DistributedDeletes, but Cassandra is so fast that it may make up for that depending on your needs. It's not designed to eliminate i/o entirely, no. But if you set RowsCached=100% in 0.6 you'll get something pretty

Re: Connect during bootstrapping?

2010-03-03 Thread Jonathan Ellis
INFO - Bootstrapping But when I run nodetool streams, no streams are transferring: Mode: Bootstrapping Not sending any streams. Not receiving any streams. And it doesn't look like the node is getting any data. Any ideas? Thanks for the help... Brian On 3/2/10 12:22 PM, Jonathan Ellis

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
http://issues.apache.org/jira/browse/CASSANDRA-598 2010/3/3 Ted Zlatanov t...@lifelogs.com: I don't understand the advantages of ColumnFamilies over a SuperColumnFamily with just one supercolumn.  Why have the former if the latter is functionally equivalent? Thanks Ted

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
I would rather move to a more flexible model (as many levels of nesting as you want) than a less-flexible one. 2010/3/3 Ted Zlatanov t...@lifelogs.com: On Wed, 3 Mar 2010 07:23:48 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE 2010/3/3 Ted Zlatanov t...@lifelogs.com: I don't understand

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-03 Thread Jonathan Ellis
a while. I will track the jira issue of anticompaction and diskspace. Thanks for the pointer. Thanks, Shiv From: Jonathan Ellis jbel...@gmail.com To: cassandra-user@incubator.apache.org Sent: Wed, February 24, 2010 11:34:59 AM Subject: Re: Anti-compaction

Re: finding Cassandra servers

2010-03-03 Thread Jonathan Ellis
We appear to be reaching consensus that this is solving a non-problem, so I have closed that ticket. 2010/3/3 Ted Zlatanov t...@lifelogs.com: On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman i...@holsman.net wrote: IH We could create a branch or git fork where you guys could develop it, IH and

Re: failed to identify others in a 3-node ring

2010-03-03 Thread Jonathan Ellis
You probably assigned all nodes the same token. Don't do that. :) On Wed, Mar 3, 2010 at 4:41 AM, Pahud pahud...@gmail.com wrote: Hello list, I just setup a 3-node ring in a virtualbox bridging environment. By running the 'cassandra -f' the log indicates it discovers other nodes but if I

Re: Questions while evaluating Cassandra

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner e...@gigya.com wrote: Is the procedure described in the description of ticket CASSANDRA-44 really the way to do schema changes in the latest release? I'm not sure what's your thoughts about this but our experience is that every release of our software

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jonathan Ellis
, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16 This one is harmless java.io.IOException: Value too large

Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn’t serve reads while it is bootstrapping,

Re: Index values: data or pointers?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 4:13 PM, jeremey.barr...@nokia.com wrote: I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get

Re: Looking for work

2010-03-02 Thread Jonathan Ellis
(This is not to say that I think job posts are off-topic here, because they are not.) On Tue, Mar 2, 2010 at 10:43 PM, Jonathan Ellis jbel...@gmail.com wrote: If there's one thing that's worse than a mailing list as a job board, it's a wiki. :) On Tue, Mar 2, 2010 at 10:39 PM, Ryan Daum r

Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
. And it doesn’t look like the node is getting any data. Any ideas? Thanks for the help... Brian On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I’m running 0.5 and I had 2 nodes up

Re: What's the ideal size of a column?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 11:57 PM, Cool BSD c...@coolbsd.com wrote: Be short - what's the ideal column size in real world? Long description - I'm working on a prototype, the application is a data store that holding blobs sizing from couple of KB to hundreds of MB, close to 1GB in the worst

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
0.5.0. Is this a separate package/tool? Thanks, Jon On Wed, Feb 24, 2010 at 8:17 PM, Jonathan Ellis jbel...@gmail.com wrote: nodeprobe loadbalance and/or nodeprobe move http://wiki.apache.org/cassandra/Operations On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote

Re: Storage format

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:06 PM, Erik Holstad erikhols...@gmail.com wrote: So that is kinda of what I want to do, but I want to go from a row with multiple columns to multiple rows with one column Right, and I'm trying to tell you that this is a bad idea unless you are worried about exhausting

Re: Process for removing an old CF in 0.5.0

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:41 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: Hi,  I was just wondering what the process might be for removing an old column family in 0.5.0. Can I just update the config and restart the server? Yes, but make sure your commitlog is flushed first (and

Re: Storage format

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 4:49 PM, Erik Holstad erikhols...@gmail.com wrote: Haha! Thanks. Well I'm z little bit worried about this but since the indexes are pretty small I don't think it is going to be too bad. But was mostly thinking about performance and and having the index row as a

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 3:18 PM, Jon Graham sjclou...@gmail.com wrote: Thanks Jonathan. It seems like the load balance operation isn't moving. I haven't seen any data file time changes in 2 hours and no location file time changes in over an hour. I can see a tcp port # 7000 opened on the

Re: Storage format

2010-03-01 Thread Jonathan Ellis
Then you definitely want one row, range queries are slower than we'd like right now. (Ticket to fix that: https://issues.apache.org/jira/browse/CASSANDRA-821) On Mon, Mar 1, 2010 at 5:00 PM, Erik Holstad erikhols...@gmail.com wrote: On Mon, Mar 1, 2010 at 2:51 PM, Jonathan Ellis jbel

Re: Adjusting Token Spaces and Rebalancing Data

2010-03-01 Thread Jonathan Ellis
On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16 This one is harmless java.io.IOException: Value too large for defined data type     at

Re: binary data in key names?

2010-02-27 Thread Jonathan Ellis
Keys are strings. That means they have to be UTF8-encoded, although thrift bindings for many languages (including python) don't help you with this. On Sat, Feb 27, 2010 at 7:43 PM, Robert Edmonds edmo...@debian.org wrote: hi, i'm using cassandra 0.5.0 and pycassa 0.1. i'd like to store

Re: binary data in key names?

2010-02-27 Thread Jonathan Ellis
Yes. On Sat, Feb 27, 2010 at 8:34 PM, Robert Edmonds edmo...@debian.org wrote: On 2010-02-28, Jonathan Ellis jbel...@gmail.com wrote: Keys are strings.  That means they have to be UTF8-encoded, although thrift bindings for many languages (including python) don't help you with this. ah, i

Re: cassandra freezes

2010-02-25 Thread Jonathan Ellis
The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com

Re: cassandra freezes

2010-02-25 Thread Jonathan Ellis
. On Thu, Feb 25, 2010 at 4:07 PM, Jonathan Ellis jbel...@gmail.com wrote: The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you.  In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
Compaction is why http://wiki.apache.org/cassandra/CassandraHardware recommends raid0-ing if you are concerned about free disk space limits. On Thu, Feb 25, 2010 at 1:36 PM, Gary Dusbabek gdusba...@gmail.com wrote: Cassandra always compacts to the directory with the most free space. There is

Re: 3 node installation

2010-02-25 Thread Jonathan Ellis
can tell. These are good suggestions. Thanks. (I don't know whether it is worth describing this in a JIRA as a bug. I would be willing to do it if you like me to do so.) On Thu, Feb 25, 2010 at 6:19 AM, Jonathan Ellis jbel...@gmail.com wrote: Then it sounds like a bug. Do A and B agree

Re: Attach a binary stream

2010-02-25 Thread Jonathan Ellis
Cassandra column values are byte arrays. Turning your java object into a byte[] is your responsibility. :) On Thu, Feb 25, 2010 at 10:11 AM, Charles Moulliard cmoulli...@gmail.com wrote: Hi, Is it possible to attach a binary stream to a Cassandra DB ? I would like to say is it possible to

Re: Consistency Level of CLI

2010-02-25 Thread Jonathan Ellis
CLI uses CL.ONE for reads and writes. It has no user-level documentation other than its help output. On Thu, Feb 25, 2010 at 1:08 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: What is the write and read consistency level for the CLI tool cassandra-cli ? Do the set and get

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
that maybe nodeprobe repair might do it, will it? Thanks, -Anthony On Thu, Feb 25, 2010 at 01:43:22PM -0600, Jonathan Ellis wrote: Compaction is why http://wiki.apache.org/cassandra/CassandraHardware recommends raid0-ing if you are concerned about free disk space limits. On Thu, Feb 25

Re: Would deleted columns slow down reads?

2010-02-25 Thread Jonathan Ellis
Yes, that's going to hurt forward scans with no start column. (Reverse scans, or scans that start with a known live column, will still be fast b/c of the per-row column indexes.) On Thu, Feb 25, 2010 at 8:56 PM, Edmond Lau edm...@ooyala.com wrote: Given that Cassandra needs to maintain

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
On Thu, Feb 25, 2010 at 3:54 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: What about the case where cpu and ram are underutilized, and your bottleneck is disk io (which seems to often be the case in ec2), then adding more spindles improves overall throughput of the system.  I've

Re: reads are slow

2010-02-24 Thread Jonathan Ellis
only the total row size limit (must fit in memory during compaction) On Wed, Feb 24, 2010 at 7:47 AM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns

Re: Getting the keys in your system?

2010-02-24 Thread Jonathan Ellis
0.6 adds hadoop support for exactly this scenario (among others). You can also use get_range_slice to iterate all keys against RP in 0.6, but it will be slow since it is difficult to parallelize manually. -Jonathan On Wed, Feb 24, 2010 at 9:23 AM, Erik Holstad erikhols...@gmail.com wrote: If

Re: Getting the keys in your system?

2010-02-24 Thread Jonathan Ellis
Other than you'll have to completely reload all your data when changing partitioners, no, not much to think about. :) On Wed, Feb 24, 2010 at 9:38 AM, Erik Holstad erikhols...@gmail.com wrote: Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and

Re: Cassandra paging, gathering stats

2010-02-24 Thread Jonathan Ellis
It does not. Someone would need it badly enough to code it first. :) On Wed, Feb 24, 2010 at 10:26 AM, Wojciech Kaczmarek kaczmare...@gmail.com wrote: Btw, does get_range_slice support reversed=true for keys (not column predicates) ? In 0.5 seems not On Tue, Feb 23, 2010 at 21:28, Jonathan

Re: import data into cassandra

2010-02-24 Thread Jonathan Ellis
I suggest getting it working via plain thrift calls before trying anything fancy. Otherwise it's probably premature optimization. On Wed, Feb 24, 2010 at 11:43 AM, Martin Probst ser...@preisroboter.de wrote: Hi, i'm playing around a little bit with cassandra and trying to load some data

Re: Wiki permission denied

2010-02-24 Thread Jonathan Ellis
pinged #asfinfra. looks like they fixed it. On Wed, Feb 24, 2010 at 11:09 AM, Mark Robson mar...@gmail.com wrote: Hiya, I'm looking at http://wiki.apache.org/cassandra/RecentChanges And there's an error. Can someone look into it please? Ta Mark

Re: cassandra freezes

2010-02-24 Thread Jonathan Ellis
On Wed, Feb 24, 2010 at 8:46 PM, Santal Li santal...@gmail.com wrote: BTW: Somebody in my team told me, that if the cassandra managed data was too huge( 15x than heap space) , will cause performance issues, is this true? It really has more to do with what your hot data set is, than absolute

Re: Adjusting Token Spaces and Rebalancing Data

2010-02-24 Thread Jonathan Ellis
nodeprobe loadbalance and/or nodeprobe move http://wiki.apache.org/cassandra/Operations On Wed, Feb 24, 2010 at 6:17 PM, Jon Graham sjclou...@gmail.com wrote: Hello, I have 6 node Cassandra 0.5.0 cluster using org.apache.cassandra.dht.OrderPreservingPartitioner with replication factor 3.

Re: Understanding Bootstrapping

2010-02-24 Thread Jonathan Ellis
Bootstrap files are streamed directly to data locations as .tmp files and renamed when complete. One of the problems w/ 0.5's bootstrap is indeed that it doesn't give you any visibility into what is going on. This is addressed in 0.6 w/ additional JMX reporting. On Wed, Feb 24, 2010 at 5:06 PM,

Re: 3 node installation

2010-02-24 Thread Jonathan Ellis
Is the configuration identical on all nodes? Specifically, is ReplicationFactor set to 2 on all nodes? On Wed, Feb 24, 2010 at 10:07 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can provide an explanation for the following behavior observed in a three-node cluster:

Re: problem about bootstrapping when used in huge node

2010-02-23 Thread Jonathan Ellis
On Tue, Feb 23, 2010 at 12:33 AM, Michael Lee mail.list.steel.men...@gmail.com wrote: (1) A cluster cannot be enlarge(add more node into cluster) if it already used more than half capacity: If every node has data more than it’s half capacity , the admin may not bootstrapping new node into

Re: problem about bootstrapping when used in huge node

2010-02-23 Thread Jonathan Ellis
have to transfer a lot of data to repair every time a single disk dies -Jonathan On Tue, Feb 23, 2010 at 10:26 AM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 7:31 AM, Jonathan Ellis jbel...@gmail.com wrote: (2) How to use node has 12 1TB disk?? You should use

Re: reads are slow

2010-02-23 Thread Jonathan Ellis
On Tue, Feb 23, 2010 at 12:12 PM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:07 AM, Jonathan Ellis jbel...@gmail.com wrote: you enable row caching by upgrading to 0.6. :) where can i get 0.6 from? svn trunk? svn branches/cassandra-0.6 like I said, we're voting

Re: Cassandra paging, gathering stats

2010-02-23 Thread Jonathan Ellis
you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. On Tue, Feb 23, 2010 at 1:57 PM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can

Re: Moving Cassandra data from one cluster to another cluster on a different network

2010-02-22 Thread Jonathan Ellis
Just scp the data files over, one per node. You just need to make sure that the token on the node you are copying to is the same as the source token. if you copy everything from data and commitlog this will Just Work, since token is stored in data/system. And of course you will need to tweak

Re: Cassandra paging, gathering stats

2010-02-22 Thread Jonathan Ellis
On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer sonnyh...@gmail.com wrote: Hey, We are in the process of implementing a cassandra application service. we have already ingested TB of data using the cassandra bulk loader (StorageService). One of the requirements is to get a data explosion factor

Re: Cassandra paging, gathering stats

2010-02-22 Thread Jonathan Ellis
Breaking sooner rather than later is a feature, of sorts. You really do need to give a sane max. Remember that thrift must pull results into memory before giving them back to you, so allowing you to give it a max that cannot possibly fit in memory is not doing you a favor. -Jonathan On Mon,

Re: StackOverflowError on high load

2010-02-21 Thread Jonathan Ellis
On Sun, Feb 21, 2010 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: I'm also no clear whether CASSANDRA-804 is going to be a real fix. There's one way to find out. :P

Re: Cassandra versus HBase performance study

2010-02-21 Thread Jonathan Ellis
On Wed, Feb 3, 2010 at 7:45 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to

Re: Cassandra range scans

2010-02-21 Thread Jonathan Ellis
[replying to list, with permission] On Mon, Feb 22, 2010 at 12:05 AM, jeremey.barr...@nokia.com wrote: I'm looking for a very scalable primary data store for a large web/API application. Our data consists largely of lists of things, per user. So a user has a bunch (dozens to hundreds) of

Re: cassandra freezes

2010-02-20 Thread Jonathan Ellis
haproxy should be fine. normal GCs aren't a problem, you don't need to worry about that. what is a problem is when you shove more requests into cassandra than it can handle, so it tries to GC to get enough memory to handle that, then you shove even more requests, so it GC's again, and it spirals

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Jonathan Ellis
get to that point you can instruct me to implement this feature along with the row-cache-write-through. Our goal is straightforward: to support short read latency in high volume web application with write/read ratio to be 1:1. -Weijun -Original Message- From: Jonathan Ellis

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Jonathan Ellis
. -Weijun -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, February 18, 2010 12:04 PM To: cassandra-user@incubator.apache.org Subject: Re: Testing row cache feature in trunk: write should put record in cache Did you force a GC from

Re: Unbalanced read latency among nodes in a cluster

2010-02-19 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/Operations On Fri, Feb 19, 2010 at 3:03 PM, Weijun Li weiju...@gmail.com wrote: I setup a two cassandra clusters with 2 nodes each. Both use random partitioner. It's strange that for each cluster, one node has much shortter read latency than the other one

Re: cassandra freezes

2010-02-19 Thread Jonathan Ellis
are you using the old deb package? because that had broken gc settings. On Fri, Feb 19, 2010 at 10:40 PM, Santal Li santal...@gmail.com wrote: I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was

  1   2   3   4   5   >