[RELEASE] 0.6.2

2010-05-28 Thread Eric Evans

Just in time for those Memorial Day weekend maintenance windows, I give
you, Apache Cassandra 0.6.2.

You can check out a summary of what's changed here[1], or you can trust
that it's awesome and go straight to the download page[2].

I can smell the upgrades from here.


[1]: http://bit.ly/b3E9GS (changes)
[2]: http://cassandra.apache.org/download

-- 
Eric Evans
eev...@rackspace.com



Help to understand a strange behavior with DCQUORUM

2010-05-28 Thread Patricio Echagüe
Hi all, I need to help to understand how DCQUORUM works.

This is my setup:

- Cluster with 3 Cassandra Nodes
- RF = 3
- ReplicatePlacementStrategy = RackUnawareStrategy

My test:
- I write/read with DCQUORUM

Results:
- While the 3 nodes are UP, all my writes and read succeed. (the nodes are
reached, and the third one -to complete the RF=3- is done my replication,
right?)
- When I killed one node, the test FAILED with UnavailableException
- When I performed the same test but with QUORUM instead of DCQUORUM, It
succeeded.

Could someone explain please why reads and writes with DCQUORUM worked fine
while the 3 nodes were up and running but failed when 1 was down even thouch
I have only one Data Center?

Thanks in advance

-- 
Patricio.-


Re: Thoughts on adding complex queries to Cassandra

2010-05-28 Thread Jeremy Davis
I wonder if any of the main project committers would like to weigh in on
what a desired API would look like, or perhaps we should start an
unscheduled Jira ticket?

On Thu, May 27, 2010 at 5:39 PM, Jake Luciani  wrote:

> I had this:
>
>
> string slice_dice_reduce(1:required list key,
>   2:required ColumnParent
> column_parent,
>   3:required SlicePredicate predicate,
>   4:required ConsistencyLevel
> consistency_level=ONE,
>   5:required string dice_js,
>   6:required string reduce_js)
> throws (1:InvalidRequestException ire,
> 2:UnavailableException ue, 3:TimedOutException te),
>
> I guess it could use a union of sorts and return either.
>
>
>
> On Thu, May 27, 2010 at 8:36 PM, Jeremy Davis <
> jerdavis.cassan...@gmail.com> wrote:
>
>>
>> I agree, I had more than filter results in mind.
>> Though I had envisioned the results to continue to use the
>> List (and not JSON). You could still create new result
>> columns that do not in any way exist in Cassandra, and you could still stuff
>> JSON in to any of result columns.
>>
>> I had envisioned:
>> list get_slice(keyspace, key, column_parent, predicate, 
>> consistency_level,
>> javascript_blob )
>>
>> -JD
>>
>>
>>
>>
>>
>> On Thu, May 27, 2010 at 5:01 PM, Jake Luciani  wrote:
>>
>>> I've secretly started working on this but nothing to show yet :( I'm
>>> calling it SliceDiceReduce or SliceReduce.
>>>
>>>  The plan is to use the js thrift bindings I've added for 0.3 release of
>>> thrift (out very soon?)
>>>
>>> This will allow the supplied js to access the results like any other
>>> thrift client.
>>>
>>> Adding a new verb handler and SEDA stage that will execute on a local
>>> node and pass this nodes slice data into the supplied js "dice" function via
>>> the thrift js bindings.
>>>
>>> The resulting js from each node would then be passed into another
>>> supplied js reduce function on the starting node.
>>>
>>> The result of this would then return a single JSON or string result.
>>>  The reason I'm keeping the results in json is you can do more than filter.
>>> You can do things like word count etc.
>>>
>>> Anyway this is little more than an idea now. But if people like this
>>> approach maybe I'll get motivated!
>>>
>>> Jake
>>>
>>>
>>>
>>>
>>>
>>> On May 27, 2010, at 7:36 PM, Steve Lihn  wrote:
>>>
>>> Mongo has it too. It could save a lot of development time if one can
>>> figure out porting Mongo's query API and stored javascript to Cassandra.
>>> It would be great if scala's list comprehension can be facilitated to
>>> write query-like code against Cassandra schema.
>>>
>>> On Thu, May 27, 2010 at 11:05 AM, Vick Khera < 
>>> vi...@khera.org> wrote:
>>>
 On Thu, May 27, 2010 at 9:50 AM, Jonathan Ellis < 
 jbel...@gmail.com> wrote:
 > There definitely seems to be demand for something like this.  Maybe
 for 0.8?
 >

 The Riak data store has something like this: you can submit queries
 (and map reduce jobs) written in javascript that run on the data nodes
 using data local to that node.  It is a very compelling feature.

>>>
>>>
>>
>


Re: how does communication between nodes works?

2010-05-28 Thread Tobias Jungen
>From what I can tell nodes talk to each other directly via TCP/IP sockets.
Look at IncomingTcpConnection and OutboundTcpConnection under
org.apache.cassandra.net - they both use java.net.Socket for communication
purposes.

On Fri, May 28, 2010 at 11:32 AM, Gabriel Sosa wrote:

> I've been trying to find some deep documentation about the way on how
> the nodes communicate between them.
>
> I've been reading
> http://wiki.apache.org/cassandra/ArchitectureInternals but I couldn't
> find anything there
>
> I'm trying to learn how this works. Is it using thirft or thirft its
> only used for clients communication?
>
>
> thank you
>
> --
> Gabriel Sosa
> Si buscas resultados distintos, no hagas siempre lo mismo. - Einstein
>


how does communication between nodes works?

2010-05-28 Thread Gabriel Sosa
I've been trying to find some deep documentation about the way on how
the nodes communicate between them.

I've been reading
http://wiki.apache.org/cassandra/ArchitectureInternals but I couldn't
find anything there

I'm trying to learn how this works. Is it using thirft or thirft its
only used for clients communication?


thank you

-- 
Gabriel Sosa
Si buscas resultados distintos, no hagas siempre lo mismo. - Einstein


Re: ec2 tests

2010-05-28 Thread gabriele renzi
On Fri, May 28, 2010 at 3:48 PM, Mark Greene  wrote:
> First thing I would do is stripe your EBS volumes. I've seen blogs that say
> this helps and blogs that say it's fairly marginal.


just to point out: another option is to stripe the ephemeral drives
(if using instances > small)


Re: Avro: C# support

2010-05-28 Thread Eric Hauser
There is JIRA ticket for .NET support in Avro -
https://issues.apache.org/jira/browse/AVRO-533

On Fri, May 28, 2010 at 10:01 AM, Stephan Pfammatter <
stephan.pfammat...@logmein.com> wrote:

>  Q: Are the plans for Avro to support C#?
>


Avro: C# support

2010-05-28 Thread Stephan Pfammatter
Q: Are the plans for Avro to support C#?


Re: ec2 tests

2010-05-28 Thread Mark Greene
First thing I would do is stripe your EBS volumes. I've seen blogs that say
this helps and blogs that say it's fairly marginal. (You may want to try
rackspace cloud as they're local storage is much faster.)

Second, I would start out with N=2 and set W=1 and R=1. That will mirror
your data across two of the three nodes and possibly give you stale data on
the reads. If you feel you need stronger durability you increase N and W.

As far as heap memory, don't use 100% of the available physical ram.
Remember, object heap will be smaller than your overall JVM process heap.

That should get you started.


On Fri, May 28, 2010 at 3:10 AM, Chris Dean  wrote:

> Mark Greene  writes:
> > If you give us an objective of the test that will help. Trying to get max
> > write throughput? Read throughput? Weak consistency?
>
> I would like reading to be as fast as I can get.  My real-world problem
> is write heavy, but the latency requirements are minimal on that side.
> If there are any particular config setting that would help with the slow
> ec2 IO that would be great to know.
>
> Cheers,
> Chris Dean
>


Re: Cassandra CF sharding

2010-05-28 Thread Maxim Kramarenko

Hello!

Thank you.

In 1) I hope, that processing smaller files will be more easy to 
monitor. Also, if we have disk failure, we can delete just one file and 
repair, for example. Actually, CF per customer will be the best (easy to 
delete/backup specified customer data only, customers are totally 
independent), but Cassandra likely doesn't support 15000 CF per Keyspace.


Regarding 3) - yes, I understand.

One related question there - if we can choose, should we prefer
5 nodes, 16 cores/16 GB/8 TB disk space each
or
10 nodes, 8 cores/8 GB/4 TB disk space each ?

When it worth to use multiple Cassandra instance per node ? We run now 6 
instances on 3 nodes, and it works much better, than 3 instances on the 
same 3 nodes. Is it rule or exception ?





On 28.05.2010 07:11, Jonathan Ellis wrote:

2) is correct, but for 1) I'm not sure what manageability improvements
you anticipate from dealing with multiple entities instead of one.
I'm not sure what you're thinking of for 3) but routing is done by key
only.

2010/5/27 Maxim Kramarenko:

Hello!

We have mail archive with one large CF for mail body. In our case, it's easy
to shard data to 5-10 CF by customer id. We like to do this because:

1) We get more manageable instances, because we have many small CF instead
of one multi-TB CF on each node.

2) Better disk space usage (need to reserve 50% of the largest shard for
compaction only)

3) Can manage node load not by token only, but also by defining shards
available per node.

Is my assumptions correct ? Any negative side effects ?


Re: remove a row

2010-05-28 Thread gabriele renzi
On Fri, May 28, 2010 at 11:05 AM, huajun qi  wrote:
> Is there anyway to remove a row completely?
> I use thrift client's remove method , it only deletes the columns under a
> row, but the row with its key is still there.
> How can I remove it completely?


you can't really, with the thrift api,  see
 http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html


-- 
blog en: http://www.riffraff.info
blog it: http://riffraff.blogsome.com


remove a row

2010-05-28 Thread huajun qi
Is there anyway to remove a row completely?

I use thrift client's remove method , it only deletes the columns under a
row, but the row with its key is still there.

How can I remove it completely?

-- 
Location:


Re: Batch_Mutate throws Uncaught exception

2010-05-28 Thread Moses Dinakaran
Hi sorry for the post I was wrong in understanding key and column family

I was in the thinking cache_pages is the column family and Page is the key
but its the other way right

I will update my code and check it again.

Thanks,
Moses

On Thu, May 27, 2010 at 3:46 PM, Mishail  wrote:

> Hi,
>
> Just to clarify. Are you trying to insert a couple of columns with key
> "cache_pages" in the ColumnFamily "Page"?
>
> Moses Dinakaran wrote:
> i,
> >
> >
> >
> > I am trying to use batch_mutate() with PHP Thrift. I was getting the
> > following error.
> >
>
>


Re: ec2 tests

2010-05-28 Thread Chris Dean
Mark Greene  writes:
> If you give us an objective of the test that will help. Trying to get max
> write throughput? Read throughput? Weak consistency?

I would like reading to be as fast as I can get.  My real-world problem
is write heavy, but the latency requirements are minimal on that side.
If there are any particular config setting that would help with the slow
ec2 IO that would be great to know.

Cheers,
Chris Dean