Re: Chunking if size > 64MB

2011-06-29 Thread aaron morton
AFAIK there is no server side chunking of column values.

This link http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage is 
just suggesting in the app you do not store more than 64MB per column. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 07:25, A J wrote:

> From what I read, Cassandra allows a single column value to be up-to
> 2GB but would chunk the data if greater than 64MB.
> Is the chunking transparent to the application or does the app need to
> know if/how/when the chunking happened for a specific column value
> that happened to be > 64MB.
> 
> Thank you.



Re: hadoop results

2011-06-29 Thread aaron morton
How about  get_slice() with reversed == true and count = 1 to get the highest 
time UUID ? 

Or you can also store a column with a magic name that have the value of the 
timeuuid that is the current metric to use. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 06:35, William Oberman wrote:

> I'll start with my question: given a CF with comparator TimeUUIDType, what is 
> the most efficient way to get the greatest column's value?
> 
> Context: I've been running cassandra for a couple of months now, so obviously 
> it's time to start layering more on top :-)  In my test environment, I 
> managed to get pig/hadoop running, and developed a few scripts to collect 
> metrics I've been missing since I switched from MySQL to cassandra (including 
> the ever useful "select count(*) from table" equivalent).  
> 
> I was hoping to dump the results of this processing back into cassandra for 
> use in other tools/processes.  My initial thought was: new CF called "stats" 
> with comparator TimeUUIDType.  The basic idea being I'd store:
> stat_name -> time stat was computed (as UUID) -> value
> That way I can also see a historical perspective of any given stat for 
> auditing (and for cumulative stats to see trends).  The stat_name itself is a 
> URI that is composed of "what" and any constraints on the "what" (including 
> an optional time range, if the stat supports it).  E.g. 
> ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still 
> deciding on the format of the URI).  But, right now, the only way I know to 
> get the "current" stat value would be to iterate over all columns (the 
> TimeUUIDs) and then return the last one.
> 
> Thanks for any tips,
> 
> will



Re: Cannot set column value to zero

2011-06-29 Thread aaron morton
The extra () in the describe keyspace output is only there if the column 
comparator is the BytesType, the client tries to format the data as UTF8. 

Dont forget truncate is doing snapshots, so check the snapshots dir and delete 
things if you are using it a lot for testing. 

The 0 == 1 thing does not ring any bells. Let us know if it happens again. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 02:13, dnalls...@taz.qinetiq.com wrote:

> I had a strange problem recently where I was unable to set the value of a 
> column
> to '0' (it always returned '1') but setting it to other values worked fine:
> 
> [default@Test] set Urls['rowkey']['status']='1';
> Value inserted.
> [default@Test] get Urls['rowkey'];
> => (column=status, value=1, timestamp=1309189541891000)
> Returned 1 results.
> 
> [default@Test] set Urls['rowkey']['status']='0';
> Value inserted.
> [default@Test] get Urls['rowkey'];
> => (column=status, value=1, timestamp=1309189551407616)
> Returned 1 results.
> 
> This was on a one-node test cluster (v0.7.6) with no other clients; setting
> other values (e.g. '9') worked fine. However, attempting to set the value back
> to '0' always resulted in a value of '1'.
> 
> I noticed this shortly after truncating the CF.
> 
> The column family was shown as follows below. One thing that looks odd is that
> on other test clusters the Column Name is followed by a reference to
> the index, e.g. "Column Name: status (737461747573)" - but here it isn't.
> 
> I was wondering if there was some interaction between truncating the CF and 
> the
> use of a KEYS index? (Presumably it would be safer to delete all data
> directories in order to wipe the cluster during experimentation, rather than
> truncating?)
> 
> Unfortunately I'm not sure how to recreate the situation as this was a test
> machine on which I played around with various configurations - but maybe
> someone has seen a similar problem elsewhere? In the end I had to wipe the 
> data
> and start again, and all seemed fine, although the index reference is still
> absent as mentioned above.
> 
> [default@Test] describe keyspace;
> Keyspace: Test:
> ...
>ColumnFamily: Foo
>  default_validation_class: org.apache.cassandra.db.marshal.BytesType
>  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>  Row cache size / save period in seconds: 0.0/0
>  Key cache size / save period in seconds: 0.0/14400
>  Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB)
>  GC grace seconds: 864000
>  Compaction min/max thresholds: 4/32
>  Read repair chance: 1.0
>  Built indexes: [Foo.737461747573]
>  Column Metadata:
>Column Name: status
>  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>  Index Type: KEYS
> ...
> 
> 
> This message was sent using IMP, the Internet Messaging Program.
> 
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> If you are not the intended recipient of this email, you must neither
> take any action based upon its contents, nor copy or show it to anyone.
> Please contact the sender if you believe you have received this email in
> error. QinetiQ may monitor email traffic data and also the content of
> email for the purposes of security. QinetiQ Limited (Registered in
> England & Wales: Company Number: 3796233) Registered office: Cody Technology 
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.



Re: custom reconciling columns?

2011-06-29 Thread Jonathan Ellis
On Tue, Jun 28, 2011 at 10:06 PM, Yang  wrote:
> I'm trying to see whether there are some easy magic bullets for a drop-in
> replacement for concurrentSkipListMap...

I'm highly interested if you find one. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: No Transactions: An Example

2011-06-29 Thread AJ


On 6/22/2011 9:18 AM, Trevor Smith wrote:
Right -- that's the part that I am more interested in fleshing out in 
this post.




Here is one way.  Use MVCC 
.  A 
single global clean-up process would be acceptable since it's not a 
single point of failure, only a single point of accumulating back-logged 
work and will not affect availability as long as you are notified if 
that process terminates and restart it in a reasonable amount of time 
but this will not affect the validity of subsequent reads.


So, you would have a "balance" column.  And each update will create a 
"balance_" with a positive or negative value indicating a 
credit or debit.  Subsequent clients will read the latest value by doing 
a slice from "balance" to "balance_~" (i.e. all "balance*" columns).  
(You would have to work-out your column naming conventions so that your 
slices return only the pertinent columns.)  Then, the clients would have 
to apply all the credits and debits to the balance to get the current 
balance.


This handles the lost update problem.

For the dirty read and incorrect summary problems by others reading data 
that is in the middle of a transaction that hasn't committed yet, I 
would add a final transaction column to a Transactions CF.  The key 
would be .., e.g., Accounts.1234.balance, 1234 being 
the account # and Accounts being the CF owning the balance column.  
Then, a new column would be added for each successful transaction (e.g., 
after debiting and crediting the two accounts) using the same timestamp 
used in balance_.  So, now, a client wanting the current 
balance would have to do a slice for all of the transactions for that 
column and only apply the balance updates up to the latest transaction.  
Note, you might have to do something else with the transaction naming 
schemes to make sure they are guaranteed to be unique, but you get the 
idea.  If the transaction fails, the client simply does not add a 
transaction column to Transactions and deletes any "balance_" 
columns it added to in the Accounts CF (or let's the clean-up process do 
it... carefully).


This should avoid the need for locks and as long as each account doesn't 
have a crazy amount of updates, the slices shouldn't be so large as to 
be a significant perf hit.


A note about the updates.  You have to make sure the clean-up process 
processes the updates in order and only 1 time.  If you can't guarantee 
these, then you'll have to make sure your updates are idempotent and 
commutative.


Oh yeah, and you must use QUORUM read/writes, of course.

Any critiques?

aj


Re: api to extract gossiper results

2011-06-29 Thread Edward Capriolo
A simple solution is to setup log4j to a DEBUG level on Gossip events.

You can also use the StorageProxy/Fat client and then participate in gossip.
Each system has its own converging view of the ring, thus what your local
gossip things is the topology may not be the same across the cluster.

Edward

On Wed, Jun 29, 2011 at 5:20 PM, A J  wrote:

> Cassandra uses accrual failure detector to interpret the gossips.
> Is it somehow possible to extract these (gossip values and results of
> the failure detector) in an external system ?
>
> Thanks
>


Cassandra client loses connectivity to cluster

2011-06-29 Thread Jim Ancona
In reviewing client logs as part of our Cassandra testing, I noticed
several Hector "All host pools marked down" exceptions in the logs.
Further investigation showed a consistent pattern of
"java.net.SocketException: Broken pipe" and "java.net.SocketException:
Connection reset" messages. These errors occur for all 36 hosts in the
cluster over a period of seconds, as Hector tries to find a working
host to connect to. Failing to find a host results in the "All host
pools marked down" messages. These messages recur for a period ranging
from several seconds up to almost 15 minutes, clustering around two to
three minutes. Then connectivity returns and when Hector tries to
reconnect it succeeds.

The clients are instances of a JBoss 5 web application. We use Hector
0.7.0-29 (plus a patch that was pulled in advance of -30) The
Cassandra cluster has 72 nodes split between two datacenters. It's
running 0.7.5 plus a couple of bug fixes pulled in advance of 0.7.6.
The keyspace uses NetworkTopologyStrategy and RF=6 (3 in each
datacenter). The clients are reading and writing at LOCAL_QUORUM to
the 36 nodes in their own data center. Right now the second datacenter
is for failover only, so there are no clients actually writing there.

There's nothing else obvious in the JBoss logs at around the same
time, e.g. other application errors, GC events. The Cassandra
system.log files at INFO level shows nothing out of the ordinary. I
have a capture of one of the incidents at DEBUG level where again I
see nothing abnormal looking, but there's so much data that it would
be easy to miss something.

Other observations:
* It only happens on weekdays (Our weekends are much lower load)
* It has occurred every weekday for the last month except for Monday
May 30, the Memorial Day holiday in the US.
* Most days it occurs only once, but six times it has occurred twice,
never more often than that.
* It generally happens in the late afternoon, but there have been
occurrences earlier in the afternoon and twice in the late morning.
Earliest occurrence is 11:19 am, latest is 18:11 pm. Our peak loads
are between 10:00 and 14:00, so most occurrences do *not* correspond
with peak load times.
* It only happens on a single client JBoss instance at a time.
* Generally, it affects a different host each day, but the same host
was affected on consecutive days once.
* Out of 40 clients, one has been affected three times, seven have
been affected twice, 11 have been affected once and 21 have not been
affected.
* The cluster is lightly loaded.

Given that the problem affects a single client machine at a time and
that machine loses the ability to connect to the entire cluster, It
seems unlikely that the problem is on the C* server side. Even a
network problem seems hard to explain, given that the clients are on
the same subnet, I would expect all of them to fail if it were a
network issue.

I'm hoping that perhaps someone has seen a similar issue or can
suggest things to try.

Thanks in advance for any help!

Jim


api to extract gossiper results

2011-06-29 Thread A J
Cassandra uses accrual failure detector to interpret the gossips.
Is it somehow possible to extract these (gossip values and results of
the failure detector) in an external system ?

Thanks


Chunking if size > 64MB

2011-06-29 Thread A J
>From what I read, Cassandra allows a single column value to be up-to
2GB but would chunk the data if greater than 64MB.
Is the chunking transparent to the application or does the app need to
know if/how/when the chunking happened for a specific column value
that happened to be > 64MB.

Thank you.


RE: RAID or no RAID

2011-06-29 Thread Jeremiah Jordan
With multiple data dirs you are still limited by the space free on any
one drive.  So if you have two data dirs with 40GB free on each, and you
have 50GB to be compacted, it won't work, but if you had a raid, you
would have 80GB free and could compact... 

-Original Message-
From: mcasandra [mailto:mohitanch...@gmail.com] 
Sent: Tuesday, June 28, 2011 7:55 PM
To: cassandra-u...@incubator.apache.org
Subject: Re: RAID or no RAID


aaron morton wrote:
> 
>> Not sure what the intended purpose is, but we've mostly used it as an

>> emergency disk-capacity-increase option
> 
> Thats what I've used it for.  
> 
> Cheers
> 

How does compaction work in terms of utilizing multiple data dirs? Also,
is there a reference on wiki somewhere that says not to use multiple
data dirs?


--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or
-no-RAID-tp6522904p6527219.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive
at Nabble.com.


CQL injection attacks?

2011-06-29 Thread dnallsopp

Someone asked a while ago whether Cassandra was vulnerable to injection attacks:

http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra

With Thrift, the answer was 'no'.

With CQL, presumably the situation is different, at least until prepared
statements are possible (CASSANDRA-2475) ?

Has there been any discussion on this already that someone could point me to,
please? I couldn't see anything on JIRA (searching for CQL AND injection, CQL
AND security, etc).

Thanks.


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


hadoop results

2011-06-29 Thread William Oberman
I'll start with my question: given a CF with comparator TimeUUIDType, what
is the most efficient way to get the greatest column's value?

Context: I've been running cassandra for a couple of months now, so
obviously it's time to start layering more on top :-)  In my test
environment, I managed to get pig/hadoop running, and developed a few
scripts to collect metrics I've been missing since I switched from MySQL to
cassandra (including the ever useful "select count(*) from table"
equivalent).

I was hoping to dump the results of this processing back into cassandra for
use in other tools/processes.  My initial thought was: new CF called "stats"
with comparator TimeUUIDType.  The basic idea being I'd store:
stat_name -> time stat was computed (as UUID) -> value
That way I can also see a historical perspective of any given stat for
auditing (and for cumulative stats to see trends).  The stat_name itself is
a URI that is composed of "what" and any constraints on the "what"
(including an optional time range, if the stat supports it).  E.g.
ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still
deciding on the format of the URI).  But, right now, the only way I know to
get the "current" stat value would be to iterate over all columns (the
TimeUUIDs) and then return the last one.

Thanks for any tips,

will


Re: custom reconciling columns? (improve performance of long rows )

2011-06-29 Thread Yang
I hacked around the code, and first I thought that the cost on map put and
get was due to the synchronization cost , so I tried
replacing concurrentSkipListMap with TreeMap. I created a subclass of
ColumnFamily and use the subclass only in pure read path : interestingly
on the read path, no more than one thread accesses the return CF at any
time, so we can remove the concurrency control.
but it did not offer any significant change in speed.

then I tried changing TreeMap to HashMap, this time, it uses only half the
time. but the problem is how to keep the sorted output. doing a sort on
every return is going to be even slower...




On Tue, Jun 28, 2011 at 10:07 PM, Yang  wrote:

> btw I use only one box now just because I'm running it on dev junit test,
> not that it's going to be that way in production
>
>
> On Tue, Jun 28, 2011 at 10:06 PM, Yang  wrote:
>
>> ok, here is the profiling result. I think this is consistent (having been
>> trying to recover how to effectively use yourkit ...)  see attached picture
>>
>> since I actually do not use the thrift interface, but just directly use
>> the thrift.CassandraServer and run my code in the same JVM as cassandra,
>> and was running the whole thing on a single box, there is no message
>> serialization/deserialization cost. but more columns did add on to more
>> time.
>>
>> the time was spent in the ConcurrentSkipListMap operations that implement
>> the memtable.
>>
>>
>> regarding breaking up the row, I'm not sure it would reduce my run time,
>> since our requirement is to read the entire rolling window history (we
>> already have
>> the TTL enabled , so the history is limited to a certain length, but it is
>> quite long: over 1000 , in some  cases, can be 5000 or more ) .  I think
>> accessing roughly 1000 items is not an uncommon requirement for many
>> applications. in our case, each column has about 30 bytes of data, besides
>> the meta data such as ttl, timestamp.
>> at history length of 3000, the read takes about 12ms (remember this is
>> completely in-memory, no disk access)
>>
>> I just took a look at the expiring column logic, it looks that the
>> expiration does not come into play until when the
>> CassandraServer.internal_get()===>thriftifyColumns() gets called. so the
>> above memtable access time is still spent. yes, then breaking up the row is
>> going to be helpful, but only to the degree of preventing accessing
>> expired columns (btw  if this is actually built into cassandra code it
>> would be nicer, so instead of spending multiple key lookups, I locate to the
>> row once, and then within the row, there are different "generation" buckets,
>> so those old generation buckets that are beyond expiration are not read );
>> currently just accessing the 3000 live columns is already quite slow.
>>
>> I'm trying to see whether there are some easy magic bullets for a drop-in
>> replacement for concurrentSkipListMap...
>>
>> Yang
>>
>>
>>
>>
>> On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall  wrote:
>>
>>> I agree with Aaron's suggestion on data model and query here. Since
>>> there is a time component, you can split the row on a fixed duration
>>> for a given user, so the row key would become userId_[timestamp
>>> rounded to day].
>>>
>>> This provides you an easy way to roll up the information for the date
>>> ranges you need since the key suffix can be created without a read.
>>> This also benefits from spreading the read load over the cluster
>>> instead of just the replicas since you have 30 rows in this case
>>> instead of one.
>>>
>>> On Tue, Jun 28, 2011 at 5:55 PM, aaron morton 
>>> wrote:
>>> > Can you provide some more info:
>>> > - how big are the rows, e.g. number of columns and column size  ?
>>> > - how much data are you asking for ?
>>> > - what sort of read query are you using ?
>>> > - what sort of numbers are you seeing ?
>>> > - are you deleting columns or using TTL ?
>>> > I would consider issues with the data churn, data model and query
>>> before
>>> > looking at serialisation.
>>> > Cheers
>>> > -
>>> > Aaron Morton
>>> > Freelance Cassandra Developer
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> > On 29 Jun 2011, at 10:37, Yang wrote:
>>> >
>>> > I can see that as my user history grows, the reads time proportionally
>>> ( or
>>> > faster than linear) grows.
>>> > if my business requirements ask me to keep a month's history for each
>>> user,
>>> > it could become too slow.- I was suspecting that it's actually the
>>> > serializing and deserializing that's taking time (I can definitely it's
>>> cpu
>>> > bound)
>>> >
>>> >
>>> > On Tue, Jun 28, 2011 at 3:04 PM, aaron morton >> >
>>> > wrote:
>>> >>
>>> >> There is no facility to do custom reconciliation for a column. An
>>> append
>>> >> style operation would run into many of the same problems as the
>>> Counter
>>> >> type, e.g. not every node may get an append and there is a chance for
>>> lost
>>> >> appends unless you go to all the troubl

Re: Data storage security

2011-06-29 Thread Eric tamme
On Wed, Jun 29, 2011 at 12:37 PM, A J  wrote:
> Are there any options to encrypt the column families when they are
> stored in the database. Say in a given keyspace some CF has sensitive
> info and I don't want a 'select *' of that CF to layout the data in
> plain text.
>
> Thanks.
>

I think this is an application layer issue - just encrypt/decrypt
there.  The data stored within the column value can be any arbitrary
bytes, and since column data is not indexed it wont affect how you can
access the data with Cassandra in any way.

-Eric


Data storage security

2011-06-29 Thread A J
Are there any options to encrypt the column families when they are
stored in the database. Say in a given keyspace some CF has sensitive
info and I don't want a 'select *' of that CF to layout the data in
plain text.

Thanks.


Re: question on capacity planning

2011-06-29 Thread Ryan King
On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun  wrote:
> if I'm planning to store 20TB of new data per week, and expire all data
> every 2 weeks, with a replication factor of 3, do I only need approximately
> 120 TB of disk? I'm going to use ttl in my column values to automatically
> expire data. Or would I need more capacity to handle sstable merges? Given
> this amount of data, would you recommend node storage at 2TB per node or
> more? This application will have a heavy write /moderate read use profile.

You'll need extra space for both compaction and the overhead in the
storage format.

As to the amount of storage per node, that depends on your latency and
throughput requirements.

-ryan


Cannot set column value to zero

2011-06-29 Thread dnallsopp
I had a strange problem recently where I was unable to set the value of a column
to '0' (it always returned '1') but setting it to other values worked fine:

[default@Test] set Urls['rowkey']['status']='1';
Value inserted.
[default@Test] get Urls['rowkey'];
=> (column=status, value=1, timestamp=1309189541891000)
Returned 1 results.

[default@Test] set Urls['rowkey']['status']='0';
Value inserted.
[default@Test] get Urls['rowkey'];
=> (column=status, value=1, timestamp=1309189551407616)
Returned 1 results.

This was on a one-node test cluster (v0.7.6) with no other clients; setting
other values (e.g. '9') worked fine. However, attempting to set the value back
to '0' always resulted in a value of '1'.

I noticed this shortly after truncating the CF.

The column family was shown as follows below. One thing that looks odd is that
on other test clusters the Column Name is followed by a reference to
the index, e.g. "Column Name: status (737461747573)" - but here it isn't.

I was wondering if there was some interaction between truncating the CF and the
use of a KEYS index? (Presumably it would be safer to delete all data
directories in order to wipe the cluster during experimentation, rather than
truncating?)

Unfortunately I'm not sure how to recreate the situation as this was a test
machine on which I played around with various configurations - but maybe
someone has seen a similar problem elsewhere? In the end I had to wipe the data
and start again, and all seemed fine, although the index reference is still
absent as mentioned above.

[default@Test] describe keyspace;
Keyspace: Test:
...
ColumnFamily: Foo
  default_validation_class: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 0.0/14400
  Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [Foo.737461747573]
  Column Metadata:
Column Name: status
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
...


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Re: Ec2 snitch with network topology strategy

2011-06-29 Thread pankaj soni
Hmm... Just tested the config. It works, got confused with the options, my
bad.

On Wed, Jun 29, 2011 at 2:26 PM, pankajsoni0126 wrote:

> I was thinking of leveraging ec2 snitch. But my question is then how do I
> give replica placement options?
>
> Or can I give snitch as ec2snitch and write the nodes
> cassandra-topology.prop and in give locator strategy at time of creating
> keyspace as network topology strategy. But will it work?
>
> And those who are struggling to deploy cassandra with across ec2 regions.
>
> 1. approach is to use milind's patch, it works but has some limitation.
> https://issues.apache.org/jira/browse/CASSANDRA-2362
> 2. openvpn is a good option but neverthless is futile with encryption
> available in 0.8.0 cassandra
> 3. Vijay has come up with a patch and so far tested I have not seen any
> jerks.
> https://issues.apache.org/jira/browse/CASSANDRA-2452 - its marked to be
> there in 0.8.2 release.
>
>
> -pankaj
>
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ec2-snitch-with-network-topology-strategy-tp6528188p6528188.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


question on capacity planning

2011-06-29 Thread Jacob, Arun
if I'm planning to store 20TB of new data per week, and expire all data every 2 
weeks, with a replication factor of 3, do I only need approximately 120 TB of 
disk? I'm going to use ttl in my column values to automatically expire data. Or 
would I need more capacity to handle sstable merges? Given this amount of data, 
would you recommend node storage at 2TB per node or more? This application will 
have a heavy write /moderate read use profile.

-- Arun


Ec2 snitch with network topology strategy

2011-06-29 Thread pankajsoni0126
I was thinking of leveraging ec2 snitch. But my question is then how do I
give replica placement options? 

Or can I give snitch as ec2snitch and write the nodes
cassandra-topology.prop and in give locator strategy at time of creating
keyspace as network topology strategy. But will it work?

And those who are struggling to deploy cassandra with across ec2 regions.

1. approach is to use milind's patch, it works but has some limitation.
https://issues.apache.org/jira/browse/CASSANDRA-2362
2. openvpn is a good option but neverthless is futile with encryption
available in 0.8.0 cassandra
3. Vijay has come up with a patch and so far tested I have not seen any
jerks.
https://issues.apache.org/jira/browse/CASSANDRA-2452 - its marked to be
there in 0.8.2 release.


-pankaj




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ec2-snitch-with-network-topology-strategy-tp6528188p6528188.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.