Re: Concurrency Control

2012-05-30 Thread Filipe Gonçalves
It's the timestamps provided in the columns that do concurrency
control/conflict resolution. Basically, the newer timestamp wins.
For counters I think there is no such mechanism (i.e. counter updates are
not idempotent).

From https://wiki.apache.org/cassandra/DataModel :

All values are supplied by the client, including the 'timestamp'. This
 means that clocks on the clients should be synchronized (in the Cassandra
 server environment is useful also), as these timestamps are used for
 conflict resolution. In many cases the 'timestamp' is not used in client
 applications, and it becomes convenient to think of a column as a
 name/value pair. For the remainder of this document, 'timestamps' will be
 elided for readability. It is also worth noting the name and value are
 binary values, although in many applications they are UTF8 serialized
 strings.
 Timestamps can be anything you like, but microseconds since 1970 is a
 convention. Whatever you use, it must be consistent across the application,
 otherwise earlier changes may overwrite newer ones.


2012/5/28 Helen live42...@gmx.ch

 Hi,
 what kind of Concurrency Control Method is used in Cassandra? I found out
 so far
 that it's not done with the MVCC Method and that no vector clocks are
 being used.
 Thanks Helen




-- 
Filipe Gonçalves


Re: Retrieving old data version for a given row

2012-05-30 Thread Felipe Schmidt
I have further questions:
-Is there any other way to stract the contect of SSTable, writing a
java program for example instead of using sstable2json?
-I tried to get tombstons using the thrift API, but seems to be not
possible, is it right? When I try, the program throws an exception.

thanks in advance

Regards,
Felipe Mathias Schmidt
(Computer Science UFRGS, RS, Brazil)




2012/5/24 aaron morton aa...@thelastpickle.com:
 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).

 You can design it into your data model if you need it.


 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a

 query to get the old version instead of the newest one?

 No.
 There is only ever 1 value for a column.
 The older copies of the column in the SSTables are artefacts of immutable
 on disk structures.
 If you want to see what's inside an SSTable use bin/sstable2json

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote:

 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).

 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a
 query to get the old version instead of the newest one?

 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)




 2012/5/16 Dave Brosius dbros...@mebigfatguy.com:

 You're in for a world of hurt going down that rabbit hole. If you truely

 want version data then you should think about changing your keying to

 perhaps be a composite key where key is of form


 NaturalKey/VersionId


 Or if you want the versioning at the column level, use composite columns

 with ColumnName/VersionId format





 On 05/16/2012 10:16 AM, Felipe Schmidt wrote:


 That was very helpfull, thank you very much!


 I still have some questions:

 -it is possible to make Cassandra keep old value data after flushing?

 The same question for the memTable, before flushing. Seems to me that

 when I update some tuple, the old data will be overwrited in memTable,

 even before flushing.

 -it is possible to scan values from the memtable, maybe using the

 so-called Thrift API? Using the client-api I can just see the newest

 data version, I can't see what's really happening with the memTable.


 I ask that cause what I'll try to do is a Change Data Capture to

 Cassandra and the answers will define what kind of aproaches I'm able

 to use.


 Thanks in advance.


 Regards,

 Felipe Mathias Schmidt

 (Computer Science UFRGS, RS, Brazil)



 2012/5/14 aaron mortonaa...@thelastpickle.com:


 Cassandra does not provide access to multiple versions of the same

 column.

 It is essentially implementation detail.


 All mutations are written to the commit log in a binary format, see the

 o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for

 analysis you may want to change commitlog_sync in cassandra.yaml)


 Here is post about looking at multiple versions columns in an

 sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/


 Remember that not all versions of a column are written to disk

  (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/).

 Also

 compaction will compress multiple versions of the same column from

 multiple

 files into a single version in a single file .


 Hope that helps.



 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com


 On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote:


 Yes, I need this information just for academic purposes.


 So, to read old data values, I tried to open the Commitlog using tail

 -f and also the log files viewer of Ubuntu, but I can not see many

 informations inside of the log!

 Is there any other way to open this log? I didn't find any Cassandra

 API for this purpose.


 Thanks averybody in advance.


 Regards,

 Felipe Mathias Schmidt

 (Computer Science UFRGS, RS, Brazil)





 2012/5/14 zhangcheng2zhangche...@software.ict.ac.cn:


 After compaciton, the old version data will gone!



 


 zhangcheng2



 From: Felipe Schmidt


 Date: 2012-05-14 05:33


 To: user


 Subject: Retrieving old data version for a given row


 I'm trying to retrieve old data version for some row but it seems not


 be possible. I'm a beginner  with Cassandra and the unique aproach I


 know is looking to the SSTable in the storage folder, but if I insert


 some column and right after insert another value to the same row,


 after flushing, I only get the last value.


 Is there any way to get the old data 

Moving to 1.1

2012-05-30 Thread Vanger
I didn't track mailing list since 1.1-rc is out and know i have several 
questions.


1) We want to upgrade from 1.09. How stable 1.1 is? I mean work under 
high load, running compactions and clean-ups? Is it faster then 1.09?


2) If i what to use hector as cassandra client which version is better 
for 1.1? Is it ok to use 0.8.0-3?
We're kind of stuck on this hector release because new versions support 
serialization of Doubles (and some other types, but doubles are 50% of 
data). So we can't read old data: double values were serialized as 
objects and can't be deserialized as double.
We can override default serializer by it's older version and keep 
working with serialized objects... but it looks rather stupid. Did 
anyone run into such problem?
And i didn't find any change lists for hector - so such backward 
incompatibility was quite a surprise. Anybody knows some other breaking 
changes from 0.8.0-3?


3) Java 7 now recommended for use by Oracle. We have several developers 
running local cassandra instances on it for a while without problems. 
Anybody tried it in production? Some time ago java 7 wasn't recommended 
for use with cassandra, what's for now?



p.s. sorry for my 'english'

Thanks,
Sergey B.


Renaming a keyspace in 1.1

2012-05-30 Thread Oleg Dulin

Is it possible ? How ?




Re: commitlog_sync_batch_window_in_ms change in 0.7

2012-05-30 Thread osishkin osishkin
Thank you all.

We're planning to move soon to a more advanced version.
But for now I have a lot of data on my 0.7 cluster which i dont want
to lose, just because of some schema error on restart etc.
I dont mind losing any writes during the shutdown, however losing ALL
the data would require me to run the setup script for my experiments
for several days, something I obviously want to avoid.


On Wed, May 30, 2012 at 8:29 AM, Pierre Chalamet pie...@chalamet.net wrote:
 You'd better use version 1.0.9 (using this one in production) or 1.0.10.

 1.1 is still a bit young to be ready for prod unfortunately.


 --Original Message--
 From: Rob Coli
 To: user@cassandra.apache.org
 To: osish...@gmail.com
 ReplyTo: user@cassandra.apache.org
 Subject: Re: commitlog_sync_batch_window_in_ms change in 0.7
 Sent: May 30, 2012 03:12

 On Mon, May 28, 2012 at 6:53 AM, osishkin osishkin osish...@gmail.com wrote:
 I'm experimenting with Cassandra 0.7 for some time now.

 I feel obligated to recommend that you upgrade to Cassandra 1.1.
 Cassandra 0.7 is better than 0.6, but I definitely still wouldn't be
 experimenting with this old version in 2012.

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb


 - Pierre


tokens and RF for multiple phases of deployment

2012-05-30 Thread Chong Zhang
Hi all,

We are planning to deploy a small cluster with 4 nodes in one DC first, and
will expend that to 8 nodes, then add another DC with 8 nodes for fail over
(not active-active), so all the traffic will go to the 1st cluster, and
switch to 2nd cluster if the whole 1st cluster is down or on maintenance.

Could you provide some guide on how to assign the tokens in this growing
deployment phases? I looked at some docs but not very clear on how to
assign tokens on the fail-over case.
Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write
and LOCAL_QUORUM for read, can the read also reach to the 2nd cluster? We'd
like to keep both write and read on the same cluster.

Thanks in advance,
Chong


Re: Moving to 1.1

2012-05-30 Thread Edward Capriolo
1) Stable is a hard word to define. History shows it is better to let
anything .0 burn in a bit. if you are pre-production it probably does
not matter, otherwise I would say play safe. Wait for a .1 or .2 or
the .0 is in the wild for a few weeks.
2) I worked on one of the patches to get hector working with 1.1 there
is a specific release especially for those creating meta-data
3) We are slowly migrating our environment to Java 1.7. The only issue
we have ran into is
https://issues.apache.org/jira/browse/CASSANDRA-4275 which is just a
setting tune. Anecdotally I see something that could be better
performance with 1.7 (but I also did a kernel update) so I would not
call it essential.

Edward

On Wed, May 30, 2012 at 7:08 AM, Vanger disc...@gmail.com wrote:
 I didn't track mailing list since 1.1-rc is out and know i have several
 questions.

 1) We want to upgrade from 1.09. How stable 1.1 is? I mean work under high
 load, running compactions and clean-ups? Is it faster then 1.09?

 2) If i what to use hector as cassandra client which version is better for
 1.1? Is it ok to use 0.8.0-3?
 We're kind of stuck on this hector release because new versions support
 serialization of Doubles (and some other types, but doubles are 50% of
 data). So we can't read old data: double values were serialized as objects
 and can't be deserialized as double.
 We can override default serializer by it's older version and keep working
 with serialized objects... but it looks rather stupid. Did anyone run into
 such problem?
 And i didn't find any change lists for hector - so such backward
 incompatibility was quite a surprise. Anybody knows some other breaking
 changes from 0.8.0-3?

 3) Java 7 now recommended for use by Oracle. We have several developers
 running local cassandra instances on it for a while without problems.
 Anybody tried it in production? Some time ago java 7 wasn't recommended for
 use with cassandra, what's for now?


 p.s. sorry for my 'english'

 Thanks,
 Sergey B.


unsibscribe

2012-05-30 Thread Maxim Potekhin




Cassandra 1.1.1 release?

2012-05-30 Thread Roland Mechler
Anyone have a rough idea of when Cassandra 1.1.1 is likely to be released?

-Roland


Re: Replication factor

2012-05-30 Thread aaron morton
Ah. The lack of page cache hits after compaction makes sense. But I don't think 
the drastic effect it appears to have is expected. Do you have an idea of how 
much slower local reads get ?

If you are selecting coordinators based on token ranges the DS is not as much. 
It still has some utility as the Digest reads will be happening on other nodes 
and it should help with selecting them. 

Thanks for the extra info. 

Aaron

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 1:24 AM, Viktor Jevdokimov wrote:

 All data is in the page cache. No repairs. Compactions not hitting disk for 
 read. CPU 50%. ParNew GC 100 ms in average.
  
 After one compaction completes, new sstable is not in page cache, there may 
 be a disk usage spike before data is cached, so local reads gets slower for a 
 moment, comparing with other nodes. Redirecting almost all requests to other 
 nodes finally ends up with a huge latency spike almost on all nodes, 
 especially when ParNew GC may spike on one node (200ms). We call it “cluster 
 hiccup”, when incoming and outgoing network traffic drops for a moment.
  
 And such hiccups happens several times an hour, few seconds long. Playing 
 with badness threshold did not gave a lot better results, but turning DS off 
 completely fixed all problems with latencies, node spikes, cluster hiccups 
 and network traffic drops.
  
 In our case, our client is selecting endpoints for a key by calculating a 
 token, so we always hit a replica.
  
  
 
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo18be.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Thursday, May 24, 2012 13:00
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
  
 Your experience is when using CL ONE the Dynamic Snitch is moving local reads 
 off to other nodes and this is causing spikes in read latency ? 
  
 Did you notice what was happening on the node for the DS to think it was so 
 slow ? Was compaction or repair going on ? 
  
 Have you played with the badness threshold 
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472 ? 
  
 Cheers
  
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:
 
 
 Depends on use case. For ours we have another experience and statistics, when 
 turning dynamic snitch off makes overall latency and spikes much, much lower.
  
  
  
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
  
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo29.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
  
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Thursday, May 24, 2012 02:35
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
  
 On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:
  When RF == number of nodes, and you read at CL ONE you will always be 
  reading locally.
 “always be reading locally” – only if Dynamic Snitch is “off”. With dynamic 
 snitch “on” request may be redirected to other node, which may introduce 
 latency spikes.
  
 Actually it's preventing spikes, since if it won't read locally that means 
 the local replica is in worse shape than the rest (compacting, repairing, 
 etc.)
  
 -Brandon 



Re: what about an hybrid partitioner for CF with composite row key ?

2012-05-30 Thread aaron morton
 * with the RP: for one ui action, many nodes may be requested, but it's 
 simpler to balance the cluster
Many nodes good. You will have increased availability if the data is more 
widely distributed.

 one sweeter(?) partitioner would be a partitioner that would distribute a row 
 according only to the first part of its key (= according to folder id only).
It would still be unbalanced. 

 Is it doable to implement such a partitioner ?

Sort of, but it's not a good idea. 

The token created by the partitioner is just some bytes, so technically they 
can be anything. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 1:47 AM, DE VITO Dominique wrote:

 Hi,
  
 We have defined a CF with a composite row key that sounds like (folder id, 
 doc id).
  
 For our app, one very common pattern is accessing, through one ui action, 
 some bunch of data with the following row keys: (id, id_1), (id, id_2), (id, 
 id_3)...
 So, multiple rows are accessed, but all row keys have the same 1st part 
 folder id.
  
 * with the BOP: for one ui action, one simple node is requested (in average), 
 but it's much harder to balances the cluster nodes
 * with the RP: for one ui action, many nodes may be requested, but it's 
 simpler to balance the cluster
  
 one sweeter(?) partitioner would be a partitioner that would distribute a row 
 according only to the first part of its key (= according to folder id only).
  
 Is it doable to implement such a partitioner ?
 Thanks.
  
 Regards,
 Dominique
  



Re: Moving to 1.1

2012-05-30 Thread Rob Coli
On Wed, May 30, 2012 at 4:08 AM, Vanger disc...@gmail.com wrote:
 3) Java 7 now recommended for use by Oracle. We have several developers
 running local cassandra instances on it for a while without problems.
 Anybody tried it in production? Some time ago java 7 wasn't recommended for
 use with cassandra, what's for now?

I have a variation of this question, which goes :

Now that OpenJDK is the official Java reference implementation, are
there plans to make Cassandra support it?

https://blogs.oracle.com/henrik/entry/moving_to_openjdk_as_the

Cassandra has (had?) a slightly passive-aggressive log message where
it refers to any JDK other than Sun's as a buggy and suggests that
you should upgrade to the Sun JDK. I'm fine with using whatever JDK
is technically best, but within the enterprise using something other
than the official reference implementation can be a tough sell.
Wondering if people have a view as to the importance and/or
feasibility of making OpenJDK supported.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Confusion regarding the terms replica and replication factor

2012-05-30 Thread Jeff Williams
First, note that replication is done at the row level, not at the node level.

That line should look more like:

placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
1,DC2: 1,DC3: 1 }

This means that each row will have one copy in each DC and within each DC it's 
placement will be according to the partitioner, so could be on any of the nodes 
in the each DC.

So, don't think of it as nodes replicating, but rather as how nodes should 
store a copy of each row in each DC.

Also, replication does not relate the the seed nodes. Seed nodes allow the 
nodes to find each other initially, but are not special otherwise - any node 
can be used as a seed node.

So if you had a strategy like:

placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
3,DC2: 2,DC3: 1 }

Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one of 
the nodes in DC3. Again, with the placement in each DC due to the partitioner, 
based on the row key.

Jeff

On May 29, 2012, at 11:25 PM, David Fischer wrote:

 Ok now i am confused :),
 
 ok if i have the following
 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
 {DC1:R1,DC2:R1,DC3:R1 }
 
 this means in each of my datacenters i will have one full replica that
 also can be seed node?
 if i have 3 node in addition to the DC replica's with normal token
 calculations a key can be in any datacenter plus on each of the
 replicas right?
 It will show 12 nodes total in its ring
 
 On Thu, May 24, 2012 at 2:39 AM, aaron morton aa...@thelastpickle.com wrote:
 This is partly historical. NTS (as it is now) has not always existed and was 
 not always the default. In days gone by used to be a fella could run a 
 mighty fine key-value store using just a Simple Replication Strategy.
 
 A different way to visualise it is a single ring with a Z axis for the DC's. 
 When you look at the ring from the top you can see all the nodes. When you 
 look at it from the side you can see the nodes are on levels that correspond 
 to their DC. Simple Strategy looks at the ring from the top. NTS works 
 through the layers of the ring.
 
 If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.
 Nodes having a DC is a feature of *some* snitches and utilised by the *some* 
 of the replication strategies (and by the messaging system for network 
 efficiency). For background, mapping from row tokens to nodes is based on 
 http://en.wikipedia.org/wiki/Consistent_hashing
 
 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/05/2012, at 1:07 AM, java jalwa wrote:
 
 Thanks Aaron. That makes things clear.
 So I guess the 0 - 2^127 range for tokens corresponds to a cluster
 -level top-level ring. and then you add some logic on top of that with
 NTS to logically segment that range into sub-rings as per the notion
 of data clusters defined in NTS. Whats the advantage of having a
 single top-level ring ? intuitively it seems like each replication
 group could have a separate ring so that the same tokens can be
 assigned to nodes in different DC. If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.
 
 Thanks again.
 
 
 On Wed, May 23, 2012 at 3:14 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ?
 No, only nodes in the DC's specified in the NTS configuration will be 
 replicas.
 
 Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
 The NTS considers each DC to have it's own ring. This can make token 
 selection in a multi DC environment confusing at times. There is something 
 in the DS docs about it.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/05/2012, at 3:16 PM, java jalwa wrote:
 
 Hi all,
  I am a bit confused regarding the terms replica and
 replication factor. Assume that I am using RandomPartitioner and
 NetworkTopologyStrategy for replica placement.
 From what I understand, with a RandomPartitioner, a row key will
 always be hashed and be stored on the node that owns the range to
 which the key is mapped.
 http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
 The example here, talks about having 2 data centers and a replication
 factor of 4 with 2 replicas in each datacenter, so the strategy is
 configured as DC1:2 and DC2:2. Now suppose I add another 

Re: Confusion regarding the terms replica and replication factor

2012-05-30 Thread David Fischer
Thanks!

My missunderstanding was the snitch names are broken up by DC1:RAC1
and the strategy_options takes only the first part of the snitch
names?



On Wed, May 30, 2012 at 12:14 PM, Jeff Williams
je...@wherethebitsroam.com wrote:
 First, note that replication is done at the row level, not at the node level.

 That line should look more like:

 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
 1,DC2: 1,DC3: 1 }

 This means that each row will have one copy in each DC and within each DC 
 it's placement will be according to the partitioner, so could be on any of 
 the nodes in the each DC.

 So, don't think of it as nodes replicating, but rather as how nodes should 
 store a copy of each row in each DC.

 Also, replication does not relate the the seed nodes. Seed nodes allow the 
 nodes to find each other initially, but are not special otherwise - any node 
 can be used as a seed node.

 So if you had a strategy like:

 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
 3,DC2: 2,DC3: 1 }

 Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one 
 of the nodes in DC3. Again, with the placement in each DC due to the 
 partitioner, based on the row key.

 Jeff

 On May 29, 2012, at 11:25 PM, David Fischer wrote:

 Ok now i am confused :),

 ok if i have the following
 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
 {DC1:R1,DC2:R1,DC3:R1 }

 this means in each of my datacenters i will have one full replica that
 also can be seed node?
 if i have 3 node in addition to the DC replica's with normal token
 calculations a key can be in any datacenter plus on each of the
 replicas right?
 It will show 12 nodes total in its ring

 On Thu, May 24, 2012 at 2:39 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 This is partly historical. NTS (as it is now) has not always existed and 
 was not always the default. In days gone by used to be a fella could run a 
 mighty fine key-value store using just a Simple Replication Strategy.

 A different way to visualise it is a single ring with a Z axis for the 
 DC's. When you look at the ring from the top you can see all the nodes. 
 When you look at it from the side you can see the nodes are on levels that 
 correspond to their DC. Simple Strategy looks at the ring from the top. NTS 
 works through the layers of the ring.

 If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.
 Nodes having a DC is a feature of *some* snitches and utilised by the 
 *some* of the replication strategies (and by the messaging system for 
 network efficiency). For background, mapping from row tokens to nodes is 
 based on http://en.wikipedia.org/wiki/Consistent_hashing

 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 24/05/2012, at 1:07 AM, java jalwa wrote:

 Thanks Aaron. That makes things clear.
 So I guess the 0 - 2^127 range for tokens corresponds to a cluster
 -level top-level ring. and then you add some logic on top of that with
 NTS to logically segment that range into sub-rings as per the notion
 of data clusters defined in NTS. Whats the advantage of having a
 single top-level ring ? intuitively it seems like each replication
 group could have a separate ring so that the same tokens can be
 assigned to nodes in different DC. If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.

 Thanks again.


 On Wed, May 23, 2012 at 3:14 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ?
 No, only nodes in the DC's specified in the NTS configuration will be 
 replicas.

 Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
 The NTS considers each DC to have it's own ring. This can make token 
 selection in a multi DC environment confusing at times. There is 
 something in the DS docs about it.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 3:16 PM, java jalwa wrote:

 Hi all,
              I am a bit confused regarding the terms replica and
 replication factor. Assume that I am using RandomPartitioner and
 NetworkTopologyStrategy for replica placement.
 From what I understand, with a RandomPartitioner, a row key will
 always be hashed and be stored on the node that owns the range to
 which the key is mapped.
 

Re: commitlog_sync_batch_window_in_ms change in 0.7

2012-05-30 Thread Rob Coli
On Tue, May 29, 2012 at 10:29 PM, Pierre Chalamet pie...@chalamet.net wrote:
 You'd better use version 1.0.9 (using this one in production) or 1.0.10.

 1.1 is still a bit young to be ready for prod unfortunately.

OP described himself as experimenting which I inferred to mean
not-production. I agree with others, 1.0.x is what I'd currently
recommend for production. :)

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Confusion regarding the terms replica and replication factor

2012-05-30 Thread Edward Capriolo
You can avoid the confusion by using the term natural endpoints. For
example, with a replication factor of 3 natural endpoints for key x
are node1, node2, node11.

The snitch does use the datacenter and the rack but almost all
deployments use a single rack per DC, because when you have more then
one rack in a data center the NTS snitch has some logic to spread the
data between racks. (most people do not want this behavior)


On Wed, May 30, 2012 at 3:57 PM, David Fischer fischer@gmail.com wrote:
 Thanks!

 My missunderstanding was the snitch names are broken up by DC1:RAC1
 and the strategy_options takes only the first part of the snitch
 names?



 On Wed, May 30, 2012 at 12:14 PM, Jeff Williams
 je...@wherethebitsroam.com wrote:
 First, note that replication is done at the row level, not at the node level.

 That line should look more like:

 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
 1,DC2: 1,DC3: 1 }

 This means that each row will have one copy in each DC and within each DC 
 it's placement will be according to the partitioner, so could be on any of 
 the nodes in the each DC.

 So, don't think of it as nodes replicating, but rather as how nodes should 
 store a copy of each row in each DC.

 Also, replication does not relate the the seed nodes. Seed nodes allow the 
 nodes to find each other initially, but are not special otherwise - any node 
 can be used as a seed node.

 So if you had a strategy like:

 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
 3,DC2: 2,DC3: 1 }

 Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one 
 of the nodes in DC3. Again, with the placement in each DC due to the 
 partitioner, based on the row key.

 Jeff

 On May 29, 2012, at 11:25 PM, David Fischer wrote:

 Ok now i am confused :),

 ok if i have the following
 placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
 {DC1:R1,DC2:R1,DC3:R1 }

 this means in each of my datacenters i will have one full replica that
 also can be seed node?
 if i have 3 node in addition to the DC replica's with normal token
 calculations a key can be in any datacenter plus on each of the
 replicas right?
 It will show 12 nodes total in its ring

 On Thu, May 24, 2012 at 2:39 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 This is partly historical. NTS (as it is now) has not always existed and 
 was not always the default. In days gone by used to be a fella could run a 
 mighty fine key-value store using just a Simple Replication Strategy.

 A different way to visualise it is a single ring with a Z axis for the 
 DC's. When you look at the ring from the top you can see all the nodes. 
 When you look at it from the side you can see the nodes are on levels that 
 correspond to their DC. Simple Strategy looks at the ring from the top. 
 NTS works through the layers of the ring.

 If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.
 Nodes having a DC is a feature of *some* snitches and utilised by the 
 *some* of the replication strategies (and by the messaging system for 
 network efficiency). For background, mapping from row tokens to nodes is 
 based on http://en.wikipedia.org/wiki/Consistent_hashing

 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 24/05/2012, at 1:07 AM, java jalwa wrote:

 Thanks Aaron. That makes things clear.
 So I guess the 0 - 2^127 range for tokens corresponds to a cluster
 -level top-level ring. and then you add some logic on top of that with
 NTS to logically segment that range into sub-rings as per the notion
 of data clusters defined in NTS. Whats the advantage of having a
 single top-level ring ? intuitively it seems like each replication
 group could have a separate ring so that the same tokens can be
 assigned to nodes in different DC. If the hierarchy is Cluster -
 DataCenter - Node, why exactly do we need globally unique node tokens
 even though nodes are at the lowest level in the hierarchy.

 Thanks again.


 On Wed, May 23, 2012 at 3:14 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ?
 No, only nodes in the DC's specified in the NTS configuration will be 
 replicas.

 Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
 The NTS considers each DC to have it's own ring. This can make token 
 selection in a multi DC environment confusing at times. There is 
 something in the DS docs about it.

 Cheers

 -
 Aaron Morton
 

Re: Confusion regarding the terms replica and replication factor

2012-05-30 Thread Jeff Williams

On May 30, 2012, at 10:32 PM, Edward Capriolo wrote:

 
 The snitch does use the datacenter and the rack but almost all
 deployments use a single rack per DC, because when you have more then
 one rack in a data center the NTS snitch has some logic to spread the
 data between racks. (most people do not want this behavior)
 

Out of curiosity, why would most people not want this behaviour? It seems like 
a good idea from a availability perspective. 

Jeff

Re: unknown exception with hector

2012-05-30 Thread aaron morton
 i'm not sure if using framed transport is an option with hector.
http://hector-client.github.com/hector//source/content/API/core/0.8.0-2/me/prettyprint/cassandra/service/CassandraHostConfigurator.html#setUseThriftFramedTransport(boolean)

 what should i be in the logs looking for to find the cause of these dropped 
 reads?
These look like transport errors. If something is happening on the server side 
it will be logged at ERROR level. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 1:00 PM, Deno Vichas wrote:

 i'm not sure if using framed transport is an option with hector.
 
 what should i be in the logs looking for to find the cause of these dropped 
 reads?
 
 thanks,
 
 On 5/24/2012 3:04 AM, aaron morton wrote:
 
 Dropped read messages occur when the node could not process a read task 
 within rpc_timeout. It generally means the cluster has been overwhelmed at 
 some point:  too many requests, to much GC, compaction hurting, etc. 
 
 Check the server side logs for errors but I doubt it is related to the call 
 stack below. 
 
 Have you confirmed that you are using framed transport on the client?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/05/2012, at 5:52 PM, Deno Vichas wrote:
 
 i've notice the my nodes seem to have a large (?, not really sure what 
 acceptable numbers are) read dropped count from tpstats.  
 could they be related? 
 
 On 5/23/2012 2:55 AM, aaron morton wrote:
 
 No sure but 
 
at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 
 Looks like the client is not using framed transport. The server defaults 
 to framed.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/05/2012, at 5:35 AM, Deno Vichas wrote:
 
 could somebody clue me in to the cause of this exception?  i see these 
 randomly.
 
 AnalyzerService-2 2012-05-22 13:28:00,385 :: WARN  
 cassandra.connection.HConnectionManager  - Exception:
 me.prettyprint.hector.api.exceptions.HectorTransportException: 
 org.apache.thrift.transport.TTransportException
at 
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:39)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:851)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:840)
at 
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:99)
at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:243)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:857)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
at 
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at 
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)
at 
 com.stocktouch.dao.StockDaoImpl.getHistorical(StockDaoImpl.java:365)
at 
 com.stocktouch.dao.StockDaoImpl.getHistoricalQuote(StockDaoImpl.java:433)
at 
 com.stocktouch.service.StockHistoryServiceImpl.getHistoricalQuote(StockHistoryServiceImpl.java:480)
at 
 com.stocktouch.service.AnalyzerServiceImpl.getClose(AnalyzerServiceImpl.java:180)
at 
 com.stocktouch.service.AnalyzerServiceImpl.calcClosingPrices(AnalyzerServiceImpl.java:90)
at 
 com.stocktouch.service.AnalyzerServiceImpl.nightlyRollup(AnalyzerServiceImpl.java:66)
at 
 com.stocktouch.service.AnalyzerServiceImpl$2.run(AnalyzerServiceImpl.java:55)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.thrift.transport.TTransportException
at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at 
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at 
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
 

Re: about multitenant datamodel

2012-05-30 Thread aaron morton
 - Do a lot of keyspaces cause some problems? (If I have 1,000 users, 
 cassandra creates 1,000 keyspaces…)
It's not keyspaces, but the number of column families. 

Without storing any data each CF uses about 1MB of ram. When they start storing 
and reading data they use more. 

IMHO a model that allows external users to create CF's is a bad one. 

Hope that helps.  
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 12:52 PM, Toru Inoko wrote:

 Hi, all.
 
 I'm designing data api service(like cassandra.io but not using dedicated 
 server for each user) on cassandra 1.1 on which users can do DML/DDL method 
 like cql.
 Followings are api which users can use( almost same to cassandra api).
 - create/read/delete ColumnFamilies/Rows/Columns
 
 Now I'm thinking about multitenant datamodel on that.
 My data model like the following.
 I'm going to prepare a keyspace for each user as a user's tenant space.
 
 | keyspace1 | --- | column family |
 |(for user1)|  |
   ...
 
 | keyspace2 | --- | column family |
 |(for user2)|  |
   ...
 
 Followings are my question!
 - Is this data model a good for multitenant?
 - Do a lot of keyspaces cause some problems? (If I have 1,000 users, 
 cassandra creates 1,000 keyspaces...)
 
 please, help.
 thank you in advance.
 
 Toru Inoko.
 



Re: High CPU load on Cassandra Node

2012-05-30 Thread aaron morton
 Further I need to understand that for internal read/write does cassandra uses 
 thrift for doing so over an rpc connection(port 9160) or 7000 as for inter 
 node communication.May be that also could be a reason for so many connections 
 on 9160.
Uses 7000

 What I could see from Ganglia is high CPU load on this server and also number 
 of TCP connection on port 9160 is around 600+ all the time.The distribution 
 of these connections say that we have connections from this machine to other 
 DC machines are around 90 odd each. For port 7000 its around 45.
Could these be hadoop tasks that are still running ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 4:51 PM, Shubham Srivastava wrote:

 I have a multiDC ring with 6 nodes in each DC.
 
 I have a single node which runs some jobs (including Hadoop Map-Reduce with 
 PIG) every 15minutes. 
 
 Lately there has been high CPU load and memory issues on this node. 
 
 What I could see from Ganglia is high CPU load on this server and also number 
 of TCP connection on port 9160 is around 600+ all the time.The distribution 
 of these connections say that we have connections from this machine to other 
 DC machines are around 90 odd each. For port 7000 its around 45.
 
 Further I need to understand that for internal read/write does cassandra uses 
 thrift for doing so over an rpc connection(port 9160) or 7000 as for inter 
 node communication.May be that also could be a reason for so many connections 
 on 9160.
 
 I have an 8Core machine with 14Gb RAM and 8Gb Heap. 
 rpc min and max threads are default and so are the other rpc based properties
 RF:3 each DC  and Read/Write CL:1 and Read Repair Chance=0.1.
 cassandra version is 0.8.6
 
 Regards,
 Shubham



Re: Schema changes not getting picked up from different process

2012-05-30 Thread aaron morton
What clients are the scripts using ? This sounds like something that should be 
handled in the client. 

I would worry about holding a long running connection to a single node. There 
are several situations where the correct behaviour for a client is to kill a 
connection and connect to another node. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/05/2012, at 12:11 AM, Victor Blaga wrote:

 Hi Dave,
 
 Thank you for your answer.
 
 2012/5/25 Dave Brosius dbros...@mebigfatguy.com
 What version are you using?
 
 I am using version 1.1.0
  
 It might be related to https://issues.apache.org/jira/browse/CASSANDRA-4052
  
  Indeed the Issue you suggested goes into the direction of my problem. 
 However, things are a little bit more complex. I used the cassandra-cli just 
 for this example, although I'm getting this behavior from other clients (I'm 
 using python and ruby scripts). Basically I'm modifying the schema through 
 the ruby script and I'm trying to query and insert data through the python 
 script. Both of the scripts are meant to be on forever (sort of daemons) and 
 thus they establish once at start a connection to the Cassandra which is kept 
 alive.
 
 I can see from the comments on the issue that keeping a long-lived connection 
 to the Cluster might not be ideal and it would probably be better to 
 reconnect upon executing a set of queries.



Re: Frequent exception with Cassandra 1.0.9

2012-05-30 Thread aaron morton
Still getting this ? Was there some more to the message ? 

Here's an example from the internets http://pastebin.com/WdD7181x

it may be an issue with the JVM on windows. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/05/2012, at 6:07 AM, Dwight Smith wrote:

 I am running embedded Cassandra version 1.0.9  on Windows2008 Server 
 frequently encounter the following exception:
  
 Stack: [0x7dc6,0x7dcb],  sp=0x7dcaf0b0,  free space=316k
 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
 j  java.io.WinNTFileSystem.getSpace0(Ljava/io/File;I)J+0
 j  java.io.WinNTFileSystem.getSpace(Ljava/io/File;I)J+10
 j  java.io.File.getUsableSpace()J+34
 j  
 org.apache.cassandra.config.DatabaseDescriptor.getDataFileLocationForTable(Ljava/lang/String;JZ)Ljava/lang/String;+44
 j  org.apache.cassandra.db.Table.getDataFileLocation(JZ)Ljava/lang/String;+6
 j  org.apache.cassandra.db.Table.getDataFileLocation(J)Ljava/lang/String;+3
 j  
 org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(JLjava/lang/String;)Ljava/lang/String;+5
 j  
 org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(JJLorg/apache/cassandra/db/commitlog/ReplayPosition;)Lorg/apache/cassandra/io/sstable/SSTableWriter;+18
 J  
 org.apache.cassandra.db.Memtable.writeSortedContents(Lorg/apache/cassandra/db/commitlog/ReplayPosition;)Lorg/apache/cassandra/io/sstable/SSTableReader;
 j  
 org.apache.cassandra.db.Memtable.access$400(Lorg/apache/cassandra/db/Memtable;Lorg/apache/cassandra/db/commitlog/ReplayPosition;)Lorg/apache/cassandra/io/sstable/SSTableReader;+2
 j  org.apache.cassandra.db.Memtable$4.runMayThrow()V+36
 j  org.apache.cassandra.utils.WrappedRunnable.run()V+9
 J  
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V
 J  java.util.concurrent.ThreadPoolExecutor$Worker.run()V
 j  java.lang.Thread.run()V+11
 v  ~StubRoutines::call_stub
  
 Java into:
  
 java version 1.6.0_30
 Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
 Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)
  



Re: will compaction delete empty rows after all columns expired?

2012-05-30 Thread aaron morton
Minor compaction will remove the tombstones if the row only exists in the 
sstable being compaction. 

Are these very wide rows that are constantly written to ? 

Cheers
 p.s. cassandra 1.0 really does rock. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/05/2012, at 6:21 AM, Curt Allred wrote:

 This is an old thread from December 27, 2011.  I interpret the yes answer 
 to mean you do not have to explicitly delete an empty row after all of its 
 columns have been deleted, the empty row (i.e. row key) will automatically be 
 deleted eventually (after gc_grace).  Is that true?   I am not seeing that 
 behavior on our v 0.7.9 ring.  We are accumulating a large number of old 
 empty rows.  They are taking alot of space because the row keys are big, and 
 exploding the data size by 10x.  I have read conflicting information on blogs 
 and cassandra docs.  Someone mentioned that there are both row tombstones and 
 column tombstones, implying that you have to explicitly delete empty rows.  
 Is that correct.
 
 My basic question is... how do I delete all these empty row keys?
 
 -
 From: Feng Qu
 Sent: Tuesday, December 27, 2011 11:09 AM
 Compaction should delete empty rows once gc_grace_seconds is passed, right? 
 -
 From: Peter Schuller
 Yes.  
 But just to be extra clear: Data will not actually be removed once the row in 
 question participates in compaction. Compactions will not be actively 
 triggered by Cassandra for tombstone processing reasons.



Re: cassandra read latency help

2012-05-30 Thread aaron morton
 80 ms per request
sounds high. 

I'm doing some guessing here, i am guessing memory usage is the problem..

* I assume you are not longer seeing excessive GC activity. 
* The key cache will not get used when you hit the row cache. I would disable 
the row cache if you have a random workload, which it looks like you do. 
* 500 million is a lot of keys to have on a single node. At the default index 
sample of every 128 keys it will have about 4 million samples, which is 
probably taking up a lot of memory. 

Is this testing a real world scenario or an abstract benchmark ? IMHO you will 
get more insight from testing something that resembles your application. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am 
 seeing something weird.
 
 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks 
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500 
 million) 
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs
 
 test: 
 Trying to do 5 reads per second. Each read is a multigetslice query for just 
 1 key, 2 columns. 
 
 observations:
 row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to 
 steady state)
 cfstats - 80 ms
 
 iostat (every 5 seconds): 
 
 r/s : 400   
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000
 
 why this is weird is because.. 
 5 reads per second is causing a latency of 80 ms per request (according to 
 cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This number is 
 way too high, avg row size is just 149 bytes. Even index reads should not 
 cause this high data being read from the disk.
 
 what i understand is that each read request translates to 2 disk accesses 
 (because there is only 1 sstable). 1 for the index, 1 for the data. At such a 
 low reads/second, why is the latency so high?
 
 would appreciate help debugging this issue.
 Thanks
 Gurpreet
 
 
 On Tue, May 22, 2012 at 2:46 AM, aaron morton aa...@thelastpickle.com wrote:
 With
 
 heap size = 4 gigs
 
 I would check for GC activity in the logs and consider setting it to 8 given 
 you have 16 GB.  You can also check if the IO system is saturated 
 (http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html) Also 
 take a look at nodetool cfhistogram perhaps to see how many sstables are 
 involved. 
 
 
 I would start by looking at the latency reported on the server, then work 
 back to the client….
 
 I may have missed it in the email but what recent latency for the CF is 
 reported by nodetool cfstats ? That's latency for a single request on a 
 single read thread. The default settings give you 32 read threads. 
 
 If you know the latency for a single request, and you know you have 32 
 concurrent read threads, you can get an idea of the max throughput for a 
 single node. Once you get above that throughput the latency for a request 
 will start to include wait time. 
 
 It's a bit more complicated, because when you request 40 rows that turns into 
 40 read tasks. So if two clients send a request for 40 rows at the same time 
 there will be 80 read tasks to be processed by 32 threads. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/05/2012, at 4:10 PM, Radim Kolar wrote:
 
 Dne 19.5.2012 0:09, Gurpreet Singh napsal(a):
 Thanks Radim.
 Radim, actually 100 reads per second is achievable even with 2 disks.
 it will become worse as rows will get fragmented.
 But achieving them with a really low avg latency per key is the issue.
 
 I am wondering if anyone has played with index_interval, and how much of a 
 difference would it make to reads on reducing the index_interval.
 close to zero. but try it yourself too and post your findings.
 
 



Re: TimedOutException caused by Stop the world activity

2012-05-30 Thread aaron morton
The cluster is running into GC problems and this is slowing it down under the 
stress test. When it slows down one or more of the nodes is failing to perform 
the write within rpc_timeout . This causes the coordinator of the write to 
raise the TimedOutException. 

You options are:

* allocate more memory
* ease back on the stress test. 
* work as a CL QUORUM so that one node failing does result in the error. 

see also http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Cheers
 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/05/2012, at 12:59 PM, Jason Tang wrote:

 Hi
 
 My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default 
 configuration (which means 1/3 heap for memtable), replicate number 3, write 
 all, read one.
 When I run stress load testing, I got this TimedOutException, and some 
 operation failed, and all traffic hang for a while. 
 
 And when I have 1G memory 32 bit cassandra on standalone model, I didn't find 
 so frequently Stop the world behavior.
 
 So I wonder what kind of operation will hang the cassandra system. 
 
 How to collect information for tuning.
 
 From the system log and document, I guess there are three type operations:
 1) Flush memtable when meet max size
 2) Compact SSTable (why?)
 3) Java GC
 
 system.log:
  INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) 
 Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live 
 bytes, 2 ops)
  INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) 
 Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops)
  INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) 
 Completed flushing 
 /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes)
 ...
 
  INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java 
 (line 112) Compacting 
 [SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
  SSTableReader(path='/var/proclog/raw/cassandra/data/ myks 
 /queue-hb-32-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ 
 myks /queue-hb-37-Data.db'), 
 SSTableReader(path='/var/proclog/raw/cassandra/data/ myks 
 /queue-hb-53-Data.db')]
 ...
 
  WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) 
 Heap is 0.7993011015621736 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
  INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) 
 GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 
 6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) 
 GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) 
 GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784
  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) 
 GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max 
 is 6274678784
 
 
 Timeout Exception:
 Caused by: org.apache.cassandra.thrift.TimedOutException: null
 at 
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495)
  ~[na:na]
 at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
  ~[na:na]
 at 
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
  ~[na:na]
 at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
  ~[na:na]
 ... 64 common frames omitted
 
 BRs
 //Tang Weiqiang
 
 



Re: Snapshot failing on JSON files in 1.1.0

2012-05-30 Thread aaron morton
CASSANDRA-4230 is a bug in 1.1

I am not aware of issues using snapshot on 1.0.9. But errno 0 is a bit odd. 

On the server side there should be a log message at ERROR level that contains 
the string Unable to create hard link and the error message. What does that 
say ? 

Can you also include the OS version. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/05/2012, at 9:27 PM, Alain RODRIGUEZ wrote:

 I have the same error with the last Datastax AMI (1.0.9). Is that the same 
 bug ?
 
 Requested snapshot for: cassa_teads
 Exception in thread main java.io.IOError: java.io.IOException:
 Unable to create hard link from
 /raid0/cassandra/data/cassa_teads/stats_product-hc-233-Index.db to
 /raid0/cassandra/data/cassa_teads/snapshots/20120528/stats_product-hc-233-Index.db
 (errno 0)
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at org.apache.cassandra.db.Table.snapshot(Table.java:210)
at 
 org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1710)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Unable to create hard link from
 /raid0/cassandra/data/cassa_teads/stats_product-hc-233-Index.db to
 /raid0/cassandra/data/cassa_teads/snapshots/20120528/stats_product-hc-233-Index.db
 (errno 0)
at 
 org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:158)
at 
 org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:857)
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1412)
... 32 more
 
 Can we do a snapshot manually (like flushing and after copying all the
 file into the snapshot folder) ?
 
 Alain
 
 2012/5/19 Jonathan Ellis jbel...@gmail.com:
 When these bugs are fixed:
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+CASSANDRA+AND+fixVersion+%3D+%221.1.1%22+AND+resolution+%3D+Unresolved+ORDER+BY+due+ASC%2C+priority+DESC%2C+created+ASCmode=hide
 
 On Wed, May 16, 2012 at 6:35 PM, Bryan Fernandez bfernande...@gmail.com 
 wrote:
 Does anyone know when 1.1.1 will be released?
 
 Thanks.
 
 On Tue, May 15, 2012 at 5:40 PM, Brandon Williams dri...@gmail.com wrote:
 
 Probably https://issues.apache.org/jira/browse/CASSANDRA-4230
 
 On Tue, May 15, 2012 at 4:08 PM, Bryan 

Re: Doubts regarding compaction

2012-05-30 Thread aaron morton
 Also, I want to make sure, if Major compactions could only be done manually ? 
Major compactions are the ones you run using nodetool

 Is the author referring to this time period as no minor compactions being 
 triggered automatically ?

They minor compaction will be triggered less frequently because you will need 
to run 4 compactions at the first size tier before one runs at the next. And so 
on. Up to the point where you need to get another 3 files roughly the same size 
as the one you got from the major compaction. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/05/2012, at 3:41 AM, Rajat Mathur wrote:

 http://www.datastax.com/docs/1.0/operations/tuning
 
 On this page at last, there's a note about Major Compaction which says, 
 
 Also, once you run a major compaction, automatic minor compactions are no 
 longer triggered frequently...
 
 Could anybody give an explanation for that, because as far as I think, once a 
 major compaction takes place, after that let's say there would be no 
 compactions till N(default value 4) SSTables of same size (size of memtable 
 to be precise) are formed, then automatically minor compactions would start. 
 Is the author referring to this time period as no minor compactions being 
 triggered automatically ?
 
 Also, I want to make sure, if Major compactions could only be done manually ? 
 
 -- 
 Rajat Mathur



Re: About Composite range queries

2012-05-30 Thread aaron morton
Composite Columns compare each part in turn, so the values are ordered as 
you've shown them. 

However the rows are not ordered according to key value. They are ordered using 
the random token generated by the partitioner see 
http://wiki.apache.org/cassandra/FAQ#range_rp

 What is the real advantage compared to super column families?
They are faster. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:

 How is it done in Cassandra to be able to range query on a composite key?
 
 key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
 
 like get_range (key1, start_column=(A,), end_column=(A, C)); will return [ 
 (A:B:C), (A:C:C) ] (in pycassa)
 
 I mean does the composite implementation add much overhead to make it work?
 Does it need to add other Column families, to be able to range query between 
 composites simple keys (first, second and third part of the composite)?
 
 What is the real advantage compared to super column families?
 
 key1 = A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
 
 thx



Re: All host pools Marked Down

2012-05-30 Thread aaron morton
I would remove the load balancer from the equation.

Compactions do not stop the world, they may degrade performance for a while but 
thats about it. 

Look in the logs on the servers, are the nodes logging that other nodes are 
going DOWN ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/05/2012, at 2:25 AM, cem wrote:

 It should retry but it doesn't. It is also clear that it delegates the retry 
 to the client  Retry burden pushed out to client  you can also check Hector 
 code. I wrote a separate service that retries when this exception occurs. 
 
 I think you have a problem with your load balancer. Try to connect with 
 telnet.  
 
 Cem.
 
 On Tue, May 29, 2012 at 3:06 PM, Shubham Srivastava 
 shubham.srivast...@makemytrip.com wrote:
 My webapp connects to the LoadBalancer IP which has the actual nodes in its 
 pool.
 
 If there is by any chance a connection break then will hector not retry to 
 re-establish connection I guess it should retry every XX seconds based on  
 retryDownedHostsDelayInSeconds
  .
 
 
 Regards,
 Shubham
 From: cem [cayiro...@gmail.com]
 Sent: Tuesday, May 29, 2012 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: All host pools Marked Down
 
 Since all hosts are seem to be down, Hector will not do retry. There should 
 be at least one node up in a cluster. Make sure that you have a proper 
 connection from your webapps to your cluster.
 
 Cem. 
 
 On Tue, May 29, 2012 at 1:46 PM, Shubham Srivastava 
 shubham.srivast...@makemytrip.com wrote:
 Any takers on this. Hitting us badly right now.
 
 Regards,
 Shubham
 From: Shubham Srivastava
 Sent: Tuesday, May 29, 2012 12:55 PM
 To: user@cassandra.apache.org
 Subject: All host pools Marked Down
 
 I am getting this exception lot of times
 
  
 me.prettyprint.hector.api.exceptions.HectorException: All host pools marked 
 down. Retry burden pushed out to client.
 
  
 
 What this causes is no data read/write from the ring from my WebApp.
 
  
 I have retries as 3 and can see that max retries 3 getting exhausted with the 
 same error as above.
 
  
 Checked cfstats and tpstats nothing seem to be a problem.
 
  
 However through the logs I see lot of time taken in compactions like the below
 
  
 INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java 
 (line 608) Compacted to 
 /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db.  36,986,932 to 
 36,961,554 (~99% of original) bytes for 132,743 keys.  Time: 112,910ms.
 
  
 The time taken here seems pretty high. Will this cause a pause or read 
 timeout etc.
 
  
 I have the connection from my web app through a hardware loadbalancer . 
 Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC.
 
 CL:1 and RF:3.
 
  
 Memeory:8Gb heap - 14Gb Server memory with 8Core CPU.
 
  
 How do I move ahead in this.
 
  
 Shubham Srivastava | Technical Lead - Technology Development
 
 +91 124 4910 548   |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, 
 Gurgaon, Haryana - 122 016, India
 
 image001.gifWhat's new? My Trip Rewards - An exclusive loyalty program for 
 MakeMyTrip customers.
 
 image002.gif
 
 image003.gif
 Office Map
 
 image004.gif
 Facebook
 
 image005.gif
 Twitter
 
  
 
 



Re: Nodetool talking to an old IP address (and timing out)

2012-05-30 Thread aaron morton
node tool passes the host name un modified to the JMX library to connect to the 
host. 

The JMX server will, by default, bind to the ip address of the machine. 

If the host name was wrong, I would guess the JMX service failed to bind. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/05/2012, at 6:39 AM, Douglas Muth wrote:

 8 hours, 1 cup of coffee, and 4 Advil later, and I think I got the
 bottom of this.  Not having much of a Java or JMX background, I'll try
 to explain it the best that I can.
 
 To recap, my machine originally had the IP address of 10.244.207.16.
 Then I shutdown/restarted that EC2 instance, and it had the IP
 10.84.117.110.  During this, Cassandra was fine -- I could still
 connect to 127.0.0.1 with cqlsh and the Helenus node.js module.
 
 Things got weird only when I tried to use nodetool to manage the
 instance.  As best I can tell, here's the algorithm that nodetool uses
 when connecting to a Cassandra instance:
 
 Step 1) Connect to the hostname and port specified on the command
 line. localhost and 7199 are the defaults.
 
 Step 2) Cassandra, at boot time, notes the hostname of the machine,
 and tells nodetool go connect to this hostname instead!
 
 After further investigation, it seems that after my instance was
 rebooted, the file /etc/hostname was not updated.  It still had the
 value ip-10-244-207-16.ec2.internal in it.  This means that any
 attempt to connect to Cassandra involved Cassandra telling nodetool,
 Hey, go talk to 10.244.207.16 instead.  And that's where things went
 wrong.
 
 The permanent fix for this was to change the hostname to localhost
 and to restart Cassandra.  The fact that Cassandra notes the hostname
 at startup was one thing that made this so difficult to track down.  I
 did not see the old IP anywhere in Cassandra configuration (or in
 logfile output), so I did not think there was anything abnormal
 happening in the instance.
 
 While I'm sure there's a good reason for this sort of behavior, it is
 very confusing to a Cassandra newbie such as myself, and I'll bet
 others have been affected by this as well.  In the future, I think
 some sort of logging of this sort of of logic, or perhaps a --verbose
 mode for nodetool would be a really good idea.  What do other folks
 think?
 
 -- Doug
 http://twitter.com/dmuth
 
 
 On Tue, May 29, 2012 at 12:08 PM, Douglas Muth doug.m...@gmail.com wrote:
 I'm afraid that did not work.  I'm running JMX on port 7199 (the
 default) and I verified that the port is open and accepting
 connections.
 [snip]



RE: will compaction delete empty rows after all columns expired?

2012-05-30 Thread Curt Allred
No, these were not wide rows.  They are rows that formerly had one or 2 
columns. The columns are deleted but the empty rows dont go away, even after 
gc_grace_secs.

So if I understand... the empty row will only be removed after gc_grace if 
enough compactions have occurred so that all the column tombstones for the 
empty row are in a single SSTable file?
From: aaron morton [mailto:aa...@thelastpickle.com]


Minor compaction will remove the tombstones if the row only exists in the 
sstable being compaction.

Are these very wide rows that are constantly written to ?

Cheers
 p.s. cassandra 1.0 really does rock.


Re: will compaction delete empty rows after all columns expired?

2012-05-30 Thread Zhu Han
On Thu, May 31, 2012 at 9:31 AM, Curt Allred c...@mediosystems.com wrote:

 No, these were not wide rows.  They are rows that formerly had one or 2
 columns. The columns are deleted but the empty rows dont go away, even
 after gc_grace_secs.


The empty row goes away only during a compaction after the gc_grace_secs.

You can set the gc_grace_secs as a little value and force major compaction
after the row is expired.  After then please check whether the row still
exists.


 

 ** **

 So if I understand... the empty row will only be removed after gc_grace if
 enough compactions have occurred so that all the column tombstones for the
 empty row are in a single SSTable file?

 

 *From:* aaron morton [mailto:aa...@thelastpickle.com]

 

 ** **

 Minor compaction will remove the tombstones if the row only exists in the
 sstable being compaction. 

 ** **

 Are these very wide rows that are constantly written to ? 

 ** **

 Cheers

  p.s. cassandra 1.0 really does rock. 



Re: Confusion regarding the terms replica and replication factor

2012-05-30 Thread Edward Capriolo
http://answers.oreilly.com/topic/2408-replica-placement-strategies-when-using-cassandra/

As mentioned it does this:

The Network Topology Strategy places some replicas in another data
center and the remainder in other racks in the first data center, as
specified

Which is not what most would expect.
Assume your largish cluster is say 40 nodes in a data center.  3
copies get you quorum and a 48 port switch is pretty common. switches
can even be stacked sometimes 3 or 4 units so now you are talking 48x4
switch ports really make up one rack.


On Wed, May 30, 2012 at 4:37 PM, Jeff Williams
je...@wherethebitsroam.com wrote:

 On May 30, 2012, at 10:32 PM, Edward Capriolo wrote:


 The snitch does use the datacenter and the rack but almost all
 deployments use a single rack per DC, because when you have more then
 one rack in a data center the NTS snitch has some logic to spread the
 data between racks. (most people do not want this behavior)


 Out of curiosity, why would most people not want this behaviour? It seems 
 like a good idea from a availability perspective.

 Jeff