Re: Problem with PropertyFileSnitch in Amazon EC2

2011-06-21 Thread Joaquin Casares
Could you verify any security settings that may come into play with Elastic
IPs? You should make sure the appropriate ports are open.

See: http://www.datastax.com/docs/0.8/brisk/install_brisk_ami
for a list of ports in the first chart.

Joaquin Casares
DataStax
Software Engineer/Support



On Mon, Jun 20, 2011 at 7:41 PM, Sameer Farooqui wrote:

> Quick update...
>
> I'm trying to get a 3-node cluster defined the following way in the
> topology.properties file to work first:
> 10.68.x.x=DC1:RAC1
> 10.198.x.x=DC1:RAC2
> 10.204.x.x=DC1:RAC3
>
> I'll split up the 3rd node into a separate data center later.
>
> Also, ignore that comment I made about the $BRISK_HOME/lib/ folder not
> existing. When you run ANT, I believe it populates correctly, but I'll have
> to confirm/test later.
>
> Based on Joaquin @ DataStax's suggestion, I tried changing the Seed IP in
> all 3 nodes' YAML file to the Amazon Private IP, instead of the Elastic IP.
> After this change, all three nodes joined the ring correctly:
>
> ubuntu@ip-10-68-x-x:~/brisk-1.0~beta1.2/resources/cassandra/conf$
> ../bin/nodetool -h localhost ring
> Address Status State   LoadOwnsToken
>
>  113427455640312821154458202477256070485
> 10.68.x.x Up Normal  10.9 KB 33.33%  0
> 10.198.x.x  Up Normal  15.21 KB33.33%
>  56713727820156410577229101238628035242
> 10.204.x.x  Up Normal  6.55 KB 33.33%
>  113427455640312821154458202477256070485
>
> PasteBin is down and is showing me a diligent cat typing on a keyboard, so
> I uploaded some relevant DEBUG level log files here:
>
> http://blueplastic.com/accenture/N1-system-seed_is_ElasticIP.log (problem
> exists)
> http://blueplastic.com/accenture/N2-system-seed_is_ElasticIP.log (problem
> exists)
>
> http://blueplastic.com/accenture/N1-system-seed_is_privateIP.log(everything 
> works)
> http://blueplastic.com/accenture/N2-system-seed_is_privateIP.log(everything 
> works)
>
>
> But if I want to set up the Brisk cluster across Amazon regions, I have to
> be able to use the Elastic IP for the seed. Also, using v 0.7.4 of Cassandra
> in Amazon, we successfully set up a 30+ node cluster using 3 seed nodes
> which were declared in the YAML file using Elastic IPs. All 30 nodes were in
> the same region and availability zone. So, in an older version of Cassandra,
> providing the Seeds as Elastic IP used to work.
>
> In my current setup, even though nodes 1 & 2 are in the same region &
> availability zone, I can't seem to get them to join the same ring correctly.
>
>
> Here is what the system log file shows when I declare the Seed using
> Elastic IP:
> INFO [Thread-4] 2011-06-21 00:10:30,849 BriskDaemon.java (line 187)
> Listening for thrift clients...
> DEBUG [GossipTasks:1] 2011-06-21 00:10:31,608 Gossiper.java (line 201)
> Assuming current protocol version for /50.17.x.x
> DEBUG [WRITE-/50.17.212.84] 2011-06-21 00:10:31,610
> OutboundTcpConnection.java (line 161) attempting to connect to /50.17.x.x
> DEBUG [GossipTasks:1] 2011-06-21 00:10:32,610 Gossiper.java (line 201)
> Assuming current protocol version for /50.17.x.x
> DEBUG [ScheduledTasks:1] 2011-06-21 00:10:32,613 StorageLoadBalancer.java
> (line 334) Disseminating load info ...
> DEBUG [GossipTasks:1] 2011-06-21 00:10:33,611 Gossiper.java (line 201)
> Assuming current protocol version for /50.17.x.x
> DEBUG [GossipTasks:1] 2011-06-21 00:10:34,612 Gossiper.java (line 201)
> Assuming current protocol version for /50.17.x.x
>
>
> But when I use private IP, the log shows:
>
> INFO [Thread-4] 2011-06-21 00:19:47,993 BriskDaemon.java (line 187)
> Listening for thrift clients...
> DEBUG [ScheduledTasks:1] 2011-06-21 00:19:49,769 StorageLoadBalancer.java
> (line 334) Disseminating load info ...
> DEBUG [WRITE-/10.198.126.193] 2011-06-21 00:20:09,658
> OutboundTcpConnection.java (line 161) attempting to connect to /10.198.x.x
>  INFO [GossipStage:1] 2011-06-21 00:20:09,690 Gossiper.java (line 637) Node
> /10.198.x.x is now part of the cluster
> DEBUG [GossipStage:1] 2011-06-21 00:20:09,691 MessagingService.java (line
> 158) Resetting pool for /10.198.x.x
>  INFO [GossipStage:1] 2011-06-21 00:20:09,691 Gossiper.java (line 605)
> InetAddress /10.198.x.x is now UP
> DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
> (line 282) Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
> (line 274) schema for /10.198.x.x matches local schema
> DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
> (line 288) Sleeping 11662ms to stagger hint delivery
>
> - Sameer
>
>
> On Mon, Jun 20, 2011 at 2:28 PM, Sameer Farooqui 
> wrote:
>
>> Hi,
>>
>> I'm setting up a 3 node test cluster in multiple Amazon Availability Zones
>> to test cross-zone internode communication (and eventually cross-region
>> communications).
>>
>> But I wanted to start with a cross-zone setup and am having trouble
>> getting the nodes to conne

Re: Problem with PropertyFileSnitch in Amazon EC2

2011-06-20 Thread Sameer Farooqui
Quick update...

I'm trying to get a 3-node cluster defined the following way in the
topology.properties file to work first:
10.68.x.x=DC1:RAC1
10.198.x.x=DC1:RAC2
10.204.x.x=DC1:RAC3

I'll split up the 3rd node into a separate data center later.

Also, ignore that comment I made about the $BRISK_HOME/lib/ folder not
existing. When you run ANT, I believe it populates correctly, but I'll have
to confirm/test later.

Based on Joaquin @ DataStax's suggestion, I tried changing the Seed IP in
all 3 nodes' YAML file to the Amazon Private IP, instead of the Elastic IP.
After this change, all three nodes joined the ring correctly:

ubuntu@ip-10-68-x-x:~/brisk-1.0~beta1.2/resources/cassandra/conf$
../bin/nodetool -h localhost ring
Address Status State   LoadOwnsToken

 113427455640312821154458202477256070485
10.68.x.x Up Normal  10.9 KB 33.33%  0
10.198.x.x  Up Normal  15.21 KB33.33%
 56713727820156410577229101238628035242
10.204.x.x  Up Normal  6.55 KB 33.33%
 113427455640312821154458202477256070485

PasteBin is down and is showing me a diligent cat typing on a keyboard, so I
uploaded some relevant DEBUG level log files here:

http://blueplastic.com/accenture/N1-system-seed_is_ElasticIP.log (problem
exists)
http://blueplastic.com/accenture/N2-system-seed_is_ElasticIP.log (problem
exists)

http://blueplastic.com/accenture/N1-system-seed_is_privateIP.log (everything
works)
http://blueplastic.com/accenture/N2-system-seed_is_privateIP.log (everything
works)


But if I want to set up the Brisk cluster across Amazon regions, I have to
be able to use the Elastic IP for the seed. Also, using v 0.7.4 of Cassandra
in Amazon, we successfully set up a 30+ node cluster using 3 seed nodes
which were declared in the YAML file using Elastic IPs. All 30 nodes were in
the same region and availability zone. So, in an older version of Cassandra,
providing the Seeds as Elastic IP used to work.

In my current setup, even though nodes 1 & 2 are in the same region &
availability zone, I can't seem to get them to join the same ring correctly.


Here is what the system log file shows when I declare the Seed using Elastic
IP:
INFO [Thread-4] 2011-06-21 00:10:30,849 BriskDaemon.java (line 187)
Listening for thrift clients...
DEBUG [GossipTasks:1] 2011-06-21 00:10:31,608 Gossiper.java (line 201)
Assuming current protocol version for /50.17.x.x
DEBUG [WRITE-/50.17.212.84] 2011-06-21 00:10:31,610
OutboundTcpConnection.java (line 161) attempting to connect to /50.17.x.x
DEBUG [GossipTasks:1] 2011-06-21 00:10:32,610 Gossiper.java (line 201)
Assuming current protocol version for /50.17.x.x
DEBUG [ScheduledTasks:1] 2011-06-21 00:10:32,613 StorageLoadBalancer.java
(line 334) Disseminating load info ...
DEBUG [GossipTasks:1] 2011-06-21 00:10:33,611 Gossiper.java (line 201)
Assuming current protocol version for /50.17.x.x
DEBUG [GossipTasks:1] 2011-06-21 00:10:34,612 Gossiper.java (line 201)
Assuming current protocol version for /50.17.x.x


But when I use private IP, the log shows:

INFO [Thread-4] 2011-06-21 00:19:47,993 BriskDaemon.java (line 187)
Listening for thrift clients...
DEBUG [ScheduledTasks:1] 2011-06-21 00:19:49,769 StorageLoadBalancer.java
(line 334) Disseminating load info ...
DEBUG [WRITE-/10.198.126.193] 2011-06-21 00:20:09,658
OutboundTcpConnection.java (line 161) attempting to connect to /10.198.x.x
 INFO [GossipStage:1] 2011-06-21 00:20:09,690 Gossiper.java (line 637) Node
/10.198.x.x is now part of the cluster
DEBUG [GossipStage:1] 2011-06-21 00:20:09,691 MessagingService.java (line
158) Resetting pool for /10.198.x.x
 INFO [GossipStage:1] 2011-06-21 00:20:09,691 Gossiper.java (line 605)
InetAddress /10.198.x.x is now UP
DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
(line 282) Checking remote schema before delivering hints
DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
(line 274) schema for /10.198.x.x matches local schema
DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java
(line 288) Sleeping 11662ms to stagger hint delivery

- Sameer


On Mon, Jun 20, 2011 at 2:28 PM, Sameer Farooqui wrote:

> Hi,
>
> I'm setting up a 3 node test cluster in multiple Amazon Availability Zones
> to test cross-zone internode communication (and eventually cross-region
> communications).
>
> But I wanted to start with a cross-zone setup and am having trouble getting
> the nodes to connect to each other and join one 3-node ring. All nodes just
> seem to join their own ring and claim 100% of that space.
>
> I'm using this Beta2 distribution of Brisk:
> http://debian.datastax.com/maverick/pool/brisk_1.0~beta1.2.tar.gz
>
> I had to manually recreate the $BRISK_HOME/lib/ folder because it didn't
> exist in the binary for some reason and I also added jna and mx4j jar files
> to the lib directory.
>
> The cluster is geographically located like this:
>
> Node 1 (seed): East-A
> Node 2: East-A
> Node 3: East-B
>
> The c

Problem with PropertyFileSnitch in Amazon EC2

2011-06-20 Thread Sameer Farooqui
Hi,

I'm setting up a 3 node test cluster in multiple Amazon Availability Zones
to test cross-zone internode communication (and eventually cross-region
communications).

But I wanted to start with a cross-zone setup and am having trouble getting
the nodes to connect to each other and join one 3-node ring. All nodes just
seem to join their own ring and claim 100% of that space.

I'm using this Beta2 distribution of Brisk:
http://debian.datastax.com/maverick/pool/brisk_1.0~beta1.2.tar.gz

I had to manually recreate the $BRISK_HOME/lib/ folder because it didn't
exist in the binary for some reason and I also added jna and mx4j jar files
to the lib directory.

The cluster is geographically located like this:

Node 1 (seed): East-A
Node 2: East-A
Node 3: East-B

The cassandra-topology.properties file on all three nodes contains this:

# Cassandra Node IP=Data Center:Rack
10.68.x.x=DC1:RAC1
10.198.x.x=DC1:RAC2
10.204.x.x=DC2:RAC1
default=DC1:RAC1


and finally, here is what the relevant sections of the YAML file looks like
for each node:

++ Node 1 ++
cluster_name: 'Test Cluster'
initial_token: 0
auto_bootstrap: false
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x#This is the elastic IP for Node 1
listen_address: 10.68.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
internode_encryption: none

++ Node 2 ++
cluster_name: 'Test Cluster'
initial_token: 56713727820156410577229101238628035242
auto_bootstrap: true
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x#This is the elastic IP for Node 1
listen_address: 10.198.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
internode_encryption: none

++ Node 3 ++
cluster_name: 'Test Cluster'
initial_token: 113427455640312821154458202477256070485
auto_bootstrap: true
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x#This is the elastic IP for Node 1
listen_address: 10.204.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
internode_encryption: none


When I start Cassandra on all three nodes using "sudo bin/brisk cassandra",
the startup log doesn't show any warnings or errors. The end of the start
log on Node1 says:
 INFO [main] 2011-06-20 21:06:57,702 MessagingService.java (line 201)
Starting Messaging Service on port 7000
 INFO [main] 2011-06-20 21:06:57,723 StorageService.java (line 482) Using
saved token 0
 INFO [main] 2011-06-20 21:06:57,724 ColumnFamilyStore.java (line 1011)
Enqueuing flush of Memtable-LocationInfo@1260987126(38/47 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,724 Memtable.java (line 237)
Writing Memtable-LocationInfo@1260987126(38/47 serialized/live bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,809 Memtable.java (line 254)
Completed flushing /raiddrive/data/system/LocationInfo-g-12-Data.db (148
bytes)
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,812 CompactionManager.java
(line 539) Compacting Major:
[SSTableReader(path='/raiddrive/data/system/LocationInfo-g-9-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-11-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-10-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-12-Data.db')]
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,828 CompactionIterator.java
(line 186) Major@1110828771(system, LocationInfo, 429/808) now compacting at
16777 bytes/ms.
 INFO [main] 2011-06-20 21:06:57,881 Mx4jTool.java (line 67) mx4j
successfuly loaded
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,909 CompactionManager.java
(line 603) Compacted to
/raiddrive/data/system/LocationInfo-tmp-g-13-Data.db.  808 to 432 (~53% of
original) bytes for 3 keys.  Time: 97ms.
 INFO [main] 2011-06-20 21:06:57,953 BriskDaemon.java (line 146) Binding
thrift service to /0.0.0.0:9160
 INFO [main] 2011-06-20 21:06:57,955 BriskDaemon.java (line 160) Using
TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [Thread-4] 2011-06-20 21:06:57,958 BriskDaemon.java (line 187)
Listening for thrift clients...


And the end of the log on node 2 says:
 INFO [main] 2011-06-20 21:06:57,899 StorageService.java (line 368)
Cassandra version: 0.8.0-beta2-SNAPSHOT
 INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 369) Thrift
API version: 19.10.0
 INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 382) Loading
persisted ring state
 INFO [main] 2011-06-20 21:06:57,904 StorageService.java (line 418) Starting
up server gossip
 INFO [main] 2011-06-20 21:06:57,915 ColumnFamilyStore.java (line 1011)
Enqueuing flush of Memtable-LocationInfo@885597447(29/36 serialized/live
bytes, 1 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,916 Memtable.java (line 237)
Writing Memtable-LocationInfo@885597447(29/36 serialized/live bytes, 1 ops)
 INFO [FlushWriter:1] 2011-06-