Get cassandra SuperColumn only!

2010-09-16 Thread Saurabh Raje
Hi,

I have a cassandra datastore as follows:
key:{
 supercol (utf8) : {
   subcol (timuuid) : data
 }
}

Now, for a particular usecase I want to do slice on 2 levels. Firstly
on supercols  then from the selected supercols results slice subcols
(mostly to restrict no of items fetched in mem). I have tried various
API's and there doesn't seem to be a way to do this. The reason being
when I slice supercols i get the subcols in the result too! Now,
ofcourse, I can add another index as follows:

key : {
   supercol (utf8) : emptydata
 }
}

Haven't looked at cassandra storage in too detail - but hoping there
is a better solution!

Thanks in advance.


0.7 live schema updates

2010-09-16 Thread Marc Canaleta
Hi!

I like the new feature of making live schema updates. You can add, drop and
rename columns and keyspaces via thrift, but how do you modify column
attributes like key_cache_size or rows_cached?

Thank you.


Re: 0.7 live schema updates

2010-09-16 Thread Oleg Anastasyev
You can change these attrs using JMX interface. Take a look at
org.apache.cassandra.tools.NodeProbe setCacheCapacities method.



busy thread on IncomingStreamReader

2010-09-16 Thread Joseph Mermelstein
Hi - has anyone made any progress with this issue? We are having the same
problem with our Cassandra nodes in production. At some point a node (and
sometimes all 3) will jump to 100% CPU usage and stay there for hours until
restarted. Stack traces reveal several threads in a seemingly endless loop
doing this:

Thread-21770 - Thread t...@25278
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.size0(Native Method)
 at sun.nio.ch.FileChannelImpl.size(Unknown Source)
- locked java.lang.obj...@7a2c843d
 at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
 at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


My understanding from reading the code is that this trace shows a thread
belonging to the StreamingService which is writing an incoming stream to
disk. There seems to be some kind of bizzare problem which is causing the
FileChannel.size() function to spin with high CPU.

Also, this problem is not easy to replicate - so I would appreciate any
information on how the StreamingService works and what triggers it to
transfer these file streams.

Thanks,

Joseph Mermelstein
LivePerson http://solutions.liveperson.com





 i all,

  We setup two nodes and simply set replication factor=2 for test run.

 After both nodes, say, node A and node B, serve several hours, we found that
 node A always keep 300% cpu usage.


 (the other node is under 100% cpu, which is normal)

 thread dump on node A shows that there are 3 busy threads related to
 IncomingStreamReader:

 ==

 Thread-66 prio=10 tid=0x2aade4018800 nid=0x69e7 runnable


 [0x4030a000]
java.lang.Thread.State: RUNNABLE
 at sun.misc.Unsafe.setMemory(Native Method)
 at sun.nio.ch.Util.erase(Util.java:202)
 at
 sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)


 at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


 Thread-65 prio=10 tid=0x2aade4017000 nid=0x69e6 runnable
 [0x4d44b000]
java.lang.Thread.State: RUNNABLE
 at sun.misc.Unsafe.setMemory(Native Method)
 at sun.nio.ch.Util.erase(Util.java:202)


 at
 sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
 at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)


 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

 Thread-62 prio=10 tid=0x2aade4014800 nid=0x4150 runnable
 [0x4d34a000]
java.lang.Thread.State: RUNNABLE


 at sun.nio.ch.FileChannelImpl.size0(Native Method)
 at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309)
 - locked 0x2aaac450dcd0 (a java.lang.Object)
 at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597)


 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

 ===


 Is there anyone experience similar issue ?

 environments:

 OS   --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux
 Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build
 14.2-b01, mixed mode)


 Cassandra --- 0.6.0
 Node configuration --- node A and node B. both nodes use node A as Seed
 client --- Java thrift clients pick one node randomly to do read and write.


 --
 Ingram Chen
 online share order: http://dinbendon.net


 blog: http://www.javaworld.com.tw/roller/page/ingramchen





Getting client only example to work

2010-09-16 Thread Asif Jan


Hi

I am using 0.7.0-beta1 , and trying to get the contrib/client_only  
example to work.


I am running cassandra on host1, and trying to access it from host2.

When using thirft (via cassandra-cli) and in my application; I am able  
to connect and do all operations as expected.


But I am not able to connect to cassandra when using the code in  
client_only  (or far that matter using contrib/bmt_example). Since my  
test requires to do bulk insertion of about 1.4 TB of data, so I need  
to use a non-thirft interface.


The error that I am getting is follows (the keyspace and the column  
family exist and can be used via Thirft) :


10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode  
'auto' determined to be mmap, indexAccessMode is mmap

10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip
Exception in thread main java.lang.IllegalArgumentException: Unknown  
ColumnFamily Standard1 in keyspace Keyspace1
	at  
org 
.apache 
.cassandra 
.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009)
	at  
org 
.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java: 
418)

at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)

I am using the following code (from client_only example) (also passing  
JVM parameter -Dstorage-config=path_2_cassandra.yaml)




public static void main(String[] args) throws Exception {
System.setProperty(storage-config,cassandra.yaml);

testWriting();
}


// from client_only example

 private static void testWriting() throws Exception
{
StorageService.instance.initClient();
// sleep for a bit so that gossip can do its thing.
try
{
Thread.sleep(1L);
}
catch (Exception ex)
{
throw new AssertionError(ex);
}

// do some writing.
	final AbstractType comp =  
ColumnFamily.getComparatorFor(Keyspace1, Standard1, null);

for (int i = 0; i  100; i++)
{
	RowMutation change = new RowMutation(Keyspace1, (key  
+ i).getBytes());
	ColumnPath cp = new  
ColumnPath(Standard1).setColumn((colb).getBytes());
	change.add(new QueryPath(cp), (value + i).getBytes(),  
new TimestampClock(0));


	// don't call change.apply().  The reason is that is  
makes a static call into Table, which will perform
	// local storage initialization, which creates local  
directories.

// change.apply();

StorageProxy.mutate(Arrays.asList(change));
System.out.println(wrote key + i);
}
System.out.println(Done writing.);
StorageService.instance.stopClient();
}








RE: 0.7 live schema updates

2010-09-16 Thread Viktor Jevdokimov
But you'll loose these settings after Cassandra restart.

-Original Message-
From: Oleg Anastasyev [mailto:olega...@gmail.com] 
Sent: Thursday, September 16, 2010 11:21 AM
To: user@cassandra.apache.org
Subject: Re: 0.7 live schema updates

You can change these attrs using JMX interface. Take a look at
org.apache.cassandra.tools.NodeProbe setCacheCapacities method.




IndexingLocking in Cassandra

2010-09-16 Thread Sandor Molnar
Hello,

I have a few questions about indexing and locking in Cassandra:
- if I understood well only row level indexing exists prior to v0.7. I mean 
only the primary keys are indexed. Is that true?
- is it possible to use composite primary keys? For instance I have a user 
object: User(name,birthday,gender,address) and I want to have the 
(name,birthday) columns as PK. Can I do? If yes, how?
- does Cassandra support CF (table) level locking? Couls someone explain 
me/provide a link how?

Thanks in advance,
Sandor


Re: IndexingLocking in Cassandra

2010-09-16 Thread Juho Mäkinen
Hello,

 I have a few questions about indexing and locking in Cassandra:
 - if I understood well only row level indexing exists prior to v0.7. I mean 
 only the primary keys are indexed. Is that true?

Yes and no. The row name is the key which you use to fetch the row
from cassandra. There are methods to iterate thru rows but that's not
efficient and should be used only in batch operations. Columns inside
rows are sorted by their names so they are also indexes as you use the
column name to fetch the contents of the column. If you want to index
data by other ways you need to build your own application code which
maintains such indexes and the upcoming 0.7 version will bring some
handy features which makes the coders job much easier.

 - is it possible to use composite primary keys? For instance I have a user 
 object: User(name,birthday,gender,address) and I want to have the 
 (name,birthday) columns as PK. Can I do? If yes, how?

You can always create your row key as a string like $name_$birthday.
Did this answer to your question?

 - does Cassandra support CF (table) level locking? Couls someone explain 
 me/provide a link how?

No, cassandra doesn't have any locking capabilities. You can always
use some external locking mechanism like zookeeper
[http://hadoop.apache.org/zookeeper/] or implement your own sollution
on top of cassandra (not recommended as it's quite hard to get it
correctly).

 - Juho Mäkinen / Garo


RE: IndexingLocking in Cassandra

2010-09-16 Thread Sandor Molnar

Thanks for your fast answer.
Regarding to the composite keys: that's what I thought by default I just needed 
some confirmation. Unfortunately I can not use this approach in our application 
so I will figure out something else.
I will check out Zookeeper to see if I can use it.

Thanks again!

Hello,

 I have a few questions about indexing and locking in Cassandra:
 - if I understood well only row level indexing exists prior to v0.7. I mean 
 only the primary keys are indexed. Is that true?

Yes and no. The row name is the key which you use to fetch the row
from cassandra. There are methods to iterate thru rows but that's not
efficient and should be used only in batch operations. Columns inside
rows are sorted by their names so they are also indexes as you use the
column name to fetch the contents of the column. If you want to index
data by other ways you need to build your own application code which
maintains such indexes and the upcoming 0.7 version will bring some
handy features which makes the coders job much easier.

 - is it possible to use composite primary keys? For instance I have a user 
 object: User(name,birthday,gender,address) and I want to have the 
 (name,birthday) columns as PK. Can I do? If yes, how?

You can always create your row key as a string like $name_$birthday.
Did this answer to your question?

 - does Cassandra support CF (table) level locking? Couls someone explain 
 me/provide a link how?

No, cassandra doesn't have any locking capabilities. You can always
use some external locking mechanism like zookeeper
[http://hadoop.apache.org/zookeeper/] or implement your own sollution
on top of cassandra (not recommended as it's quite hard to get it
correctly).

 - Juho Mäkinen / Garo




Re: Getting client only example to work

2010-09-16 Thread Gary Dusbabek
I discovered some problems with the fat client earlier this week when
I tried using it.  It needs some fixes to keep up with all the 0.7
changes.

Gary.

On Thu, Sep 16, 2010 at 05:48, Asif Jan asif@gmail.com wrote:

 Hi
 I am using 0.7.0-beta1 , and trying to get the contrib/client_only example
 to work.
 I am running cassandra on host1, and trying to access it from host2.
 When using thirft (via cassandra-cli) and in my application; I am able to
 connect and do all operations as expected.
 But I am not able to connect to cassandra when using the code in client_only
  (or far that matter using contrib/bmt_example). Since my test requires to
 do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft
 interface.
 The error that I am getting is follows (the keyspace and the column family
 exist and can be used via Thirft) :
 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto'
 determined to be mmap, indexAccessMode is mmap
 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip
 Exception in thread main java.lang.IllegalArgumentException: Unknown
 ColumnFamily Standard1 in keyspace Keyspace1
 at
 org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009)
 at
 org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418)
 at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
 at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)
 I am using the following code (from client_only example) (also passing JVM
 parameter -Dstorage-config=path_2_cassandra.yaml)


 public static void main(String[] args) throws Exception {
 System.setProperty(storage-config,cassandra.yaml);
         testWriting();
 }

 // from client_only example
  private static void testWriting() throws Exception
     {
         StorageService.instance.initClient();
         // sleep for a bit so that gossip can do its thing.
         try
         {
             Thread.sleep(1L);
         }
         catch (Exception ex)
         {
             throw new AssertionError(ex);
         }
         // do some writing.
         final AbstractType comp = ColumnFamily.getComparatorFor(Keyspace1,
 Standard1, null);
         for (int i = 0; i  100; i++)
         {
             RowMutation change = new RowMutation(Keyspace1, (key +
 i).getBytes());
             ColumnPath cp = new
 ColumnPath(Standard1).setColumn((colb).getBytes());
             change.add(new QueryPath(cp), (value + i).getBytes(), new
 TimestampClock(0));
             // don't call change.apply().  The reason is that is makes a
 static call into Table, which will perform
             // local storage initialization, which creates local
 directories.
             // change.apply();
             StorageProxy.mutate(Arrays.asList(change));
             System.out.println(wrote key + i);
         }
         System.out.println(Done writing.);
         StorageService.instance.stopClient();
     }








Re: 0.7 live schema updates

2010-09-16 Thread Gary Dusbabek
beta-2 will include the ability to set these values and others.  Look
for the system_update_column_family() and system_update_keyspace()
methods.

Gary.

On Thu, Sep 16, 2010 at 02:38, Marc Canaleta mcanal...@gmail.com wrote:
 Hi!
 I like the new feature of making live schema updates. You can add, drop and
 rename columns and keyspaces via thrift, but how do you modify column
 attributes like key_cache_size or rows_cached?
 Thank you.


Re: Build an index to for join query

2010-09-16 Thread Rock, Paul
Alvin - assuming I understand what you're after correctly, why not make a CF 
Name_Address(name, address). Modifying the Cassandra methods to do the join 
you describe seems like overkill to me...

-Paul

On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:

Hello,

I am going to build an index to join two CFs.
First, we see this index as a CF/SCF. The difference is I don't materialise it.
Assume we have two tables:
ID_Address(Id, address) ,  Name_ID(name, id)
Then,the index is: Name_Address(name, address)

When the application tries to query on Name_Address, the value of name is 
given by the application.
I want to direct the read operation  to Name_ID to get Id value, then go to 
ID_Address to
get the address value by the Id value. So far, I consider only the read 
operation.
By this way, the join query is transparent to the user.

So I think I should find out which methods or classes are in charge of the read 
operation in the above operation.
For example, the operation in cassandra CLI get Keyspace1.Standard2['jsmith'] 
calls exactly which methods
in the server side?

I noted CassandraServer is used to listen to clients, and there are some 
methods such as get(), get_slice().
Is it the right place I can modify to implement my idea?

Thanks.

Alvin



Pb with memtable_throughput_in_mb?

2010-09-16 Thread Thomas Boucher
Hi,

I am trying out the latest trunk version and I get an error when
starting Cassandra with -Xmx3G:
Fatal error: memtable_operations_in_millions must be a positive double

I guess it is caused by line 76 in org/apache/cassandra/config/Config.java [0]:

public Integer memtable_throughput_in_mb = (int)
Runtime.getRuntime().maxMemory() / 8;

The cast to (int) is done on maxMemory() but this method returns a
long, leading to a cast to a negative integer for mem=3G for instance.
Thus memtable_operations_in_millions becomes negative (Double
memtable_operations_in_millions = memtable_throughput_in_mb / 64 *
0.3) and the exception is thrown:

maxMemory() is measured in bytes but I guess memtable_throughput_in_mb
should in MB (as it names imply), which is not the case here.


What do you think?

Thanks for any input you have to this,
Cheers

[0] 
http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java


Re: Pb with memtable_throughput_in_mb?

2010-09-16 Thread Brandon Williams
On Thu, Sep 16, 2010 at 11:00 AM, Thomas Boucher ethx...@gmail.com wrote:

 Hi,

 I am trying out the latest trunk version and I get an error when
 starting Cassandra with -Xmx3G:
 Fatal error: memtable_operations_in_millions must be a positive double

 I guess it is caused by line 76 in org/apache/cassandra/config/Config.java
 [0]:

public Integer memtable_throughput_in_mb = (int)
 Runtime.getRuntime().maxMemory() / 8;

 The cast to (int) is done on maxMemory() but this method returns a
 long, leading to a cast to a negative integer for mem=3G for instance.
 Thus memtable_operations_in_millions becomes negative (Double
 memtable_operations_in_millions = memtable_throughput_in_mb / 64 *
 0.3) and the exception is thrown:

 maxMemory() is measured in bytes but I guess memtable_throughput_in_mb
 should in MB (as it names imply), which is not the case here.


Oops, good catch.  Fixed in r997841.

 -Brandon


Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Francois Richard
Guys,

I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
Ubuntu.  I see that you have a ./debian directory in the source builds, do you 
have a bit more background on how it is used and build?

P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.

Thanks,

FR



Francois Richard


Re: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Clint Byrum
Hello Francois, 

There are already .debs available here:

http://wiki.apache.org/cassandra/DebianPackaging

I've also setup a PPA to build the packages on Ubuntu here:

https://launchpad.net/~cassandra-ubuntu/+archive/stable

Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I 
write this email..

The .debs are nearly identical. The only difference is that I've packaged the 
jars necessary to build, so that you get the same exact versions of all 
libraries if you need to patch + repeat the build. Also, these are built 
specifically for Ubuntu releases, so if we find any incompatibilities between 
debian/ubuntu we can fix them for ubuntu users. 

I hope this helps!

On Sep 16, 2010, at 10:30 AM, Francois Richard wrote:

 Guys,
  
 I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
 Ubuntu.  I see that you have a ./debian directory in the source builds, do 
 you have a bit more background on how it is used and build?
  
 P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.
  
 Thanks,
  
 FR
  
  
  
 Francois Richard



Re: Get cassandra SuperColumn only!

2010-09-16 Thread Aaron Morton
AFAIK there is no way to get a list of the super columns, without also getting the sub columns. I do not know if there is a technical reason that would prevent this from being added.In general it's more efficient to make 1 request that pulls back more data, than two or more than pull back just enough data. But you also want to design to answer the queries you need to make.Keeping an index of super column names in another CF does not sound too bad. it might pay to take another look at why you are using a super CF. It may be better to use two standard CF's if say you want to have one sort of request that gets a list of things, and another sort of request that gets the details for a number of things.AaronOn 16 Sep, 2010,at 07:25 PM, Saurabh Raje saur...@webaroo.com wrote:Hi,I have a cassandra datastore as follows:
key:{supercol (utf8) : { subcol (timuuid) : data}}Now, for a particular usecase I want to do slice on 2 levels. Firstlyon supercols  then from the selected supercols results slice subcols
(mostly to restrict no of items fetched in mem). I have tried variousAPI's and there doesn't seem to be a way to do this. The reason beingwhen I slice supercols i get the subcols in the result too! Now,
ofcourse, I can add another index as follows:key : { supercol (utf8) : emptydata}}Haven't looked at cassandra storage in too detail - but hoping thereis a better solution!
Thanks in advance


RE: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Francois Richard
Thanks Clint,

I am going to look-up the links below, I am pretty new on the DEB packaging in 
general and from what I have seen so far, a lot of the tutorial on the web are 
mostly based on classic [ .configure | make | make install ] of an application 
built in C.  In this case I wanted to figure out the DEB packaging in the 
context of a Java application.  I'll read on more and will stay in touch.

My goal at the end of the day, is to install  the stock package for Cassandra 
and then to create a special Cassandra-config package that would move and 
deploy my customized configuration files on the system.


Thanks,

FR

-Original Message-
From: Clint Byrum [mailto:cl...@ubuntu.com] 
Sent: Thursday, September 16, 2010 10:54 AM
To: user@cassandra.apache.org
Subject: Re: Buildding a Ubuntu / Debian package for Cassandra

Hello Francois, 

There are already .debs available here:

http://wiki.apache.org/cassandra/DebianPackaging

I've also setup a PPA to build the packages on Ubuntu here:

https://launchpad.net/~cassandra-ubuntu/+archive/stable

Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I 
write this email..

The .debs are nearly identical. The only difference is that I've packaged the 
jars necessary to build, so that you get the same exact versions of all 
libraries if you need to patch + repeat the build. Also, these are built 
specifically for Ubuntu releases, so if we find any incompatibilities between 
debian/ubuntu we can fix them for ubuntu users. 

I hope this helps!

On Sep 16, 2010, at 10:30 AM, Francois Richard wrote:

 Guys,
  
 I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
 Ubuntu.  I see that you have a ./debian directory in the source builds, do 
 you have a bit more background on how it is used and build?
  
 P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.
  
 Thanks,
  
 FR
  
  
  
 Francois Richard



Re: Getting client only example to work

2010-09-16 Thread Asif Jan
ok, did something about the message service changed in the initClient  
method; essentially now one can not call initClient when a cassandra  
instance is running on the same machine.


thanks
On Sep 16, 2010, at 3:48 PM, Gary Dusbabek wrote:


I discovered some problems with the fat client earlier this week when
I tried using it.  It needs some fixes to keep up with all the 0.7
changes.

Gary.

On Thu, Sep 16, 2010 at 05:48, Asif Jan asif@gmail.com wrote:


Hi
I am using 0.7.0-beta1 , and trying to get the contrib/client_only  
example

to work.
I am running cassandra on host1, and trying to access it from host2.
When using thirft (via cassandra-cli) and in my application; I am  
able to

connect and do all operations as expected.
But I am not able to connect to cassandra when using the code in  
client_only
 (or far that matter using contrib/bmt_example). Since my test  
requires to
do bulk insertion of about 1.4 TB of data, so I need to use a non- 
thirft

interface.
The error that I am getting is follows (the keyspace and the column  
family

exist and can be used via Thirft) :
10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode  
'auto'

determined to be mmap, indexAccessMode is mmap
10/09/16 12:35:31 INFO service.StorageService: Starting up client  
gossip
Exception in thread main java.lang.IllegalArgumentException:  
Unknown

ColumnFamily Standard1 in keyspace Keyspace1
at
org 
.apache 
.cassandra 
.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java: 
1009)

at
org 
.apache 
.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418)

at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)
I am using the following code (from client_only example) (also  
passing JVM

parameter -Dstorage-config=path_2_cassandra.yaml)


public static void main(String[] args) throws Exception {
System.setProperty(storage-config,cassandra.yaml);
testWriting();
}

// from client_only example
 private static void testWriting() throws Exception
{
StorageService.instance.initClient();
// sleep for a bit so that gossip can do its thing.
try
{
Thread.sleep(1L);
}
catch (Exception ex)
{
throw new AssertionError(ex);
}
// do some writing.
final AbstractType comp =  
ColumnFamily.getComparatorFor(Keyspace1,

Standard1, null);
for (int i = 0; i  100; i++)
{
RowMutation change = new RowMutation(Keyspace1,  
(key +

i).getBytes());
ColumnPath cp = new
ColumnPath(Standard1).setColumn((colb).getBytes());
change.add(new QueryPath(cp), (value + i).getBytes(),  
new

TimestampClock(0));
// don't call change.apply().  The reason is that is  
makes a

static call into Table, which will perform
// local storage initialization, which creates local
directories.
// change.apply();
StorageProxy.mutate(Arrays.asList(change));
System.out.println(wrote key + i);
}
System.out.println(Done writing.);
StorageService.instance.stopClient();
}










Re: Bootstrapping stays stuck

2010-09-16 Thread Gurpreet Singh
Thanks to driftx from cassandra IRC channel for helping out.
This was resolved by increasing the rpc timeout for the bootstrap process.

On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh
gurpreet.si...@gmail.comwrote:

 This problem still stays unresolved despite numerous restarts to the
 cluster. I cant seem to find a way out of this one, and I am not really
 looking for a workaround, kinda need this to work if i need to go to
 production.

 Turned on the ALL logging in log4j, and now I see the following exception
 (EOFException) on the destination. After receiving each file, it seems to be
 throwing this exception. The transfer is successful except for this
 exception. The source successful declares the transfer complete. But the
 destination does not move out of the bootstrapping mode, and just sits
 there.

 DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
 65) Receiving stream: finished reading chunk, awaiting more
 DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
 87) Removing stream context
 /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369
 DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java
 (line 73) Sending a streaming finished message with
 org.apache.cassandra.streaming.completedfilesta...@54828e7 to IP1
 TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line
 82) eof reading from socket; closing
 java.io.EOFException
 at java.io.DataInputStream.readInt(Unknown Source)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
 DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
 51) Receiving stream
 DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
 54) Creating file for
 /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db
 DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
 65) Receiving stream: finished reading chunk, awaiting more
 DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
 87) Removing stream context
 /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045
 DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java
 (line 73) Sending a streaming finished message with
 org.apache.cassandra.streaming.completedfilesta...@7b41a32f to IP1
 TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line
 82) eof reading from socket; closing
 java.io.EOFException
 at java.io.DataInputStream.readInt(Unknown Source)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)

 /G

 On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh gurpreet.si...@gmail.com
  wrote:

 Hi Vineet,
 I have tracked the nodetool streams to completion each time. Below are the
 logs on the source and destination node. There are 3 sstables being
 transferred, and the transfer seems to be successful. However, after the
 streams finish, the source prints out messages about the dropped messages,
 which may point to the problem. ideas? I checked port 7000 is open for
 communication. 9160 is not up on the node being bootstrapped, but that comes
 up after the node is bootstrapped, is that right?

 Thanks a ton,
 /G

 *Logs on the source node (IP2):*
 *
 *
 INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
 Flushing memtables for userdata...
  INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
 Performing anticompaction ...
  INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
 (line 339) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
  INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129)
 GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
 4415946752
  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
 (line 396) AntiCompacted to
 /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
  49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
 (line 339) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
 (line 396) AntiCompacted to
 /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
  28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
 (line 339) AntiCompacting
 [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
  INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java
 (line 396) AntiCompacted to
 

questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Gurpreet Singh
Hi,

I have a few questions and was looking for an answer.
I have a cluster of 7 Cassandra 0.6.5 nodes in my test setup. RF=2. Original
data size is about 100 gigs, with RF=2, i see the total load on the cluster
is about 200 gigs, all good.

1.  I was looking to increase the RF to 3. This process entails changing the
config and calling repair on the keyspace one at a time, right?
So, I started with one node at a time, changed the config file on the first
node for the keyspace, restarted the node. And then called a nodetool repair
on the node.   These same steps i followed for every node after that, as I
read somewhere that the repair should be invoked one node at a time.
(a) What is the best way to ascertain if the repair is completed on a node?
(b) After the repair was finished, I was expecting the total data load to be
300 gigs. However, calling the ring command, shows the total load to be 370
gigs. I double checked and config on all machines says RF=3. I am calling a
cleanup on each node right now. Is the cleanup required after calling a
repair? Am i missing something?


2. This question is regarding multi-datacenter support. I plan to have a
cluster of 6 machines across 2 datacenters, with the machines from the
datacenters alternating on the ring. RF=3 is the plan. I already have a test
setup as described above, which has most of the data, but its still
configured on the default RackUnAware strategy. I was hoping to find the
right steps to move it to RackAware strategy with the
PropertyFileEndpointSnitch that I read somewhere (not sure if thats
supported in 0.6.5, but CustomEndPointSnitch is the same, right?), all this
without having to repopulate any data again.
Currently there is only 1 datacenter, but I was stil planning to set the
cluster up as it would be in multi-datacenter support, and run it like that
in the one datacenter, and when the second datacenter comes up, just copy
all the files across to the new nodes in the second datacenter, and bring
the whole cluster up.  Will this work ? I have tried copying files to a new
node, shutting down all nodes, and bringing back everything up, and it
recognized the new ips.


Thanks
Gurpreet


What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Ying Tang
What the thrift version cassandra 0.7 beta uses?

-- 
Best regards,

Ivy Tang


Re: What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Jeremy Hanna
It doesn't use a specific version - it uses a specific subversion revision.  
The revision number is appended to the thrift jar in the cassandra lib folder.

On Sep 16, 2010, at 9:10 PM, Ying Tang wrote:

 What the thrift version cassandra 0.7 beta uses?
 
 -- 
 Best regards,
 
 Ivy Tang
 
 
 



Re: What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Ying Tang
So the thrift.lib will maybe change while the cassandra is updating?

On Thu, Sep 16, 2010 at 10:36 PM, Jeremy Hanna
jeremy.hanna1...@gmail.comwrote:

 It doesn't use a specific version - it uses a specific subversion revision.
  The revision number is appended to the thrift jar in the cassandra lib
 folder.

 On Sep 16, 2010, at 9:10 PM, Ying Tang wrote:

  What the thrift version cassandra 0.7 beta uses?
 
  --
  Best regards,
 
  Ivy Tang
 
 
 




-- 
Best regards,

Ivy Tang


Re: questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Benjamin Black
On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh
gurpreet.si...@gmail.com wrote:
 1.  I was looking to increase the RF to 3. This process entails changing the
 config and calling repair on the keyspace one at a time, right?
 So, I started with one node at a time, changed the config file on the first
 node for the keyspace, restarted the node. And then called a nodetool repair
 on the node.

You need to change the RF on _all_ nodes in the cluster _before_
running repair on _any_ of them.  If nodes disagree on which nodes
should have replicas for keys, repair will not work correctly.
Different RF for the same keyspace creates that disagreement.


b