Re: Installing Thrift with Solandra

2011-06-08 Thread Krish Pan
you are trying to run solandra from resources directory,

follow these steps

1) don't use root - use a regular user
2) cd /tmp/
3) git clone git://github.com/tjake/Solandra.git
4) cd Solandra
5) ant

once you get BUILD SUCCESSFUL

6) cd solandra-app
7) ./start-solandra.sh



On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 I found start-solandra.sh in resources folder. But when I execute it. I
 still get an error.


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.pngThanks
 again.

 On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Ok

 So I have to install Thrift and Cassandra than Solandra.

 I am asking because I followed the instructions in your Git page but I get
 this error:

 # cd solandra-app; ./start-solandra.sh

 -bash: ./start-solandra.sh: No such file or directory

 Thanks again :)

 On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani jak...@gmail.com wrote:

 This seems to be a common cause of confusion. Let me try again.

 Solandra doesn't integrate your Cassandra data into solr. It simply
 provides a scalable backend for solr by
 Building on Cassandra. The inverted index lives in it's own Cassandra
 keyspace.

 What you have in the end is two functionally different components
 (Cassandra and solr) in one logical service.

 Jake

 On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 jnbdzjn...@gmail.com wrote:
  I just saw a post you made on Stackoverflow, where you said:
  The Solandra project which is replacing Lucandra no longer uses
 thrift, only Solr.
 
  So I use Solr to access my data in Cassandra?
  Thanks again...
  On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
  Thanks again :)
  Ok... But in the tutorial it says that I need to build a Thrift
 interface for Cassandra:
 
 
  ./compiler/cpp/thrift -gen php
 ../PATH-TO-CASSANDRA/interface/cassandra.thrift
  How do I do this?
  Where is the interface folder?
 
 
  Again, tjake thanks allot for your time and help.
  On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani jak...@gmail.com
 wrote:
  To access Cassandra in Solandra it's the same as regular cassandra.  To
 access Solr you use one of the Php Solr libraries
 http://wiki.apache.org/solr/SolPHP
 
 
 
 
 
  On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
 
 
 
 
 
  I am trying to install Thrift with Solandra.
 
 
 
  Normally when I just want to install Thrift with Cassandra, I followed
 this tutorial:
 https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
 
 
 
 
 
 
 
  But how can I do the same for Solandra?
 
 
 
  Thrift with PHP...--
  Name / Nom: Boulay Desjardins, Jean-Nicolas
  Website / Site Web: www.jeannicolas.com
 
 

 --
 http://twitter.com/tjake




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com



Misc Performance Questions

2011-06-08 Thread AJ


Is there a performance hit when dropping a CF?  What if it contains .5 
TB of data?  If not, is there a quick and painless way to drop a large 
amount of data w/minimal perf hit?


Is there a performance hit running multiple keyspaces on a cluster 
versus only one keyspace given a constant total data size?  Is there 
some quantity limit?


Using a Random Partitioner, but with a RF = 1, will the rows still be 
spread-out evenly on the cluster or will there be an affinity to a 
single node (like the one receiving the data from the client)?


I see a lot of mention of using RAID-0, but not RAID-5/6.  Why?  Even 
though Cass can tolerate a down node due to data loss, it would still be 
more efficient to just rebuild a bad hdd live, right?


Maybe perf related:  Will there be a problem having multiple keyspaces 
on a cluster all with different replication factors, from 1-3?


Thanks!


Re: how to retrieve data from supercolumns by phpcassa ?

2011-06-08 Thread amrita
Hi,
Can u please tell me how to create a supercolumn and retrieve data from it using
phpcassa???
student_details{id{sid,lesson_id,answers{time_expired,answer_opted}}}



Re: how to retrieve data from supercolumns by phpcassa ?

2011-06-08 Thread Sasha Dolgy
you'll find a response to this question on the phpcassa mailing list ...
where you asked the same question.
-sd

On Wed, Jun 8, 2011 at 10:22 AM, amrita amritajayakuma...@gmail.com wrote:

 Hi,
 Can u please tell me how to create a supercolumn and retrieve data from it
 using
 phpcassa???

 student_details{id{sid,lesson_id,answers{time_expired,answer_opted}}}



Re: Misc Performance Questions

2011-06-08 Thread Richard Low
Hi AJ,

On Wed, Jun 8, 2011 at 9:29 AM, AJ a...@dude.podzone.net wrote:

 Is there a performance hit when dropping a CF?  What if it contains .5 TB of
 data?  If not, is there a quick and painless way to drop a large amount of
 data w/minimal perf hit?

Dropping a CF is quick - it snapshots the files (which creates hard
links) and removes the CF definition.  To actually delete the data,
remove the snapshot files from your data directory.

 Is there a performance hit running multiple keyspaces on a cluster versus
 only one keyspace given a constant total data size?  Is there some quantity
 limit?

There is a tiny amount of memory used per keyspace, but unless you
have very many keyspaces you won't notice any impact of running
multiple keyspaces.

There is however a difference in running multiple column families
versus putting everything in the same column family and separating
them with e.g. a key prefix.  E.g. if you have a large data set and a
small one, it will be quicker to query the small one if it is in its
own column family.

 Using a Random Partitioner, but with a RF = 1, will the rows still be
 spread-out evenly on the cluster or will there be an affinity to a single
 node (like the one receiving the data from the client)?

The rows will be spread out the same way - RF=1 doesn't affect the
load balancing.

 I see a lot of mention of using RAID-0, but not RAID-5/6.  Why?  Even though
 Cass can tolerate a down node due to data loss, it would still be more
 efficient to just rebuild a bad hdd live, right?

There's a trade-off - RAID-0 will give better performance, but
rebuilds are over a network.  WIth RF  1, RAID-0 is enough so that
that you're unlikely to lose data, but as you say, replacing a failed
node will be slower.

 Maybe perf related:  Will there be a problem having multiple keyspaces on a
 cluster all with different replication factors, from 1-3?

No.

Richard.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu


Re: how to know there are some columns in a row

2011-06-08 Thread Patrick de Torcy
There is no reason for ambiguities...
We could add in the api another method call (similar to get_count) :

get_columnNames

   -

   liststring
   get_columnNames(key, column_parent, predicate, consistency_level)

Get the columns names present in column_parent within the predicate.

The method is not O(1). It takes all the columns from disk to calculate the
answer. The only benefit of the method is that you do not need to pull all
their values over Thrift interface to get their names

(just to get the idea...)

In fact column names can really be data in themselves, so there should be a
way to retrieve them (without their values). When you have big values, it's
a real show stopper to use get_slice, since a lot of unnecessary traffic
would be generated...

Forgive me if I am a little insistent, but it's important for us and I'm
sure we are not the only ones interested in this feature...

cheers


Data directories

2011-06-08 Thread Héctor Izquierdo Seliva

Hi,

Is there a way to control what sstables go to what data directory? I
have a fast but space limited ssd, and a way slower raid, and i'd like
to put latency sensitive data into the ssd and leave the other data in
the raid. Is this possible? If not, how well does cassandra play with
symlinks?



Re: Misc Performance Questions

2011-06-08 Thread AJ

Thank you Richard!

On 6/8/2011 2:57 AM, Richard Low wrote:
snip

There is however a difference in running multiple column families
versus putting everything in the same column family and separating
them with e.g. a key prefix.  E.g. if you have a large data set and a
small one, it will be quicker to query the small one if it is in its
own column family.



I assumed that a read would be O(1) for any size CF since Cass is 
implemented with hashmaps.  Do you know why size matters?  (forgive the pun)


Re: Misc Performance Questions

2011-06-08 Thread Richard Low
On Wed, Jun 8, 2011 at 12:30 PM, AJ a...@dude.podzone.net wrote:

 There is however a difference in running multiple column families
 versus putting everything in the same column family and separating
 them with e.g. a key prefix.  E.g. if you have a large data set and a
 small one, it will be quicker to query the small one if it is in its
 own column family.


 I assumed that a read would be O(1) for any size CF since Cass is
 implemented with hashmaps.  Do you know why size matters?  (forgive the pun)


You may not notice a difference, but it can happen.

For a query, each SSTable is queried.  If there is more data then
there are (most likely) more SSTables to query, slowing it down.  For
point queries, this isn't so bad because the Bloom filters will help,
but for range queries you will notice a big difference.  You will have
to do more seeks to seek over unwanted data.

It will also help buffer caching to separate them - the small SSTables
are more likely to remain in cache.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu


Re: Data directories

2011-06-08 Thread Jonathan Ellis
No. https://issues.apache.org/jira/browse/CASSANDRA-2749 is open to
track this but nobody is working on it to my knowledge.

Cassandra is fine with symlinks at the data directory level but I
don't think that helps you, since you really want to move the sstables
themselves. (Cassandra is NOT fine with symlinked sstable files, or
with any moving around of sstable files while it is running.)

2011/6/8 Héctor Izquierdo Seliva izquie...@strands.com:

 Hi,

 Is there a way to control what sstables go to what data directory? I
 have a fast but space limited ssd, and a way slower raid, and i'd like
 to put latency sensitive data into the ssd and leave the other data in
 the raid. Is this possible? If not, how well does cassandra play with
 symlinks?





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Data directories

2011-06-08 Thread Héctor Izquierdo Seliva
El mié, 08-06-2011 a las 08:42 -0500, Jonathan Ellis escribió:
 No. https://issues.apache.org/jira/browse/CASSANDRA-2749 is open to
 track this but nobody is working on it to my knowledge.
 
 Cassandra is fine with symlinks at the data directory level but I
 don't think that helps you, since you really want to move the sstables
 themselves. (Cassandra is NOT fine with symlinked sstable files, or
 with any moving around of sstable files while it is running.)

I was planing on creating another keyspace and moving the slow sstables
there. Of course everything done while the node is stopped.

Thanks for your help



Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Krish Pan THANKS!

Also thank you for making build successful in uppercase :)

But it seems it is still not working.

This time when I go into solandra-app directory I get the start-solandra.sh
and when I use the command: ./start-solandra.sh I get this:

http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%2011.00.15%20AM.png

And it just stays stuck there. Any ideas?

Thanks again.

On Wed, Jun 8, 2011 at 2:32 AM, Krish Pan ceo.co...@gmail.com wrote:

 you are trying to run solandra from resources directory,

 follow these steps

 1) don't use root - use a regular user
 2) cd /tmp/
 3) git clone git://github.com/tjake/Solandra.git
 4) cd Solandra
 5) ant

 once you get BUILD SUCCESSFUL

 6) cd solandra-app
 7) ./start-solandra.sh



 On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 I found start-solandra.sh in resources folder. But when I execute it. I
 still get an error.


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.pngThanks
 again.

 On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Ok

 So I have to install Thrift and Cassandra than Solandra.

 I am asking because I followed the instructions in your Git page but I
 get this error:

 # cd solandra-app; ./start-solandra.sh

 -bash: ./start-solandra.sh: No such file or directory

 Thanks again :)

 On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani jak...@gmail.com wrote:

 This seems to be a common cause of confusion. Let me try again.

 Solandra doesn't integrate your Cassandra data into solr. It simply
 provides a scalable backend for solr by
 Building on Cassandra. The inverted index lives in it's own Cassandra
 keyspace.

 What you have in the end is two functionally different components
 (Cassandra and solr) in one logical service.

 Jake

 On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 jnbdzjn...@gmail.com wrote:
  I just saw a post you made on Stackoverflow, where you said:
  The Solandra project which is replacing Lucandra no longer uses
 thrift, only Solr.
 
  So I use Solr to access my data in Cassandra?
  Thanks again...
  On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
  Thanks again :)
  Ok... But in the tutorial it says that I need to build a Thrift
 interface for Cassandra:
 
 
  ./compiler/cpp/thrift -gen php
 ../PATH-TO-CASSANDRA/interface/cassandra.thrift
  How do I do this?
  Where is the interface folder?
 
 
  Again, tjake thanks allot for your time and help.
  On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani jak...@gmail.com
 wrote:
  To access Cassandra in Solandra it's the same as regular cassandra.
  To access Solr you use one of the Php Solr libraries
 http://wiki.apache.org/solr/SolPHP
 
 
 
 
 
  On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
 
 
 
 
 
  I am trying to install Thrift with Solandra.
 
 
 
  Normally when I just want to install Thrift with Cassandra, I followed
 this tutorial:
 https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
 
 
 
 
 
 
 
  But how can I do the same for Solandra?
 
 
 
  Thrift with PHP...--
  Name / Nom: Boulay Desjardins, Jean-Nicolas
  Website / Site Web: www.jeannicolas.com
 
 

 --
 http://twitter.com/tjake




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com





-- 
Name / Nom: Boulay Desjardins, Jean-Nicolas
Website / Site Web: www.jeannicolas.com


Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Héctor Izquierdo Seliva
Hi,

I have an index I use to translate ids. I usually only read a column at
a time, and it's becoming a bottleneck. I could rewrite the application
to read a bunch at a time but it would make the application logic much
harder, as it would involve buffering incoming data.

As far as I know, to read a single column cassandra will deserialize a
bunch of them and then pick the correct one (64KB of data right?)

Would it be faster to have a row for each id I want to translate? This
would make keycache less effective, but the amount of data read should
be smaller.

Thanks!





Re: Multiple large disks in server - setup considerations

2011-06-08 Thread Edward Capriolo
On Wed, Jun 8, 2011 at 12:19 AM, AJ a...@dude.podzone.net wrote:

 On 6/7/2011 9:32 PM, Edward Capriolo wrote:
 snip


 I do not like large disk set-ups. I think they end up not being
 economical. Most low latency use cases want high RAM to DISK ratio.  Two
 machines with 32GB RAM is usually less expensive then one machine with 64GB
 ram.

 For a machine with 1TB drives (or multiple 1TB drives) it is going to be
 difficult to get enough RAM to help with random read patterns.

 Also cluster operations like joining, decommissioning, or repair can take
 a *VERY* long time maybe a day. More smaller servers like blade style or
 more agile.


 Is there some rule-of-thumb as to how much RAM is needed per GB of data?  I
 know it probably depends, but if you could try to explain the best you can
 that would be great!  I too am projecting big data requirements.



The way this is normally explained is active-set. IE you have 100,000,000
users but at any given time only 1,000,000 are active thus you need enough
RAM to keep these users cached.

No there is no rule of thumb it depends on access patterns. In the most
extreme case you are using cassandra for an ETL workload. In this case your
data will far exceed your RAM and since most operations will be like a full
table scan caching is almost hopeless and useless. On the other side there
are those that want every lookup to be predicatable low latency and totally
random read and those might want to maintain a 1-1 ratio.

I would track these things over time:
reads/writes to c*
disk utilization
size of CF on disk
cache hit rate
latency

And eventually you find what your ratio is. IE.

last month:
i had 30 reads/sec
my disk was 40% utilized
my column family was 40 GB
my cache hit was 70%
my latency was 1ms

this month:
i had 45 reads/sec
my disk was 95% utilized
my column family was 40 GB
my cache hit was 30%
my latency was 5ms

Conclusion:
my disk maxed and my cache hit/rate is dropping. I probably need more
nodes|or more RAM.


Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Peter Schuller
 As far as I know, to read a single column cassandra will deserialize a
 bunch of them and then pick the correct one (64KB of data right?)

Assuming the default setting of 64kb, the average amount deserialized
given random column access should be 8 kb (not true with row cache,
but with large rows presumably you don't have row cache).

 Would it be faster to have a row for each id I want to translate? This
 would make keycache less effective, but the amount of data read should
 be smaller.

It depends on what bottlenecks you're optimizing for. A key is
expensive in the sense that if (1) increases the size of bloom
filters for the column family, and it (2) increases the memory cost of
index sampling, and (3) increases the total data size (typically)
because the row size is duplicated in both the index and data files.

The cost of deserialization the same data repeatedly is CPU. So if
you're nowhere near bottlenecking on disk and the memory trade-off is
reasonable, it may be a suitable optimization. However, consider that
unless you're doing order preserving partitioning, accessing those
rows will be effectively random w.r.t. the locations on disk you're
reading from so you're adding a lot of overhead in terms of disk I/O
unless your data set fits comfortably in memory.

-- 
/ Peter Schuller


Re: Installing Thrift with Solandra

2011-06-08 Thread Krish Pan
looks like it is running,

you can verify by running jps it will show you a process with name jar

try this,

cd ../reuters-demo
./1-download_data.sh
./2-import_data.sh
While data is loading, open the file ./website/index.html in your
favorite browser.



On Wed, Jun 8, 2011 at 8:04 AM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 Krish Pan THANKS!

 Also thank you for making build successful in uppercase :)

 But it seems it is still not working.

 This time when I go into solandra-app directory I get the start-solandra.sh
 and when I use the command: ./start-solandra.sh I get this:


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%2011.00.15%20AM.png

 And it just stays stuck there. Any ideas?

 Thanks again.

 On Wed, Jun 8, 2011 at 2:32 AM, Krish Pan ceo.co...@gmail.com wrote:

 you are trying to run solandra from resources directory,

 follow these steps

 1) don't use root - use a regular user
 2) cd /tmp/
 3) git clone git://github.com/tjake/Solandra.git
 4) cd Solandra
 5) ant

 once you get BUILD SUCCESSFUL

 6) cd solandra-app
 7) ./start-solandra.sh



 On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 I found start-solandra.sh in resources folder. But when I execute it. I
 still get an error.


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.pngThanks
 again.

 On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Ok

 So I have to install Thrift and Cassandra than Solandra.

 I am asking because I followed the instructions in your Git page but I
 get this error:

 # cd solandra-app; ./start-solandra.sh

 -bash: ./start-solandra.sh: No such file or directory

 Thanks again :)

 On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani jak...@gmail.com wrote:

 This seems to be a common cause of confusion. Let me try again.

 Solandra doesn't integrate your Cassandra data into solr. It simply
 provides a scalable backend for solr by
 Building on Cassandra. The inverted index lives in it's own Cassandra
 keyspace.

 What you have in the end is two functionally different components
 (Cassandra and solr) in one logical service.

 Jake

 On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 jnbdzjn...@gmail.com wrote:
  I just saw a post you made on Stackoverflow, where you said:
  The Solandra project which is replacing Lucandra no longer uses
 thrift, only Solr.
 
  So I use Solr to access my data in Cassandra?
  Thanks again...
  On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
  Thanks again :)
  Ok... But in the tutorial it says that I need to build a Thrift
 interface for Cassandra:
 
 
  ./compiler/cpp/thrift -gen php
 ../PATH-TO-CASSANDRA/interface/cassandra.thrift
  How do I do this?
  Where is the interface folder?
 
 
  Again, tjake thanks allot for your time and help.
  On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani jak...@gmail.com
 wrote:
  To access Cassandra in Solandra it's the same as regular cassandra.
  To access Solr you use one of the Php Solr libraries
 http://wiki.apache.org/solr/SolPHP
 
 
 
 
 
  On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
 
 
 
 
 
  I am trying to install Thrift with Solandra.
 
 
 
  Normally when I just want to install Thrift with Cassandra, I
 followed this tutorial:
 https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
 
 
 
 
 
 
 
  But how can I do the same for Solandra?
 
 
 
  Thrift with PHP...--
  Name / Nom: Boulay Desjardins, Jean-Nicolas
  Website / Site Web: www.jeannicolas.com
 
 

 --
 http://twitter.com/tjake




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com





 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com



nosql yes but yescql, no?

2011-06-08 Thread SriSatish Ambati
Gotta love, Eric!
http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no
https://twitter.com/#!/jericevans/status/78118651043127297
-- 
SriSatish Ambati
Director of Engineering, DataStax
@srisatish


Re: nosql yes but yescql, no?

2011-06-08 Thread Marcos Ortiz

On 06/08/2011 01:23 PM, SriSatish Ambati wrote:

Gotta love, Eric!
http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no

--
SriSatish Ambati
Director of Engineering, DataStax
@srisatish





Good resource.
Thanks for share it with us SriSatish

Regards

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://twitter.com/marcosluis2186




Re: nosql yes but yescql, no?

2011-06-08 Thread Jeffrey Kesselman
While I agree the Thrift API sucks, Id love to see that sovled on a
binary level, and CQl on top of that.

JK

On Wed, Jun 8, 2011 at 2:50 PM, Marcos Ortiz mlor...@uci.cu wrote:
 On 06/08/2011 01:23 PM, SriSatish Ambati wrote:

 Gotta love, Eric!
 http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no

 --
 SriSatish Ambati
 Director of Engineering, DataStax
 @srisatish




 Good resource.
 Thanks for share it with us SriSatish

 Regards

 --
 Marcos Luís Ortíz Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186





-- 
It's always darkest just before you are eaten by a grue.


Re: nosql yes but yescql, no?

2011-06-08 Thread Jeremy Hanna
I think that's partly the idea of it.  CQL could end up being a way forward and 
it currently builds on thrift.  Then if it becomes the API/client of record to 
build on, then it could move to something else underneath that's more efficient 
and CQL itself wouldn't have to change at all.

On Jun 8, 2011, at 1:29 PM, Jeffrey Kesselman wrote:

 While I agree the Thrift API sucks, Id love to see that sovled on a
 binary level, and CQl on top of that.
 
 JK
 
 On Wed, Jun 8, 2011 at 2:50 PM, Marcos Ortiz mlor...@uci.cu wrote:
 On 06/08/2011 01:23 PM, SriSatish Ambati wrote:
 
 Gotta love, Eric!
 http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no
 
 --
 SriSatish Ambati
 Director of Engineering, DataStax
 @srisatish
 
 
 
 
 Good resource.
 Thanks for share it with us SriSatish
 
 Regards
 
 --
 Marcos Luís Ortíz Valmaseda
 Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://twitter.com/marcosluis2186
 
 
 
 
 
 -- 
 It's always darkest just before you are eaten by a grue.



Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Thanks again...

Here it gets a bit more complex.

I added Solandra to /tmp folder like you told me.

And the data also...

Everything seems to work.

The problem is I am running Solandra in a VM on my Mac OS X the VM is Ubuntu
Server.

On that VM I have a DNS server... And one of my domain names is
jean-nicolas.name...

I put the website folder in it, to test out Solandra. The page loads, but no
content.

So I want in the code to find where it was getting the content.

Then I found reutors.js and in it was an address:
http://localhost:8983/solandra/reutors

So I used this command in the terminal (in my VM obviously):

curl http://localhost:8983/solandra/reutors

Then I got:

NOT FOUND... In some HTML code...

So it seems it cannot find the data...

Also it was followed by a series of br/ nothing between them...

So does that mean Solandra is not working? Or is it something else?

Thanks again for your time and help...

I know I am a bit slow :) Thanks again...

On Wed, Jun 8, 2011 at 12:55 PM, Krish Pan ceo.co...@gmail.com wrote:

 looks like it is running,

 you can verify by running jps it will show you a process with name jar

 try this,

 cd ../reuters-demo
 ./1-download_data.sh
 ./2-import_data.sh
 While data is loading, open the file ./website/index.html in your favorite 
 browser.



 On Wed, Jun 8, 2011 at 8:04 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Krish Pan THANKS!

 Also thank you for making build successful in uppercase :)

 But it seems it is still not working.

 This time when I go into solandra-app directory I get the
 start-solandra.sh and when I use the command: ./start-solandra.sh I get
 this:


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%2011.00.15%20AM.png

 And it just stays stuck there. Any ideas?

 Thanks again.

 On Wed, Jun 8, 2011 at 2:32 AM, Krish Pan ceo.co...@gmail.com wrote:

 you are trying to run solandra from resources directory,

 follow these steps

 1) don't use root - use a regular user
 2) cd /tmp/
 3) git clone git://github.com/tjake/Solandra.git
 4) cd Solandra
 5) ant

 once you get BUILD SUCCESSFUL

 6) cd solandra-app
 7) ./start-solandra.sh



 On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 I found start-solandra.sh in resources folder. But when I execute it. I
 still get an error.


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.pngThanks
 again.

 On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Ok

 So I have to install Thrift and Cassandra than Solandra.

 I am asking because I followed the instructions in your Git page but I
 get this error:

 # cd solandra-app; ./start-solandra.sh

 -bash: ./start-solandra.sh: No such file or directory

 Thanks again :)

 On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani jak...@gmail.com wrote:

 This seems to be a common cause of confusion. Let me try again.

 Solandra doesn't integrate your Cassandra data into solr. It simply
 provides a scalable backend for solr by
 Building on Cassandra. The inverted index lives in it's own Cassandra
 keyspace.

 What you have in the end is two functionally different components
 (Cassandra and solr) in one logical service.

 Jake

 On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 jnbdzjn...@gmail.com wrote:
  I just saw a post you made on Stackoverflow, where you said:
  The Solandra project which is replacing Lucandra no longer uses
 thrift, only Solr.
 
  So I use Solr to access my data in Cassandra?
  Thanks again...
  On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
  Thanks again :)
  Ok... But in the tutorial it says that I need to build a Thrift
 interface for Cassandra:
 
 
  ./compiler/cpp/thrift -gen php
 ../PATH-TO-CASSANDRA/interface/cassandra.thrift
  How do I do this?
  Where is the interface folder?
 
 
  Again, tjake thanks allot for your time and help.
  On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani jak...@gmail.com
 wrote:
  To access Cassandra in Solandra it's the same as regular cassandra.
  To access Solr you use one of the Php Solr libraries
 http://wiki.apache.org/solr/SolPHP
 
 
 
 
 
  On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
 
 
 
 
 
  I am trying to install Thrift with Solandra.
 
 
 
  Normally when I just want to install Thrift with Cassandra, I
 followed this tutorial:
 https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
 
 
 
 
 
 
 
  But how can I do the same for Solandra?
 
 
 
  Thrift with PHP...--
  Name / Nom: Boulay Desjardins, Jean-Nicolas
  Website / Site Web: www.jeannicolas.com
 
 

 --
 http://twitter.com/tjake




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / Site Web: www.jeannicolas.com




 --
 Name / Nom: Boulay Desjardins, Jean-Nicolas
 Website / 

Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Also how can I backup the data that I loaded.

Because in the next reboot I am going to loose all the data that I loaded
and like you know it takes time...

I tried to copy the folder Solandra in another folder outside the /tmp...
But I am not sure that is enough.

Thanks!

On Wed, Jun 8, 2011 at 2:39 PM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 Thanks again...

 Here it gets a bit more complex.

 I added Solandra to /tmp folder like you told me.

 And the data also...

 Everything seems to work.

 The problem is I am running Solandra in a VM on my Mac OS X the VM is
 Ubuntu Server.

 On that VM I have a DNS server... And one of my domain names is
 jean-nicolas.name...

 I put the website folder in it, to test out Solandra. The page loads, but
 no content.

 So I want in the code to find where it was getting the content.

 Then I found reutors.js and in it was an address:
 http://localhost:8983/solandra/reutors

 So I used this command in the terminal (in my VM obviously):

 curl http://localhost:8983/solandra/reutors

 Then I got:

 NOT FOUND... In some HTML code...

 So it seems it cannot find the data...

 Also it was followed by a series of br/ nothing between them...

 So does that mean Solandra is not working? Or is it something else?

 Thanks again for your time and help...

 I know I am a bit slow :) Thanks again...


 On Wed, Jun 8, 2011 at 12:55 PM, Krish Pan ceo.co...@gmail.com wrote:

 looks like it is running,

 you can verify by running jps it will show you a process with name jar

 try this,

 cd ../reuters-demo
 ./1-download_data.sh
 ./2-import_data.sh
 While data is loading, open the file ./website/index.html in your favorite 
 browser.



 On Wed, Jun 8, 2011 at 8:04 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Krish Pan THANKS!

 Also thank you for making build successful in uppercase :)

 But it seems it is still not working.

 This time when I go into solandra-app directory I get the
 start-solandra.sh and when I use the command: ./start-solandra.sh I get
 this:


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%2011.00.15%20AM.png

 And it just stays stuck there. Any ideas?

 Thanks again.

 On Wed, Jun 8, 2011 at 2:32 AM, Krish Pan ceo.co...@gmail.com wrote:

 you are trying to run solandra from resources directory,

 follow these steps

 1) don't use root - use a regular user
 2) cd /tmp/
 3) git clone git://github.com/tjake/Solandra.git
 4) cd Solandra
 5) ant

 once you get BUILD SUCCESSFUL

 6) cd solandra-app
 7) ./start-solandra.sh



 On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 I found start-solandra.sh in resources folder. But when I execute it. I
 still get an error.


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png


 http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.pngThanks
 again.

 On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:

 Ok

 So I have to install Thrift and Cassandra than Solandra.

 I am asking because I followed the instructions in your Git page but I
 get this error:

 # cd solandra-app; ./start-solandra.sh

 -bash: ./start-solandra.sh: No such file or directory

 Thanks again :)

 On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani jak...@gmail.comwrote:

 This seems to be a common cause of confusion. Let me try again.

 Solandra doesn't integrate your Cassandra data into solr. It simply
 provides a scalable backend for solr by
 Building on Cassandra. The inverted index lives in it's own Cassandra
 keyspace.

 What you have in the end is two functionally different components
 (Cassandra and solr) in one logical service.

 Jake

 On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 jnbdzjn...@gmail.com wrote:
  I just saw a post you made on Stackoverflow, where you said:
  The Solandra project which is replacing Lucandra no longer uses
 thrift, only Solr.
 
  So I use Solr to access my data in Cassandra?
  Thanks again...
  On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
  Thanks again :)
  Ok... But in the tutorial it says that I need to build a Thrift
 interface for Cassandra:
 
 
  ./compiler/cpp/thrift -gen php
 ../PATH-TO-CASSANDRA/interface/cassandra.thrift
  How do I do this?
  Where is the interface folder?
 
 
  Again, tjake thanks allot for your time and help.
  On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani jak...@gmail.com
 wrote:
  To access Cassandra in Solandra it's the same as regular cassandra.
  To access Solr you use one of the Php Solr libraries
 http://wiki.apache.org/solr/SolPHP
 
 
 
 
 
  On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
 jnbdzjn...@gmail.com wrote:
 
 
 
 
 
  I am trying to install Thrift with Solandra.
 
 
 
  Normally when I just want to install Thrift with Cassandra, I
 followed this tutorial:
 

RE: how to know there are some columns in a row

2011-06-08 Thread Jeremiah Jordan
I am pretty sure this would cut down on network traffic, but not on Disk
IO or CPU use.  I think Cassandra would still have to deserialize the
whole column to get to the name.  So if you really have a use case where
you just want the name, it would be better to store a separate name
with no data column.



From: Patrick de Torcy [mailto:pdeto...@gmail.com] 
Sent: Wednesday, June 08, 2011 4:00 AM
To: user@cassandra.apache.org
Subject: Re: how to know there are some columns in a row


There is no reason for ambiguities...
We could add in the api another method call (similar to get_count) :



get_columnNames


*   liststring get_columnNames(key, column_parent, predicate,
consistency_level) 

Get the columns names present in column_parent within the predicate. 

The method is not O(1). It takes all the columns from disk to calculate
the answer. The only benefit of the method is that you do not need to
pull all their values over Thrift interface to get their names



(just to get the idea...)

In fact column names can really be data in themselves, so there should
be a way to retrieve them (without their values). When you have big
values, it's a real show stopper to use get_slice, since a lot of
unnecessary traffic would be generated...

Forgive me if I am a little insistent, but it's important for us and I'm
sure we are not the only ones interested in this feature...

cheers



Re: nosql yes but yescql, no?

2011-06-08 Thread Jeffrey Kesselman
That makes sense :)

On Wed, Jun 8, 2011 at 2:37 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote:
 I think that's partly the idea of it.  CQL could end up being a way forward 
 and it currently builds on thrift.  Then if it becomes the API/client of 
 record to build on, then it could move to something else underneath that's 
 more efficient and CQL itself wouldn't have to change at all.

 On Jun 8, 2011, at 1:29 PM, Jeffrey Kesselman wrote:

 While I agree the Thrift API sucks, Id love to see that sovled on a
 binary level, and CQl on top of that.

 JK

 On Wed, Jun 8, 2011 at 2:50 PM, Marcos Ortiz mlor...@uci.cu wrote:
 On 06/08/2011 01:23 PM, SriSatish Ambati wrote:

 Gotta love, Eric!
 http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no

 --
 SriSatish Ambati
 Director of Engineering, DataStax
 @srisatish




 Good resource.
 Thanks for share it with us SriSatish

 Regards

 --
 Marcos Luís Ortíz Valmaseda
 Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://twitter.com/marcosluis2186





 --
 It's always darkest just before you are eaten by a grue.





-- 
It's always darkest just before you are eaten by a grue.


hadoop/pig notes

2011-06-08 Thread William Oberman
I decided to try out hadoop/pig + cassandra.  I had my ups and downs to get
the script I wanted to run to work.  I'm sure everyone who tries will have
their own experiences/problems, but mine were:

-Everything I need to know was in
http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html and
http://wiki.apache.org/cassandra/HadoopSupport

-Java is really picky about hostnames.  I'm in EC2, and rather than rely on
DNS, I basically have all of my machines share an /etc/hosts file.  But, the
command line hostname wasn't returning the same thing as in /etc/hosts,
which caused all kinds of weird hadoop issues at first.  (I had hostname as
foo and /etc/hosts had foo.prod).

-I forgot I had iptables on.  It's always easier to not have firewalls to
start (this is true when configuring anything of course)

-Use the same version of everything everywhere.  And for hadoop/pig, I was
having issues until I used the combination of hadoop-0.20.2 + pig-0.8.1.

-For hadoop's mapred-site.xml you HAVE to supply a port (hostname:port), and
there isn't a standard, and it seems arbitrary.  I used 8021, based on notes
in a case somewhere from hadoop (I think trying to standardize).

It took me awhile to figure the syntax of Pig Latin out, but I finally
managed to get a script that does a count of all columns in a column family:
rows = LOAD 'cassandra://keyspace/columnfamily' USING CassandraStorage();
filter_rows = FILTER rows BY $1 is not null;
counts = FOREACH filter_rows GENERATE COUNT($1);
counts_in_bag = GROUP counts ALL;
sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1);
dump sum_of_bag;

I'm trying to see the impact of running hadoop on the same servers as
cassandra now.  And yes, I've seen the note in the wiki about the clever
partitioning of cassandra nodes to allow for web latency nodes + hadoop
processing nodes :-)


Re: CLI set command returns null, ver 0.8.0

2011-06-08 Thread aaron morton
Can you provide the cli script to create the schema and info on how many nodes 
you have. 

Thanks

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 16:12, AJ wrote:

 Can anyone help?  The CLI seems to be having issues.  The count command isn't 
 working either:
 
 [default@Keyspace1] count User[long(1)];
 Expected 8 or 0 byte long (13)
 java.lang.RuntimeException: Expected 8 or 0 byte long (13)
at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
 [default@Keyspace1]
 [default@Keyspace1] count User[1];;
 Expected 8 or 0 byte long (1)
 java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
 [default@Keyspace1] count User['1'];
 Expected 8 or 0 byte long (1)
 java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
 [default@Keyspace1] count User['12345678'];
 null
 java.lang.RuntimeException
at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
 [default@Keyspace1]
 
 
 Granted, there are no rows in the CF yet (see probs below), but this 
 exception seems to be during the parsing stage.
 
 I've check everything else, AFAIK, so I'm at a loss.
 
 Much obliged.
 
 On 6/7/2011 12:44 PM, AJ wrote:
 The log only shows INFO level messages about flushes, etc..
 
 The debug mode of the CLI shows an exception after the set:
 
 [al@mars ~]$ cassandra-cli -h 192.168.1.101 --debug
 Connected to: Test Cluster on 192.168.1.101/9160
 Welcome to the Cassandra CLI.
 
 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.
 
 [default@unknown] use Keyspace1;
 Authenticated to keyspace: Keyspace1
 [default@Keyspace1] set User[1]['name']='aaa';
 null
 java.lang.RuntimeException
at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
 [default@Keyspace1]
 
 
 



Re: hadoop/pig notes

2011-06-08 Thread Jeremy Hanna
I need to update the wiki with better pig info.  I did put some information in 
the getting started docs of pygmalion, but it would be good to transfer that to 
cassandra's wiki and add to it.
fwiw - https://github.com/jeromatron/pygmalion/wiki/Getting-Started

Thanks for the rundown William!


On Jun 8, 2011, at 4:11 PM, William Oberman wrote:

 I decided to try out hadoop/pig + cassandra.  I had my ups and downs to get 
 the script I wanted to run to work.  I'm sure everyone who tries will have 
 their own experiences/problems, but mine were:
 
 -Everything I need to know was in 
 http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html and 
 http://wiki.apache.org/cassandra/HadoopSupport
 
 -Java is really picky about hostnames.  I'm in EC2, and rather than rely on 
 DNS, I basically have all of my machines share an /etc/hosts file.  But, the 
 command line hostname wasn't returning the same thing as in /etc/hosts, 
 which caused all kinds of weird hadoop issues at first.  (I had hostname as 
 foo and /etc/hosts had foo.prod).
 
 -I forgot I had iptables on.  It's always easier to not have firewalls to 
 start (this is true when configuring anything of course)
 
 -Use the same version of everything everywhere.  And for hadoop/pig, I was 
 having issues until I used the combination of hadoop-0.20.2 + pig-0.8.1.
 
 -For hadoop's mapred-site.xml you HAVE to supply a port (hostname:port), and 
 there isn't a standard, and it seems arbitrary.  I used 8021, based on notes 
 in a case somewhere from hadoop (I think trying to standardize).
 
 It took me awhile to figure the syntax of Pig Latin out, but I finally 
 managed to get a script that does a count of all columns in a column family:
 rows = LOAD 'cassandra://keyspace/columnfamily' USING CassandraStorage();
 filter_rows = FILTER rows BY $1 is not null;
 counts = FOREACH filter_rows GENERATE COUNT($1);
 counts_in_bag = GROUP counts ALL; 
 sum_of_bag = FOREACH counts_in_bag  GENERATE SUM($1); 
 dump sum_of_bag;
 
 I'm trying to see the impact of running hadoop on the same servers as 
 cassandra now.  And yes, I've seen the note in the wiki about the clever 
 partitioning of cassandra nodes to allow for web latency nodes + hadoop 
 processing nodes :-)
 



Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread aaron morton
Just to make things less clear, if you have one row that you are continually 
writing it may end up spread out over several SSTables. Compaction helps here 
to reduce the number of files that must be accessed so long as is can keep up. 
But if you want to read column X and the row is fragmented over 5 SSTables then 
each one must be accessed. 

 https://issues.apache.org/jira/browse/CASSANDRA-2319  is open to try and 
reduce the number of seeks. 

For now take a look at nodetool cfhistograms to see how many sstables are read 
for your queries. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jun 2011, at 04:50, Peter Schuller wrote:

 As far as I know, to read a single column cassandra will deserialize a
 bunch of them and then pick the correct one (64KB of data right?)
 
 Assuming the default setting of 64kb, the average amount deserialized
 given random column access should be 8 kb (not true with row cache,
 but with large rows presumably you don't have row cache).
 
 Would it be faster to have a row for each id I want to translate? This
 would make keycache less effective, but the amount of data read should
 be smaller.
 
 It depends on what bottlenecks you're optimizing for. A key is
 expensive in the sense that if (1) increases the size of bloom
 filters for the column family, and it (2) increases the memory cost of
 index sampling, and (3) increases the total data size (typically)
 because the row size is duplicated in both the index and data files.
 
 The cost of deserialization the same data repeatedly is CPU. So if
 you're nowhere near bottlenecking on disk and the memory trade-off is
 reasonable, it may be a suitable optimization. However, consider that
 unless you're doing order preserving partitioning, accessing those
 rows will be effectively random w.r.t. the locations on disk you're
 reading from so you're adding a lot of overhead in terms of disk I/O
 unless your data set fits comfortably in memory.
 
 -- 
 / Peter Schuller



Re: how to know there are some columns in a row

2011-06-08 Thread Patrick de Torcy
| I am pretty sure this would cut down on network traffic, but not on Disk
IO or CPU use.

Well, that's the same for the get_count method !

I think that would be ok,since the network traffic is the real problem (big
values...). To store the column names in a separate column could be a
solution of course, but it generates dupplicate data, with risk of
inconsistencies (and more work)


Re: how to know there are some columns in a row

2011-06-08 Thread aaron morton
 Forgive me if I am a little insistent, but it's important for us and I'm sure 
 we are not the only ones interested in this feature...

Not an issue, it's how things get done on :)

Create a jira ticket https://issues.apache.org/jira/browse/CASSANDRA with your 
ideas to start the process and ask others to vote if they would also like to 
see it. If you have time to donate for the feature include that on the ticket. 

Thanks 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jun 2011, at 06:55, Jeremiah Jordan wrote:

 I am pretty sure this would cut down on network traffic, but not on Disk IO 
 or CPU use.  I think Cassandra would still have to deserialize the whole 
 column to get to the name.  So if you really have a use case where you just 
 want the name, it would be better to store a separate name with no data 
 column.
 
 From: Patrick de Torcy [mailto:pdeto...@gmail.com] 
 Sent: Wednesday, June 08, 2011 4:00 AM
 To: user@cassandra.apache.org
 Subject: Re: how to know there are some columns in a row
 
 There is no reason for ambiguities...
 We could add in the api another method call (similar to get_count) :
 
 get_columnNames
 
 liststring get_columnNames(key, column_parent, predicate, consistency_level)
 
 Get the columns names present in column_parent within the predicate.
 
 The method is not O(1). It takes all the columns from disk to calculate the 
 answer. The only benefit of the method is that you do not need to pull all 
 their values over Thrift interface to get their names
 
 
 (just to get the idea...)
 
 In fact column names can really be data in themselves, so there should be a 
 way to retrieve them (without their values). When you have big values, it's a 
 real show stopper to use get_slice, since a lot of unnecessary traffic would 
 be generated...
 
 Forgive me if I am a little insistent, but it's important for us and I'm sure 
 we are not the only ones interested in this feature...
 
 cheers



Re: [RELEASE] 0.8.0

2011-06-08 Thread Yi Yang
Is there anyone willing to upgrade the libcassandra for C++, to support new 
features in 0.8.0?
Or has anyone started to work on it?

Thanks


On Jun 3, 2011, at 7:36 AM, Eric Evans wrote:

 
 I am very pleased to announce the official release of Cassandra 0.8.0.
 
 If you haven't been paying attention to this release, this is your last
 chance, because by this time tomorrow all your friends are going to be
 raving, and you don't want to look silly.
 
 So why am I resorting to hyperbole?  Well, for one because this is the
 release that debuts the Cassandra Query Language (CQL).  In one fell
 swoop Cassandra has become more than NoSQL, it's MoSQL.
 
 Cassandra also has distributed counters now.  With counters, you can
 count stuff, and counting stuff rocks.
 
 A kickass use-case for Cassandra is spanning data-centers for
 fault-tolerance and locality, but doing so has always meant sending data
 in the clear, or tunneling over a VPN.   New for 0.8.0, encryption of
 intranode traffic.
 
 If you're not motivated to go upgrade your clusters right now, you're
 either not easily impressed, or you're very lazy.  If it's the latter,
 would it help knowing that rolling upgrades between releases is now
 supported?  Yeah.  You can upgrade your 0.7 cluster to 0.8 without
 shutting it down.
 
 You see what I mean?  Then go read the release notes[1] to learn about
 the full range of awesomeness, then grab a copy[2] and become a
 (fashionably )early adopter.
 
 Drivers for CQL are available in Python[3], Java[3], and Node.js[4].
 
 As usual, a Debian package is available from the project's APT
 repository[5].
 
 Enjoy!
 
 
 [1]: http://goo.gl/CrJqJ (NEWS.txt)
 [2]: http://cassandra.debian.org/download
 [3]: http://www.apache.org/dist/cassandra/drivers
 [4]: https://github.com/racker/node-cassandra-client
 [5]: http://wiki.apache.org/cassandra/DebianPackaging
 
 -- 
 Eric Evans
 eev...@rackspace.com
 



Running a cluster with 256mb RAM nodes

2011-06-08 Thread Donny Nadolny
I'd like to start using cassandra for a certain part of my database that has
high write volume. I'm setting up a 3 node cluster, however my site doesn't
make enough money yet to justify 3 nodes meeting the hardware
recommendationhttp://wiki.apache.org/cassandra/CassandraHardwareof
4gb RAM. Instead I'm trying to get it working with nodes that have
256mb
RAM (running in a VM). I looked around and found a couple places where
people mention successfully running cassandra nodes with only 256mb, eg
http://news.ycombinator.com/item?id=2074114 and
http://groups.google.com/group/reddit-dev/browse_thread/thread/f7bc839dbc62d0ad/92af1e790f2fe05c,
but they don't give any details about setting they've changed.

It took a while, but I've settled on some settings that don't give me an
OutOfMemoryException under load, and still seem to have acceptable
performance (quick writes with throughput that's good enough for now, higher
latency reads but that's okay for my use). They're a bit on the conservative
side, but I'd rather have them low and never get an OOM than risk it.

The JVM memory settings are the auto-calculated ones (running cassandra
0.7.6-2): -Xms122M -Xmx122M -Xmn30M

I have 4 CF's. The settings I've changed are, for each CF:
MemtableThroughputInMB to 1mb (yes, that's very low, that's part of my
question)
MemtableOperationsInMillions to 0.02 (20k operations)
cached keys to 20,000


The docs at http://wiki.apache.org/cassandra/MemtableThresholds warn that
tons of tiny memtables is bad. Why?
Also, am I correct in believing that it's ok to change the memtable
throughput/operations later on once I have larger nodes, and that there will
be no lasting bad effects (eg I can just trigger a compaction, or even bring
new nodes online and remove the old ones)?

I've tested this setup doing reads and writes, but I haven't tried any
operations (eg moving a node to a different token, bootstrapping a new
node). Are there any operations I need to watch out for that could cause an
OOM, or other problematic settings that should be tuned that haven't caused
problems yet but could in certain cases?

Thanks,
Donny


Re: Running a cluster with 256mb RAM nodes

2011-06-08 Thread Watanabe Maki
I once built a 4 node ring on my laptop,  with 64MB heap for each instances.
I could write and read on it, but nodetool repair caused OOM.
You should test essential operations with estimated data loaded, under expected 
traffic.

Btw I'm using 96MBx4 node ring on my laptop now just for my private lab. It 
survive on repair :-)

maki


On 2011/06/09, at 10:52, Donny Nadolny donny.nado...@gmail.com wrote:

 I'd like to start using cassandra for a certain part of my database that has 
 high write volume. I'm setting up a 3 node cluster, however my site doesn't 
 make enough money yet to justify 3 nodes meeting the hardware recommendation 
 of 4gb RAM. Instead I'm trying to get it working with nodes that have 256mb 
 RAM (running in a VM). I looked around and found a couple places where people 
 mention successfully running cassandra nodes with only 256mb, eg 
 http://news.ycombinator.com/item?id=2074114 and 
 http://groups.google.com/group/reddit-dev/browse_thread/thread/f7bc839dbc62d0ad/92af1e790f2fe05c,
  but they don't give any details about setting they've changed.
 
 It took a while, but I've settled on some settings that don't give me an 
 OutOfMemoryException under load, and still seem to have acceptable 
 performance (quick writes with throughput that's good enough for now, higher 
 latency reads but that's okay for my use). They're a bit on the conservative 
 side, but I'd rather have them low and never get an OOM than risk it.
 
 The JVM memory settings are the auto-calculated ones (running cassandra 
 0.7.6-2): -Xms122M -Xmx122M -Xmn30M
 
 I have 4 CF's. The settings I've changed are, for each CF:
 MemtableThroughputInMB to 1mb (yes, that's very low, that's part of my 
 question)
 MemtableOperationsInMillions to 0.02 (20k operations)
 cached keys to 20,000
 
 
 The docs at http://wiki.apache.org/cassandra/MemtableThresholds warn that 
 tons of tiny memtables is bad. Why?
 Also, am I correct in believing that it's ok to change the memtable 
 throughput/operations later on once I have larger nodes, and that there will 
 be no lasting bad effects (eg I can just trigger a compaction, or even bring 
 new nodes online and remove the old ones)?
 
 I've tested this setup doing reads and writes, but I haven't tried any 
 operations (eg moving a node to a different token, bootstrapping a new node). 
 Are there any operations I need to watch out for that could cause an OOM, or 
 other problematic settings that should be tuned that haven't caused problems 
 yet but could in certain cases?
 
 Thanks,
 Donny


Re: CLI set command returns null, ver 0.8.0

2011-06-08 Thread AJ

Thanks Aaron,

I created a script and everything went OK.  I think that the problem is 
when you try to update a CF.  Below, I try to change the column 
comparator and it complains that the 'comparators do not match'.  Can 
you enlighten me on what that means?  There is no data in the CF at this 
point.


[default@Keyspace1] create column family User3;
503dba20-924b-11e0--f1169bb35ddf
Waiting for schema agreement...
... schemas agree across the cluster
[default@Keyspace1] set User3['1']['name'] = 'mike';
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' as 
hex bytes
java.lang.RuntimeException: 
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' as 
hex bytes
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: 
org.apache.cassandra.locator.NetworkTopologyStrategy

Options: [datacenter1:1]
  Column Families:
ColumnFamily: User3
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.BytesType

  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.2859375/61/1440 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []
[default@Keyspace1]

/** Here, I figure the error above is because it cannot find the column 
called 'name' because it's using the BytesType column name 
sorter/comparator, so I try to change it below. */


[default@Keyspace1] update column family User3 with comparator = UTF8Type;
comparators do not match.
java.lang.RuntimeException: comparators do not match.
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]

What does comparators do not match mean?

Thanks,
Mike



On 6/8/2011 4:37 PM, aaron morton wrote:

Can you provide the cli script to create the schema and info on how many nodes 
you have.

Thanks

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 16:12, AJ wrote:


Can anyone help?  The CLI seems to be having issues.  The count command isn't 
working either:

[default@Keyspace1] count User[long(1)];
Expected 8 or 0 byte long (13)
java.lang.RuntimeException: Expected 8 or 0 byte long (13)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]
[default@Keyspace1] count User[1];;
Expected 8 or 0 byte long (1)
java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1] count User['1'];
Expected 8 or 0 byte long (1)
java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1] count User['12345678'];
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]


Granted, there are no rows in the CF yet (see probs below), but this exception 
seems to be during the parsing stage.

I've check everything else, AFAIK, so I'm at a loss.

Much obliged.

On 6/7/2011 12:44 PM, AJ wrote:

The log only shows INFO level messages about flushes, etc..

The debug mode of the CLI shows an exception after the set:

[mike@mars ~]$ cassandra-cli -h 192.168.1.101 --debug
Connected to: Test Cluster on 192.168.1.101/9160
Welcome to the Cassandra CLI.

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] set User[1]['name']='aaa';
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 

Is there a way from a running Cassandra node to determine whether or not itself is up?

2011-06-08 Thread Suan Aik Yeo
Is there a way (preferably an exposed method accessible through Thrift),
from a running Cassandra node to determine whether or not itself is up?
(Per Cassandra standards, I'm assuming based on the gossip protocol).
Another way to think of what I'm looking for is basically running nodetool
ring just on myself, but I'm only interested in knowing whether I'm Up or
Down?

I'm currently using the describe_cluster method, but earlier today when
the commitlogs for a node filled up and it appeared down to the other nodes,
describe_cluster() still worked fine, thus failing the check.

Thanks,
Suan