Re: node.js library?

2011-12-05 Thread Norman Maurer
As far as I know its the library that was developed by rackspace.

See
https://github.com/racker/node-cassandra-client

As rackspace is using node.js + cassandra I would expect it works ;)

Bye,
Norman




2011/12/5 Joe Stein crypt...@gmail.com

 Hey folks, so I have been noodling on using node.js as a new front end for
 the system I built for doing real time aggregate metrics within our
 distributed systems.

 Does anyone have experience or background story on this lib?
 http://code.google.com/a/apache-extras.org/p/cassandra-node/ it seems to
 be the most up to date one supporting CQL only (which should not be an
 issue) but was not sure if it is maintained or what the background story is
 on it and such?

 Any other experiences/horror stories/over the rainbow type stories with
 node.js  C* would be nice to hear.

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 */



Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Norman Maurer
I would be very interested in this. I wrote a prototype for JAMES which
uses cassandra to store emails and provide them via IMAP and POP3 so it
would be nice to see your impl.

thanks
norman

Am Donnerstag, 17. November 2011 schrieb Rustam Aliyev rus...@code.az:
 Hi Dotan,

 We have already built something similar and were planning to open source
it. It will be available under http://www.elasticinbox.com/.

 We haven't followed exactly IBM's paper, we believe our Cassandra model
design is more robust. It's written in Java and provides LMTP and REST
interfaces. ElasticInbox also stores original messages outside of the
Cassandra, in the blob store.

 Let me know if you are interested, I will need some time to do cleanup.

 Regards,
 Rustam.

 On 17/11/2011 14:17, Dotan N. wrote:

 Hi all,
 New to cassandra, i'm about to embrak on building a scalable user inbox
service on top of cassandra.
 I've done the preliminary googling and got some more info on bluerunner
(IBM's project on the subject),
 and now looking for more information in this specific topic.
 If anyone can point me to researches/articles that would nudge me in the
right direction i'd be tremendously thankful!
 Thanks!
 --
 Dotan, @jondot



Re: release mmap memory through jconsole?

2011-09-30 Thread Norman Maurer
I would also not use such a big heap. I think most people will tell
you that 12G -16G is max to use.

Bye,
Norman

2011/9/30 Yi Yang i...@iyyang.com:
 It is meaningless to release such memory. The counting includes the data you 
 reached in the SSTable. Those data locates on your hard drive. So it is not 
 the RAM spaces you have actually used.

 -Y.
 --Original Message--
 From: Yang
 To: user@cassandra.apache.org
 ReplyTo: user@cassandra.apache.org
 Subject: release mmap memory through jconsole?
 Sent: Oct 1, 2011 12:40 AM

 I gave an -Xmx50G to my Cassandra java processs, now top shows its
 virtual memory address space is 82G, is there
 a way to release that memory through JMX ?

 Thanks
 Yang

 從我的 BlackBerry(R) 無線裝置


Re: How to release a customised Cassandra from Eclipse?

2011-08-08 Thread Norman Maurer
Its

ant artifacts


Bye
Norman

2011/8/7, Alvin UW alvi...@gmail.com:
 Thanks guys.

 The problem is solved. I copied cassandra and cassandra.in to my bin folder.
 Then used ant release  to generate my customized cassandra.jar in dist
 folder.
 it worked.

 To Aaron: I tried ant artefacts, but it failed. is it because I am using
 Cassandra 0.7?
 What's the difference between ant artefacts and ant release?

 2011/8/6 aaron morton aa...@thelastpickle.com

 Have a look at this file in the source repo
 https://github.com/apache/cassandra/blob/trunk/bin/cassandra

 try using ant artefacts and look in the build/dist dir.

 cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7 Aug 2011, at 03:58, Alvin UW wrote:


 Thanks.

 I am a beginner.
 I checked bin folder under myCassandra. There are only some classes
 without
 executable file.
 after ant release, I got the jar file from build folder.




 2011/8/6 Jonathan Ellis jbel...@gmail.com

 look at bin/cassandra, you can't just run it with java -jar

 On Sat, Aug 6, 2011 at 10:43 AM, Alvin UW alvi...@gmail.com wrote:
  Hello,
 
  I set up a Cassandra project in Eclipse following
  http://wiki.apache.org/cassandra/RunningCassandraInEclipse
  Then, I made a few modifications on it to form a customised Cassandra.
  But I don't know how can I release this new Cassandra from Eclipse as a
 jar
  file to use in EC2.
 
  I tried ant release command in command line. It can successful build
 .jar
  file.
  Then I typed java -jar apache-cassandra-0.7.0-beta1-SNAPSHOT.jar
 
  Error: Failed to load Main-Class manifest attribute from 
 
  I edited a MANIFEST.MF like:
  Manifest-Version: 1.0
  Ant-Version: Apache Ant 1.7.1
  Created-By: 16.3-b01 (Sun Microsystems Inc.)
  Implementation-Title: Cassandra
  Implementation-Version: 0.7.0-beta1-SNAPSHOT
  Implementation-Vendor: Apache
  Main-Class: org.apache.cassandra.thrift.CassandraDaemon
 
  and tried again. the error is like below:
 
  Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/thrift/transport/TTransportException
  Caused by: java.lang.ClassNotFoundException:
  org.apache.thrift.transport.TTransportException
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
  Could not find the main class:
 org.apache.cassandra.thrift.CassandraDaemon.
  Program will exit.
 
  So what's the problem?
 
 
  Thanks.
  Alvin
 
 
 
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com







Re: nodetool repair: No neighbors

2011-07-31 Thread Norman Maurer
I created an issue and attached a patch:

https://issues.apache.org/jira/browse/CASSANDRA-2979

I was not sure if it would be better to handle it in NodeProbe or
StorageService..

Bye,
Norman


2011/7/31 Sylvain Lebresne sylv...@datastax.com:
 On Sun, Jul 31, 2011 at 2:25 AM, Jason Baker ja...@apture.com wrote:
 When I run nodetool repair on a node on my 3-node cluster, I see 3 messages
 like the following:
  INFO [manual-repair-6d9a617f-c496-4744-9002-a56909b83d5b] 2011-07-30
 18:50:28,464 AntiEntropyService.java (line 636) No neighbors to repair with
 for system on (0,56713727820156410577229101238628035242]:
 manual-repair-6d9a617f-c496-4744-9002-a56909b83d5b completed.
 Granted, there's no data on these machines yet.  Is this normal?

 You can discard those. This is saying it cannot repair the system keyspace
 (table used internally). It turns those aren't replicated and thus don't need
 to be repaired. We should probably update the code to skip trying to
 repair the system table but in the meantime this is harmless.

 --
 Sylvain



Re: CassandraFS in 1.0?

2011-07-07 Thread Norman Maurer
May I ask if its opensource by any chance ?

bye
norman

Am Donnerstag, 7. Juli 2011 schrieb David Strauss da...@davidstrauss.net:
 I'm not sure HDFS has the right properties for a media-storage file
 system. We have, however, built a WebDAV server on top of Cassandra
 that avoids any pretension of being a general-purpose, POSIX-compliant
 file system. We mount it on our servers using davfs2, which is also
 nice for a few reasons:

 * We can use standard HTTP load-balancing and dead host avoidance
 strategies with WebDAV.
 * Encrypting access and authenticating clients with PKI/HTTPS works 
 seamlessly.
 * WebDAV + davfs2 is etag-header aware, allowing clients to
 efficiently validate cached items.
 * HTTP is browser and CDN/reverse proxy cache friendly for
 distributing content to people who don't need to mount the file
 system.
 * We could extend the server's support to allow connections from a
 broad variety of interactive desktop clients.

 On Wed, Jul 6, 2011 at 13:11, Joseph Stein crypt...@gmail.com wrote:
 Hey folks, I am going to start prototyping our media tier using cassandra as
 a file system (meaning upload video/audio/images to web server save in
 cassandra and then streaming them out)
 Has anyone done this before?
 I was thinking brisk's CassandraFS might be a fantastic implementation for
 this but then I feel that I need to run another/different Cassandra cluster
 outside of what our ops folks do with Apache Cassandra 0.8.X
 Am I best to just compress files uploaded to the web server and then start
 chunking and saving chunks in rows and columns so the mem issue does not
 smack me in the face?  And use our existing cluster and build it out
 accordingly?
 I am sure our ops people would like the command line aspect of CassandraFS
 but looking for something that makes the most sense all around.
 It seems to me there is a REALLY great thing in CassandraFS and would love
 to see it as part of 1.0 =8^)  or at a minimum some streamlined
 implementation to-do the same thing.
 If comparing to HDFS that is part of Hadoop project even though Cloudera has
 a distribution of Hadoop :) maybe that can work here too _fingers_crosed_
 (or mongodb-gridfs)
 happy to help as I am moving down this road in general
 Thanks!

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop
 */




 --
 David Strauss
    | da...@davidstrauss.net
    | +1 512 577 5827 [mobile]



Re: When should I use Solandra?

2011-06-04 Thread Norman Maurer
Are you sure you really need cassandra for this ? For me it sounds
like mysql or other databases would be a better fit for you (if you
don't need to store a very hugh amount of data...)

Bye,
Norman

2011/6/4 Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com:
 Hi,
 I am planning to use Cassandra to store my users passwords and at the same
 time data for my website that need to be accessible via search. My Question
 is should I use two DB: Cassandra (for users passwords) and Solandra (for
 the websites data) or can I put everything in Solandra?
 Is there a way to stop Solandra from indexing my users passwords?
 Thanks in advance for any help.


Re: secondary indexes on data imported by json2sstable

2011-03-14 Thread Norman Maurer
I would expect they get created on the fly while importing. If not I think
its a bug...

Bye,
Norman


2011/3/14 Terje Marthinussen tmarthinus...@gmail.com

 Hi,

 Should it be expected that secondary indexes are automatically regenerated
 when importing data using json2sstable?
 Or is there some manual procedure that needs to be done to generate them?

 Regards,
 Terje



Re: sstable2json not loading CLASSPATH properly?

2011-03-13 Thread Norman Maurer
What about creating a bugreport and attach the needed changes. I bet
cassandra devs love contributions

Bye
Norman

2011/3/13, Jason Harvey alie...@gmail.com:
 nvm, I found the problem. Sstable2json and json2sstable require a
 log4j-tools properties file. I created one and all was well. I guess
 that should be added to the default install packages.

 Cheers,
 Jason

 On Sat, Mar 12, 2011 at 12:09 AM, Jason Harvey alie...@gmail.com wrote:
 Sstable2json always spits out the following when I execute it:

 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.


 I verified that the run script sets the CLASSPATH properly, and I even
 tried manually setting CLASSPATH but still got no dice. Is
 sstable2json just not reading the CLASSPATH properly, or does it need
 a separate log2j config? Running 0.7.3 on Ubuntu.

 Thanks,
 Jason




Re: Splitting the data of a single blog into 2 CFs (to implement effective caching) according to views.

2011-03-08 Thread Norman Maurer
Yeah this make sense as far as I can tell.


Bye,
Norman


2011/3/8 Aditya Narayan ady...@gmail.com


 My application  displays list of several blogs' overview data (like
 blogTitle/ nameOfBlogger/ shortDescrption for each blog) on 1st page (in
 very much similar manner like Digg's newsfeed) and when the user selects a
 particular blog to see., the application takes him to that specific blog's
 full page view which displays entire data of the blog.

 Thus I am trying to split a blog's data in *two rows*, in two **different
 CFs ** (one CF is row-cached(with less amount of data in each row) and
 another(with each row having entire remaining blog data) without caching).

 Data for 1st page view (like titles and other overview data of a blog) are
 put in a row in 1st CF. This CF is cached so as to improve the performance
 of heavily read data. Only the data from cached CF is read for 1st page. The
 other remaining data(bulk amount of text of blog and entire comments data)
 are stored as another row in 2nd CF. For 2nd page, **rows from both of the
 two CFs have to be read**. This will take two read operations.

 Does this seem to be a good design ?



Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
query per ranges is only possible with OPP or BPP.

Bye,
Norman


2011/2/23 Sasha Dolgy sdo...@gmail.com:
 What if i want 20 rows and the next 20 rows in a subsequent query?  can this
 only be achieved with OPP?

 --
 Sasha Dolgy
 sasha.do...@gmail.com

 On 23 Feb 2011 13:54, Ching-Cheng Chen cc...@evidentsoftware.com wrote:



Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
yes but be aware that the keys will not in the right order.

Bye,
Norman

2011/2/23 Roshan Dawrani roshandawr...@gmail.com:
 On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen
 cc...@evidentsoftware.com wrote:

 Actually, if you want to get ALL keys, I believe you can still use
 RangeSliceQuery with RP.
 Just use setKeys(,) as first batch call.
 Then use the last key from previous batch as startKey for next batch.
 Beware that since startKey is inclusive, so you'd need to ignore first key
 from now on.
 Keep going until you finish all batches.  You will know you'd need to stop
 when setKeys(key_xyz,) return you only one key.

 This is what I meant to suggest when I earlier said So, if you want all,
 you will need to keep paging forward and collecting the keys. :-)


Re: Can I get a range of not deleted rows?

2011-02-22 Thread Norman Maurer
To make it short... No.

You can only check if the Row contains at least one Column to
understand if its a Tombstone or not..

Bye,
Norman


2011/2/22 Joshua Partogi joshua.j...@gmail.com:
 Hi there.

 It seems that when I fetch a range of rows, cassandra also includes
 rows that has been deleted. Is it possible to only get rows that has
 not been deleted?

 Thanks for your help.

 Kind regards,
 Joshua.

 --
 http://twitter.com/jpartogi



Re: java.io.IOException in CompactionExecutor

2011-02-21 Thread Norman Maurer
The problem on windows is that it is a bit more worried about rename
a file if the handle is still open..

So maybe some stream not closed on the file.

Bye,
Norman


2011/2/21 Aaron Morton aa...@thelastpickle.com:
 From th F:/ I assume you are on Windows ? What version?
 Just did a quick test on Ubuntu 10.0.4 and it works, but the File.renameTo()
 function used has different behavior depending on the host OS. There may be
 some issues on
 Window http://stackoverflow.com/questions/1000183/reliable-file-renameto-alternative-on-windows
 Aaron


 On 21 Feb, 2011,at 11:43 PM, ruslan usifov ruslan.usi...@gmail.com wrote:

 I launch clean cassandra 7.2 instalation, and after few days i look at
 system.log follow error (more then 10 time):


 ERROR [CompactionExecutor:1] 2011-02-19 02:56:17,965
 AbstractCassandraDaemon.java (line 114) Fatal exception in thread
 Thread[CompactionExecutor:1,1,main]
 java.lang.RuntimeException: java.io.IOException: Unable to rename cache to
 F:\Cassandra\7.2\saved_caches\system-LocationInfo-KeyCache
     at
 org.apache.cassandra.utils.WrappedRunnablerun(WrappedRunnable.java:34)
     at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Unable to rename cache to
 F:\Cassandra\7.2\saved_caches\system-LocationInfo-KeyCache
     at
 org.apache.cassandra.io.sstable.CacheWriter.saveCache(CacheWriter.java:85)
     at
 org.apache.cassandra.db.CompactionManager$9.runMayThrow(CompactionManager.java:746)
     at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
     ... 6 more




Re: Queries on secondary indexes

2011-02-21 Thread Norman Maurer
No sure whats your problem..

Use two EQ operations works without a problem here (even via the cli).

Bye,
Norman

2011/2/18 Rauan Maemirov ra...@maemirov.com:
 With this schema:
 create column family Userstream with comparator=UTF8Type and rows_cached =
 1 and keys_cached = 10
 and column_metadata=[{column_name:account_id, validation_class:IntegerType,
 index_type: 0, index_name:UserstreamAccountidIdx},
 {column_name:from_id, validation_class:IntegerType, index_type: 0,
 index_name:UserstreamFromidIdx},
 {column_name:type, validation_class:IntegerType, index_type: 0,
 index_name:UserstreamTypeIdx}];
 I'm having this:
 [default@Keyspace1] get Userstream where from_id=5 and type4;
 ---
 RowKey: 23:feed:12980301937245
 = (column=account_id, value=23, timestamp=1298031252270173)
 = (column=activities,
 value=5b2232313864333936302d336235362d313165302d393838302d666235613434333135343865225d,
 timestamp=1298031252270173)
 = (column=from_id, value=5, timestamp=1298031252270173)
 = (column=type, value=5, timestamp=1298031252270173)
 ---
 RowKey: 5:feed:12980301937196
 = (column=account_id, value=5, timestamp=1298031252270173)
 = (column=activities,
 value=5b2232313863376339302d336235362d313165302d623536342d666235303739333835303234225d,
 timestamp=1298031252270173)
 = (column=from_id, value=5, timestamp=1298031252270173)
 = (column=type, value=5, timestamp=1298031252270173)
 ---
 RowKey: 9:feed:12980301937207
 = (column=account_id, value=9, timestamp=1298031252270173)
 = (column=activities,
 value=5b2232313863613637302d336235362d313165302d39622d373530393638613764326561225d,
 timestamp=1298031252270173)
 = (column=from_id, value=5, timestamp=1298031252270173)
 = (column=type, value=0, timestamp=1298031252270173)
 3 Rows Returned.

 and
 [default@Keyspace1] get Userstream where from_id=5 and type=5;
 0 Row Returned.


 What's wrong with it?


Re: cluster size, several cluster on one node for multi-tenancy

2011-02-17 Thread Norman Maurer
Maybe you could make use of Virtual Keyspaces.

See this wiki for the idea:
https://github.com/rantav/hector/wiki/Virtual-Keyspaces

Bye,
Norman

2011/2/17 Frank LoVecchio fr...@isidorey.com:
 Why not just create some sort of ACL on the client side and use one
 Keyspace?  It's a lot less management.

 On Thu, Feb 17, 2011 at 12:34 PM, Mimi Aluminium mimi.alumin...@gmail.com
 wrote:

 Hi,
 I really need your help in this matter.
 I will try to simplify my problem and ask specific questions

 I am thinking of solving the multi-tenancy problem by providing a separate
 cluster per each tenant. Does it sound reasonable?
 I can end-up with one node belongs to several clusters.
 Does Cassandra support several clusters per node? Does it mean several
 Cassandra daemons on each node? Do you recommend doing that ? what is the
 overhead? is there any link that explain how to do that?

 Thanks a lot,
 Mimi


 On Wed, Feb 16, 2011 at 6:43 PM, Mimi Aluminium mimi.alumin...@gmail.com
 wrote:

 Hi,
 We are interested in a multi-tenancy environment, that may consist of up
 to hundreds of data centers. The current design requires cross rack and
 cross DC replication. Specifically, the per-tenant CFs will be replicated 6
 times: in three racks,  with 2 copies inside a rack, the racks will be
 located in at least two different DCs. In the future other replication
 policies will be considered. The application will decide where (which racks
 and DC)  to place each tenant's replicas.  and it might be that one rack can
 hold more than one tenant.

 Separating each tenant in a different keyspace, as was suggested
 in  previous mail thread in this subject, seems to be a good approach
 (assuming the memtable problem will be solved somehow).
 But then we had concern with regard to the cluster size.
 and here are my questions:
 1) Given the above, should I define one Cassandra cluster that hold all
 the DCs? sounds not reasonable  given hundreds DCs tens of servers in each
 DC etc. Where is the bottleneck here? keep-alive messages, the gossip,
 request routing? what is the largest number of servers a cluster can bear?
 2) Now assuming that I can create the per-tenant  keyspace only for  the
 servers that in the three racks where the replicas are held,  does such
 definition reduces the messaging transfer among the other servers. Does
 Cassandra optimizes the message transfer in such case?
 3) Additional possible solution was to create a separate clusters per
 each tenant. But it can cause a situation where one server has to run two or
 more Cassandra's clusters. Can we run more than one cluster in parallel,
 does it means two cassandra daemons / instances on one server? what will be
 the overhead? do you have a link that explains how to deal with it?

 Please can you help me to decide which of these solution can work or you
 are welcome to suggest something else.
 Thanks a lot,
 Mimi










 --
 Frank LoVecchio
 Senior Software Engineer | Isidorey, LLC
 Google Voice +1.720.295.9179
 isidorey.com | facebook.com/franklovecchio | franklovecchio.com



Re: [RELEASE] 0.7.1

2011-02-14 Thread Norman Maurer
Huh,

isn't that what mirrors are supposed to be for ?

Bye,
Norman

2011/2/14 Frank LoVecchio fr...@isidorey.com:
 Did the site get hacked?
 http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.1/apache-cassandra-0.7.1-bin.tar.gz
 Sources keep changing...

 On Mon, Feb 14, 2011 at 1:13 PM, Eric Evans eev...@rackspace.com wrote:

 Today is Valentine's Day[1] in many parts of the world, an annual
 commemoration of love and affection typically celebrated with candy,
 stuffed animals, and floral arrangements.

 She may seem a bit a fickle at times, but Cassandra loves you, and since
 most people would rather receive a gift of the heart than some
 prefabricated sentiment, the project decided to give the gift of point
 release, ala 0.7.1.

 Happy Valentine's Day! :)

 Redeem yours from the usual place[5] (or [6] for users of Debian and
 derivatives).

 And, as usual, be sure to read through the changes[2] and release
 notes[3]. Report any problems you find[4], and if you have any
 questions, don't hesitate to ask.

 XOXOXO

 [1]: http://en.wikipedia.org/wiki/Valentine's_Day
 [2]: http://goo.gl/5VAPP (CHANGES.txt)
 [3]: http://goo.gl/C9M5W (NEWS.txt)
 [4]: https://issues.apache.org/jira/browse/CASSANDRA
 [5]: http://cassandra.apache.org/download
 [6]: http://wiki.apache.org/cassandra/DebianPackaging

 --
 Eric Evans
 eev...@rackspace.com




 --
 Frank LoVecchio
 Senior Software Engineer | Isidorey, LLC
 Google Voice +1.720.295.9179
 isidorey.com | facebook.com/franklovecchio | franklovecchio.com



Re: Do supercolumns have a purpose?

2011-02-09 Thread Norman Maurer
I still think super-columns are useful you just need to be aware of
the limitations...

Bye,
Norman


2011/2/9 Mike Malone m...@simplegeo.com:
 On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn da...@lookin2.com wrote:

 Shaun, I agree with you, but marking them as deprecated is not good enough
 for me. I can't easily stop using supercolumns. I need an upgrade path.

 David,
 Cassandra is open source and community developed. The right thing to do is
 what's best for the community, which sometimes conflicts with what's best
 for individual users. Such strife should be minimized, it will never be
 eliminated. Luckily, because this is an open source, liberal licensed
 project, if you feel strongly about something you should feel free to add
 whatever features you want yourself. I'm sure other people in your situation
 will thank you for it.
 At a minimum I think it would behoove you to re-read some of the comments
 here re: why super columns aren't really needed and take another look at
 your data model and code. I would actually be quite surprised to find a use
 of super columns that could not be trivially converted to normal columns. In
 fact, it should be possible to do at the framework/client library layer -
 you probably wouldn't even need to change any application code.
 Mike

 On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts sh...@cuttshome.net wrote:

 I'm a newbie here, but, with apologies for my presumptuousness, I think
 you should deprecate SuperColumns. They are already distracting you, and as
 the years go by the cost of supporting them as you add more and more
 functionality is only likely to get worse. It would be better to concentrate
 on making the core column families better (and I'm sure we can all think
 of lots of things we'd like).
 Just dropping SuperColumns would be bad for your reputation -- and for
 users like David who are currently using them. But if you mark them clearly
 as deprecated and explain why and what to do instead (perhaps putting a bit
 of effort into migration tools... or even a virtual layer supporting
 arbitrary hierarchical data), then you can drop them in a few years (when
 you get to 1.0, say), without people feeling betrayed.

 -- Shaun
 On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

 My main point was to say that it's think it is better to create tickets
 for what you want, rather than for something else completely different that
 would, as a by-product, give you what you want.

 Then let me say what I want: I want supercolumn families to have any
 feature that regular column families have.

 My data model is full of supercolumns. I used them, even though I knew it
 didn't *have to*, because they were there, which implied to me that I was
 supposed to use them for some good reason. Now I suspect that they will
 gradually become less and less functional, as features are added to regular
 column families and not supported for supercolumn families.


 On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.com
 wrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.

 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.

 I realize that this is largely subjective, and on such matters code
 speaks louder than words, but I don't think I agree with you on the issue 
 of
 which alternative is less work, or even which is a better solution.

 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what 
 you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598, which itself would 
 require
 quite some work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?
 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].
 It seems to me 

Re: unsubscribe

2011-02-02 Thread Norman Maurer
To make it short.. No it can't.

Bye,
Norman

(ASF Infrastructure Team)

2011/2/2 F. Hugo Zwaal h...@unitedgames.com:
 Can't the mailinglist server be changed to treat messages with unsubscribe
 as subject as an unsubscribe as well? Otherwise it will just keep happening,
 as people simply don't remember or take time to find out?

 Just my 2 cents...

 Groets, Hugo.

 On 2 feb 2011, at 16:54, Jonathan Ellis jbel...@gmail.com wrote:

 http://wiki.apache.org/cassandra/FAQ#unsubscribe

 On Wed, Feb 2, 2011 at 7:55 AM, JJ jjcha...@gmail.com wrote:


 Sent from my iPad




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: issue for cassandra-cli on 0.7-rc1

2010-12-02 Thread Norman Maurer
You need to terminate the command with a ;.

Try:
help;

Bye,
Norman

2010/12/2 Yikuo Chan yikuo.c...@gmail.com:
 Hi there ,

 After install cassandra 0.7rc1 , I get no respond problem after execute
 cassandra-cli , please reference below log and help
 me to fix this issue :

 [r...@xxx cassandra]# bin/cassandra-cli --host 10.31.23.22
 Connected to: xxx Test Cluster on 10.31.23.22/9160
 Welcome to cassandra CLI.

 Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
 [defa...@unknown] help (enter)
     help (enter)
    (blank , CLI no respond )
     [r...@sjdc-ers-zb1 cassandra]#


 thanks

 Kevin



Re: Cassandra won't start Java Issue Snow Leopard

2010-11-24 Thread Norman Maurer
Change in the bin Directory and run ./cassandra -f

Bye
Norman

2010/11/24, Alberto Velandia betovelan...@gmail.com:
 Hi I'm getting this error when i run bin/cassandra -f

 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/cassandra/thrift/CassandraDaemon
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.thrift.CassandraDaemon
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

 which seems to be a Java version issue, I've updated my ~/.profile to the
 following:

 export
 PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:$PATH
 [[ -s $HOME/.rvm/scripts/rvm ]]  . $HOME/.rvm/scripts/rvm  # This
 loads RVM into a shell session.

 but I'm still getting the same error, I've also set the environment variable
 JAVA_HOME but I don't know If I did it right.

 can anyone help me? thx





Re: Cassandra crashed - possible JMX threads leak

2010-10-26 Thread Norman Maurer
Depending on finalize() is really not want you want todo, so I think
the API change would be preferable.

Bye,
Norman


2010/10/26 Bill Au bill.w...@gmail.com:
 I would be happy to submit a patch but is it a bit more trickier than simply
 calling JMXConenctor.close().  NodeProbe's use of the JMXConnector is not
 exposed in its API  The JMX connection is created in NodeProbe's
 constructor.  Without changing the API, the only place to call close() would
 be in NodeProbe's finalize().  I am not sure if that's the best thing to
 do.  I think this warrant a discussion on the developer mailing list.  I
 will start an new mail thread there.

 Anyways, I am still trying to understand why the JMX server connection
 timeout threads pile up rather quickly when I restart a node in a live
 cluster.  I took a look at the Cassandra source and see that NodeProbe is
 the only place that creates and uses a JMX connection.  And NobeProbe is
 only used by the tools.  So it seems that there is another JMX thread leak
 in Cassandra.

 Bill

 On Fri, Oct 22, 2010 at 4:33 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Is the fix as simple as calling close() then?  Can you submit a patch for
 that?

 On Fri, Oct 22, 2010 at 2:49 PM, Bill Au bill.w...@gmail.com wrote:
  Not with the nodeprobe or nodetool command because the JVM these two
  commands spawn has a very short life span.
 
  I am using a webapp to monitor my cassandra cluster.  It pretty much
  uses
  the same code as NodeCmd class.  For each incoming request, it creates
  an
  NodeProbe object and use it to get get various status of the cluster.  I
  can
  reproduce the Cassandra JVM crash by issuing requests to this webapp in
  a
  bash while loop.  I took a deeper look and here is what I discovered:
 
  In the webapp when NodeProbe creates a JMXConnector to connect to the
  Cassandra JMX port, a thread
  (com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created
  and
  run in the webapp's JVM.  Meanwhile in the Cassamdra JVM there is a
  com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to
  timeout remote JMX connection.  However, since NodeProbe does not call
  JMXConnector.close(), the JMX client checker threads remains in the
  webapp's
  JVM even after the NobeProbe object has been garbage collected.  So this
  JMX
  connection is still considered open and that keeps the JMX timeout
  thread
  running inside the Cassandra JVM.  The number of JMX client checker
  threads
  in my webapp's JVM matches up with the number of JMX server timeout
  threads
  in my Cassandra's JVM.  If I stop my webapp's JVM,
  all the JMX server timeout threads in my Cassandra's JVM all disappear
  after
  2 minutes, the default timeout for a JMX connection.  This is why the
  problem cannot be reproduced by nodeprobe or nodetool.  Even though
  JMXConnector.close() is not called, the JVM exits shortly so the JMX
  client
  checker thread do not stay around.  So their corresponding JMX server
  timeout thread goes away after two minutes.  This is not the case with
  my
  weabpp since its JVM keeps running, so all the JMX client checker
  threads
  keep running as well.  The threads keep piling up until it crashes
  Casssandra's JVM.
 
  In my case I think I can change my webapp to use a static NodeProbe
  instead
  of creating a new one for every request.  That should get around the
  leak.
 
  However, I have seen the leak occurs in another situation.  On more than
  one
  occasions when I restarted one node in a live multi-node clusters, I see
  that the JMX server timeout threads quickly piled up (number in the
  thousands) in Cassandra's JVM.  It only happened on a live cluster that
  is
  servicing read and write requests.  I am guessing the hinted hand off
  might
  have something to do with it.  I am still trying to understand what is
  happening there.
 
  Bill
 
 
  On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  can you reproduce this by, say, running nodeprobe ring in a bash while
  loop?
 
  On Wed, Oct 20, 2010 at 3:09 PM, Bill Au bill.w...@gmail.com wrote:
   One of my Cassandra server crashed with the following:
  
   ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419
   CassandraDaemon.java (line 82) Uncaught exception in thread
   Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main]
   java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:597)
       at
  
  
   org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)
  
  
   I took threads dump in the JVM on all the other Cassandra severs in
   my
   cluster.  They all have thousand of threads looking like this:
  
   JMX server connection timeout 183373 daemon prio=10
   tid=0x2aad230db800
   nid=0x5cf6 in Object.wait() [0x2aad7a316000]
      java.lang.Thread.State: TIMED_WAITING (on object 

Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Norman Maurer
No Zookeeper is not used in cassandra. You can use Zookeeper as some
kind of add-on to do locking etc.

Bye,
Norman


2010/10/19 Yang tedd...@gmail.com:
 I read from the Facebook cassandra paper that zookeeper is used
 . for certain things ( membership and Rack-aware placement)

 but I pulled 0.7.0-beta2 source and couldn't grep out anything with
 Zk or Zoo,  nor any files with Zk/Zoo in the names


 is Zookeeper really used? docs/blog posts from online search kind of
 give conflicting answers


 Thanks
 Yang



Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Norman Maurer
CTRL + Z does not stop a programm it just suspend it. You will need to
resume it with fg and then hit CTRL + C to stop it.

For some basic background:

http://linuxreviews.org/beginner/jobs/

Bye,
Norman


2010/10/6 Alberto Velandia b...@yogadigital.net:
 Hi I've stopped cassandra hitting Ctrl + Z and tried to restart it and got
 this message:
  INFO 11:46:16,039 JNA not found. Native methods will be disabled.
  INFO 11:46:16,159 DiskAccessMode 'auto' determined to be mmap,
 indexAccessMode is mmap
 ERROR 11:46:16,449 Fatal exception during initialization
 java.io.IOException: Failed to delete
 /var/lib/cassandra/data/system/LocationInfo-9-Data.db
         at
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:47)
         at
 org.apache.cassandra.io.SSTable.deleteIfCompacted(SSTable.java:108)
         at
 org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:172)
         at
 org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
         at org.apache.cassandra.db.Table.init(Table.java:338)
         at org.apache.cassandra.db.Table.open(Table.java:199)
         at
 org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:124)
         at
 org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:97)
         at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:214)

 1. How I'm I supposed to stop Cassandra?
 2. Is there a way for me to restart Cassandra without losing the data I've
 stored?
 Thanks

 Alberto Velandia
 Director de proyectos
 Calle 74 # 15-80 - Int 2. Oficina 202
 Osaka Trade Center
 Bogotá, Colombia, Latam.
 Tel: +571-345-1070
 Cel: +57301-466-3902
 Site en desarrollo: http://www.staging.yoga-digital.net/works
 Messenger: b...@yogadigital.net
 SkypeID: beto_velandia


 La pureza de lo simple

 Este mensaje se dirige exclusivamente a su destinatario y puede contener
 información privilegiada o confidencial. Si no es usted el destinatario
 indicado, queda notificado de que la lectura, utilización, divulgación y/o
 copia sin autorización está prohibida en virtud de la legislación vigente.
 Si ha recibido este mensaje por error, le rogamos que nos lo comunique
 inmediatamente por esta misma vía y proceda a su destrucción. El correo
 electrónico vía Internet no permite asegurar la confidencialidad de los
 mensajes que se transmiten ni su integridad o correcta recepción. Yoga
 Digital S.A.S. no asume ninguna responsabilidad por estas circunstancias.

 This message is intended exclusively for its addressee and may contain
 information that is CONFIDENTIAL and protected by a professional privilege
 or whose disclosure is prohibited by law. If you are not the intended
 recipient you are hereby notified that any read, dissemination, copy or
 disclosure of this communication is strictly prohibited by law. If this
 message has been received in error, please immediately notify us via e-mail
 and delete it.  Internet e-mail neither guarantees the confidentiality nor
 the integrity or proper receipt of the messages sent. Yoga Digital S.A.S.
 does not assume any liability for those circumstances.



Re: get keys based on values??

2010-10-06 Thread Norman Maurer
Only in 0.7

Bye,
Norman

2010/10/6 Brayton Thompson thomp...@grnoc.iu.edu:
 Are secondary index's available in .6.5? or are they only in .7?
 On Oct 6, 2010, at 1:15 PM, Tyler Hobbs wrote:

 If you're interested in only checking part of a column's value, you can
 generally
 just store that part of the value in a different column.  So, have an
 email_addr column
 and a email_domain column, which stores aol.com, for example.

 Then you can just use a secondary index on the email_domain column.

 - Tyler

 On Wed, Oct 6, 2010 at 10:33 AM, Brayton Thompson thomp...@grnoc.iu.edu
 wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Ok, I am VERY new to Cassandra and trying to get my head around its core
 ideas.

 So lets say I have a CF of Users that contains all the info I would ever
 want to know about them. One day I decide(for some reason) that I want to
 send a mass email to only the users with AOL email addresses. Is there a
 mechanism for getting only keys whose email attribute contains the string
 @aol.com ? Or is this frowned upon? I could also envision separate CF's for
 each email type; that stored values to use as keys into my Users CF. Say the
 AOL CF contains the usernames of everyone that has an aol account. So I
 would pull all of the keys from that CF and then use them to index into the
 Users CF to pull their email addresses.  It seems to me that this is
 redundant. So I would like your thoughts on my example.

 Thank you,
 Brayton Thompson
 thomp...@grnoc.iu.edu
 Global Research Network Operation Center
 Indiana University
 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

 iQIcBAEBAgAGBQJMrJa1AAoJENisXTckM+p9ffcP/1UmNDyWxDnOu41ZRcVwmJiE
 +47QxqNc57WmdXX86FUvcauhPFFNZfbrbGwA61sof1sktSOL83osOXQuOfGr5GvT
 tulU3+rQ1B+ea0x+aBESbKZwXHxckLGdst2Hro1eCVXEna+VvqkxNJ2rvYzE3hNM
 FTNBWDIv3JbOChTYBnycBqg1iG5yMDkc2xEHlaiw9S/VsOPU18pPYrf42eoSqgnk
 /rZDCxxiThznuaLI70QnU3O7ZTiyXpavN8BUW6KoeDZNAypgg1AayhEL2d67zZWu
 qtnGEpoIeieinjccWMpkUrv2f14CZQ5gbJSLwPdoNLItYLnFvGHg0Ca/hXhrkIDr
 BqnA0R5w2YHB+5p84gvj1NTRE0O2kXcUHkLDDBvnlLKUOUkoDyqr5tGAIwHhIwA7
 hpko76CyGN84bS8Kma+1D6e8wg9zqfiS9mvvErJCUOwyU5e+XeoiCdyhwgDHJKlW
 T5UjMXdAHwyZly48J5l6jEJastHsL1wKAHeV/NlQ1gEx2CmnnJ0lBPDPqlT5Lxdb
 uQFzS/YhFzxWL2gApHKF8EdCz4jFbPUggYYPsVgfYkNNBISgcIiQaEIIPkri96vb
 V/xhnxLrFCO20NnGQ5PCTzCnZptyc3V+9WI542fnRGcS8SbF+N5BdLzoJBjtidrI
 a/Nps/KUhJ5kVzJ0o8H3
 =oBhH
 -END PGP SIGNATURE-





Re: Curious as to how Cassandra handles the following

2010-09-26 Thread Norman Maurer
Comments inside..

2010/9/26 Lucas Nodine lucasnod...@gmail.com:
 I'm looking at a design where multiple clients will connect to Cassandra and
 get/mutate resources, possibly concurrently.  After planning a bit, I ran
 into the following scenero for which I have not been able to research to
 find an answer sufficient for my needs.  I have found where others have
 recommended Zookeeper for such tasks, but I want to determine if there is a
 simple solution before including another product in my design.

 Make the following assumption for all following situations:
 Assuming multiple clients where a client is someone accessing Cassandra
 using thrift.  All reads and writes are performed using the QUORUM
 consistency level.

 Situation 1:
 Client A (A) connects to Cassandra and requests a QUORUM consistency level
 get of an entire row.  At or very shortly thereafter (before A's request
 completes), Client B (B) connects to Cassandra and inserts (or mutates) a
 column (or multiple columns) within the row.

 Does A receive the new data saved by B or does A receive the data prior to
 B's save?

Shoud receive A stuff.

 Situaton 2:
 B connects and mutates multiple columns within a row.  A requests some data
 therein while B is processing.

 Result?

Depends.. is it done in BatchMutate or not ?


 Situation 3:
 B mutates multiple columns within multiple rows.  A requests some data
 therein while B is processing.

 Result?

See above..


 Justification: At certain points I want to essentially lock a resource (row)
 in cassandra for exclusive write access (think checkout a resource) by
 setting a flag value of a column within that row.  I'm just considering race
 conditions.

You will need to use cages or something like that..


 Thanks,

 Lucas Nodine

Bye,
Norman


Re: Curious as to how Cassandra handles the following

2010-09-26 Thread Norman Maurer
To be more clear (maybe I was not before). BatchMutate is not atomic.
So it only batch up mutates to reduce overhead. So it can be that
you will receive data from it even if the whole operation is not
complete or will not complete.

bye,
Norman


2010/9/26 Norman Maurer nor...@apache.org:
 Comments inside..

 2010/9/26 Lucas Nodine lucasnod...@gmail.com:
 I'm looking at a design where multiple clients will connect to Cassandra and
 get/mutate resources, possibly concurrently.  After planning a bit, I ran
 into the following scenero for which I have not been able to research to
 find an answer sufficient for my needs.  I have found where others have
 recommended Zookeeper for such tasks, but I want to determine if there is a
 simple solution before including another product in my design.

 Make the following assumption for all following situations:
 Assuming multiple clients where a client is someone accessing Cassandra
 using thrift.  All reads and writes are performed using the QUORUM
 consistency level.

 Situation 1:
 Client A (A) connects to Cassandra and requests a QUORUM consistency level
 get of an entire row.  At or very shortly thereafter (before A's request
 completes), Client B (B) connects to Cassandra and inserts (or mutates) a
 column (or multiple columns) within the row.

 Does A receive the new data saved by B or does A receive the data prior to
 B's save?

 Shoud receive A stuff.

 Situaton 2:
 B connects and mutates multiple columns within a row.  A requests some data
 therein while B is processing.

 Result?

 Depends.. is it done in BatchMutate or not ?


 Situation 3:
 B mutates multiple columns within multiple rows.  A requests some data
 therein while B is processing.

 Result?

 See above..


 Justification: At certain points I want to essentially lock a resource (row)
 in cassandra for exclusive write access (think checkout a resource) by
 setting a flag value of a column within that row.  I'm just considering race
 conditions.

 You will need to use cages or something like that..


 Thanks,

 Lucas Nodine

 Bye,
 Norman