Re: Cassandra Demo/Tutorial Applications

2010-03-12 Thread Ian Holsman
There are several large data sets on the net you could use to build.  
Demo with.

Search logs, wikipedia, uk govt stuff
Dbpedia may be interesting as they have some of the stuff extracted out


---
Sent from my phone
Ian Holsman - 703 879-3128

On 13/03/2010, at 4:46 PM, Jonathan Ellis  wrote:

On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar  
 wrote:
I was looking at this from CASSANDRA-873 as well as hands-on  
homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate  
insights:


A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?


multi-user blog (single-user is too easy :)
- extra credit: with full-text search using lucandra

discussion forum
- also w/ FTS

C)  What would be the goal of demo apps ? Tutorial to help folks  
learn the
ins and outs of Cassandra ? Show case capabilities ? I think  
Cassandra-873
belongs to the latter; Twissandra most probably belongs to the  
former.


I think you nailed it.


D)  Hadoop on Cassandra might be a good demo/tutorial


Sure, I'll buy that.

I can't think of any standalone projects for that, but "compute a
twissandra tag cloud" would be pretty cool.  (Might need to write a
twissandra bot to load stuff in to make an interesting cloud. :)

E)  How would one structure the infrastructure for the demo/ 
tutorials ? What

assumptions can we make in creating them ? As AMIs to be run in EC2 ?


I'd probably go with "virtualbox images" as being simpler for people
who don't have an AWS key already.  (VB can read vmware player images,
i think.  But there is no free vmware for OS X, so you'd want to check
that before going w/ vmware format.)

Or just have people d/l cassandra and a configuration xml.  Probably
easier than teaching people to use virtualbox who haven't before.


Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ?


You're not going to have time to teach cluster management.  Keep it  
to 1.


Re: finding Cassandra servers

2010-03-03 Thread Ian Holsman
+1 on erics comments
We could create a branch or git fork where you guys could develop it,
and if it reaches a usable state and others find it interesting it
could get integrated in then


On 3/3/10, Eric Evans  wrote:
> On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote:
>> I can do a patch+ticket for this in the core, making it optional and
>> off by default, or do the same for a contrib/ service as you
>> suggested.  So I'd appreciate a +1/-1 quick vote on whether this can
>> go in the core to save me from rewriting the patch later.
>
> I don't think voting is going to help. Voting doesn't do anything to
> develop consensus and it seems pretty clear that no consensus exists
> here.
>
> It's entirely possible that you've identified a problem that others
> can't see, or haven't yet encountered. I don't see it, but then maybe
> I'm just thick.
>
> Either way, if you think this is important, the onus is on you to
> demonstrate the merit of your idea and contrib/ or a github project is
> one way to do that (the latter has the advantage of not needing to rely
> on anyone else).
>
>
> --
> Eric Evans
> eev...@rackspace.com
>
>

-- 
Sent from my mobile device


Re: Cassandra News Page

2010-02-19 Thread Ian Holsman
Hi Sal.
we'll be moving off the incubator site shortly. we'll address that when we go 
to cassandra.apache.org

regards
Ian
On Feb 18, 2010, at 4:06 PM, Sal Fuentes wrote:

> This is just a thought, but I think some type of *latest news* page would be 
> nice to have on the main site (http://incubator.apache.org/cassandra/) even 
> if its a bit outdated. Not sure if this has been previously considered. 
> 
> -- 
> Salvador Fuentes Jr.

--
Ian Holsman
i...@holsman.net





Re: Scalable data model for a Metadata database

2010-02-10 Thread Ian Holsman
Hi Jared.
you might want to look at graph databases (hypergraphDB or neo4j for example) 
for use cases like this. 
what it seems like you are asking for is a semantic knowledge base ala 
freebase.com

tools like protégé (protege.stanford.edu/ ) and gremlin (gremlin.tinkerpop.com) 
are helpful for this kind of thing as well.

the other issue you are going to encounter is when you want to link up 2 things.

for example marriage.
find all people whose sex == ‘male’ and age >= 20 and age <= 29 and is married 
to people called michelle who is older than 27.

HTH
Ian

On Feb 10, 2010, at 3:51 AM, Jared winick wrote:

> Thanks for the specific suggestions Jonathan, I really appreciate it.
> 
> On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis  wrote:
>> On Tue, Feb 9, 2010 at 10:01 AM, Jared winick  wrote:
>>> Somehow I need to partition the data better.  Would a recommendation
>>> be to “split” the “sex” key into multiple keys? For example I could
>>> append the year and month to the key (“sex_022010”) to partition the
>>> data by the month it was insert.
>> 
>> That's one possibility.  Another would be to kill two birds with one
>> stone and add the age to that key, so you'd have male_20 (probably
>> better: male_1990), etc.
>> 
>> Fundamentally TANSTAAFL and if you need to scale queries w/ lots of
>> criteria like this you will have to choose (sometimes from more than
>> one of) these options:
>> 
>>  - have a lot of machines so you can parallelize brute force queries,
>> e.g. w/ Hadoop
>>  - precompute specific "indexes" like sex_birthdate above
>>   - note, with supercolumns you can also materialize the whole
>> "person" in subcolumns, rather than doing an extra lookup for each
>> index hit
>>  - use less-specific indexes (e.g. separate sex & birthdate indexes to
>> continue the example) and do more work on the client
>> 
>> -Jonathan
>> 

--
Ian Holsman
i...@holsman.net





Re: Cassandra versus HBase performance study

2010-02-04 Thread Ian Holsman
Hi Brian.
was there any performance changes on the other tests with v0.5 ?
the graphs on the other pages looks remarkably identical.

On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote:

> 0.5 does seem to be significantly faster - the latency is better and it 
> provides significantly more throughput. I'm updating my charts with new 
> values now.
> 
> One thing that is puzzling is the scan performance. The scan experiment is to 
> scan between 1-100 records on each request. My 6 node Cassandra cluster is 
> only getting up to about 230 operations/sec, compared to >1400 ops/sec for 
> other systems. The latency is quite a bit higher. A chart with these results 
> is here:
> 
> http://www.brianfrankcooper.net/pubs/scans.png
> 
> Is this the expected performance? I'm using the OrderPreservingPartitioner 
> with InitialToken values that should evenly partition the data (and the 
> amount of data in /var/cassandra/data is about the same on all servers). I'm 
> using get_range_slice() from Java (code snippet below). 
> 
> At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
> varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
> (and the machine with the busiest disk is not the one with highest CPU 
> usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
> on different boxes. So clearly there is some imbalance (and the workload 
> itself is skewed via a Zipfian distribution) but I'm surprised that the 
> latencies are so high even in this case.
> 
> Code snippet - fields is a Set listing the columns I want; 
> recordcount is the number of records to return.
> 
> SlicePredicate predicate;
> if (fields==null)
> {
>   predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
> byte[0],false,100));
> }
> else
> {
>   Vector fieldlist=new Vector();
>   for (String s : fields)
>   {
>   fieldlist.add(s.getBytes("UTF-8"));
>   }
>   predicate = new SlicePredicate(fieldlist,null);
> }
> ColumnParent parent = new ColumnParent("data", null);
>   
> List results = 
> client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
>   
> Thanks!
> 
> Brian
> 
> 
> From: Brian Frank Cooper
> Sent: Saturday, January 30, 2010 7:56 AM
> To: cassandra-user@incubator.apache.org
> Subject: RE: Cassandra versus HBase performance study
> 
> Good idea, we'll benchmark 0.5 next.
> 
> brian
> 
> -Original Message-
> From: Jonathan Ellis [mailto:jbel...@gmail.com]
> Sent: Friday, January 29, 2010 1:13 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra versus HBase performance study
> 
> Thanks for posting your results; it is an interesting read and we are
> pleased to beat HBase in most workloads. :)
> 
> Since you originally benchmarked 0.4.2, you might be interested in the
> speed gains in 0.5.  A couple graphs here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
> 
> 0.6 (beta in a few weeks?) is looking even better. :)
> 
> -Jonathan

--
Ian Holsman
i...@holsman.net





Re: [VOTE] Graduation

2010-01-25 Thread Ian Holsman
+1.
On Jan 26, 2010, at 8:11 AM, Eric Evans wrote:

> 
> There was some additional discussion[1] concerning Cassandra's
> graduation on the incubator list, and as a result we've altered the
> initial resolution to expand the size of the PMC by three to include our
> active mentors (new draft attached).
> 
> I propose a vote for Cassandra's graduation to a top-level project.
> 
> We'll leave this open for 72 hours, and assuming it passes, we can then
> take it to a vote with the Incubator PMC.
> 
> +1 from me!
> 
> 
> [1] http://thread.gmane.org/gmane.comp.apache.incubator.general/24427
> 
> -- 
> Eric Evans
> eev...@rackspace.com
> 

--
Ian Holsman
i...@holsman.net





Re: Data Model Index Text

2010-01-08 Thread Ian Holsman
Hi ML.
this sounds more like a job for SOLR, but if you want to do this with 
cassandra, 
you should look at Jake's Lucandra http://github.com/tjake/Lucandra


you should also look at 
http://nicklothian.com/blog/2009/10/27/solr-cassandra-solandra/

I wouldn't recommend you building your own IR engine, just use one of the ones 
out there.

regards
Ian
On Jan 9, 2010, at 9:12 AM, ML_Seda wrote:

> 
> Hey,
> 
> I've been reading up on the Cassandra data model a bit, and would like to
> get some input from this forum on different techniques for a particular
> problem.
> 
> Assume I need to index millions of text docs (e.g. research papers), and
> allow the ability to query them by a given word inside or around any of the
> indexed docs.  meaning if i search for terms i would like to get a list of
> docs in which these terms show up (e.g. Michael Jordan = Michael is the main
> term, and Jordan is next term n1.  The same can be applied by indicating
> previous terms to Michael)
> 
> How do I model this in Cassandra?
> 
> Would my Keys be a concat of the middle term + docid?  Will I be able to do
> queries by wildcarding the docid?
> 
> Thanks.
> -- 
> View this message in context: 
> http://n2.nabble.com/Data-Model-Index-Text-tp4275199p4275199.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at 
> Nabble.com.

--
Ian Holsman
i...@holsman.net





Re: Advise for choice

2010-01-07 Thread Ian Holsman
things positive for solr.
- mature and stable
- lots of documentation
- a swiss army knife and can be used for a LOT of things, especially if you are 
manipulating a lot of text.
- the query language is easier to use (imho.. but i've been using solr for 
years, so I am biased)
- lots of people know it
- fast caching
- faceting

cons for solr.
- hard to update a single field (you need to fetch & re-insert the entire row)
- commits/optimizes can slow things down to a crawl
- can't store structured data easily. (for example a blog post has tags which 
have both a key and a value).
- scalability isn't as easy as cassandra. sharding works, but it requires a lot 
of manual effort
- it's easy to get started and get something running, but if you need to do 
something out of the ordinary, it gets hard fast. I think cassandra is more 
flexible to do ordinary things that don't involve text-matching.
- replication isn't instant. (this is changing.. also look at zoie which may 
help).

of course, if you tell us what your trying to do, I can be more specific.
FWIW.. we use SOLR for some of our news-content (see love.com and 
newsrunner.com) and it works fast enough for us. 
We have a incoming doc rate of about 8-10 news articles/second.

On Jan 8, 2010, at 5:43 AM, Nathan McCall wrote:

> Agreed that there is not much to go on here in the original question.
> I will say that we very recently found a good fit with Solr and
> Cassandra in how we deal with a very heavy write volume of news
> article data. Cassandra is excellent with write throughput and high
> availability, but our search use cases are with time-dependent news
> content, so we need lots of term proximity, faceting and ordering
> functionality.
> 
> We probably could store everything in Solr, but the above approach
> will allow us to make articles immediately available in a
> fault-tolerant manner while being able to efficiently send batches at
> regular intervals to Solr and therefore scale out our ingestion of
> news articles a little smoother. Full disclosure: I am still getting
> my head around the innards of Solr replication and clustering, but so
> far I feel like we made a good choice.
> 
> Hopefully the above will be helpful to folks during their evaluations.
> 
> Cheers,
> -Nate
> 
> 
> On Thu, Jan 7, 2010 at 10:02 AM, Joseph Bowman  
> wrote:
>> I have to agree with Tatu. If you're struggling to find reasons to validate
>> that Cassandra is the better choice for your task than Solr, then perhaps
>> Solr is the correct choice. I kind of went through the same thing recently,
>> struggled to make Cassandra fit what I was doing, then realized I was doing
>> it wrong and moved to MongoDB.
>> Cassandra is great at what it tries to accomplish, which is managing
>> gigantic datasets in a distributed way. The question is, is that really what
>> you need?
>> 
>> On Thu, Jan 7, 2010 at 12:58 PM, Tatu Saloranta 
>> wrote:
>>> 
>>> On Thu, Jan 7, 2010 at 3:16 AM, Richard Grossman 
>>> wrote:
>>>> Hi,
>>>> 
>>>> This message is little different than support.
>>>> I'm confronted to problem where people want to change Cassandra with
>>>> Solr
>>>> server. I really think that our problem is a great case for cassandra
>>>> but I
>>>> need more arguments.
>>>> 
>>>> So please if you've some time just put some idea why to use cassandra
>>>> instead solr.
>>> 
>>> Solution is generally applicable to a problem... so what is the (main) use
>>> case?
>>> 
>>> That would make it easier to find arguments for or against proposed
>>> solution.
>>> 
>>> -+ Tatu +-
>> 
>> 

--
Ian Holsman
i...@holsman.net





FWD: [protobuf] Captain Proto -- A Protobuf RPC system using capability-based security

2009-12-21 Thread Ian Holsman
There was a discussion about authorization for cassandra a while back.

I thought this may be of interest. yes it is based on protobuf, but it 
should work equally as well with Thrift if someone was eager enough I 
would think.

Regards
Ian

 Original Message 
Subject:[protobuf] Captain Proto -- A Protobuf RPC system using 
capability-based security
Date:   Sun, 13 Dec 2009 03:18:54 -0800
From:   Kenton Varda 


To: Protocol Buffers 





Hi all,

As I've mentioned a couple times in other threads, last weekend I wrote 
up a simple RPC system based on Protocol Buffers which treats services 
as capabilities, in the sense of capability-based security.


http://en.wikipedia.org/wiki/Capability-based_security


Essentially what this means is that you can construct a service 
implementation and then embed a reference to it into an RPC message sent 
to or from some other service.  So, for instance, if a client wants a 
server to be able to make calls back to the client, it can simply send 
the server a reference to a service implemented by the client.  Or, for 
another example, a service which acts as a resource broker could grant a 
client access to a particular resource by sending it a reference to a 
service object representing that resource, to which the client can then 
make calls.  Note that a particular service object cannot be accessed 
over a particular connection until that service object has actually been 
sent in an RPC over that connection.  This property is useful for 
security, as described in the above link.

In any case, the project is called Captain Proto and can be found here:


http://code.google.com/p/capnproto/


Currently it only has Java support, though I hope it will eventually 
support other languages as well.  The wire protocol is itself defined in 
terms of protocol buffers:


http://code.google.com/p/capnproto/source/browse/proto/capnproto.proto


There is basic documentation here:


http://capnproto.googlecode.com/hg/doc/java/index.html


You can also look at the test for an example:


http://code.google.com/p/capnproto/source/browse/java/test.proto
http://code.google.com/p/capnproto/source/browse/java/Test.java


I expect the API to change quite a bit, so be warned that if you write 
code based on it, that code will have to change at some point.

Future hopes/plans:
- Improve API by taking advantage of code generator plugins.
- Define a standard "ServiceDirectory" service which can serve as the 
default service on servers that export multiple services.  The directory 
would have a method like Open() which takes the name of some particular 
service and returns a reference to the corresponding service object.
- Provide a library of capability design pattern implementations, e.g. 
the revocable membrane.
- Define a capnproto-over-HTTP protocol which can be used by AJAX clients.
- Support C++ and Python.

For the time being, this is not an official Google project.  It's just 
something I wrote for fun -- or, more accurately, to support some other 
fun stuff that I want to work on.  That said, due to the obviously wide 
applicability, I might try to make it more official at some point.

--

You received this message because you are subscribed to the Google 
Groups "Protocol Buffers" group.
To post to this group, send email to 
proto...@googlegroups.com
.
To unsubscribe from this group, send email to 

protobuf+unsubscr...@googlegroups.com
.
For more options, visit this group at 

http://groups.google.com/group/protobuf?hl=en
.

--
Ian Holsman
i...@holsman.net





Re: Is Cassandra suitable for multi criteria search engine

2009-12-18 Thread Ian Holsman
Hi David.
3 million is a good size. I would say it is a 'medium'
but it really depends on a lot of factors, and what exactly you are indexing.
as a rule of thumb if you can fit the index in memory you'll be fine. 
It also depends on how much of a long tail you have, how often you update the 
index (each commit clears the caches)
and how complex your queries are. I've found the number of commits plays a 
bigger part than the physical size.

You should get a full size index up and benchmark it, in normal operation to be 
sure.

you can also install Solr in 'distributed' mode, which lets you scale it out 
further.

On Dec 19, 2009, at 12:30 AM, David MARTIN wrote:

> Is a 3 million records set not a big deal for Solr? If I consider
> about 30 properties per item, I have to give Solr 90 millions
> properties to consider. Is that volume still correct for such a
> solution?
> 
> And regarding lucene on top of Cassandra, can people share their feed
> back, if any, about such a solution. Pros & cons vs Solr for instance.
> 
> Thank you.
> 
> 
> 2009/12/17, Jake Luciani :
>> True replication and scale.
>> 
>> On Dec 17, 2009, at 4:56 PM, Josh  wrote:
>> 
>>> I've used solr a bunch (And I'd cosign gabriel: Solr's fantastic) and
>>> I'm trying to work my head around Cassandra, but I'm really hazy on
>>> what the Cassandra+Lucene combo gives you.  What are you trying to
>>> accomplish?  (Meant earnestly:  I'm really curious)
>>> 
>>> josh
>>> @schulz
>>> http://schulzone.org
>>> 
>>> 
>>> On Thu, Dec 17, 2009 at 2:52 PM, Jake Luciani 
>>> wrote:
>>>> You can also put lucene on top of Cassandra by using.
>>>> 
>>>> http://github.com/tjake/Lucandra
>>>> 
>>>> On Dec 17, 2009, at 4:43 PM, gabriele renzi 
>>>> wrote:
>>>> 
>>>>> On Thu, Dec 17, 2009 at 7:48 PM, David MARTIN
>>>>> 
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> That's what I was thinking. And I'm glad to read Apache solr in
>>>>>> your
>>>>>> answer as it is one of my main leads.
>>>>> 
>>>>> as a happy solr user, I second the suggestion, lucene (the
>>>>> technology
>>>>> behind solr) handles a number of documents like that without a
>>>>> sweat,
>>>>> and solr gives your replication and a few other good things.
>>>> 
>> 

--
Ian Holsman
i...@holsman.net





Re: read latency creaping up

2009-12-14 Thread Ian Holsman
can you make it so that the client restarts the connection every 30m or so ?
It could be an issue in thrift or something with long-lived connections.

On Dec 15, 2009, at 10:16 AM, Brian Burruss wrote:

> i agree.  i don't know anything about thrift, and i don't know how it keeps 
> connections open or manages resources from a client or server perspective, 
> but this situation suggests that maybe killing the clients is forcing the 
> server to free something.
> 
> how's that sound :)
> 
> 
> From: Jonathan Ellis [jbel...@gmail.com]
> Sent: Monday, December 14, 2009 3:12 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: read latency creaping up
> 
> hmm, me neither
> 
> but, I can't think how restarting the client would, either :)
> 
> On Mon, Dec 14, 2009 at 4:59 PM, Brian Burruss  wrote:
>> Well not sure how that would affect he latency as reported by the Cassandra 
>> server using nodeprobe cfstats
>> 
>> Jonathan Ellis  wrote:
>> 
>> 
>> possibly the clients are running into memory pressure?
>> 
>> On Mon, Dec 14, 2009 at 4:27 PM, Brian Burruss  wrote:
>>> thx, i'm actually the "B. Todd Burruss" in that thread ..  we changed our 
>>> email system and well now, i'm just Brian .. long story.
>>> 
>>> anyway, in this case it isn't compaction pendings as i can kill the clients 
>>> and immediately restart and the latency is back to a reasonable number.  
>>> i'm still investigating.
>>> 
>>> thx!
>>> 
>>> From: Eric Evans [eev...@rackspace.com]
>>> Sent: Monday, December 14, 2009 8:23 AM
>>> To: cassandra-user@incubator.apache.org
>>> Subject: RE: read latency creaping up
>>> 
>>> On Sun, 2009-12-13 at 13:18 -0800, Brian Burruss wrote:
>>>> if this isn't a known issue, lemme do some more investigating.  my
>>>> test client becomes "more random" with reads as time progresses, so
>>>> possibly this is what causes the latency issue.  however, all that
>>>> being said, the performance really becomes bad after a while.
>>> 
>>> Have a look at the following thread:
>>> 
>>> http://thread.gmane.org/gmane.comp.db.cassandra.user/1402
>>> 
>>> 
>>> --
>>> Eric Evans
>>> eev...@rackspace.com
>>> 
>>> 
>> 

--
Ian Holsman
i...@holsman.net





Re: Cassandra vs HBase

2009-12-07 Thread Ian Holsman

This is slightly off-topic

There is a recent project called hadoop online (hop) on google-code  
that promises a online/continuous query ability on top of hadoop which  
should allow for near real time activities instead of the batch stuff  
that mapred does


---
Sent from my phone
Ian Holsman - 703 879-3128

On 06/12/2009, at 3:12 PM, Joseph Bowman   
wrote:


When I wrote my Why Cassandra article, I didn't get into the why I  
didn't choose x platform because I didn't want to start a flame war  
by doing comparisons. For HBase, the primary reason I didn't choose  
it is that while there were benchmarks of what it could  
theoretically do, there wasn't any real real world deployments  
proving it. My experience as a systems administrator is that it's  
best to go with a product that's been proven over time in real world  
scenarios.


I'll add to this though, that nothing nosql, even Cassandra, has  
reached the point where I feel it's no-brainer to choose it over  
anything, including sql based solutions like mysql and oracle. It  
really comes down to your requirements.


On Sat, Dec 5, 2009 at 11:04 PM, Matt Revelle   
wrote:

On Dec 5, 2009, at 21:45, Joe Stump  wrote:


On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:

[Is] HBase used for real timish applications and if so any ideas  
what the largest deployment is.


I don't know of anyone off the top of my head who's using anything  
built on top of Hadoop for a real-time environment. Hadoop just  
wasn't built for that. It was built, like MapReduce, for crunching  
absurd amounts of data across hundreds of nodes in a "reasonable"  
amount of time.


Just my $0.02.

--Joe


While Hadoop MapReduce isn't meant for realtime use, HBase can  
handle it.


Over last summer there were some benchmarks included in HBase/Hadoop  
presentations that showed, IIRC, performance comparable to Cassandra.





Re: Persistently increasing read latency

2009-12-02 Thread Ian Holsman
hmm.
doesn't that leave the trunk in a bad position in terms of new development?
you may go through times when a major feature lands and trunk is broken/buggy.
or are you planning on building new features on a branch and then merging into 
trunk when it's stable?

On Dec 3, 2009, at 5:32 AM, Jonathan Ellis wrote:

> We are using trunk.  0.5 beta / trunk is better than 0.4 at the 0.4
> functionality and IMO is production ready (although you should always
> test first), but I would not yet rely on the new stuff (bootstrap,
> loadbalance, and moving nodes around in general).
> 
> -Jonathan
> 
> On Wed, Dec 2, 2009 at 12:26 PM, Adam Fisk  wrote:
>> Helpful thread guys. In general, Jonathan, would you recommend
>> building from trunk for new deployments at our current snapshot in
>> time? Are you using trunk at Rackspace?
>> 
>> Thanks.
>> 
>> -Adam
>> 
>> 
>> On Tue, Dec 1, 2009 at 6:18 PM, Jonathan Ellis  wrote:
>>> On Tue, Dec 1, 2009 at 7:31 PM, Freeman, Tim  wrote:
>>>> Looking at the Cassandra mbean's, the attributes of ROW-MUTATION-STAGE and 
>>>> ROW-READ-STAGE and RESPONSE-STAGE are all  less than 10.  
>>>> MINOR-COMPACTION-POOL reports 1218 pending tasks.
>>> 
>>> That's probably the culprit right there.  Something is wrong if you
>>> have 1200 pending compactions.
>>> 
>>> This is something that upgrading to trunk will help with right away
>>> since we parallelize compactions there.
>>> 
>>> Another thing you can do is increase the memtable limits so you are
>>> not flushing + compacting so often with your insert traffic.
>>> 
>>> -Jonathan
>>> 
>> 
>> 
>> 
>> --
>> Adam Fisk
>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>> http://twitter.com/adamfisk
>> 

--
Ian Holsman
i...@holsman.net





Re: Wish list [from "users survey" thread]

2009-11-24 Thread Ian Holsman
well.
I'd like to see how many times a specific user hits the site, without having to 
add them up every time.

On Nov 24, 2009, at 9:47 AM, Ted Zlatanov wrote:

> On Mon, 23 Nov 2009 13:45:09 -0600 Jonathan Ellis  wrote: 
> 
> JE> 1. Increment/decrement: "atomic" is a dirty word in a system
> JE> emphasizing availability, but incr/decr can be provided in an
> JE> "eventually consistent" manner with vector clocks.  There are other
> JE> possible approaches but this is probably the best fit for us.  We'd
> JE> want to allow ColumnFamilies with either traditional (for Cassandra)
> JE> long timestamps, or vector clocks, but not mixed.  The bad news is,
> JE> this is a very substantial change and will probably not be in 0.9
> JE> unless someone steps up to do the work.  (This would also cover
> JE> "flexible conflict resolution," which came up as well.)
> 
> Just for my benefit, can someone explain the reasons why atomic inc/dec
> are needed inside Cassandra if 64-bit time stamps and UUIDs are
> available?  I have not needed them in my usage but am curious about
> other schemas that do.
> 
> Thanks
> Ted
> 

--
Ian Holsman
i...@holsman.net





Re: Social network feed/wall question

2009-11-22 Thread Ian Holsman
One of the problems you may face is that the common operation is 'get  
last X'.
You might want to look at redis as an alternative as it supports this  
operation natively.
I'm sure the Cassandra experts can help with your schema to optimize  
it as well



---
Sent from my phone
Ian Holsman - 703 879-3128

On 23/11/2009, at 9:55 AM, Kristian Lunde  wrote:

I am currently building a social network application where one of  
the important features is a feed / wall (Something similar to the  
Facebook wall). We will have several feeds, one for each profile and  
one for each group and so on.  I have looked into using Cassandra  
for storing this, but I am not sure I am on the right track  
regarding my "schema".


My thoughs was that the schema would be similar to this

Feed [SuperColumn]
- Row [user id as identifier]
[Columns]
- type
- timestamp
- message
- url

Each user would have his own feed super column and store all feed  
items related to him in this super column. I am not sure this is the  
best idea, since it creates an insane amount of writes whenever  
someone writes to their wall (this will have to write the feed of  
all his friends). Also I read in this thread http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg00360.html 
 that super columns are not suited for > 60k rows in a super column.


What would be the optimal way of storing a set of feeds in cassandra?

Thanks
Kristian


Re: Cassandra users survey

2009-11-20 Thread Ian Holsman




---
Sent from my phone
Ian Holsman - 703 879-3128

On 21/11/2009, at 12:38 PM, Dan Di Spaltro   
wrote:



At Cloudkick we are using Cassandra to store monitoring statistics and
running analytics over the data.  I would love to share some ideas
about how we set up our data-model, if anyone is interested.  This
isn't the right thread to do it in, but I think it would be useful to
show how we store billions of points of data in Cassandra (and maybe
get some feedback).

Wishlist
-remove_slice_range
-auto loadbalancing
-inc/dev

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis   
wrote:

Hi all,

I'd love to get a better feel for who is using Cassandra and what  
kind

of applications it is seeing.  If you are using Cassandra, could you
share what you're using it for and what stage you are at with it
(evaluation / testing / production)? Also, what alternatives you
evaluated/are evaluating would be useful.  Finally, feel free to  
throw

in "I'd love to use Cassandra if only it did X" wishes. :)

I can start: Rackspace is using Cassandra for stats collection
(testing, almost production) and as a backend for the Mail & Apps
division (early testing).  We evaluated HBase, Hypertable, dynomite,
and Voldemort as well.

Thanks,

-Jonathan

(If you're in stealth mode or don't want to say anything in public,
feel free to reply to me privately and I will keep it off the  
record.)






--
Dan Di Spaltro


Re: Cassandra users survey

2009-11-20 Thread Ian Holsman
We're looking at it to be part of a near real time Web analytics engine, which 
sounds similar to Ooyala.
at the moment I'm pushing to get the thing open sourced if possible.

we're looking at combining Cassandra + Esper, but we are still in the very 
early stages.
On Nov 21, 2009, at 8:17 AM, Jonathan Ellis wrote:

> Hi all,
> 
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> 
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> 
> Thanks,
> 
> -Jonathan
> 
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)

--
Ian Holsman
i...@holsman.net





Re: Meetup?

2009-11-12 Thread Ian Holsman

I'm in Melbourne, and frequently in DC and NY as well.
On Nov 13, 2009, at 11:18 AM, Nick Lothian wrote:


Where in Australia are you from? (Adelaide here)

I might be interested in one down here.

From: Chris Were [mailto:chris.w...@gmail.com]
Sent: Friday, 13 November 2009 9:09 AM
To: cassandra-user@incubator.apache.org
Subject: OT: Meetup?

Hi,

I'm from Australia, but currently in SF for the next 2 weeks working  
on a startup.


If any cassandra users want to meet up to discuss cassandra or any  
other tech, shoot me an email.


Cheers,
Chris

IMPORTANT: This e-mail, including any attachments, may contain  
private or confidential information. If you think you may not be the  
intended recipient, or if you have received this e-mail in error,  
please contact the sender immediately and delete all copies of this  
e-mail. If you are not the intended recipient, you must not  
reproduce any part of this e-mail or disclose its contents to any  
other party. This email represents the views of the individual  
sender, which do not necessarily reflect those of Education.au  
except where the sender expressly states otherwise. It is your  
responsibility to scan this email and any files transmitted with it  
for viruses or any other defects. education.au limited will not be  
liable for any loss, damage or consequence caused directly or  
indirectly by this email.


--
Ian Holsman
i...@holsman.net





Re: bandwidth limiting Cassandra's replication and access control

2009-11-11 Thread Ian Holsman
service layer, the java  
security manager isn’t going to suffice.  What this snippet could  
do, though, and may be the rationale for the request, is to ensure  
that unauthorized users cannot instantiate a new Cassandra server.   
However, if a user has physical access to the machine on which  
Cassandra is installed, they could easily bypass that layer of  
security.


What if Cassandra IS the application you're exposing?  Imagine a  
large company that creates one large internal Cassandra deployment,  
and has multiple departments it wants  to create separate keyspaces  
for.  You can do that now, but there's nothing except a gentlemen's  
agreement to prevent one department from trashing another  
department's keyspace, and accidents do happen. You can front the  
service with some kind of application layer, but then you have  
another API to maintain, and you'll lose some performance this way.


-Brandon




--
Ian Holsman
i...@holsman.net





using cassandra as a real time DW

2009-11-05 Thread Ian Holsman

hey guys.
I was wondering if anyone is thinking of/is using cassandra to power a  
real time data warehouse.
if so would you consider collaborating/open sourcing the effort so  
others could join in.


TIA
Ian.
--
Ian Holsman
i...@holsman.net





Re: Got Logo?

2009-09-19 Thread Ian Holsman
Let's go with what we have.
We can get it fixed later
Usually groups run competitions and get professional looking logos then
No need to pay money

On 9/19/09, Matt Kydd  wrote:
> I think the one up on the site by Makram Saleh is really quite good -
> just needs a polish.
>
> http://issues.apache.org/jira/browse/CASSANDRA-231
>
> MK
>
> 2009/9/19 Bill de hOra :
>> David Pollak wrote:
>>
>>> I'll be happy to kick in $50 towards a 99Designs bounty for a Cassandra
>>> logo
>>
>> Likewise.
>>
>> Bill
>>
>

-- 
Sent from my mobile device


Re: New Features - Future releases

2009-09-18 Thread Ian Holsman

There was mention of lucene integration in the initial FB release.

On Sep 18, 2009, at 9:59 PM, Jeffrey Damick wrote:


Speaking of lucene, has anyone done any integration with lucene for
cassandra or are there plans to provide full-text searches within  
cassandra?


Thanks
-jeff


On 9/18/09 9:49 PM, "Joe Stump"  wrote:



On Sep 18, 2009, at 9:46 PM,  wrote:


Your idea is not bad: having a service layer in front of Cassandra.
How
about a separate opensource project or a standard/spec for ACL in  
the

service layer?


Sure. SOLR is kind of like this for Lucene.

--Joe




--
Ian Holsman
i...@holsman.net





Re: Newbe´s question

2009-08-26 Thread Ian Holsman
isn't there a way to use svn:external or svn:link to pull them in from  
their own repos?

(not sure how legal it would be).
On Aug 27, 2009, at 10:03 AM, Jonathan Ellis wrote:


I thought about that, but I really don't want Cassandra committers to
have to be in the business of updating them all when we make changes,
and having them in the repo creates that expectation even in contrib.

On Wed, Aug 26, 2009 at 6:57 PM, Ian Holsman wrote:

would it be worthwhile to start including these clients in the core
codebase? in some kind of 'client' or 'contrib' directory?

maybe even mentioning the 'popular' clients that people use in the  
readme

(with links to them) would be good.

On Aug 27, 2009, at 9:18 AM, Sal Fuentes wrote:


Just would like to say great job so far.

On Wed, Aug 26, 2009 at 4:01 PM, Ian Eure  wrote:
On Aug 25, 2009, at 2:46 PM, Drew Schleck wrote:

For anyone using my branch of Lazyboy, Ian Eure pulled my work,
improved it, and more. You ought to switch back to his version.

I'm doing some heavy refactoring all this week, to bring it up to
Cassandra trunk and simplify/genericize it wherever possible. I  
should have

something to show in a day or two.

Feel free to contact me if you have questions or requests.

 - Ian



--
Salvador Fuentes Jr.
323-540-4SAL


--
Ian Holsman
i...@holsman.net






--
Ian Holsman
i...@holsman.net





Re: Newbe´s question

2009-08-26 Thread Ian Holsman
would it be worthwhile to start including these clients in the core  
codebase? in some kind of 'client' or 'contrib' directory?


maybe even mentioning the 'popular' clients that people use in the  
readme (with links to them) would be good.


On Aug 27, 2009, at 9:18 AM, Sal Fuentes wrote:


Just would like to say great job so far.

On Wed, Aug 26, 2009 at 4:01 PM, Ian Eure  wrote:
On Aug 25, 2009, at 2:46 PM, Drew Schleck wrote:

For anyone using my branch of Lazyboy, Ian Eure pulled my work,
improved it, and more. You ought to switch back to his version.

I'm doing some heavy refactoring all this week, to bring it up to  
Cassandra trunk and simplify/genericize it wherever possible. I  
should have something to show in a day or two.


Feel free to contact me if you have questions or requests.

 - Ian



--
Salvador Fuentes Jr.
323-540-4SAL


--
Ian Holsman
i...@holsman.net





Re: Announcing 0.3.0

2009-07-20 Thread Ian Holsman
you need to give a tiny bit of time (say 24 hours) for the mirrors to  
catch up.


On 21/07/2009, at 10:09 AM, Daniel Hengeveld wrote:


I clicked on the link in my browser (Safari) - even copied the url and
pasted it into the location bar of a new window, and was absolutely
*not* greeted with a page of mirrors. Upon your reply, I tried an
alternate browser (Firefox) and did get the links. Still doesn't work
in Safari. Thanks for the help!

~d

On Mon, Jul 20, 2009 at 17:05, Jeff  
Hodges wrote:

Click through. It takes you to a page of mirror links.

In the future, much debugging can be done by pointing your browser  
to the page.

--
Jeff

On Mon, Jul 20, 2009 at 4:56 PM, Daniel  
Hengeveld wrote:

When I download this file, I get a 5KB file rather than the actual
release. Is anyone else having this problem?

On Mon, Jul 20, 2009 at 12:57, Eric Evans  
wrote:

It is with great pleasure that I announce the very first release of
Apache Cassandra, 0.3.0[1]

A projects first release is a significant milestone and one that  
our
burgeoning community should be proud of. Many thanks to everyone  
that

submitted patches and bug reports, helped with testing, documented,
organized, or just asked the important questions.

Without further ado:

The official download:
http://www.apache.org/dyn/closer.cgi/incubator/cassandra/0.3.0/apache-cassandra-incubating-0.3.0-bin.tar.gz
SVN Tag:
https://svn.apache.org/repos/asf/incubator/cassandra/tags/cassandra-0.3.0-final/


[1] DISCLAIMER: Apache Cassandra is an effort undergoing  
incubation at
The ASF, sponsored by the Apache Incubator Project Management  
Committee

(PMC).

Incubation is required of all newly accepted projects until a  
further
review indicates that the infrastructure, communications, and  
decision

making process have stabilized in a manner consistent with other
successful ASF projects.

While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the  
project

has yet to be fully endorsed by the ASF.

--
Eric Evans
eev...@rackspace.com






--
..[daniel hengeveld]..
neoglam.com







--
..[daniel hengeveld]..
neoglam.com


--
Ian Holsman
i...@holsman.net





Re: AttributeError: 'str' object has no attribute 'write'

2009-07-19 Thread Ian Holsman

hi Gasol.
shouldn't regeneration of the interface be part of the build process?

On 20/07/2009, at 3:29 AM, Gasol Wu wrote:


hi,
the cassandra.thrift has changed.
u need to generate new python client and compile class again.


On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
Hi guys
the new trunk cassandra doesnt work for a simple insert, how do we  
get this working


client.insert('Table1', 'tofu', 'Super1:Related:tofu  
stew',pickle.dumps(dict(count=1)), time.time(), 0)

---
AttributeErrorTraceback (most recent  
call last)


/home/mark/work/cexperiments/ in ()

/home/mark/work/common/cassandra/Cassandra.py in insert(self, table,  
key, column_path, value, timestamp, block_for)

358  - block_for
359 """
--> 360 self.send_insert(table, key, column_path, value,  
timestamp, block_for)

361 self.recv_insert()
362

/home/mark/work/common/cassandra/Cassandra.py in send_insert(self,  
table, key, column_path, value, timestamp, block_for)

370 args.timestamp = timestamp
371 args.block_for = block_for
--> 372 args.write(self._oprot)
373 self._oprot.writeMessageEnd()
374 self._oprot.trans.flush()

/home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
   1923 if self.column_path != None:
   1924   oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
-> 1925   self.column_path.write(oprot)
   1926   oprot.writeFieldEnd()
   1927 if self.value != None:

AttributeError: 'str' object has no attribute 'write'
In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu  
stew',pickle.dumps(dict(count=1)), time.time(), 0)



--
Bidegg worlds best auction site
http://bidegg.com



--
Ian Holsman
i...@holsman.net





Re: New cassandra in trunk - breaks python thrift interface (was AttributeError: 'str' object has no attribute 'write')

2009-07-19 Thread Ian Holsman

hi mobile.
is it possible to put these as JIRA bugs ? instead of just mailing  
them on the list ?


that way people can give them a bit more attention. and other people  
who have the same issue will be easily see what is going on.


the URL is here :- https://issues.apache.org/jira/browse/CASSANDRA
regards
Ian

On 20/07/2009, at 6:36 AM, mobiledream...@gmail.com wrote:


ok
so which is the version where cassandra python thrift works out of  
the box

thanks

On 7/19/09, Jonathan Ellis  wrote: Don't run  
trunk if you're not going to read "svn log."


The api changed with the commit of the 139 patches (and it will change
again with the 185 ones).

look at interface/cassandra.thrift to see what arguments are expected.


On Sun, Jul 19, 2009 at 3:31 PM,  wrote:
> Hey Gasol wu
> i regenerated the new thrift interface using
> thrift -gen py cassandra.thrift
>
>
>
> client.insert('Table1', 'tofu', 'Super1:Related:tofu stew',
> pickle.dumps(dict(count=1)), time.time(), 0)
>  
---
> AttributeErrorTraceback (most recent  
call last)

>
> /home/mark/work/cexperiments/ in ()
>
> /home/mark/work/common/cassandra/Cassandra.py in insert(self,  
table, key,

> column_path, value, timestamp, block_for)
> 358  - block_for
> 359 """
> --> 360 self.send_insert(table, key, column_path, value,  
timestamp,

> block_for)
> 361 self.recv_insert()
> 362
>
> /home/mark/work/common/cassandra/Cassandra.py in send_insert(self,  
table,

> key, column_path, value, timestamp, block_for)
> 370 args.timestamp = timestamp
> 371 args.block_for = block_for
> --> 372 args.write(self._oprot)
> 373 self._oprot.writeMessageEnd()
> 374 self._oprot.trans.flush()
>
> /home/mark/work/common/cassandra/Cassandra.py in write(self, oprot)
>1923 if self.column_path != None:
>1924   oprot.writeFieldBegin('column_path', TType.STRUCT, 3)
> -> 1925   self.column_path.write(oprot)
>1926   oprot.writeFieldEnd()
>1927 if self.value != None:
>
> AttributeError: 'str' object has no attribute 'write'
>
>
> On Sun, Jul 19, 2009 at 10:29 AM, Gasol Wu   
wrote:

>>
>> hi,
>> the cassandra.thrift has changed.
>> u need to generate new python client and compile class again.
>>
>>
>> On Mon, Jul 20, 2009 at 1:18 AM,  wrote:
>>>
>>> Hi guys
>>> the new trunk cassandra doesnt work for a simple insert, how do  
we get

>>> this working
>>> client.insert('Table1', 'tofu', 'Super1:Related:tofu
>>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>>
>>>  
---
>>> AttributeErrorTraceback (most recent  
call

>>> last)
>>> /home/mark/work/cexperiments/ in ()
>>> /home/mark/work/common/cassandra/Cassandra.py in insert(self,  
table, key,

>>> column_path, value, timestamp, block_for)
>>> 358  - block_for
>>> 359 """
>>> --> 360 self.send_insert(table, key, column_path, value,  
timestamp,

>>> block_for)
>>> 361 self.recv_insert()
>>> 362
>>> /home/mark/work/common/cassandra/Cassandra.py in  
send_insert(self, table,

>>> key, column_path, value, timestamp, block_for)
>>> 370 args.timestamp = timestamp
>>> 371 args.block_for = block_for
>>> --> 372 args.write(self._oprot)
>>> 373 self._oprot.writeMessageEnd()
>>> 374 self._oprot.trans.flush()
>>> /home/mark/work/common/cassandra/Cassandra.py in write(self,  
oprot)

>>>1923 if self.column_path != None:
>>>1924   oprot.writeFieldBegin('column_path', TType.STRUCT,  
3)

>>> -> 1925   self.column_path.write(oprot)
>>>1926   oprot.writeFieldEnd()
>>>1927 if self.value != None:
>>> AttributeError: 'str' object has no attribute 'write'
>>> In [4]: client.insert('Table1', 'tofu', 'Super1:Related:tofu
>>> stew',pickle.dumps(dict(count=1)), time.time(), 0)
>>>
>>> --
>>> Bidegg worlds best auction site
>>> http://bidegg.com
>>
>
>
>
> --
> Bidegg worlds best auction site
> http://bidegg.com
>



--
Bidegg worlds best auction site
http://bidegg.com


--
Ian Holsman
i...@holsman.net





Re: Best way to use a Cassandra Client in a multi-threaded environment?

2009-07-15 Thread Ian Holsman

ugh.
if this is a byproduct of thrift, we should have another way of  
getting to the backend.

serialization is *not* a desired feature for most people ;-0

On 16/07/2009, at 11:06 AM, Jonathan Ellis wrote:


What I mean is, if you have

client.rpc1()


it doesn't really matter if you can do

client.rpc2()


from another thread or not, since it's dumb. :)

On Wed, Jul 15, 2009 at 7:41 PM, Ian Holsman wrote:


On 16/07/2009, at 10:35 AM, Jonathan Ellis wrote:


IIRC thrift makes no effort to generate threadsafe code.

which makes sense in an rpc-oriented protocol really.


hmm.. not really. you can have a webserver calling a thrift backend  
quite
easily, and then you would have 100+ threads all calling the same  
code.


On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer  
wrote:


Hello,
Are there any recommendations on how to use Cassandra Clients in a
multi-threaded front-end application (java)? Is the Client thread- 
safe or

is
it best to do a client per thread (or object pool of some sort)?
Thanks,
Joel


--
Ian Holsman
i...@holsman.net






--
Ian Holsman
i...@holsman.net





Re: Best way to use a Cassandra Client in a multi-threaded environment?

2009-07-15 Thread Ian Holsman


On 16/07/2009, at 10:35 AM, Jonathan Ellis wrote:


IIRC thrift makes no effort to generate threadsafe code.

which makes sense in an rpc-oriented protocol really.


hmm.. not really. you can have a webserver calling a thrift backend  
quite easily, and then you would have 100+ threads all calling the  
same code.


On Wed, Jul 15, 2009 at 7:25 PM, Joel Meyer  
wrote:

Hello,
Are there any recommendations on how to use Cassandra Clients in a
multi-threaded front-end application (java)? Is the Client thread- 
safe or is

it best to do a client per thread (or object pool of some sort)?
Thanks,
Joel


--
Ian Holsman
i...@holsman.net





Re: Non relational db meetup - San Francisco, June 11th

2009-05-12 Thread Ian Holsman

It looks like it is sold-out.

On 13/05/2009, at 4:37 PM, Jonas Bonér wrote:


2009/5/12 Jonathan Ellis :
That's true, but 100 people is about the largest space you're going  
to

find for free, so past that you'd have to start charging people and
worrying about taxes and such.  Messy.


No worries. That makes sense. Good initiative. Have fun.



Maybe next year... :)



Hehe. Sounds good.


-Jonathan

On Tue, May 12, 2009 at 2:02 PM, Jonas Bonér   
wrote:

Great initiative.
Just sad that it is not the week before (during JavaOne). Then I  
think

a lot of people (including me) could go.

2009/5/12 Johan Oskarsson :
Cassandra will be represented by Avinash Lakshman on a free full  
day
meetup covering "open source, distributed, non relational  
databases" on

June 11th in San Francisco.

The idea is that the event will give people interested in this  
area a
great introduction and an easy way to compare the different  
projects out
there as well as the opportunity to discuss them with the  
developers.


Registration
The event is free but space is limited, please register if you  
wish to

attend: http://nosql.eventbrite.com/


Preliminary schedule, 2009-06-11
09.45: Doors open
10.00: Intro session (Todd Lipcon, Cloudera)
10.40: Voldemort (Jay Kreps, Linkedin)
11.20: Short break
11.30: Cassandra (Avinash Lakshman, Facebook)
12.10: Free lunch (sponsored by CBSi)
13.10: Dynomite (Cliff Moon, Powerset)
13.50: HBase (Ryan Rawson, Stumbleupon)
14.30: Short break
14.40: Hypertable (Doug Judd, Zvents)
15.20: Panel discussion
16.00: End of meetup, relocate to a pub called Kate O’Brien’s  
nearby


Location
Magma room, CBS interactive
235 Second Street
San Francisco, CA 94105

Sponsor
A big thanks to CBSi for providing the venue and free lunch.


/Johan Oskarsson, developer @ last.fm





--
Jonas Bonér

twitter: @jboner
blog:http://jonasboner.com
work:   http://crisp.se
work:   http://scalablesolutions.se
code:   http://github.com/jboner







--
Jonas Bonér

twitter: @jboner
blog:http://jonasboner.com
work:   http://crisp.se
work:   http://scalablesolutions.se
code:   http://github.com/jboner


--
Ian Holsman
i...@holsman.net