[ 
https://issues.apache.org/jira/browse/CASSANDRA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849860#action_12849860
 ] 

Ben Standefer commented on CASSANDRA-873:
-----------------------------------------

I think these are all good ideas. I would focus on building something simple 
that showcases Cassandra's strengths. I have found that Cassandra is powerful 
when storing and query large, dense, interconnected datasets. Think about what 
data is out there that could be analyzed at a deeper, more domain-specific 
level. (domain being per user, per host, per term, per property of an object). 

Facebook's inbox search is a search index per user. Twitter timelines are 
extremely customized and personalized based on the relationships between users. 
Digg first used Cassandra for the personalized "green badges" feature by 
creating a per-user index that stored each story you dugg and which of your 
friends also dugg each of those stories. Cassandra is good at storing large 
amounts of data that is a few layers deep. 

I think the demo app should not necessarily be feature-rich, but it should show 
us a slice of data we've never seen before. Maybe it could perform a 
locale-sensitive search or indexing task. One of the main strenghts of 
Cassandra is that it allows you to go from an extremely broad, large set of 
data to a very narrow, important set of data very quickly. Try to think of 
large sources of data. Chances are you might have to build some sort of 
crawler/indexer to get this data into Cassandra. 

A dumb example could be something like this: Let's crawl the pages 10,000 
websites and build an index of page properties (number of images on the page, 
title of the page, outbound links on the page). Then we could build a interface 
that let's users quickly drill down through all that data to find the pages 
that have no images, "dogs" in the title, and at least 5 outbound links. Now, 
this is not really a useful application in itself, but if you understand that 
Cassandra lets you drill down from a LOT of data to a very local-specific set 
of data very quickly, you should be able to think of something truly simple yet 
useful. 

Another example could be to build a service that let's websites pass a user id 
and page id every time a user views a page on their site. You could then serve 
back to that site an Amazon-style "users who viewed this page also viewed this 
page" module. You would maintain a per-page index in Cassandra that, given a 
page, shows what the most other popular pages are. For a decent-sized website, 
this is a lot of write traffic that people are having a hard time doing 
anything useful with in MySQL due to the sheer size of the data (ie pageviews). 

Replacing the backend of a blog engine might be something that Cassandra *can* 
do, but it doesn't really showcase why people would ever use Cassandra and how 
Cassandra is good at querying specific data out of large, broad datasets. Think 
of relationships between objects and properties, that's where the real value 
can come.

Like I said, the demo app doesn't need to have a ton of features, but it needs 
to showcase the capacity for handling large volumes of data.

> Create a Cassandra demo application
> -----------------------------------
>
>                 Key: CASSANDRA-873
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-873
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> http://twissandra.com/ is a demo Cassandra application built on django + 
> pycassa.  It's a great Cassandra showcase and very useful for people learning 
> Cassandra.  We could use more of those.
> Jake Luciani suggested one that presents full-text search of Wikipedia using 
> Lucandra (see 
> http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/
>  and http://github.com/tjake/Lucandra).  Feel free to propose other 
> application ideas here.
> Rackspace is willing to provide a VM to deploy on for a live demo, but 
> remember, to be really useful this needs full DIY instructions, the final 
> product is not the demo but the code + instructions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to