Java GC pauses, reality check

2016-11-25 Thread S Ahmed
Hello!

>From what I understand java GC pauses are pretty much a fact of life, but
you can tune the jvm to reduce the likelihood of the frequency and length
of GC pauses.

When using Cassandra, how frequent or long have these pauses known to be?
Even with tuning, is it safe to assume they cannot be eliminated?

Would a 20-30 second pause be something out of the ordinary?

Thanks.


what operations don't update materialized views?

2016-11-18 Thread S Ahmed
Hi,

Are there any operations that skip updating the materialized views?


store individual inventory items in a table, how to assign them correctly

2016-11-07 Thread S Ahmed
Say I have 100 products in inventory, instead of having a counter I want to
create 100 rows per inventory item.

When someone purchases a product, how can I correctly assign that customer
a product from inventory without having any race conditions etc?

Thanks.


RE: wide rows

2016-10-18 Thread S Ahmed
Hi,

Can someone clarify how you would model a "wide" row cassandra table?  From
what I understand, a wide row table is where you keep appending columns to
a given row.

The other way to model a table would be the "regular" style where each row
contains data so you would during a SELECT you would want multiple rows as
oppose to a wide row where you would get a single row, but a subset of
columns.

Can someone show a simple data model that compares both styles?

Thanks.


understanding partitions and # of nodes

2016-09-21 Thread S Ahmed
Hello,

If you have a 10 node cluster, how does having 10 partitions or 100
partitions change how cassandra will perform?

With 10 partitions you will have 1 partition per node.
WIth 100 partitions you will have 10 partitions per node.

With 100 partitions I guess it helps because when you add more nodes to
your cluster, the data can be redistributed since you have more nodes.

What else are things to consider?

Thanks.


understanding partitions

2016-09-21 Thread S Ahmed
Hello,

If you have a 10 node cluster, how does having 10 partitions or 100
partitions change how cassandra will perform?

With 10 partitions you will have 1 partition per node.
WIth 100 partitions you will have 10 partitions per node.

With 100 partitions I guess it helps because when you add more nodes to
your cluster, the data can be redistributed since you have more nodes.

What else are things to consider?


RE: no more zookeeper?

2014-01-28 Thread S Ahmed
Does C* no long use zookeeper?

I don't see a reference to it in the
https://github.com/apache/cassandra/blob/trunk/build.xml

If not, what replaced it?


Re: no more zookeeper?

2014-01-28 Thread S Ahmed
Sorry guys, I am confusing things with Hbase.  But Nate's jira look sure
looks interesting thanks.


On Tue, Jan 28, 2014 at 12:25 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Some people had done some custom cassandra zookeper integration back in
 the day. Triggers, there is some reference in the original facebook thrown
 over the wall to zk. No official release has ever used zk directly. Though
 people have suggested it.


 On Tue, Jan 28, 2014 at 12:08 PM, Andrey Ilinykh ailin...@gmail.comwrote:

 Why would cassandra use zookeeper?


 On Tue, Jan 28, 2014 at 7:18 AM, S Ahmed sahmed1...@gmail.com wrote:

 Does C* no long use zookeeper?

 I don't see a reference to it in the
 https://github.com/apache/cassandra/blob/trunk/build.xml

 If not, what replaced it?






Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread S Ahmed
From what I understood tons of people are running things on ec2, but it
could be the instance size is pretty large that it compares to a dedicated
server (especially if you go with SSD, it is like 1K/month!)


On Tue, Aug 6, 2013 at 3:54 AM, Aaron Morton aa...@thelastpickle.comwrote:

 how many nodes to start with(2 ok?) ?

 I'd recommend 3, that will give you some redundancy see
 http://thelastpickle.com/2011/06/13/Down-For-Me/

 Cheers

 -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 5/08/2013, at 1:41 AM, Rajkumar Gupta rajkumar@gmail.com wrote:

 okay, so what should a workable VPS configuration to start with  minimum
 how many nodes to start with(2 ok?) ?  Seriously I cannot afford the
 tensions of colocation setup.  My hosting provider provides SSD drives with
 KVM virtualization.





Re: funnel analytics, how to query for reports etc.

2013-07-23 Thread S Ahmed
Thanks Aaron.

Too bad Rainbird isn't open sourced yet!


On Tue, Jul 23, 2013 at 4:48 AM, aaron morton aa...@thelastpickle.comwrote:

 For background on rollup analytics:

 Twitter Rainbird
 http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
 Acunu http://www.acunu.com/

 Cheers

 -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 22/07/2013, at 1:03 AM, Vladimir Prudnikov v.prudni...@gmail.com
 wrote:

  This can be done easily,
 
  Use normal column family to store the sequence of events where key is
 session #ID identifying one use interaction with a website, column names
 are TimeUUID values and column value id of the event (do not write
 something like user added product to shopping cart, something shorter
 identifying this event).
 
  Then you can use counter column family to store counters, you can count
 anything, number of sessions, total number of events, number of particular
 events etc. One row per day for example. Then you can retrieve this row and
 calculate all required %.
 
 
  On Sun, Jul 21, 2013 at 1:05 AM, S Ahmed sahmed1...@gmail.com wrote:
  Would cassandra be a good choice for creating a funnel analytics type
 product similar to mixpanel?
 
  e.g.  You create a set of events and store them in cassandra for things
 like:
 
  event#1 user visited product page
  event#2 user added product to shopping cart
  event#3 user clicked on checkout page
  event#4 user filled out cc information
  event#5 user purchased product
 
  Now in my web application I track each user and store the events somehow
 in cassandra (in some column family etc)
 
  Now how will I pull a report that produces results like:
 
  70% of people added to shopping cart
  20% checkout page
  10% filled out cc information
  4% purchased the product
 
 
  And this is for a Saas, so this report would be for thousands of
 customers in theory.
 
 
 
  --
  Vladimir Prudnikov




high write load, with lots of updates, considerations? tomestombed data coming back to life

2013-07-23 Thread S Ahmed
I was watching some videos from the C* summit 2013 and I recall many people
saying that if you can some up with a design where you don't preform
updates on rows, that would make things easier (I believe it was because
there would be less compaction).

When building an Analytics (time series) app on top of C*, based on
Twitters Rainbird design (
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011),
this means there will be lots and lots of counters.

With lots of counters (updates), admin wise, what are some things to
consider?

Could old tomestombed data somehow come back to life?  I forget what
scenerio brings about old data (kinda scary!).


funnel analytics, how to query for reports etc.

2013-07-20 Thread S Ahmed
Would cassandra be a good choice for creating a funnel analytics type
product similar to mixpanel?

e.g.  You create a set of events and store them in cassandra for things
like:

event#1 user visited product page
event#2 user added product to shopping cart
event#3 user clicked on checkout page
event#4 user filled out cc information
event#5 user purchased product

Now in my web application I track each user and store the events somehow in
cassandra (in some column family etc)

Now how will I pull a report that produces results like:

70% of people added to shopping cart
20% checkout page
10% filled out cc information
4% purchased the product


And this is for a Saas, so this report would be for thousands of customers
in theory.


is there a key to sstable index file?

2013-07-17 Thread S Ahmed
Since SSTables are mutable, and they are ordered, does this mean that there
is a index of key ranges that each SS table holds, and the value could be 1
more sstables that have to be scanned and then the latest one is chosen?

e.g. Say I write a value abc to CF1.  This gets stored in a sstable.

Then I write def to CF1, this gets stored in another sstable eventually.

How when I go to fetch the value, it has to scan 2 sstables and then figure
out which is the latest entry correct?

So is there an index of key's to sstables, and there can be 1 or more
sstables per key?

(This is assuming compaction hasn't occurred yet).


is there a key to sstable index file?

2013-07-17 Thread S Ahmed
Since SSTables are mutable, and they are ordered, does this mean that there
is a index of key ranges that each SS table holds, and the value could be 1
more sstables that have to be scanned and then the latest one is chosen?

e.g. Say I write a value abc to CF1.  This gets stored in a sstable.

Then I write def to CF1, this gets stored in another sstable eventually.

How when I go to fetch the value, it has to scan 2 sstables and then figure
out which is the latest entry correct?

So is there an index of key's to sstables, and there can be 1 or more
sstables per key?

(This is assuming compaction hasn't occurred yet).


Re: does anyone store large values in cassandra e.g. 100kb?

2013-07-09 Thread S Ahmed
So was the point of breaking into 36 parts to bring each row to the 64 or
128mb threshold?


On Tue, Jul 9, 2013 at 3:18 AM, Theo Hultberg t...@iconara.net wrote:

 We store objects that are a couple of tens of K, sometimes 100K, and we
 store quite a few of these per row, sometimes hundreds of thousands.

 One problem we encountered early was that these rows would become so big
 that C* couldn't compact the rows in-memory and had to revert to slow
 two-pass compactions where it spills partially compacted rows to disk. we
 solved that in two ways, first by
 increasing in_memory_compaction_limit_in_mb from 64 to 128, and although it
 helped a little bit we quickly realized didn't have much effect because
 most of the time was taken up by really huge rows many times larger than
 that.

 We ended up implementing a simple sharding scheme where each row is
 actually 36 rows that each contain 1/36 of the range (we take the first
 letter in the column key and stick that on the row key on writes, and on
 reads we read all 36 rows -- 36 because there are 36 letters and numbers in
 the ascii alphabet and our column keys happen to distribute over that quite
 nicely).

 Cassandra works well with semi-large objects, and it works well with wide
 rows, but you have to be careful about the combination where rows get
 larger than 64 Mb.

 T#


 On Mon, Jul 8, 2013 at 8:13 PM, S Ahmed sahmed1...@gmail.com wrote:

 Hi Peter,

 Can you describe your environment, # of documents and what kind of usage
 pattern you have?




 On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin wool...@gmail.com wrote:

 I regularly store word and pdf docs in cassandra without any issues.




 On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed sahmed1...@gmail.com wrote:

 I'm guessing that most people use cassandra to store relatively smaller
 payloads like 1-5kb in size.

 Is there anyone using it to store say 100kb (1/10 of a megabyte) and if
 so, was there any tweaking or gotchas that you ran into?







does anyone store large values in cassandra e.g. 100kb?

2013-07-08 Thread S Ahmed
I'm guessing that most people use cassandra to store relatively smaller
payloads like 1-5kb in size.

Is there anyone using it to store say 100kb (1/10 of a megabyte) and if so,
was there any tweaking or gotchas that you ran into?


Re: does anyone store large values in cassandra e.g. 100kb?

2013-07-08 Thread S Ahmed
Hi Peter,

Can you describe your environment, # of documents and what kind of usage
pattern you have?




On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin wool...@gmail.com wrote:

 I regularly store word and pdf docs in cassandra without any issues.




 On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed sahmed1...@gmail.com wrote:

 I'm guessing that most people use cassandra to store relatively smaller
 payloads like 1-5kb in size.

 Is there anyone using it to store say 100kb (1/10 of a megabyte) and if
 so, was there any tweaking or gotchas that you ran into?





videos of 2013 summit

2013-07-04 Thread S Ahmed
Hi,

Are the videos online anywhere for the 2013 summit?


how to debug/trace

2011-12-16 Thread S Ahmed
How can you possibly trace a read/write in cassandra's codebase when it
uses so many threadpools/executers?

I'm just getting into threads so I'm not to familiar with how one can trace
things while in debug mode in IntelliJ when various thread pools are
processing things etc.


java lib used in cli to provide auto-completion

2011-11-17 Thread S Ahmed
Hi folks,

I'm curious what java lib is used to provide auto-completion in the cli?
 Or is it all custom code?


unsubscribe

2011-01-28 Thread S Ahmed



linux flavor?

2010-08-24 Thread S Ahmed
Is there a particular linux flavor that plays best with Cassandra?

I believe the file system plays big role also, any comments in this regard?

thanks.


Re: indexing rows ordered by int

2010-08-17 Thread S Ahmed
So when using Redis, how do you go about updating the index?

Do you serialize changes to the index i.e. when someone votes, you then
update the index?

Little confused as to how to go about updating a huge index.

Say you have 1 million stores, and you want to order by the top votes, how
would you maintain such an index since they are being constantly voted on.

On Sun, Aug 15, 2010 at 10:48 PM, Chris Goffinet c...@chrisgoffinet.comwrote:

 Digg is using redis for such a feature as well.  We use it on the MyNews -
 Top in 24 hours. Since we need timestamp ordering + sorting by how many
 friends touch a story.

 -Chris

 On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:

  http://code.google.com/p/redis/
 
  On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed sahmed1...@gmail.com wrote:
  For CF that I need to perform range scans on, I create separate CF that
 have
  custom ordering.
  Say a CF holds comments on a story (like comments on a reddit or digg
 story
  post)
  So if I need to order comments by votes, it seems I have to re-index
 every
  time someone votes on a comment (or batch it every x minutes).
 
 
  Right now I think I have to pull all the comments into memory, then sort
 by
  votes, then re-write the index.
  Are there any best-practises for this type of index?




indexing rows ordered by int

2010-08-15 Thread S Ahmed
For CF that I need to perform range scans on, I create separate CF that have
custom ordering.

Say a CF holds comments on a story (like comments on a reddit or digg story
post)

So if I need to order comments by votes, it seems I have to re-index every
time someone votes on a comment (or batch it every x minutes).



Right now I think I have to pull all the comments into memory, then sort by
votes, then re-write the index.

Are there any best-practises for this type of index?


why does it take 60-90 seconds for a new node to get up?

2010-08-10 Thread S Ahmed
Why is it that, if you set AutoBootStrap = false that it takes 60-90 seconds
for the node to announce itself?

I just want to understand what is going on during that time, and why that
specific timeframe (if there is a reason?)


Re: Question on nodetool ring

2010-08-09 Thread S Ahmed
that's the token range

so node#1 is from 1600.. to 429..
node#2 is from 429... to 1600...

hopefully others can chime into confirm.

On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote:

 I'm running a 2 node cluster and when I run nodetool ring I get the
 following output

 Address Status State   LoadToken

 160032583171087979418578389981025646900
 127.0.0.1  Up Normal  42.28 MB
  42909338385373526599163667549814010691
 127.0.0.2   Up Normal  42.26 MB
  160032583171087979418578389981025646900

 The columns/values are pretty much self explanatory except for the first
 line. What is this value?

 Thanks



Re: Question on nodetool ring

2010-08-09 Thread S Ahmed
b/c node#1 has a start and end range, so you can see the boundaries for each
node by looking at the last column.

On Mon, Aug 9, 2010 at 4:12 PM, Mark static.void@gmail.com wrote:

  On 8/9/10 12:51 PM, S Ahmed wrote:

 that's the token range

  so node#1 is from 1600.. to 429..
 node#2 is from 429... to 1600...

  hopefully others can chime into confirm.

 On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote:

 I'm running a 2 node cluster and when I run nodetool ring I get the
 following output

 Address Status State   LoadToken

 160032583171087979418578389981025646900
 127.0.0.1  Up Normal  42.28 MB
  42909338385373526599163667549814010691
 127.0.0.2   Up Normal  42.26 MB
  160032583171087979418578389981025646900

 The columns/values are pretty much self explanatory except for the first
 line. What is this value?

 Thanks


  I was just wondering why the 160032583171087979418578389981025646900
 token is on a line by itself and listed under 127.0.0.2.



Re: Growing commit log directory.

2010-08-09 Thread S Ahmed
if your commit logs are not getting cleared, doesn't that indicate your load
is more than your servers can handle?


On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I have a 16 node 6.3 cluster and two nodes from my cluster are giving
 me major headaches.

 10.71.71.56   Up 58.19 GB
 10827166220211678382926910108067277|   ^
 10.71.71.61   Down   67.77 GB
 123739042516704895804863493611552076888v   |
 10.71.71.66   Up 43.51 GB
 127605887595351923798765477786913079296|   ^
 10.71.71.59   Down   90.22 GB
 139206422831293007780471430312996086499v   |
 10.71.71.65   Up 22.97 GB
 148873535527910577765226390751398592512|   ^

 The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
 commit log directories. They keep growing, along with memory usage,
 eventually the logs start showing GCInspection errors and then the
 nodes will go OOM

 INFO 14:20:01,296 Creating new commitlog segment
 /var/lib/cassandra/commitlog/CommitLog-1281378001296.log
  INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
 7955651792 used; max is 9773776896
  INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
 8137412920 used; max is 9773776896
  INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
 8310139720 used; max is 9773776896
  INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
 8480136592 used; max is 9773776896
  INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
 8648872520 used; max is 9773776896
  INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
 8816581312 used; max is 9773776896
  INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
 8986063136 used; max is 9773776896
  INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
 9153134392 used; max is 9773776896
  INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
 9318140296 used; max is 9773776896
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid10913.hprof ...
  INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
  INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
  INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
 reclaimed leaving 9334753480 used; max is 9773776896
  INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

 Heap dump file created [12730501093 bytes in 253.445 secs]
 ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
  INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
 reclaimed leaving 9335215296 used; max is 9773776896

 Does anyone have any ideas what is going on?



explanation of generated files and ops

2010-08-09 Thread S Ahmed
In /var/lib/cassandra there is:

/data/system
LocationInfo-4-Data.db
LocationInfo-4-Filter.db
LocationInfo-4-Index.db
..
..

/data/Keyspace1/
Standard2-2-Data.db
Standard2-2-Filter.db
Standard2-2-Index.db

/commitlog
CommitLog-timestamp.log

/var/log/cassandra
system.log



Is this pretty much all the files that Cassandra generates? (have I missed
any)

Are there are common administrative tasks that admins might need to perform
on these files at all?

What exactly is stored in the -Filter.db files?


cassandra summit, making videos?

2010-07-27 Thread S Ahmed
Will there be videos of the session at the Cassandra Summit in SF?

I am really interested in the Cassandra codebase/internals seminar.


Re: Estimated release for Cassandra 0.6.4

2010-07-21 Thread S Ahmed
So is it a good estimate to give about 1 month per +.1 release?

i.e. 7.0 should be around October/November?


(btw great work, keep it up!)

On Wed, Jul 21, 2010 at 12:15 AM, CassUser CassUser cassu...@gmail.comwrote:

 Thanks Eric.


 On Tue, Jul 20, 2010 at 8:14 PM, Eric Evans eev...@rackspace.com wrote:

 On Tue, 2010-07-20 at 13:53 -0700, CassUser CassUser wrote:
  Is there a release date (or approximate date) for cassandra 0.6.4.  We
  are mainly concerned about the Cassandra-1042 patch.  The reason we
  don't simply apply the patch is because since we are shipping a
  product which interacts with the cassandra server (and the patch is
  server side), the customer would feel better if it was in a stable
  release.  Just trying to get an idea from the Cassandra guys on their
  plans :)

 We've been working on a monthly cadence for stable releases, so sometime
 in the next couple of weeks.

 --
 Eric Evans
 eev...@rackspace.com





Re: Cassandra benchmarking on Rackspace Cloud

2010-07-19 Thread S Ahmed
I'm reading what this thread and I am a little lost, what should the
expected behavioral be?

Should it maintain 53K regardless of nodes?

nodes   reads/sec
1   53,000
2   37,000
4   37,000

I ran this test previously on the cloud, with similar results:

nodes   reads/sec
1   24,000
2   21,000
3   21,000
4   21,000
5   21,000
6   21,000




On Mon, Jul 19, 2010 at 2:02 PM, David Schoonover 
david.schoono...@gmail.com wrote:

  Multiple client processes, or multiple client machines?


 I ran it with both one and two client machines making requests, and ensured
 the sum of the request threads across the clients was 50. That was on the
 cloud. I am re-running the multi-host test against the 4-node cluster on
 dedicated hardware now to ensure that result was not an artifact of the
 cloud.


 David Schoonover

 On Jul 19, 2010, at 1:38 PM, Jonathan Ellis wrote:

  On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
  david.schoono...@gmail.com wrote:
  How many physical client machines are running stress.py?
 
  One with 50 threads; it is remote from the cluster but within the same
  DC in both cases. I also run the test with multiple clients and saw
  similar results when summing the reqs/sec.
 
  Multiple client processes, or multiple client machines?
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com




Re: Newbie to cassandra

2010-07-18 Thread S Ahmed
read the wiki, read about nosql in general.
download and install it, play with it.
browse the source code.

read the bigdata paper by google, dynamo by amazon.

On Sun, Jul 18, 2010 at 2:46 PM, sonia gehlot sonia.geh...@gmail.comwrote:

 Hi everyone,

 I am new to Cassandra and wanted to try and start learning Cassandra. I
 have database background. I am fully exposed and have full command on
 Netezza, Oracle, MySQL, Sybase, SQL etc basically all the relational
 databases.

 As Cassandra is gaining popularity day by day by its amazing features, I
 also got tempt towards it and wanted to take deep dive into it.

 Please help me by guiding me in a right direction. How can I start working
 with Cassandra?

 Any help is appreciated.

 Thanks in advance.

 Sonia



Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Well I'm not talking about a specific column family here, as ALL my column
families will have content that is specific to a certain website, so I need
a strategy that I will use on almost all my column families.

On Wed, Jul 14, 2010 at 9:20 PM, Schubert Zhang zson...@gmail.com wrote:

 for your apps, how about this schema:

 key: website1123
 columnName: UserID
 ...


 On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton aa...@thelastpickle.comwrote:

 The key structure you have should group the keys based on the website
 There are some differences between range queries with RP and OPP this
 article may help

 http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 Aaron


 On 15 Jul, 2010,at 08:44 AM, S Ahmed sahmed1...@gmail.com wrote:

 Where is the link that describes the various key types and their impact on
 sorting? (I believe I read it before, can't seem to find it now).

 So my application supports multi-tenants, so I need the keys to represent
 things like:

 website1123 + contentID

 or

 website3454 + userID

 And for range queries, these keys have to be grouped together obviously.

 What key type would be best suited for this?


 I might have to create a CF that maps the website and its key prefix?





Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Do you think a composite key using a key type of Bytes would work?

How many bytes can it be?


public static byte [] createRowKey(int websiteid, long stamp)
throws Exception {
  byte [] websiteidBytes = Bytes.toBytes(websiteid);
  byte [] stampBytes = Bytes.toBytes(stamp);
  return Bytes.add(websiteidBytes, stampBytes);
}

So say this key is used in a ColumnFamily that stores Articles for all
websites, using a key like this would allow me to get a range of
articles written, ordered by date, for a specific website correct?



On Thu, Jul 15, 2010 at 9:38 AM, S Ahmed sahmed1...@gmail.com wrote:

 Well I'm not talking about a specific column family here, as ALL my column
 families will have content that is specific to a certain website, so I need
 a strategy that I will use on almost all my column families.


 On Wed, Jul 14, 2010 at 9:20 PM, Schubert Zhang zson...@gmail.com wrote:

 for your apps, how about this schema:

 key: website1123
 columnName: UserID
 ...


 On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton aa...@thelastpickle.comwrote:

 The key structure you have should group the keys based on the website
 There are some differences between range queries with RP and OPP this
 article may help

 http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 Aaron


 On 15 Jul, 2010,at 08:44 AM, S Ahmed sahmed1...@gmail.com wrote:

 Where is the link that describes the various key types and their impact
 on sorting? (I believe I read it before, can't seem to find it now).

 So my application supports multi-tenants, so I need the keys to represent
 things like:

 website1123 + contentID

 or

 website3454 + userID

 And for range queries, these keys have to be grouped together obviously.

 What key type would be best suited for this?


 I might have to create a CF that maps the website and its key prefix?






Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Benjamin,

Ah, thanks for clarifying that.

key sorting is changing in .7 I believe to support a binary array?

On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black b...@b3k.us wrote:

 Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
 applies to _columns_ within rows, _not_ to row keys.

 On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed sahmed1...@gmail.com wrote:
  Where is the link that describes the various key types and their impact
 on
  sorting? (I believe I read it before, can't seem to find it now).
  So my application supports multi-tenants, so I need the keys to represent
  things like:
  website1123 + contentID
  or
  website3454 + userID
  And for range queries, these keys have to be grouped together obviously.
  What key type would be best suited for this?
 
 
  I might have to create a CF that maps the website and its key prefix?



Re: key types and grouping related rows together

2010-07-15 Thread S Ahmed
Given a CF like:

Articles : {

   key1 : { title:some title, body: this is my article body...,  },
   key1 : { title:some title, body: this is my article body...,  }
}

Now these articles could be for different websites e.g. www.website1.com,
www.website2.com

If I want to get the latest 10 articles for a given website, how would I
formulate my key to achieve this?

I basically need to understand how to handle multi-tenancy, b/c I will need
to do this for almost all my CF's.

I'm a little stuck here so guidance would be great!


On Thu, Jul 15, 2010 at 4:01 PM, S Ahmed sahmed1...@gmail.com wrote:

 Benjamin,

 Ah, thanks for clarifying that.

 key sorting is changing in .7 I believe to support a binary array?

 On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black b...@b3k.us wrote:

 Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
 applies to _columns_ within rows, _not_ to row keys.

 On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed sahmed1...@gmail.com wrote:
  Where is the link that describes the various key types and their impact
 on
  sorting? (I believe I read it before, can't seem to find it now).
  So my application supports multi-tenants, so I need the keys to
 represent
  things like:
  website1123 + contentID
  or
  website3454 + userID
  And for range queries, these keys have to be grouped together obviously.
  What key type would be best suited for this?
 
 
  I might have to create a CF that maps the website and its key prefix?





Re: NYC Cassandra training

2010-07-14 Thread S Ahmed
How will we load the VM on our machines?  Do we download it ?

Is it running Ubuntu?


On Wed, Jul 14, 2010 at 11:11 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Turns out we can get a list from Eventbrite:
 http://www.eventbrite.com/org/474011012?s=1926097

 On Tue, Jul 13, 2010 at 3:09 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Fri, Jul 9, 2010 at 9:36 AM, Jeremy Dunck jdu...@gmail.com wrote:
  On Fri, Jul 2, 2010 at 1:08 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  Riptano's one day Cassandra training is coming to NYC in August, our
  first public session on the East coast:
  http://www.eventbrite.com/event/749518831
 
  Is there a calendar where you're listing this stuff, or is it just
  tweets and mail messages about individual events at this point?
 
  We are working on getting a calendar up on our web site, but for now
  it is just the mailing list here.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



key types and grouping related rows together

2010-07-14 Thread S Ahmed
Where is the link that describes the various key types and their impact on
sorting? (I believe I read it before, can't seem to find it now).

So my application supports multi-tenants, so I need the keys to represent
things like:

website1123 + contentID

or

website3454 + userID

And for range queries, these keys have to be grouped together obviously.

What key type would be best suited for this?


I might have to create a CF that maps the website and its key prefix?


Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread S Ahmed
The only issue I see (please correct me if I am wrong) is that you loose, is
that you have single points of failure in the system now i.e. redis etc.

On Tue, Jul 13, 2010 at 3:33 AM, Sandeep Kalidindi at PaGaLGuY.com 
sandeep.kalidi...@pagalguy.com wrote:

 @michael - benjamin answered your question.

 Thing is if you use mysql just for indices you are not at all using the
 benefits of the whole relational database engine(which is fine) but then are
 inheriting all its disadvantages.

 You can use mysql for storing indices and then write your own sharding
 layer on top and then make sure network partitions are taken care of and
 then.. oh wait you are already starting to create a poor mans cassandra on
 top of Mysql. Why not just use cassandra ???

 One valid argument can be mysql is solid in stability where as cassandra
 still yet to prove it is rock solid. But then 0.7 release looks awesome.
 There are some really wonderful people developing cassandra and then here to
 answer most of your questions and then if you still need there is
 Riptano(and jonathan ellis is one hell of a person to discuss your infra
 issues).

 Cheers,
 Deepu.


 On Tue, Jul 13, 2010 at 12:17 PM, Benjamin Black b...@b3k.us wrote:

 On Mon, Jul 12, 2010 at 11:35 PM, Michael Dürgner mich...@duergner.de
 wrote:
  The thing about slow on joins is true (we experience that ourselves) but
 still I wonder myself, why you use cassandra for the indices. Can't you just
 store them in MySQL although?
 

 ...and then shard and shard and shard to deal with hundreds of
 millions or billions of rows?  That's usually the trade-off.  Both can
 be made to work, but neither is free.


 b





Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-12 Thread S Ahmed
Very interesting!

What kind of integration do you have between vB and Cassandra? its not a
port then?

On Mon, Jul 12, 2010 at 3:34 AM, Sandeep Kalidindi at PaGaLGuY.com 
sandeep.kalidi...@pagalguy.com wrote:

 we were one of the vbulletin customers and our forums has been facing some
 bad scaling issues.

 we coded our forum software to work with cassandra. we are still testing
 for bugs and might go live in couple of weeks. You can ask any specific
 questions about vbulletin and cassandra and i will answer to the best of my
 knowledge.

 I our case a combination of cassandra and redis took care of most of the
 functionality that vbulletin offers and much more.

 Cheers,
 Deepu.


 On Mon, Jul 12, 2010 at 9:58 AM, Paul Prescod pres...@gmail.com wrote:

 On Sun, Jul 11, 2010 at 8:39 AM, S Ahmed sahmed1...@gmail.com wrote:
  I want to build a vBulletin type application (forums, threads, posts,
 user
  management, etc).
  Support multi-tenancy for a Saas type environment.
  Would Cassandra be suitable for this type of application?
 
 
  Thanks in advance.

 Most likely, it is technically a fine fit. But Cassandra is very early
 stage software, so you should expect that the documentation will not
 always be clear and things will change from version to version. If you
 are not extremely self-reliant, you may find it a frustrating
 experience. Unless you are confident you will have trouble scaling
 traditional technologies, it might not make business sense.

  Paul Prescod





Re: server needs thrift to run also?

2010-07-12 Thread S Ahmed
confused, why does the installation guide say to build and make it then?
http://github.com/ericflo/twissandra

http://github.com/ericflo/twissandratwissandar is for 0.6.1 is that why?
i.e. it was embedded in a later version?


On Mon, Jul 12, 2010 at 4:46 PM, Stu Hood stu.h...@rackspace.com wrote:

 The Thrift server is embedded in Cassandra, and starts by default. Look for
 references to Thrift on: http://wiki.apache.org/cassandra/GettingStarted

 Thanks,
 Stu

 -Original Message-
 From: S Ahmed sahmed1...@gmail.com
 Sent: Monday, July 12, 2010 3:43pm
 To: user@cassandra.apache.org
 Subject: server needs thrift to run also?

 I'm trying to follow along the twissandra installation instructions.

 So to get it running I have to install Thrift.

 So thrift runs as another service?  So communication is done via thrift,
 which then communicates to Cassandra on another port?





Re: server needs thrift to run also?

2010-07-12 Thread S Ahmed
Ok I guess I have to read up on exactly what is going on here.

I figured I could download twissandra, fire up cassandra and run the app!

I thought all you needed was the python driver which comes with twissandra.

Let me read more about Thrift and generating client code etc.

thanks!

On Mon, Jul 12, 2010 at 5:04 PM, Michael Pearson mjpear...@gmail.comwrote:

 Twissandra is packaged with pycassa + correct generated thrift
 transports under /deps already, so really just need the thrift binary
 to build from a cassandra.thrift API newer than what's currently
 supported by the bundled pycassa.

 -michael

 On Mon, Jul 12, 2010 at 1:55 PM, Stu Hood stu.h...@rackspace.com wrote:
  You'll need Thrift installed to generate the _client_ code: the server
 code is embedded within Cassandra.
 
  -Original Message-
  From: S Ahmed sahmed1...@gmail.com
  Sent: Monday, July 12, 2010 3:49pm
  To: user@cassandra.apache.org
  Subject: Re: server needs thrift to run also?
 
  confused, why does the installation guide say to build and make it then?
  http://github.com/ericflo/twissandra
 
  http://github.com/ericflo/twissandratwissandar is for 0.6.1 is that
 why?
  i.e. it was embedded in a later version?
 
 
  On Mon, Jul 12, 2010 at 4:46 PM, Stu Hood stu.h...@rackspace.com
 wrote:
 
  The Thrift server is embedded in Cassandra, and starts by default. Look
 for
  references to Thrift on:
 http://wiki.apache.org/cassandra/GettingStarted
 
  Thanks,
  Stu
 
  -Original Message-
  From: S Ahmed sahmed1...@gmail.com
  Sent: Monday, July 12, 2010 3:43pm
  To: user@cassandra.apache.org
  Subject: server needs thrift to run also?
 
  I'm trying to follow along the twissandra installation instructions.
 
  So to get it running I have to install Thrift.
 
  So thrift runs as another service?  So communication is done via thrift,
  which then communicates to Cassandra on another port?
 
 
 
 
 
 



Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-12 Thread S Ahmed
What sort of traffic levels made you port the application to Cassandra?

Very interested in seeing this go live.

What sort of server setup are you looking at using?

On Mon, Jul 12, 2010 at 4:39 PM, Sandeep Kalidindi at PaGaLGuY.com 
sandeep.kalidi...@pagalguy.com wrote:

 No we re-coded from scratch with most of the needed functionality.

 Cheers,
 Deepu.


 On Mon, Jul 12, 2010 at 7:49 PM, S Ahmed sahmed1...@gmail.com wrote:

 Very interesting!

 What kind of integration do you have between vB and Cassandra? its not a
 port then?


 On Mon, Jul 12, 2010 at 3:34 AM, Sandeep Kalidindi at PaGaLGuY.com 
 sandeep.kalidi...@pagalguy.com wrote:

 we were one of the vbulletin customers and our forums has been facing
 some bad scaling issues.

 we coded our forum software to work with cassandra. we are still testing
 for bugs and might go live in couple of weeks. You can ask any specific
 questions about vbulletin and cassandra and i will answer to the best of my
 knowledge.

 I our case a combination of cassandra and redis took care of most of the
 functionality that vbulletin offers and much more.

 Cheers,
 Deepu.


 On Mon, Jul 12, 2010 at 9:58 AM, Paul Prescod pres...@gmail.com wrote:

 On Sun, Jul 11, 2010 at 8:39 AM, S Ahmed sahmed1...@gmail.com wrote:
  I want to build a vBulletin type application (forums, threads, posts,
 user
  management, etc).
  Support multi-tenancy for a Saas type environment.
  Would Cassandra be suitable for this type of application?
 
 
  Thanks in advance.

 Most likely, it is technically a fine fit. But Cassandra is very early
 stage software, so you should expect that the documentation will not
 always be clear and things will change from version to version. If you
 are not extremely self-reliant, you may find it a frustrating
 experience. Unless you are confident you will have trouble scaling
 traditional technologies, it might not make business sense.

  Paul Prescod







advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-11 Thread S Ahmed
I want to build a vBulletin type application (forums, threads, posts, user
management, etc).
Support multi-tenancy for a Saas type environment.

Would Cassandra be suitable for this type of application?



Thanks in advance.


Re: NYC Cassandra training

2010-07-09 Thread S Ahmed
My previous reply seemed to have bounced.

Will there be a training day before/after the Cassandr Summit? (in SF on the
10th)

On Fri, Jul 2, 2010 at 2:08 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Riptano's one day Cassandra training is coming to NYC in August, our
 first public session on the East coast:
 http://www.eventbrite.com/event/749518831

 We have also nailed down our next locations, although registration is
 not yet open: Denver in September and Seattle in October.

 See you there!

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Digg 4 Preview on TWiT

2010-07-04 Thread S Ahmed
Agreed, what exactly did they replace it with.

On Sun, Jul 4, 2010 at 8:14 AM, Bill de hÓra b...@dehora.net wrote:

 On Mon, 2010-06-28 at 11:51 -0500, Eric Evans wrote:
  On Mon, 2010-06-28 at 07:53 -0700, Kochheiser,Todd W - TOK-DITT-1 wrote:
   On a related but separate note: While I am fairly new to Cassandra and
   have only been following the mailing lists for a few months, the
   conversation with Kevin Rose on TWiT made me curious if the versions
   of Cassandra that Digg, Twitter, and Facebook are using may end up
   being forks of the Apache project or old versions.
 
  Facebook and Apache have diverged (technically we're the fork). To the
  best of my knowledge, this has always been the case.

 This person's understanding is that Facebook 'no longer contributes to
 nor uses Cassandra.':

 http://redmonk.com/sogrady/2010/05/17/beyond-cassandra/

 I assume it's accurate - policy reasons wouldn't interest me as much as
 technical ones.

 Bill





Re: facebook search index super column, do I have this correct?

2010-07-02 Thread S Ahmed
Actually I think in the video they said they store each messageID as a
seperate column, that way they can do range queries correct?

so it would be:

aloha: { message1: 2343, message2: 9590002, }

On Thu, Jul 1, 2010 at 6:25 PM, S Ahmed sahmed1...@gmail.com wrote:

 So trying to map how facebook implemented a CF of type Super to index
 message terms.

 Is this json representation correct?

 MessageIndex = {

userid1 : {

 aloha : { messageIdList:
 234,2343234,23423434,234255,345345,2342,532432},
 clown : { messageIdList: 632, 2342, 23452, 234234, 234234},
 ..
 ..
 ..
},

userid2 : {

eating : { messageIdList:
 234,2343234,23423434,234255,345345,2342,532432},
 studying : { messageIdList: 632, 2342, 23452, 234234, 234234},
 ..
 ..
 ..

}

 }


 So if a user searches for the term clown, they you perform a lookup in
 the CF named MessageIndex, and use do a lookup for the row of the
 currently logged in user by UserID (which is the key), and then look for a a
 CF with the term clown and return the value.

 Is this a proper representation and am I using the correct terminology?





Pelops 'up and running' post question + WTF is a SuperColumn = really confused.

2010-07-02 Thread S Ahmed
https://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java

So using the code snipped below, I want to create a json representation of
the CF (super).


/**
 * Write multiple sub-column values to a super column...
 * @param rowKeyThe key of the row to modify
 * @param colFamily The name of the super column family to
operate on
 * @param colName   The name of the super column
 * @param subColumnsA list of the sub-columns to write
 */
mutator. writeSubColumns(
userId,
L1Tickets,
UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time
mutator.newColumnList(
mutator.newColumn(category, videoPhone),
mutator.newColumn(reportType, POOR_PICTURE),
mutator.newColumn(createdDate,
NumberHelper.toBytes(System.currentTimeMillis())),
mutator.newColumn(capture, jpegBytes),
mutator.newColumn(comment) ));


Can someone show me what it would look like?

This is what I have so far

SupportTickets = {

userId : {

L1Tickets : { }

}


}


But from what I understood, a CF of type super looks like (
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model) :

AddressBook = { // this is a ColumnFamily of type Super
phatduckk: {// this is the key to this row inside the Super CF
// the key here is the name of the owner of the address book

// now we have an infinite # of super columns in this row
// the keys inside the row are the names for the SuperColumns
// each of these SuperColumns is an address book entry
friend1: {street: 8th street, zip: 90210, city: Beverley
Hills, state: CA},

// this is the address book entry for John in phatduckk's address
book
John: {street: Howard street, zip: 94404, city: FC, state:
CA},
Kim: {street: X street, zip: 87876, city: Balls, state: VA},
Tod: {street: Jerry street, zip: 54556, city: Cartoon, state:
CO},
Bob: {street: Q Blvd, zip: 24252, city: Nowhere, state: MN},
...
// we can have an infinite # of ScuperColumns (aka address book
entries)
}, // end row
ieure: { // this is the key to another row in the Super CF
// all the address book entries for ieure
joey: {street: A ave, zip: 55485, city: Hell, state: NV},
William: {street: Armpit Dr, zip: 93301, city: Bakersfield,
state: CA},
},
}

The Pelop's code snippet seems to be adding an additional inner layer to
this to me, confused!


Re: Pelops 'up and running' post question + WTF is a SuperColumn = really confused.

2010-07-02 Thread S Ahmed
ok now that makes sense, thanks a bundle.

On Fri, Jul 2, 2010 at 5:49 PM, Dan Washusen d...@reactive.org wrote:

 L1Tickets = { // column family
 userId: { // row key
 42C120DF-D44A-44E4-9BDC-2B5439A5C7B4: { category:
 videoPhone, reportType: POOR_PICTURE, ...},
 99B60047-382A-4237-82CE-AE53A74FB747: { category:
 somethingElse, reportType: FOO, ...}
 }
 }

 On 3 July 2010 02:29, S Ahmed sahmed1...@gmail.com wrote:


 https://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java

 So using the code snipped below, I want to create a json representation of
 the CF (super).


 /**
  * Write multiple sub-column values to a super column...
  * @param rowKeyThe key of the row to modify
  * @param colFamily The name of the super column family to
 operate on
  * @param colName   The name of the super column
  * @param subColumnsA list of the sub-columns to write
  */
 mutator. writeSubColumns(
 userId,
 L1Tickets,
 UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by
 time
 mutator.newColumnList(
 mutator.newColumn(category, videoPhone),
 mutator.newColumn(reportType, POOR_PICTURE),
 mutator.newColumn(createdDate,
 NumberHelper.toBytes(System.currentTimeMillis())),
 mutator.newColumn(capture, jpegBytes),
 mutator.newColumn(comment) ));


 Can someone show me what it would look like?

 This is what I have so far

 SupportTickets = {

 userId : {

 L1Tickets : { }

 }


 }


 But from what I understood, a CF of type super looks like (
 http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model) :

 AddressBook = { // this is a ColumnFamily of type Super
 phatduckk: {// this is the key to this row inside the Super CF
 // the key here is the name of the owner of the address book

 // now we have an infinite # of super columns in this row
 // the keys inside the row are the names for the SuperColumns
 // each of these SuperColumns is an address book entry
 friend1: {street: 8th street, zip: 90210, city: Beverley
 Hills, state: CA},

 // this is the address book entry for John in phatduckk's address
 book
 John: {street: Howard street, zip: 94404, city: FC, state:
 CA},
 Kim: {street: X street, zip: 87876, city: Balls, state:
 VA},
 Tod: {street: Jerry street, zip: 54556, city: Cartoon,
 state: CO},
 Bob: {street: Q Blvd, zip: 24252, city: Nowhere, state:
 MN},
 ...
 // we can have an infinite # of ScuperColumns (aka address book
 entries)
 }, // end row
 ieure: { // this is the key to another row in the Super CF
 // all the address book entries for ieure
 joey: {street: A ave, zip: 55485, city: Hell, state: NV},
 William: {street: Armpit Dr, zip: 93301, city: Bakersfield,
 state: CA},
 },
 }

 The Pelop's code snippet seems to be adding an additional inner layer to
 this to me, confused!





vector maps and counts

2010-07-01 Thread S Ahmed
(I realize the ability to get/set a count constantly is coming in a upcoming
release)

Can someone give me a high level of the design of the vector map solution?

Is the actual count value stored in the CF row or is it stored separately?


where is the video just before this one by Avinash?

2010-07-01 Thread S Ahmed
In this video: http://vimeo.com/5185526

Avinash mentions that the previous presenter covered allot of what he was to
cover.  Does anyone have a link to that presentation?


facebook search index super column, do I have this correct?

2010-07-01 Thread S Ahmed
So trying to map how facebook implemented a CF of type Super to index
message terms.

Is this json representation correct?

MessageIndex = {

   userid1 : {

aloha : { messageIdList:
234,2343234,23423434,234255,345345,2342,532432},
clown : { messageIdList: 632, 2342, 23452, 234234, 234234},
..
..
..
   },

   userid2 : {

   eating : { messageIdList:
234,2343234,23423434,234255,345345,2342,532432},
studying : { messageIdList: 632, 2342, 23452, 234234, 234234},
..
..
..

   }

}


So if a user searches for the term clown, they you perform a lookup in the
CF named MessageIndex, and use do a lookup for the row of the currently
logged in user by UserID (which is the key), and then look for a a CF with
the term clown and return the value.

Is this a proper representation and am I using the correct terminology?


Re: forum application data model conversion

2010-06-23 Thread S Ahmed
Any thoughts?

On Tue, Jun 22, 2010 at 2:13 PM, S Ahmed sahmed1...@gmail.com wrote:

 Converting a Forum application to cassandra's data model.

 Tables:

 Posts [postID, threadID, userID, subject, body, created, lastmodified]

 So this table contains the actual question subject and body.

 When a user logs in, they want to see a list of their questions, and also
 order by the last-modified date (to see if people responed to their
 question).

 How would you do this best in Cassandra, seeing as the question/answer text
 is stored in another table.

 I know you could make a CF like:

 userID { postID1, postID2, ...}

 And somehow order by last-modified, but then on the actual web page you
 would have to first query for postID's owned by the user, and orderd by
 last-modified.

 THEN you would have to fetch the post data from the posts collection.

 Is this the only way?  I mean other than repeating the post subject+body in
 the user-to-postID index CF.





forum application data model conversion

2010-06-22 Thread S Ahmed
Converting a Forum application to cassandra's data model.

Tables:

Posts [postID, threadID, userID, subject, body, created, lastmodified]

So this table contains the actual question subject and body.

When a user logs in, they want to see a list of their questions, and also
order by the last-modified date (to see if people responed to their
question).

How would you do this best in Cassandra, seeing as the question/answer text
is stored in another table.

I know you could make a CF like:

userID { postID1, postID2, ...}

And somehow order by last-modified, but then on the actual web page you
would have to first query for postID's owned by the user, and orderd by
last-modified.

THEN you would have to fetch the post data from the posts collection.

Is this the only way?  I mean other than repeating the post subject+body in
the user-to-postID index CF.


django or pylons

2010-06-20 Thread S Ahmed
Seeing as I will be using a different ORM, would it make more sense to use
pylons over django?

From what I understand, pylons assumes less as compared to django.


CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
If you store only the key mappings in a column family, for custom ordering
of rows etc. for things like:

friends = {

   user_id : { friendid1, friendid2, }

}

or

topForumPosts = {

forum_id1 : { post2343, post32343, post32223, ...}

}


Now on friends page or on the top_forum_posts page you will get back a list
of post_ids, you will then have to perform lookups on the main 'posts' CF to
get the actual data.  So if a page is displaying 10, 25, or 50 posts you
will have 10, 25 or 50 key based lookups for each page view.

Is this the suggested way?  i.e. a look based on a slice to get a list of
post_id's, then a seperate call to actually fetch the data for the given
entity.

Or is cassandra so fast that 50 key based calls is no reason to worry?


Re: CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
well it won't be a range, it will be random key lookups.

On Tue, Jun 15, 2010 at 8:44 AM, Gary Dusbabek gdusba...@gmail.com wrote:

 On Tue, Jun 15, 2010 at 04:29, S Ahmed sahmed1...@gmail.com wrote:
  If you store only the key mappings in a column family, for custom
 ordering
  of rows etc. for things like:
  friends = {
 
 user_id : { friendid1, friendid2, }
  }
  or
  topForumPosts = {
 
  forum_id1 : { post2343, post32343, post32223, ...}
  }
 
  Now on friends page or on the top_forum_posts page you will get back a
 list
  of post_ids, you will then have to perform lookups on the main 'posts' CF
 to
  get the actual data.  So if a page is displaying 10, 25, or 50 posts you
  will have 10, 25 or 50 key based lookups for each page view.
  Is this the suggested way?  i.e. a look based on a slice to get a list of
  post_id's, then a seperate call to actually fetch the data for the given
  entity.
  Or is cassandra so fast that 50 key based calls is no reason to worry?

 You should look at using either multi_get_slice or get_range_slices.
 You'll save on network trips and the amount of work required of the
 cluster.

 Gary.



using cassandra w/django

2010-06-11 Thread S Ahmed
When using cassandra with django, can you still use the rapid development
freatures of django w/cassandra or are you basically just using the
framework but the models and ORM features are up to you to implement since
you are using cassandra.


Re: using cassandra w/django

2010-06-11 Thread S Ahmed
I see, well I am new to python + django so I wasn't sure what I really meant
:)

So basically I am using django for its framework related features, but
excluding the ORM/autogen admin pages.

That's reasonable and understable thanks.

On Fri, Jun 11, 2010 at 10:38 PM, Jeremy Dunck jdu...@gmail.com wrote:

 There's no direct support for cassandra in django, but there are a
 couple starts.

 http://www.allbuttonspressed.com/projects/django-nonrel
 http://github.com/enki/tragedy
 http://code.djangoproject.com/wiki/SummerOfCode2010

 All of the features which Django has and which build on the ORM are
 out, of course.  The GSoC project is trying to provide some nonrel
 features through the ORM, I think the general understanding of what
 people mean when they say does Django work with nosql-X is does the
 Django admin work with nosql-X.  The GSoC might get there, but it's
 pretty ambitious.

 On Fri, Jun 11, 2010 at 9:18 PM, S Ahmed sahmed1...@gmail.com wrote:
  When using cassandra with django, can you still use the rapid development
  freatures of django w/cassandra or are you basically just using the
  framework but the models and ORM features are up to you to implement
 since
  you are using cassandra.
 
 



Re: Cassandra training Jun 18 in SF

2010-06-04 Thread S Ahmed
Nice!

Would it be possible to give more than 2 weeks notice for the following
events? Preferrably a month, its not that easy to get off work etc.

On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev olega...@gmail.com wrote:

 Jonathan Ellis jbellis at gmail.com writes:

 
  This will be Riptano's 6th training session (including the four we've
  done that were on-site with a specific customer), and in my humble
  opinion the material's really solid at this point.
 
  We are actively working on lining up other locations.
 
 Do you have plans for training sessions in Europe ?






Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-21 Thread S Ahmed
curious how did things turn out?

On Tue, May 18, 2010 at 1:38 PM, Curt Bererton c...@zipzapplay.com wrote:

 We only have a few CFs (6 or 7).  I've increased the MemtableThroughputInMB
 and MemtableOperationsInMillions as per your suggestions. Do we really
 need a swap file though? I suppose it can't hurt, but with my problem in
 particular we weren't maxing out main memory.

 We'll be running another test today and see if the settings changes
 proposed so far fix our problem ( I hope so ).

 Best,
 Curt


 On Tue, May 18, 2010 at 5:59 AM, Lee Parker l...@socialagency.com wrote:

 How many different CFs do you have?  If you only have a few, I would
 highly recommend increasing the MemtableThroughputInMB and 
 MemtableOperationsInMillions.
  We only have to CFs and I have it set at 256MB and 2.5m. Since most of our
 columns are relatively small, these values are practically equivalent to
 each other.  I would also recommend dropping your heap space to 6G and
 adding a swap file.  In our case, the large EC2 instances didn't have any
 swap setup by default.

 Lee Parker






is it possible to trace/debug cassandra?

2010-05-18 Thread S Ahmed
Would it be possible to put cassandra in debug mode, so I could actually
step through, line by line, the execution flow of operations I execute
against it?

If yes, any help would be great.


Re: Cassandra training on May 21 in Palo Alto

2010-05-17 Thread S Ahmed
Jonathan,

Curious how many people have signed up?

I hope you will do another one soon!

On Tue, May 11, 2010 at 12:42 PM, Vick Khera vi...@khera.org wrote:

 On Fri, May 7, 2010 at 6:56 AM, Matt Revelle mreve...@gmail.com wrote:
  Reston, VA is a good spot in the DC metro area for tech events.

 +1



Re: zookeeper, how do you feed the pets?

2010-05-16 Thread S Ahmed
yes counts will be a big part of the project (user points).  ok i'll wait
for that vector implementation then (I think that is what it was called).

thanks!

On Sun, May 16, 2010 at 10:10 PM, Chris Goffinet c...@chrisgoffinet.comwrote:

 If you are running multiple datacenters, intend to have a lot of writes for
 counters, I highly advise against it. We got rid of ZK because of that.

 -Chris

 On May 16, 2010, at 7:04 PM, S Ahmed wrote:

  Can someone quickly go over how you go about using zookeeper if you want
 to store counts and have those counts be accurate?
 
  e.g. in digg's case I believe, they are using zookeeper so they can keep
 track of digg's for a particular digg story.
 
  Is it a backend change only and then storing API calls are uneffected?
  is it a config issue ?
 
  What are the ramifications of using this addon, are writes slower because
 you have to wait for the write to propogate to all the servers?




is cassandra really a 'handsoff' solution once setup?

2010-05-14 Thread S Ahmed
realizing cassandra might be a little tricky to setup at first due to lack
of docs etc.

Once it is up and running/humming, is it a hands-off solution or does it
require hand-holding/monitoring?

I recall Joe Stump's blog post stating that it doesn't require an admin (or
somethign to that effect when comparing to a sql server box).

For those with live apps, how has it been? (fb/digg/twitter people, would
love your experiences)


what/how do you guys monitor slow nodes?

2010-05-11 Thread S Ahmed
If you have 3-4 nodes, how do you monitor the performance of each node?


Re: Cassandra training on May 21 in Palo Alto

2010-05-09 Thread S Ahmed
I guess the hard part would be recording something so long (9-5pm)

A video that is split between the screen (say powerpoint) and linux console
would be perfect :)

On Fri, May 7, 2010 at 11:24 AM, Todd Burruss bburr...@real.com wrote:

  +1



 -Original Message-
 *From:* S Ahmed [sahmed1...@gmail.com]
 *Received:* 5/7/10 7:09 AM
 *To:* user@cassandra.apache.org [u...@cassandra.apache.org]
 *Subject:* Re: Cassandra training on May 21 in Palo Alto

 It would be great if you could make a video of this event.  Yes it won't
 like being there 1-1, but it sure would help get up to speed.

 On Fri, May 7, 2010 at 6:56 AM, Matt Revelle mreve...@gmail.com wrote:

   Reston, VA is a good spot in the DC metro area for tech events.  The
 recent Pragmatic Programmer Clojure class sold out and already has two more
 return visits planned.

 On May 7, 2010, at 6:42 AM, S Ahmed  
 sahmed1...@gmail.comsahmed1...@gmail.com
 sahmed1...@gmail.com wrote:

   toronto :)

  If not toronto, Virginia.

 On Thu, May 6, 2010 at 5:28 PM, Jonathan Ellis  
 jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com wrote:

 We're planning that now.  Where would you like to see one?

 On Thu, May 6, 2010 at 2:40 PM, S Ahmed  
 sahmed1...@gmail.comsahmed1...@gmail.comsahmed1...@gmail.com
 sahmed1...@gmail.com wrote:
  Do you have rough ideas when you would be doing the next one?  Maybe in
 1 or
  2 months or much later?
 
 
  On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis  
  jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com wrote:
 
  Yes, although when and where are TBD.
 
  On Tue, May 4, 2010 at 7:38 PM, Mark Greene  
  green...@gmail.comgreen...@gmail.comgreen...@gmail.com
 green...@gmail.com wrote:
   Jonathan,
   Awesome! Any plans to offer this training again in the future for
 those
   of
   us who can't make it this time around?
   -Mark
  
   On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis 
   jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com
   wrote:
  
   I'll be running a day-long Cassandra training class on Friday, May
 21.
I'll cover
  
   - Installation and configuration
   - Application design
   - Basics of Cassandra internals
   - Operations
   - Tuning and troubleshooting
  
   Details at  
   http://riptanobayarea20100521.eventbrite.com/http://riptanobayarea20100521.eventbrite.com/http://riptanobayarea20100521.eventbrite.com/
 http://riptanobayarea20100521.eventbrite.com/
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of Riptano, the source for professional Cassandra
 support
   http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com
 
 



  --
  Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com






Re: Cassandra training on May 21 in Palo Alto

2010-05-07 Thread S Ahmed
toronto :)

If not toronto, Virginia.

On Thu, May 6, 2010 at 5:28 PM, Jonathan Ellis jbel...@gmail.com wrote:

 We're planning that now.  Where would you like to see one?

 On Thu, May 6, 2010 at 2:40 PM, S Ahmed sahmed1...@gmail.com wrote:
  Do you have rough ideas when you would be doing the next one?  Maybe in 1
 or
  2 months or much later?
 
 
  On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Yes, although when and where are TBD.
 
  On Tue, May 4, 2010 at 7:38 PM, Mark Greene green...@gmail.com wrote:
   Jonathan,
   Awesome! Any plans to offer this training again in the future for
 those
   of
   us who can't make it this time around?
   -Mark
  
   On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   I'll be running a day-long Cassandra training class on Friday, May
 21.
I'll cover
  
   - Installation and configuration
   - Application design
   - Basics of Cassandra internals
   - Operations
   - Tuning and troubleshooting
  
   Details at http://riptanobayarea20100521.eventbrite.com/
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of Riptano, the source for professional Cassandra support
   http://riptano.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Cassandra training on May 21 in Palo Alto

2010-05-07 Thread S Ahmed
It would be great if you could make a video of this event.  Yes it won't
like being there 1-1, but it sure would help get up to speed.

On Fri, May 7, 2010 at 6:56 AM, Matt Revelle mreve...@gmail.com wrote:

 Reston, VA is a good spot in the DC metro area for tech events.  The recent
 Pragmatic Programmer Clojure class sold out and already has two more return
 visits planned.

 On May 7, 2010, at 6:42 AM, S Ahmed  
 sahmed1...@gmail.comsahmed1...@gmail.com
 sahmed1...@gmail.com wrote:

 toronto :)

 If not toronto, Virginia.

 On Thu, May 6, 2010 at 5:28 PM, Jonathan Ellis  
 jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com wrote:

 We're planning that now.  Where would you like to see one?

 On Thu, May 6, 2010 at 2:40 PM, S Ahmed  
 sahmed1...@gmail.comsahmed1...@gmail.comsahmed1...@gmail.com
 sahmed1...@gmail.com wrote:
  Do you have rough ideas when you would be doing the next one?  Maybe in
 1 or
  2 months or much later?
 
 
  On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis  
  jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com wrote:
 
  Yes, although when and where are TBD.
 
  On Tue, May 4, 2010 at 7:38 PM, Mark Greene  
  green...@gmail.comgreen...@gmail.comgreen...@gmail.com
 green...@gmail.com wrote:
   Jonathan,
   Awesome! Any plans to offer this training again in the future for
 those
   of
   us who can't make it this time around?
   -Mark
  
   On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis  
   jbel...@gmail.comjbel...@gmail.comjbel...@gmail.com
 jbel...@gmail.com
   wrote:
  
   I'll be running a day-long Cassandra training class on Friday, May
 21.
I'll cover
  
   - Installation and configuration
   - Application design
   - Basics of Cassandra internals
   - Operations
   - Tuning and troubleshooting
  
   Details at 
   http://riptanobayarea20100521.eventbrite.com/http://riptanobayarea20100521.eventbrite.com/http://riptanobayarea20100521.eventbrite.com/
 http://riptanobayarea20100521.eventbrite.com/
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of Riptano, the source for professional Cassandra support
   http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com http://riptano.com http://riptano.com
 http://riptano.com





Re: Cassandra training on May 21 in Palo Alto

2010-05-06 Thread S Ahmed
Do you have rough ideas when you would be doing the next one?  Maybe in 1 or
2 months or much later?



On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Yes, although when and where are TBD.

 On Tue, May 4, 2010 at 7:38 PM, Mark Greene green...@gmail.com wrote:
  Jonathan,
  Awesome! Any plans to offer this training again in the future for those
 of
  us who can't make it this time around?
  -Mark
 
  On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  I'll be running a day-long Cassandra training class on Friday, May 21.
   I'll cover
 
  - Installation and configuration
  - Application design
  - Basics of Cassandra internals
  - Operations
  - Tuning and troubleshooting
 
  Details at http://riptanobayarea20100521.eventbrite.com/
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Is Hector a wrapper around thrift?

2010-04-27 Thread S Ahmed
Just trying to get my head wrapped around everything here, so bare with me
:)

So Thrift can spit out generated code for any language, be it C#, Java or
python etc.

Hector is a higher level wrapper around the java generated code by Thrift.

Do I have this right?

And Hector is probably the most worked on higher level wrapper?  (is there
anything similiar in python or?)


Re: getting cassandra setup on windows 7

2010-04-25 Thread S Ahmed
great that worked thanks!

On Fri, Apr 23, 2010 at 2:28 PM, Mark Greene green...@gmail.com wrote:

 Try the 
 cassandra-with-fixes.bathttps://issues.apache.org/jira/secure/attachment/12442349/cassandra-with-fixes.bat
  file
 attached to the issue. I had the same issue an that bat file got cassandra
 to start. It still throws another error complaining about the
 log4j.properties.


 On Fri, Apr 23, 2010 at 1:59 PM, S Ahmed sahmed1...@gmail.com wrote:

 Any insights?

 Much appreciated!


 On Thu, Apr 22, 2010 at 11:13 PM, S Ahmed sahmed1...@gmail.com wrote:

 I was just reading that thanks.

 What does he mean when he says:

 This appears to be related to data storage paths I set, because if I
 switch the paths back to the default UNIX paths. Everything runs fine


 On Thu, Apr 22, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.comwrote:

 https://issues.apache.org/jira/browse/CASSANDRA-948

 On Thu, Apr 22, 2010 at 10:03 PM, S Ahmed sahmed1...@gmail.com wrote:
  Ok so I found the config section:
 
 CommitLogDirectoryE:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\commitlog/CommitLogDirectory
DataFileDirectories
 
 
  
 DataFileDirectoryE:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\data/DataFileDirectory
/DataFileDirectories
 
  Now when I run:
  bin/cassandra
  I get:
  Starting cassandra server
  listening for transport dt_socket at address:
  exception in thread main java.lang.noclassDefFoundError:
  org/apache/cassthreft/cassandraDaemon
  could not find the main class:
  org.apache.cassandra.threif.cassandraDaemon...
 
 
 
 
 
  On Thu, Apr 22, 2010 at 10:53 PM, S Ahmed sahmed1...@gmail.com
 wrote:
 
  So I uncompressed the .tar, in the readme it says:
  * tar -zxvf cassandra-$VERSION.tgz
* cd cassandra-$VERSION
* sudo mkdir -p /var/log/cassandra
* sudo chown -R `whoami` /var/log/cassandra
* sudo mkdir -p /var/lib/cassandra
* sudo chown -R `whoami` /var/lib/cassandra
 
  My cassandra is at:
  c:\java\cassandra\apache-cassandra-0.6.1/
  So I have to create 2 folders log and lib?
  Is there a setting in a config file that I edit?
 







value size, is there a suggested limit?

2010-04-25 Thread S Ahmed
Is there a suggested sized maximum that you can set the value of a given
key?

e.g. could I convert a document to bytes and store it as a value to a key?
 if yes, which I presume so, what if the file is 10mb? or 100mb?


Re: getting cassandra setup on windows 7

2010-04-23 Thread S Ahmed
Any insights?

Much appreciated!

On Thu, Apr 22, 2010 at 11:13 PM, S Ahmed sahmed1...@gmail.com wrote:

 I was just reading that thanks.

 What does he mean when he says:

 This appears to be related to data storage paths I set, because if I
 switch the paths back to the default UNIX paths. Everything runs fine


 On Thu, Apr 22, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.comwrote:

 https://issues.apache.org/jira/browse/CASSANDRA-948

 On Thu, Apr 22, 2010 at 10:03 PM, S Ahmed sahmed1...@gmail.com wrote:
  Ok so I found the config section:
 
 CommitLogDirectoryE:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\commitlog/CommitLogDirectory
DataFileDirectories
 
 
  
 DataFileDirectoryE:\java\cassandra\apache-cassandra-0.6.1-bin\apache-cassandra-0.6.1\data/DataFileDirectory
/DataFileDirectories
 
  Now when I run:
  bin/cassandra
  I get:
  Starting cassandra server
  listening for transport dt_socket at address:
  exception in thread main java.lang.noclassDefFoundError:
  org/apache/cassthreft/cassandraDaemon
  could not find the main class:
  org.apache.cassandra.threif.cassandraDaemon...
 
 
 
 
 
  On Thu, Apr 22, 2010 at 10:53 PM, S Ahmed sahmed1...@gmail.com wrote:
 
  So I uncompressed the .tar, in the readme it says:
  * tar -zxvf cassandra-$VERSION.tgz
* cd cassandra-$VERSION
* sudo mkdir -p /var/log/cassandra
* sudo chown -R `whoami` /var/log/cassandra
* sudo mkdir -p /var/lib/cassandra
* sudo chown -R `whoami` /var/lib/cassandra
 
  My cassandra is at:
  c:\java\cassandra\apache-cassandra-0.6.1/
  So I have to create 2 folders log and lib?
  Is there a setting in a config file that I edit?
 





Re: cassandra instability

2010-04-22 Thread S Ahmed
If digg uses PHP with cassandra, can the library really be that old?

Or they are using their own custom php cassandra client? (probably, but just
making sure).

On Fri, Apr 16, 2010 at 2:13 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Fri, Apr 16, 2010 at 12:50 PM, Lee Parker l...@socialagency.com wrote:
  Each time I start it up, it will
  work fine for about 1 hour and then it will crash the servers.  The error
  message on the servers is usually an out of memory error.

 Sounds like
 http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
 to me.

  I will get
  several time out errors on the clients

 Symtomatic of running out of memory.

  and occasionally get an error telling
  me that i was missing the timestamp.

 This is an entirely different problem.  Your client is sending
 garbage, plain and simple.  Why that is, I don't know.  The PHP Thrift
 binding is virtually unmaintained, so it could be a bug there, but
 Digg uses PHP against Cassandra extensively and hasn't hit this to my
 knowledge.  As I said in another thread, I wouldn't rule out bad
 hardware.

  The timestamp error is accompanied by
  a server crashing if I use framed transport instead of buffered.

 Thrift is fragile when the client sends it garbage.
 (https://issues.apache.org/jira/browse/THRIFT-601)

  One of the reasons we
  were trying cassandra was to scale out with smaller nodes rather than
 having
  to run larger instances for mysql.

 2 x 1GB isn't a whole lot to do a bulk load with.  You may have to
 throttle your clients to fix the OOM completely.

 -Jonathan



security, firewall level only?

2010-04-21 Thread S Ahmed
Is security in terms of remote clients connecting to a cassandra node done
purely at the hardware/firewall level?

i.e. there is no username/pwd like in mysql/sqlserver correct?

Or permissions at the column family level per user ?


Just to be clear, cassandra is web framework agnostic b/c of Thrift?

2010-04-18 Thread S Ahmed
Just want to be clear, is it true that it really makes no difference if my
web application is asp.net or java or python, since the way we communicate
to Cassandra is via the Thrift generated interface?

Obviously if you run asp.net on windows, it is probably a VERY good idea to
be running cassandra on a linux box.


Re: Just to be clear, cassandra is web framework agnostic b/c of Thrift?

2010-04-18 Thread S Ahmed
Interesting, I'm just finding windows to be a pain, particular starting up
java apps. (I guess I just need to learn!)

How exactly would you startup Cassandra on a windows machine? i.e when the
server reboots, how will it run the java -jar cassandar ?



On Sun, Apr 18, 2010 at 7:35 PM, Joe Stump j...@joestump.net wrote:


 On Apr 18, 2010, at 5:33 PM, S Ahmed wrote:

 Obviously if you run asp.net on windows, it is probably a VERY good idea
 to be running cassandra on a linux box.


 Actually, I'm not sure this is true. A few people have found Windows
 performs fairly well with Cassandra, if I recall correctly. Obviously, all
 of the testing and most of the bigger users are running on Linux though.

 --Joe



if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread S Ahmed
From what I read in another thread, Cassandra isn't used for isn't 'ideal'
for keeping track of counts.

For example, I would undertand this to mean keeping track of which stories
were dugg.

If this is true, how would a site like digg keep track of the 'dugg'
counter?

Also, I am assuming with eventual consistancy the number *may* not be 100%
accurate.  If you wanted it to be accurate, would you just use the Quorom
flag? (I believe quorom is to ensure all writes are written to disk)


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread S Ahmed
Chris,

When you so patch, does that mean for Cassandra or your own internal
codebase?

Sounds interesting thanks!

On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet goffi...@digg.com wrote:

 That's not true. We have been using the Zookeper work we posted on jira.
 That's what we are using internally and have been for months. We are now
 just wrapping up our vector clocks + distributed counter patch so we can
 begin transitioning away from the Zookeeper approach because there are
 problems with it long-term.

 -Chris

 On Apr 6, 2010, at 9:50 AM, Ryan King wrote:

  They don't use cassandra for it yet.
 
  -ryan
 
  On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed sahmed1...@gmail.com wrote:
  From what I read in another thread, Cassandra isn't used for isn't
 'ideal'
  for keeping track of counts.
  For example, I would undertand this to mean keeping track of which
 stories
  were dugg.
  If this is true, how would a site like digg keep track of the 'dugg'
  counter?
  Also, I am assuming with eventual consistancy the number *may* not be
 100%
  accurate.  If you wanted it to be accurate, would you just use the
 Quorom
  flag? (I believe quorom is to ensure all writes are written to disk)