Re: memory estimate for each key in the key cache

2011-12-16 Thread Brandon Williams
On Fri, Dec 16, 2011 at 9:31 PM, Dave Brosius  wrote:
> Wow, Java is a lot better than I thought if it can perform that kind of
> magic.  I'm guessing the wiki information is just old and out of date. It's
> probably more like 60 + sizeof(key)

With jamm and MAT it's fairly easy to test.  The number is accurate
last I checked.

-Brandon


Re: memory estimate for each key in the key cache

2011-12-16 Thread Dave Brosius

On 12/16/2011 10:13 PM, Brandon Williams wrote:

On Fri, Dec 16, 2011 at 8:52 PM, Kent Tong  wrote:

Hi,

 From the source code I can see that for each key, the hash (token), the key 
itself (ByteBuffer) and the position (long. offset in the sstable) are stored into 
the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total size 
required is at least 16+size-of(key)+4 which is>  20 bytes. If we consider the 
overhead of the object references, then it will be even larger. Then, why the wiki 
recommends multiplying the  number of keys cached with 10-12 to get the memory 
requirement?

In a word: java.

-Brandon



Wow, Java is a lot better than I thought if it can perform that kind of 
magic.  I'm guessing the wiki information is just old and out of date. 
It's probably more like 60 + sizeof(key)


Re: memory estimate for each key in the key cache

2011-12-16 Thread Brandon Williams
On Fri, Dec 16, 2011 at 8:52 PM, Kent Tong  wrote:
> Hi,
>
> From the source code I can see that for each key, the hash (token), the key 
> itself (ByteBuffer) and the position (long. offset in the sstable) are stored 
> into the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total 
> size required is at least 16+size-of(key)+4 which is > 20 bytes. If we 
> consider the overhead of the object references, then it will be even larger. 
> Then, why the wiki recommends multiplying the  number of keys cached with 
> 10-12 to get the memory requirement?

In a word: java.

-Brandon


memory estimate for each key in the key cache

2011-12-16 Thread Kent Tong
Hi,

From the source code I can see that for each key, the hash (token), the key 
itself (ByteBuffer) and the position (long. offset in the sstable) are stored 
into the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total 
size required is at least 16+size-of(key)+4 which is > 20 bytes. If we consider 
the overhead of the object references, then it will be even larger. Then, why 
the wiki recommends multiplying the  number of keys cached with 10-12 to get 
the memory requirement?

Thanks for any idea!


Re: gracefully recover from data file corruptions

2011-12-16 Thread Ramesh Natarajan
Thanks Ben and Jeremiah. We are actively working with our 3rd party
vendors to determine the root cause for this issue. Hopefully we will
figure something out. This repair procedure is more like a last resort
which i really don't want to use but something to keep in mind if such
necessity arises.

thanks
Ramesh

On Fri, Dec 16, 2011 at 12:48 PM, Ben Coverston
 wrote:
> Hi Ramesh,
>
> Every time I have seen this in the last year it has been caused by bad
> hardware or bad memory. Usually we find errors in the syslog.
>
> Jeremiah is right about running repair when you get your nodes back up.
>
> Fortunately with the addition of checksums in 1.0 I don't think that the
> corrupt data can get propagated across nodes.
>
> Your recovery steps do seem solid, if not a bit verbose. I usually tell
> people to shut down the node, remove the offending SSTables, bring the node
> back up then run repair.
>
> I can't stress enough however that if you're going to bring it back up on
> the same hardware you probably want to find the root cause, otherwise you're
> going to find yourself in the same situation days/weeks/months in the
> future.
>
> Ben
>
> On Fri, Dec 16, 2011 at 5:16 PM, Jeremiah Jordan
>  wrote:
>>
>> You need to run repair on the node once it is back up (to get back the
>> data you just deleted).  If this is happening on more than one node you
>> could have data loss...
>>
>> -Jeremiah
>>
>>
>> On 12/16/2011 07:46 AM, Ramesh Natarajan wrote:
>>>
>>> We are running a 30 node 1.0.5 cassandra cluster  running RHEL 5.6
>>> x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion
>>> error during compactions and at this point we are suspecting anything
>>> from OS/ESXi/HBA/iSCSI RAID.  Please correct me i am wrong, once a
>>> node gets into this state I don't see any way to recover unless I
>>> remove the corrupted data file and restart cassandra. I am running
>>> tests with replication factor 3 and all reads and writes are done with
>>> QUORUM. So i believe there will not be data loss if i do this.
>>>
>>> If this is a correct way to recover I would like to know how to
>>> gracefully do this in production environment..
>>>
>>> - Disable thrift
>>> - Disable gossip
>>> - Drain the node
>>> - kill the cassandra java process ( send a sigterm and or sigkill )
>>> - do a filesystem sync
>>> - remove the corrupted file from the /var/lib/cassandra/data directory
>>> - start cassandra
>>> - enable gossip so all pending hintedhandoff occurs
>>> - enable thrift.
>>>
>>> Thanks
>>> Ramesh
>
>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>


Re: how to debug/trace

2011-12-16 Thread Yang
normally I'd just fire up debug in eclipse, make a break point on  the
Cassandra.server methods.

On Fri, Dec 16, 2011 at 2:19 PM, S Ahmed  wrote:
> How can you possibly trace a read/write in cassandra's codebase when it uses
> so many threadpools/executers?
>
> I'm just getting into threads so I'm not to familiar with how one can trace
> things while in debug mode in IntelliJ when various thread pools are
> processing things etc.


how to debug/trace

2011-12-16 Thread S Ahmed
How can you possibly trace a read/write in cassandra's codebase when it
uses so many threadpools/executers?

I'm just getting into threads so I'm not to familiar with how one can trace
things while in debug mode in IntelliJ when various thread pools are
processing things etc.


Re: gracefully recover from data file corruptions

2011-12-16 Thread Ben Coverston
Hi Ramesh,

Every time I have seen this in the last year it has been caused by bad
hardware or bad memory. Usually we find errors in the syslog.

Jeremiah is right about running repair when you get your nodes back up.

Fortunately with the addition of checksums in 1.0 I don't think that the
corrupt data can get propagated across nodes.

Your recovery steps do seem solid, if not a bit verbose. I usually tell
people to shut down the node, remove the offending SSTables, bring the node
back up then run repair.

I can't stress enough however that if you're going to bring it back up on
the same hardware you probably want to find the root cause, otherwise
you're going to find yourself in the same situation days/weeks/months in
the future.

Ben

On Fri, Dec 16, 2011 at 5:16 PM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

> You need to run repair on the node once it is back up (to get back the
> data you just deleted).  If this is happening on more than one node you
> could have data loss...
>
> -Jeremiah
>
>
> On 12/16/2011 07:46 AM, Ramesh Natarajan wrote:
>
>> We are running a 30 node 1.0.5 cassandra cluster  running RHEL 5.6
>> x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion
>> error during compactions and at this point we are suspecting anything
>> from OS/ESXi/HBA/iSCSI RAID.  Please correct me i am wrong, once a
>> node gets into this state I don't see any way to recover unless I
>> remove the corrupted data file and restart cassandra. I am running
>> tests with replication factor 3 and all reads and writes are done with
>> QUORUM. So i believe there will not be data loss if i do this.
>>
>> If this is a correct way to recover I would like to know how to
>> gracefully do this in production environment..
>>
>> - Disable thrift
>> - Disable gossip
>> - Drain the node
>> - kill the cassandra java process ( send a sigterm and or sigkill )
>> - do a filesystem sync
>> - remove the corrupted file from the /var/lib/cassandra/data directory
>> - start cassandra
>> - enable gossip so all pending hintedhandoff occurs
>> - enable thrift.
>>
>> Thanks
>> Ramesh
>>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: Using Cassandra in Rails App

2011-12-16 Thread Aaron Turner
On Thu, Dec 15, 2011 at 3:13 AM, Wolfgang Vogl  wrote:
> Hi,
>
> I have a couple of questions about working with Ruby on Rails and Cassandra.
>
>
>
> What is the recommended way of Cassandra integration into a Rails app ?
>
> active_column
>
> cassandra-cql
>
> some other gems?
>
>
>
> Is there any reference implementation?
>
> some projects on github that are using the gems?

Depends on what you're trying to use Cassandra for with Rails.  In my
project we're using a SQL DB for meta data and Cassandra as our heavy
lifting datastore for time series data.  After looking at the
available Ruby drivers a few months ago (things have changed since
then btw), I decided to go with JRuby + Hector. This fit in well with
some other requirements I had which JRuby was a great fit (like real
threads) and TorqueBox as my application server.

Due to how my data is stored, it didn't really matter if I used
Hector, CQL or ActiveRecord, I was going to need to write an
abstraction layer on top to make it easier to store & retrieve data.
And using Hector meant I was using one of the most mature, stable and
tested libraries for accessing Cassandra.

My project is big enough that not everything is done in RoR... there
are some cron jobs as well doing some Map/Reduce like jobs as well.
This is where JRuby+Hector really shines since for high performance
you really need your client to be multi-threaded since the single
threaded performance of Cassandra isn't anything to write home about.

Anyways, I'm not sure I would recommend JRuby+Hector if this is the
only reason you'd use JRuby over MRI, but if you might find the
plethora of Java libraries useful it's definitely worth looking into.

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: Using Cassandra in Rails App

2011-12-16 Thread Jeremy Hanna
Traditionally there are two places to go.  Twitter's ruby client at 
https://github.com/twitter/cassandra or the newer cql driver at 
http://code.google.com/a/apache-extras.org/p/cassandra-ruby/.  The latter might 
be nice for green field applications but CQL is still gaining features.  Some 
people also use hector via jruby afaik.  Brian O'Neill also has the rest based 
server that he mentioned in this thread.

A recent presentation that talks about ruby with cassandra is here: 
http://www.slideshare.net/tylerhobbs/cassandra-for-rubyrails-devs

On Dec 15, 2011, at 5:13 AM, Wolfgang Vogl wrote:

> Hi,
> I have a couple of questions about working with Ruby on Rails and Cassandra.
>  
> What is the recommended way of Cassandra integration into a Rails app ?
> active_column
> cassandra-cql
> some other gems?
>  
> Is there any reference implementation?
> some projects on github that are using the gems?
>  
> Regards,
> Wolfgang



Re: gracefully recover from data file corruptions

2011-12-16 Thread Jeremiah Jordan
You need to run repair on the node once it is back up (to get back the 
data you just deleted).  If this is happening on more than one node you 
could have data loss...


-Jeremiah

On 12/16/2011 07:46 AM, Ramesh Natarajan wrote:

We are running a 30 node 1.0.5 cassandra cluster  running RHEL 5.6
x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion
error during compactions and at this point we are suspecting anything
from OS/ESXi/HBA/iSCSI RAID.  Please correct me i am wrong, once a
node gets into this state I don't see any way to recover unless I
remove the corrupted data file and restart cassandra. I am running
tests with replication factor 3 and all reads and writes are done with
QUORUM. So i believe there will not be data loss if i do this.

If this is a correct way to recover I would like to know how to
gracefully do this in production environment..

- Disable thrift
- Disable gossip
- Drain the node
- kill the cassandra java process ( send a sigterm and or sigkill )
- do a filesystem sync
- remove the corrupted file from the /var/lib/cassandra/data directory
- start cassandra
- enable gossip so all pending hintedhandoff occurs
- enable thrift.

Thanks
Ramesh


gracefully recover from data file corruptions

2011-12-16 Thread Ramesh Natarajan
We are running a 30 node 1.0.5 cassandra cluster  running RHEL 5.6
x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion
error during compactions and at this point we are suspecting anything
from OS/ESXi/HBA/iSCSI RAID.  Please correct me i am wrong, once a
node gets into this state I don't see any way to recover unless I
remove the corrupted data file and restart cassandra. I am running
tests with replication factor 3 and all reads and writes are done with
QUORUM. So i believe there will not be data loss if i do this.

If this is a correct way to recover I would like to know how to
gracefully do this in production environment..

- Disable thrift
- Disable gossip
- Drain the node
- kill the cassandra java process ( send a sigterm and or sigkill )
- do a filesystem sync
- remove the corrupted file from the /var/lib/cassandra/data directory
- start cassandra
- enable gossip so all pending hintedhandoff occurs
- enable thrift.

Thanks
Ramesh


Re: Cassandra C client implementation

2011-12-16 Thread Vlad Paiu

Hi,

I've also decided to give the C++ Thrift a try, but I can't seem to 
compile the simple examples from

http://wiki.apache.org/cassandra/ThriftExamples .

I get lots of errors like :
/usr/local/include/thrift/transport/TTransport.h:34:1: error: 
‘uint32_t’ does not name a type
/usr/local/include/thrift/transport/TTransport.h:56:1: error: 
expected unqualified-id before ‘class’
/usr/local/include/thrift/transport/TTransport.h:262:29: error: 
‘TTransport’ was not declared in this scope
/usr/local/include/thrift/transport/TTransport.h:262:39: error: 
template argument 1 is invalid
/usr/local/include/thrift/transport/TTransport.h:262:72: error: 
‘TTransport’ was not declared in this scope
/usr/local/include/thrift/transport/TTransport.h:262:82: error: 
template argument 1 is invalid


Thrift version is 0.8, installed from sources, Cassandra version in 1.0.6.

Any ideas ?

Regards,

Vlad Paiu
OpenSIPS Developer


On 12/16/2011 11:02 AM, Vlad Paiu wrote:

Hello,

Sorry, wrong link in the previous email. Proper link is
http://svn.apache.org/viewvc/thrift/trunk/lib/c_glib/test/

Regards,

Vlad Paiu
OpenSIPS Developer


On 12/15/2011 08:35 PM, Vlad Paiu wrote:

Hello,

While digging more for this I've found these :

http://svn.apache.org/viewvc/thrift/lib/c_glib/test/

Which shows how to create the TSocket and TTransport structures, very 
similar to the way it's done in C++.


Now I'm stuck on how to create the actual connection to the Cassandra 
server. It should be a function generated by the Cassandra thrift 
interface, but I can't seem to find the proper one.

Any help would be very much appreciated.

Regards,
Vlad

Mina Naguib  wrote:


Hi Vlad

I'm the author of libcassie.

For what it's worth, it's in production where I work, consuming a 
heavily-used cassandra 0.7.9 cluster.


We do have plans to upgrade the cluster to 1.x, to benefit from all 
the improvements, CQL, etc... but that includes revising all our 
clients (across several programming languages).


So, it's definitely on my todo list to address our C clients by 
either upgrading libcassie, or possibly completely rewriting it.


Currently it's a wrapper around the C++ parent project 
libcassandra.  I haven't been fond of having that many layered 
abstractions, and the thrift Glib2 interface has definitely piqued 
my interest, so I'm leaning towards a complete rewrite.


While we're at it, it would also be nice to have features like 
asynchronous modes for popular event loops, connection pooling, etc.


Unfortunately, I have no milestones set for any of this, nor the 
time (currently) to experiment and proof-of-concept it.


I'd be curious to hear from other C hackers whether they've 
experimented with the thrift Glib2 interface and gotten a "hello 
world" to work against cassandra 1.x.  Perhaps there's room for some 
code sharing/collaboration on a new library to supersede the 
existing libcassie+libcassandra.



On 2011-12-14, at 5:16 PM, Vlad Paiu wrote:


Hello Eric,

We have that, thanks alot for the contribution.
The idea is to not play around with including C++ code in a C app, 
if there's an alternative ( the thrift g_libc ).


Unfortunately, since thrift does not generate a skeleton for the 
glibc code, I don't know how to find out what the API functions are 
called, and guessing them is not going that good :)


I'll wait a little longer&  see if anybody can help with the C 
thrift, or at least tell me it's not working. :)


Regards,
Vlad

Eric Tamme  wrote:


On 12/14/2011 04:18 PM, Vlad Paiu wrote:

Hi,

Just tried libcassie and seems it's not compatible with latest 
cassandra, as even simple inserts and fetches fail with 
InvalidRequestException...


So can anybody please provide a very simple example in C for 
connecting&   fetching columns with thrift ?


Regards,
Vlad

Vlad Paiu   wrote:


Vlad,

We have written a specific cassandra db module for usrloc with 
opensips

and have open sourced it on github.  We use the thrift generated c++
bindings and extern stuff to c.  I spoke to bogdan about this a while
ago, and gave him the github link, but here it is for your reference
https://github.com/junction/db_jnctn_usrloc

Hopefully that helps.  I idle in #opensips too,  just ask about
cassandra in there and I'll probably see it.

- Eric Tamme



Re: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-16 Thread Terje Marthinussen
Works if you turn off mmap?

We run without mmap and see hardly any difference in performance, but with huge 
benefits in the form of a memory consumption which can actually be monitored 
easily and it just seem like things are more stable this way in general.  

Just turn off and see how that works!

Regards,
Terje

On 16 Dec 2011, at 18:39, Viktor Jevdokimov  
wrote:

> Created https://issues.apache.org/jira/browse/CASSANDRA-3642
> 
> -Original Message-
> From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
> Sent: Thursday, December 15, 2011 18:26
> To: user@cassandra.apache.org
> Subject: RE: [RELEASE] Apache Cassandra 1.0.6 released
> 
> Cassandra 1.0.6 under Windows Server 2008 R2 64bit with disk acces mode 
> mmap_index_only failing to delete any *-Index.db files after compaction or 
> scrub:
> 
> ERROR 13:43:17,490 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
> java.lang.RuntimeException: java.io.IOException: Failed to delete 
> D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
>at 
> org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>at java.util.concurrent.FutureTask.run(Unknown Source)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
>  Source)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
>  Source)
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>at java.lang.Thread.run(Unknown Source) Caused by: 
> java.io.IOException: Failed to delete 
> D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
>at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
>at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
>at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
>at 
> org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>... 8 more
> 
> ERROR 17:20:09,701 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
> java.lang.RuntimeException: java.io.IOException: Failed to delete 
> D:\cassandra\data\data\Keyspace1\ColumnFamily1-hc-840-Index.db
>at 
> org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>at java.util.concurrent.FutureTask.run(Unknown Source)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
>  Source)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
>  Source)
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>at java.lang.Thread.run(Unknown Source) Caused by: 
> java.io.IOException: Failed to delete D:\cassandra\data\data\ 
> Keyspace1\ColumnFamily1-hc-840-Index.db
>at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
>at 
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
>at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
>at 
> org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>... 8 more
> 
> 
> 
> 
> Best regards/ Pagarbiai
> 
> Viktor Jevdokimov
> Senior Developer
> 
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063
> Fax: +370 5 261 0453
> 
> J. Jasinskio 16C,
> LT-01112 Vilnius,
> Lithuania
> 
> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.-Original Message-
> 
> From: Sylvain Lebresne [mailto:sylv...@datastax.com]
> Sent: Wednesday, December 14, 2011 20:23
> To: user@cassandra.apache.org
> Subject: [RELEASE] Apache

Re: cassandra as an email store ...

2011-12-16 Thread Rustam Aliyev

Hi Sasha,

There's been a lot of fud in regards to SuperColumns. But actually in 
our case we found them quite useful.


The main argument for using SC in that case is that message metadata is 
immutable and in most of the cases read and written alltogether (i.e. 
you fetch all message headers together). There are few exceptions when 
we need to update only one column, e.g. when updating labels/markers. 
But the number of such requests shouldn't affect performance 
dramatically.


Currently we have this model running in prod with 150K subs and 20M 
messages and quite happy with the performance.


Would be interesting to see other alternatives though.

Regards,
Rustam.

On Fri Dec 16 11:51:24 2011, Sasha Dolgy wrote:

Hi Rustam,

Thanks for posting that.

Interesting to see that you opted to use Super Column's:
https://github.com/elasticinbox/elasticinbox/wiki/Data-Model ..
wondering, for the sake of argument/discussion .. if anyone can come
up with an alternative data model that doesn't use SC's.

-sd

On Fri, Dec 16, 2011 at 11:10 AM, Rustam Aliyev  wrote:

Hi Sasha,

Replying to the old thread just for reference. We've released a code which
we use to store emails in Cassandra as an open source project:
http://elasticinbox.com/

Hope you find it helpful.

Regards,
Rustam.


On Fri Apr 29 15:20:07 2011, Sasha Dolgy wrote:


Great read.  thanks.


On Apr 29, 2011 4:07 PM, "sridhar basam"mailto:s...@basam.org>>  wrote:

Have you already looked at some research out of IBM about this usecase?
Paper is at

http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf

Sridhar






Re: cassandra as an email store ...

2011-12-16 Thread Sasha Dolgy
Hi Rustam,

Thanks for posting that.

Interesting to see that you opted to use Super Column's:
https://github.com/elasticinbox/elasticinbox/wiki/Data-Model ..
wondering, for the sake of argument/discussion .. if anyone can come
up with an alternative data model that doesn't use SC's.

-sd

On Fri, Dec 16, 2011 at 11:10 AM, Rustam Aliyev  wrote:
> Hi Sasha,
>
> Replying to the old thread just for reference. We've released a code which
> we use to store emails in Cassandra as an open source project:
> http://elasticinbox.com/
>
> Hope you find it helpful.
>
> Regards,
> Rustam.
>
>
> On Fri Apr 29 15:20:07 2011, Sasha Dolgy wrote:
>>
>> Great read.  thanks.
>>
>>
>> On Apr 29, 2011 4:07 PM, "sridhar basam" > > wrote:
>> > Have you already looked at some research out of IBM about this usecase?
>> > Paper is at
>> >
>> > http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
>> >
>> > Sridhar



-- 
Sasha Dolgy
sasha.do...@gmail.com


Some problems with stress testing

2011-12-16 Thread Chi Shin Hsu
Hi, all

I am confused by my stress testing result.

The test environment:
One Cassandra node, one client
The size of each row is 1MB, and the client write 10 rows continually.
Totally data size is 100GB.

First, my client connected  to the server by 100Mbs Ethernet.
The result was 7.3MB/s.
I think this must be limited by the Ethernet speed, so I put the client and
server on the same node.

However the speed was almost slower than Ethernet!

My computer:
CPU: i7 920
RAM: 3G

I noticed the message of Cassandra.

1. When the Cassandra keeps flushing the cf, and the client gets timeout
exception very frequently.
2. The heap is always full, so I set it bigger to 2G, and the memtable size
is 768MB.

Is it caused by my hard disk's speed? memory size? or any setting?

Any idea?
Thanks!


Re: cassandra as an email store ...

2011-12-16 Thread Rustam Aliyev

Hi Sasha,

Replying to the old thread just for reference. We've released a code 
which we use to store emails in Cassandra as an open source project: 
http://elasticinbox.com/


Hope you find it helpful.

Regards,
Rustam.

On Fri Apr 29 15:20:07 2011, Sasha Dolgy wrote:

Great read.  thanks.

On Apr 29, 2011 4:07 PM, "sridhar basam" > wrote:

> Have you already looked at some research out of IBM about this usecase?
> Paper is at
>
> http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
>
> Sridhar


RE: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-16 Thread Viktor Jevdokimov
Created https://issues.apache.org/jira/browse/CASSANDRA-3642

-Original Message-
From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Thursday, December 15, 2011 18:26
To: user@cassandra.apache.org
Subject: RE: [RELEASE] Apache Cassandra 1.0.6 released

Cassandra 1.0.6 under Windows Server 2008 R2 64bit with disk acces mode 
mmap_index_only failing to delete any *-Index.db files after compaction or 
scrub:

ERROR 13:43:17,490 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: 
Failed to delete D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more

ERROR 17:20:09,701 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: 
Failed to delete D:\cassandra\data\data\ Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more




Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-

From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Wednesday, December 14, 2011 20:23
To: user@cassandra.apache.org
Subject: [RELEASE] Apache Cassandra 1.0.6 released

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 1.0.6.

Cassandra is a highly scalable second-generation distributed database, bringing 
together Dynamo's fully distributed design and Bigtable's ColumnFamily-based 
data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay attention 
to the 

Re: commit log size

2011-12-16 Thread Alexandru Dan Sicoe
Hi Maxim,
 Sorry for the late reply but I was away for a course. Lower the
memtable_flush_after_mins for your low traffic CFs. If in the meantime you
upgraded to 1.0 (which by the way 1.0.3 for me ended not working and me
converting a lot of data to it) I think there was a discussion you sent me
on the group. I never experiemented with the new commitlog setting in 1.0
but the datastax website
http://www.datastax.com/docs/1.0/configuration/node_configuration#commitlog-total-space-in-mbsays
this:
---
commitlog_total_space_in_mb¶

When the commitlog size on a node exceeds this threshold, Cassandra will
flush memtables to disk for the oldest commitlog segments, thus allowing
those log segments to be removed. This reduces the amount of data to replay
on startup, and prevents infrequently-updated column families from keeping
commit log segments around indefinitely. This replaces the per-column
family storage setting memtable_flush_after_mins.

---
Tell me if it worked ,
Alexandru

On Wed, Dec 14, 2011 at 5:33 PM, Maxim Potekhin  wrote:

> Alexandru, Jeremiah --
>
> what setting needs to be tweaked, and what's the recommended value?
>
> I observed similar behavior this morning.
>
> Maxim
>
>
>
> On 11/28/2011 2:53 PM, Jeremiah Jordan wrote:
>
>> Yes, the low volume memtables are causing the problem.  Lower the
>> thresholds for those tables if you don't want the commit logs to go crazy.
>>
>> -Jeremiah
>>
>> On 11/28/2011 11:11 AM, Alexandru Dan Sicoe wrote:
>>
>>> Hello everyone,
>>>
>>> 4 node Cassandra 0.8.5 cluster with RF=2, replica placement strategy =
>>> SimpleStartegy, write consistency level = ANY, memtable_flush_after_mins
>>> =1440; memtable_operations_in_**millions=0.1; memtable_throughput_in_mb
>>> = 40; max_compaction_threshold =32; min_compaction_threshold =4;
>>>
>>> I have one keyspace with 1 CF for all the data and 3 other small CFs for
>>> metadata. I am using Datastax OpsCenter to monitor my cluster so there is
>>> another keyspace for monitoring.
>>>
>>> Everything works ok, the only thing I've noticed is this morning the
>>> commitlog of one node was 52GB, one was 25 GB and the others were around 3
>>> GB. I left everything untouched and looked a couple of hours later and the
>>> 52GB one is now about 3GB and the 25 GB one is now 29 GB and the other two
>>> about the same as before.
>>>
>>> Are my commit logs growing because of small memtables which don't get
>>> flushed because they don't reach the operations and throughput limits? Then
>>> why do only some nodes exhibit this behaviour?
>>>
>>> It would be interesting to understand how to control the size of the
>>> commitlog also to know how to size my commitlog disks!
>>>
>>> Thanks,
>>> Alex
>>>
>>
>


-- 
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow


Re: Counters != Counts

2011-12-16 Thread Alain RODRIGUEZ
Can we have a hope that counters will be replayed as safely as a classical
data someday ? Do someone still work on jiras like
issues.apache.org/jira/browse/CASSANDRA-2495 ? I thought that replaying a
write from the client didn't lead to over-counts contrary to the internal
cassandra replay from commitlog.

I just made a new connection pool with retries / 2 and timeouts * 4. I hope
it will improve the accuracy of my counters.

Anyways, thank you for answering that fast.

Alain

2011/12/16 Tyler Hobbs 

> Probably quite a few of them are coming from automatic retries by
> phpcassa.  When working with counters, I recommend minimizing retries
> and/or increasing timeouts.  Usually this means you want to use a separate
> connection pool with different settings just for counters.
>
> By the way, this advice applies to other clients as well.
>
>
> On Wed, Dec 14, 2011 at 10:29 AM, Alain RODRIGUEZ wrote:
>
>> Hi everybody.
>>
>> I'm using a lot of counters to make statistics on a 4 nodes cluster (ec2
>> m1.small) with phpcassa (cassandra v1.0.2).
>>
>> I store some events and increment counters at the same time.
>>
>> Counters give me over-counts compared with the count of every
>> corresponding events.
>>
>> I sure that my non-counters counts are good.
>>
>> I'm not sure why these over-counts happen, but I heard that recovering
>> from commitlogs can produce this.
>> I have some timeouts on phpcassa which are written in my apache logs
>> while a compaction is running. However I am always able to write at Quorum,
>> so I guess I shouldn't have to recover from cassandra commitlogs.
>>
>> Where can these over-counts come from ?
>>
>> Alain
>>
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>
>


Re: Cassandra C client implementation

2011-12-16 Thread Vlad Paiu

Hello,

Sorry, wrong link in the previous email. Proper link is
http://svn.apache.org/viewvc/thrift/trunk/lib/c_glib/test/

Regards,

Vlad Paiu
OpenSIPS Developer


On 12/15/2011 08:35 PM, Vlad Paiu wrote:

Hello,

While digging more for this I've found these :

http://svn.apache.org/viewvc/thrift/lib/c_glib/test/

Which shows how to create the TSocket and TTransport structures, very similar 
to the way it's done in C++.

Now I'm stuck on how to create the actual connection to the Cassandra server. 
It should be a function generated by the Cassandra thrift interface, but I 
can't seem to find the proper one.
Any help would be very much appreciated.

Regards,
Vlad

Mina Naguib  wrote:


Hi Vlad

I'm the author of libcassie.

For what it's worth, it's in production where I work, consuming a heavily-used 
cassandra 0.7.9 cluster.

We do have plans to upgrade the cluster to 1.x, to benefit from all the 
improvements, CQL, etc... but that includes revising all our clients (across 
several programming languages).

So, it's definitely on my todo list to address our C clients by either 
upgrading libcassie, or possibly completely rewriting it.

Currently it's a wrapper around the C++ parent project libcassandra.  I haven't 
been fond of having that many layered abstractions, and the thrift Glib2 
interface has definitely piqued my interest, so I'm leaning towards a complete 
rewrite.

While we're at it, it would also be nice to have features like asynchronous 
modes for popular event loops, connection pooling, etc.

Unfortunately, I have no milestones set for any of this, nor the time 
(currently) to experiment and proof-of-concept it.

I'd be curious to hear from other C hackers whether they've experimented with the thrift 
Glib2 interface and gotten a "hello world" to work against cassandra 1.x.  
Perhaps there's room for some code sharing/collaboration on a new library to supersede 
the existing libcassie+libcassandra.


On 2011-12-14, at 5:16 PM, Vlad Paiu wrote:


Hello Eric,

We have that, thanks alot for the contribution.
The idea is to not play around with including C++ code in a C app, if there's 
an alternative ( the thrift g_libc ).

Unfortunately, since thrift does not generate a skeleton for the glibc code, I 
don't know how to find out what the API functions are called, and guessing them 
is not going that good :)

I'll wait a little longer&  see if anybody can help with the C thrift, or at 
least tell me it's not working. :)

Regards,
Vlad

Eric Tamme  wrote:


On 12/14/2011 04:18 PM, Vlad Paiu wrote:

Hi,

Just tried libcassie and seems it's not compatible with latest cassandra, as 
even simple inserts and fetches fail with InvalidRequestException...

So can anybody please provide a very simple example in C for connecting&   
fetching columns with thrift ?

Regards,
Vlad

Vlad Paiu   wrote:


Vlad,

We have written a specific cassandra db module for usrloc with opensips
and have open sourced it on github.  We use the thrift generated c++
bindings and extern stuff to c.  I spoke to bogdan about this a while
ago, and gave him the github link, but here it is for your reference
https://github.com/junction/db_jnctn_usrloc

Hopefully that helps.  I idle in #opensips too,  just ask about
cassandra in there and I'll probably see it.

- Eric Tamme