Re: RE: Hector samples -- where?

2010-05-25 Thread Ran Tavory
it's here
http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java

On Wed, May 26, 2010 at 8:18 AM, Nicholas Sun  wrote:

>  Could you please provide some indication as to their location?  Thanks.
>
>
>
> Nick
>
>
>
> *From:* Ran Tavory [mailto:ran...@gmail.com]
> *Sent:* Tuesday, May 25, 2010 9:15 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: RE: Hector samples -- where?
>
>
>
> The best examples are in KeyspaceTest but don't include all scenarios
>
> On May 26, 2010 2:27 AM, "Nicholas Sun"  wrote:
>
> I am also interested in this.  It seems like adding multiple Cols into a CF
> or SuperCols would be very useful.  Like a dataload type capability?
>
> Nick
>
>
> -Original Message-
> From: Bill de hOra [mailto:b...@dehora.net]
> Sent: Tuesday, May 25, 2010...
>
>


RE: RE: Hector samples -- where?

2010-05-25 Thread Nicholas Sun
Thanks All,

 

 I found it here:

 

http://code.google.com/p/cassandra-java-client/source/browse/trunk/src/test/java/org/yosemite/jcsadra/impl/KeySpaceTest.java?r=50

 

 I’m actually fairly new to OSS, but I wanted to really dig into the 
software here.  So far, so good.

 

Nick

 

From: Ran Tavory [mailto:ran...@gmail.com] 
Sent: Tuesday, May 25, 2010 9:15 PM
To: user@cassandra.apache.org
Subject: Re: RE: Hector samples -- where?

 

The best examples are in KeyspaceTest but don't include all scenarios

On May 26, 2010 2:27 AM, "Nicholas Sun"  wrote:

I am also interested in this.  It seems like adding multiple Cols into a CF or 
SuperCols would be very useful.  Like a dataload type capability?

Nick


-Original Message-
From: Bill de hOra [mailto:b...@dehora.net] 
Sent: Tuesday, May 25, 2010...



RE: RE: Hector samples -- where?

2010-05-25 Thread Nicholas Sun
Could you please provide some indication as to their location?  Thanks.

 

Nick

 

From: Ran Tavory [mailto:ran...@gmail.com] 
Sent: Tuesday, May 25, 2010 9:15 PM
To: user@cassandra.apache.org
Subject: Re: RE: Hector samples -- where?

 

The best examples are in KeyspaceTest but don't include all scenarios

On May 26, 2010 2:27 AM, "Nicholas Sun"  wrote:

I am also interested in this.  It seems like adding multiple Cols into a CF or 
SuperCols would be very useful.  Like a dataload type capability?

Nick


-Original Message-
From: Bill de hOra [mailto:b...@dehora.net] 
Sent: Tuesday, May 25, 2010...



Re: RE: Hector samples -- where?

2010-05-25 Thread Ran Tavory
The best examples are in KeyspaceTest but don't include all scenarios

On May 26, 2010 2:27 AM, "Nicholas Sun"  wrote:

I am also interested in this.  It seems like adding multiple Cols into a CF
or SuperCols would be very useful.  Like a dataload type capability?

Nick


-Original Message-
From: Bill de hOra [mailto:b...@dehora.net]
Sent: Tuesday, May 25, 2010...


Cassandra-0.6.1 Crash Error: out of memory

2010-05-25 Thread Peng Guo
Hi

There are 3 Cassandra  servcer runing, and 18 process insert lots of data
into the  Cassandra  servcer.
After runing for an hour  the Cassandra  servcer crashed.
The error msg is below:

INFO [GC inspection] 2010-05-26 00:56:50,153 GCInspector.java (line 110) GC
for ConcurrentMarkSweep: 7764 ms, 120920 reclaimed leaving 2168941512 used;
max is 2284584960
 INFO [GC inspection] 2010-05-26 01:01:51,803 GCInspector.java (line 110) GC
for ConcurrentMarkSweep: 5368 ms, 214864 reclaimed leaving 2168850488 used;
max is 2284584960
ERROR [pool-1-thread-63] 2010-05-26 01:10:34,721 Cassandra.java (line 1618)
Internal error processing batch_mutate
java.lang.OutOfMemoryError: Java heap space
ERROR [pool-1-thread-55] 2010-05-26 01:10:29,157 CassandraDaemon.java (line
78) Fatal exception in thread Thread[pool-1-thread-55,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [main] 2010-05-26 01:10:29,157 CassandraDaemon.java (line 195)
Exception encountered during startup.
java.lang.OutOfMemoryError: Java heap space
ERROR [CACHETABLE-TIMER-2] 2010-05-26 01:04:58,068 CassandraDaemon.java
(line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-2,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [pool-1-thread-56] 2010-05-26 01:14:35,482 CassandraDaemon.java (line
78) Fatal exception in thread Thread[pool-1-thread-56,5,main]
java.lang.OutOfMemoryError: Java heap space
 INFO [GC inspection] 2010-05-26 01:14:35,482 GCInspector.java (line 110) GC
for ConcurrentMarkSweep: 7702 ms, 15440656 reclaimed leaving 2153622464
used; max is 2284584960
ERROR [HINTED-HANDOFF-POOL:1] 2010-05-26 01:14:35,274
DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java
heap space
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.OutOfMemoryError: Java heap space
ERROR [pool-1-thread-50] 2010-05-26 01:12:58,865 CassandraDaemon.java (line
78) Fatal exception in thread Thread[pool-1-thread-50,5,main]
java.lang.OutOfMemoryError: Java heap space

-- 
Regards
   Peng Guo


Re: Problem accessing Cassandra wiki top page with browser locale other than english

2010-05-25 Thread Yuki Morishita
Jonathan,

Thanks for reporting an issue.
I will wait and see.

2010年5月25日23:29 Jonathan Ellis :
> Turns out this is a bug in the version of MoinMoin the ASF has
> installed.  There's nothing we can do until the infrastructure team
> upgrades: https://issues.apache.org/jira/browse/INFRA-2741
>
> On Sun, May 23, 2010 at 10:09 PM, Yuki Morishita  wrote:
>> Hi all,
>>
>> I'm currently working on translating cassandra wiki to Japanese.
>> Cassandra is gaining attention in Japan, too. :)
>>
>> I noticed that for those who have browser locale with 'ja', accessing
>> top page of cassandra wiki (http://wiki.apache.org/cassandra) displays
>> Japanese default front page
>> (http://wiki.apache.org/cassandra/フロントページ), not the one wanted
>> (http://wiki.apache.org/cassandra/FrontPage).
>>
>> Since the front page for Japanese locale is not editable, I cannot
>> make any change to it.
>> (FrontPage is translated into Japanese, but with the name FrontPage_JP.)
>>
>> Can I get privilege to edit Japanese front page above?
>> Or, can someone from dev team edit above front page so that everyone
>> with browser locale 'ja' get redirected to 'FrontPage_JP'?
>> (Just put '#redirect FrontPage_JP' in first line of
>> http://wiki.apache.org/cassandra/フロントページ)
>>
>> Thanks in advance,
>>
>> 
>> Yuki Morishita
>> t:yukim (http://twitter.com/yukim)
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 

Yuki Morishita


Order Preserving Partitioner

2010-05-25 Thread Steve Lihn
I have a question on using Order Preserving Partitioner.

Many rowKeys in my system will be related to dates, so it seems natural to
use Order Preserving Partitioner instead of the default Random Partitioner.
However, I have been warned that special attention has to be applied for
Order Preserving Partitioner to work properly (basically to ensure a good
key distribution and avoid "hot spot") and reverting it back to Random may
not be easy. Also not every rowKey is related to dates, for these, using
Random Partitioner is okay, but there is only one place to set Partitioner.

(Note: The intension of this warning is actually to discredit Cassandra and
persuade me not to use it.)

It seems the choice of Partitioner is defined in the storage-conf.xml and is
a global property. My question why does it have to be a global property? Is
there a future plan to make it customizable per KeySpace (just like you
would choose hash or range partition for different table/data in RDBMS) ?

Thanks,
Steve


GMFD messages

2010-05-25 Thread Anthony Molinaro
Hi,

  I just noticed I have lots of these messages

INFO [GMFD:1] 2010-05-25 23:21:04,070 GossipDigestSynMessage.java (line 152)
  Remaining bytes zero. Stopping deserialization in EndPointState.
INFO [GMFD:1] 2010-05-25 23:21:05,224 GossipDigestSynMessage.java (line 129)
   Breaking out to respect the MTU size in EPS. Estimate is 56 

The first message only occurs on some machines in my cluster.  The second
on all of them.

The ones with the first message seem to be building up quite a backlog
in their MessageDeserializer PendingTasks.

I assume there is a correlation, what could be causing this sort of thing?

This cluster is now at 27 m1.xlarge boxes on ec2 running 0.6.2 of some flavor.

Thanks,

-Anthony

-- 

Anthony Molinaro   


RE: Hector samples -- where?

2010-05-25 Thread Nicholas Sun
I am also interested in this.  It seems like adding multiple Cols into a CF or 
SuperCols would be very useful.  Like a dataload type capability?

Nick

-Original Message-
From: Bill de hOra [mailto:b...@dehora.net] 
Sent: Tuesday, May 25, 2010 3:18 PM
To: user@cassandra.apache.org
Subject: Re: Hector samples -- where?

Are there examples of inserting multiple cols into a CF anywhere?

Bill

Ran Tavory wrote:
> http://wiki.github.com/rantav/hector/examples
> 
>> On May 25, 2010 10:43 PM, "Asaf Lahav" > > wrote:
>>
>> Hi, Where can I find Hector code samples?
>>
>>





Re: Why are writes faster than reads?

2010-05-25 Thread Mark Robson
On 25 May 2010 09:04, David Boxenhorn  wrote:

> I have seen several off-hand mentions that writes are inherently faster
> than reads. Why is this so?
>

In addition to the points that other posters made, writes only need to go as
far as your battery-backed raid controller, whereas reads go all the way to
the disc, possibly quite a lot of times (to search a tree structure for the
right block). The client has no option but to wait for these.

For durability, you just need the data to be in battery-backed ram in the
controller, which is a pretty fast path, provided it has sufficient
throughput to get the blocks on to disc faster than they're being written to
(which should not be a problem for Cassandra's sequentially-written commit
log and sstable files)

Mark


Re: Why are writes faster than reads?

2010-05-25 Thread Jonathan Shook
Writes only have to write to the journal before returning. Reads have
to read potentially from several sources, including binary searches of
things that may or may not be cached anywhere.  The journal writes do
not involve much random disk IO, while the read activity does.

On Tue, May 25, 2010 at 11:53 AM, Tatu Saloranta  wrote:
> On Tue, May 25, 2010 at 4:04 AM, Mark Greene  wrote:
>> I'm fairly certain the write path hits the commit log first, then the
>> memtable.
>
> True, but that does not make them any less sequential -- journal logs
> are strictly sequential fast writes. Actual ordering occurs in memory,
> and results are eventually flushed from memtable to disk.
> There is no similar ordering for reads.
>
> -+ Tatu +-
>


Re: Hector samples -- where?

2010-05-25 Thread Bill de hOra

Are there examples of inserting multiple cols into a CF anywhere?

Bill

Ran Tavory wrote:

http://wiki.github.com/rantav/hector/examples

On May 25, 2010 10:43 PM, "Asaf Lahav" > wrote:


Hi, Where can I find Hector code samples?






Re: Hector samples -- where?

2010-05-25 Thread Asaf Lahav
10x

On Tue, May 25, 2010 at 10:45 PM, Ran Tavory  wrote:

> http://wiki.github.com/rantav/hector/examples
>
> On May 25, 2010 10:43 PM, "Asaf Lahav"  wrote:
>
> Hi, Where can I find Hector code samples?
>
>
>


Re: Hector vs cassandra-java-client

2010-05-25 Thread Maxim Kramarenko

Hello

I've used jassandra, works fine and easy for use.

On 25.05.2010 06:21, Peter Hsu wrote:

Hi All,

This may have been answered already, but I did a [quick] Google search and 
didn't find much.  Which is the better Java client to use?  Hector or 
cassandra-java-client or neither?

it seems Hector is more fully featured and more active as a project in general.

What are user experiences with either library?  Any advice?

Thanks,
Peter




Re: Hector samples -- where?

2010-05-25 Thread Ran Tavory
http://wiki.github.com/rantav/hector/examples

On May 25, 2010 10:43 PM, "Asaf Lahav"  wrote:

Hi, Where can I find Hector code samples?


Hector samples -- where?

2010-05-25 Thread Asaf Lahav
Hi, Where can I find Hector code samples?


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Jonathan Ellis
Yes.  But I haven't yet seen a workload with enough data that that
would matter, that wasn't more cpu bound than disk space bound, so that
would usually be premature optimization.

On Tue, May 25, 2010 at 2:23 PM, Robert Edmonds  wrote:
> On 2010-05-25, Jonathan Ellis  wrote:
>> That's true.  But fundamentally Cassandra is expected to use more
>> space than mysql for a few reasons; usually the biggest factor is that
>> Cassandra has to write out each column name in each row, since column
>> names are dynamic unlike in mysql where you declare the columns once
>> for the whole table.
>
> does this mean that using short column names (e.g., "f" instead of
> "first_seen") will save space when storing billions of rows?
>
> --
> Robert Edmonds
> edmo...@debian.org
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Robert Edmonds
On 2010-05-25, Jonathan Ellis  wrote:
> That's true.  But fundamentally Cassandra is expected to use more
> space than mysql for a few reasons; usually the biggest factor is that
> Cassandra has to write out each column name in each row, since column
> names are dynamic unlike in mysql where you declare the columns once
> for the whole table.

does this mean that using short column names (e.g., "f" instead of
"first_seen") will save space when storing billions of rows?

-- 
Robert Edmonds
edmo...@debian.org



Re: Panasas and Cassandra

2010-05-25 Thread Fernanda Foertter
Two reasons:

Do a single node test of large file read/write without having to purchase
additional hard rives at the moment.

Benefit from I/O Panasas delivers that I can't get from local drives...

Keeping the data local, for easier loading.



On 5/25/10 11:58 AM, "Ryan King"  wrote:

> On Tue, May 25, 2010 at 9:06 AM, Fernanda Foertter
>  wrote:
>> Hi everyone,
>> 
>> So we have Panasas (http://www.panasas.com), and want to avoid local drives.
>>  Because panasas has its own redundancy and cache, Can I set RF=1?  If so,
>> can you think of any reason why we shouldn¹t use panasas?
> 
> I don't see why you'd us panasas and cassandra together. They take
> very different approaches to the problem.
> 
> -ryan



Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Jonathan Ellis
the only place we use a java serializer is for the BitSet in bloom filters.

On Tue, May 25, 2010 at 12:37 PM, Chris Goffinet  wrote:
> My money is on the fact that the serializer is just horribly verbose. It's
> using a basic set of the java serializer.
> -Chris
>
>
> On Tue, May 25, 2010 at 10:02 AM, Ryan King  wrote:
>>
>> Also, timestamps for each column.
>>
>> -ryan
>>
>> On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis  wrote:
>> > That's true.  But fundamentally Cassandra is expected to use more
>> > space than mysql for a few reasons; usually the biggest factor is that
>> > Cassandra has to write out each column name in each row, since column
>> > names are dynamic unlike in mysql where you declare the columns once
>> > for the whole table.
>> >
>> > 2010/5/25 Peter Schüller :
>> >>> Could you please tell me why?
>> >>
>> >> There might be pending sstable removals on disk, which won't happen
>> >> until GC or restart. If you just did a bulk insert and checked
>> >> diskspace immediately afterwards, I think this is a possible
>> >> explanation.
>> >>
>> >> (See "Write path" on
>> >> http://wiki.apache.org/cassandra/ArchitectureInternals)
>> >>
>> >> --
>> >> / Peter Schuller aka scode
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of Riptano, the source for professional Cassandra support
>> > http://riptano.com
>> >
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Anyone using hadoop/MapReduce integration currently?

2010-05-25 Thread Utku Can Topçu
Hi Jeremy,

> Why are you using Cassandra versus using data stored in HDFS or HBase?
- I'm thinking of using it for realtime streaming of user data. While
streaming the requests, I'm also using Lucandra for indexing the data in
realtime. It's a better option when you compare it with HBase or the native
HDFS flat files, because of low latency in writes.

> Is there anything holding you back from using it (if you would like to use
it but currently cannot)?

My answer to this would be:
- The current integration only supports the whole range of the CF to be
input for the map phase, it would be way much better if the InputFormat had
means of support for a KeyRange.

Best Regards,
Utku

On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna wrote:

> I'll be doing a presentation on Cassandra's (0.6+) hadoop integration next
> week. Is anyone currently using MapReduce or the initial Pig integration?
>
> (If you're unaware of such integration, see
> http://wiki.apache.org/cassandra/HadoopSupport)
>
> If so, could you post to this thread on how you're using it or planning on
> using it (if not covered by the shroud of secrecy)?
>
> e.g.
> What is the use case?
>
> Why are you using Cassandra versus using data stored in HDFS or HBase?
>
> Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps
> are you running the Job Tracker and Task Trackers on Cassandra nodes?
>
> Is there anything holding you back from using it (if you would like to use
> it but currently cannot)?
>
> Thanks!


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Chris Goffinet
My money is on the fact that the serializer is just horribly verbose. It's
using a basic set of the java serializer.

-Chris


On Tue, May 25, 2010 at 10:02 AM, Ryan King  wrote:

> Also, timestamps for each column.
>
> -ryan
>
> On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis  wrote:
> > That's true.  But fundamentally Cassandra is expected to use more
> > space than mysql for a few reasons; usually the biggest factor is that
> > Cassandra has to write out each column name in each row, since column
> > names are dynamic unlike in mysql where you declare the columns once
> > for the whole table.
> >
> > 2010/5/25 Peter Schüller :
> >>> Could you please tell me why?
> >>
> >> There might be pending sstable removals on disk, which won't happen
> >> until GC or restart. If you just did a bulk insert and checked
> >> diskspace immediately afterwards, I think this is a possible
> >> explanation.
> >>
> >> (See "Write path" on
> http://wiki.apache.org/cassandra/ArchitectureInternals)
> >>
> >> --
> >> / Peter Schuller aka scode
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>


Re: Key cache capacity: 1 when using KeysCached="50%"

2010-05-25 Thread Ran Tavory
 https://issues.apache.org/jira/browse/CASSANDRA-1129

On Tue, May 25, 2010 at 3:42 PM, Jonathan Ellis  wrote:

> That does look like a bug.  Can you create a ticket and upload a
> (preferably small-ish) sstable that illustrates the problem?
>
> On Mon, May 24, 2010 at 12:07 PM, Ran Tavory  wrote:
> > I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't
> > correct, but I do think there's a problem. Here's with my own data:
> > When using actual numbers (in this case for RowsCached) it works as
> > expected, however when specifying KeysCached="100%" I get only 1.
> >> KeysCached="100%"
> > RowsCached="1"
> > />
> >
> > Column Family: KvAds
> > SSTable count: 7
> > Space used (live): 797535964
> > Space used (total): 797535964
> > Memtable Columns Count: 42292
> > Memtable Data Size: 10514176
> > Memtable Switch Count: 24
> > Read Count: 2563704
> > Read Latency: 4.590 ms.
> > Write Count: 1963804
> > Write Latency: 0.025 ms.
> > Pending Tasks: 0
> > Key cache capacity: 1
> > Key cache size: 1
> > Key cache hit rate: 0.0
> > Row cache capacity: 1
> > Row cache size: 1
> > Row cache hit rate: 0.2206178354382234
> > Compacted row minimum size: 386
> > Compacted row maximum size: 9808
> > Compacted row mean size: 616
> >
> > On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis 
> wrote:
> >>
> >> If you really want a cache capacity of 0 then you need to use 0
> >> explicitly, otherwise the % versions will give you at least 1.
> >>
> >> On Mon, May 24, 2010 at 12:34 AM, Ran Tavory  wrote:
> >> > I've noticed that when defining KeysCached="50%" (or KeysCached="100%"
> >> > and I
> >> > didn't test other values with %) then cfstats reports Key cache
> >> > capacity: 1
> >> > This looks weird... is this expected? (version 0.6.1)
> >> > For example, in the default configuration:
> >> >>> > ColumnType="Super"
> >> > CompareWith="UTF8Type"
> >> > CompareSubcolumnsWith="UTF8Type"
> >> > RowsCached="1"
> >> > KeysCached="50%"/>
> >> >
> >> > 
> >> > Keyspace: Keyspace1
> >> > Read Count: 0
> >> > Read Latency: NaN ms.
> >> > Write Count: 0
> >> > Write Latency: NaN ms.
> >> > Pending Tasks: 0
> >> > Column Family: Super1
> >> > SSTable count: 0
> >> > Space used (live): 0
> >> > Space used (total): 0
> >> > Memtable Columns Count: 0
> >> > Memtable Data Size: 0
> >> > Memtable Switch Count: 0
> >> > Read Count: 0
> >> > Read Latency: NaN ms.
> >> > Write Count: 0
> >> > Write Latency: NaN ms.
> >> > Pending Tasks: 0
> >> > Key cache capacity: 20
> >> > Key cache size: 0
> >> > Key cache hit rate: NaN
> >> > Row cache: disabled
> >> > Compacted row minimum size: 0
> >> > Compacted row maximum size: 0
> >> > Compacted row mean size: 0
> >> > Column Family: Super2
> >> > SSTable count: 0
> >> > Space used (live): 0
> >> > Space used (total): 0
> >> > Memtable Columns Count: 0
> >> > Memtable Data Size: 0
> >> > Memtable Switch Count: 0
> >> > Read Count: 0
> >> > Read Latency: NaN ms.
> >> > Write Count: 0
> >> > Write Latency: NaN ms.
> >> > Pending Tasks: 0
> >> > Key cache capacity: 1
> >> > Key cache size: 0
> >> > Key cache hit rate: NaN
> >> > Row cache capacity: 1
> >> > Row cache size: 0
> >> > Row cache hit rate: NaN
> >> > Compacted row minimum size: 0
> >> > Compacted row maximum size: 0
> >> > Compacted row mean size: 0
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Ryan King
Also, timestamps for each column.

-ryan

On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis  wrote:
> That's true.  But fundamentally Cassandra is expected to use more
> space than mysql for a few reasons; usually the biggest factor is that
> Cassandra has to write out each column name in each row, since column
> names are dynamic unlike in mysql where you declare the columns once
> for the whole table.
>
> 2010/5/25 Peter Schüller :
>>> Could you please tell me why?
>>
>> There might be pending sstable removals on disk, which won't happen
>> until GC or restart. If you just did a bulk insert and checked
>> diskspace immediately afterwards, I think this is a possible
>> explanation.
>>
>> (See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)
>>
>> --
>> / Peter Schuller aka scode
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Panasas and Cassandra

2010-05-25 Thread Ryan King
On Tue, May 25, 2010 at 9:06 AM, Fernanda Foertter
 wrote:
> Hi everyone,
>
> So we have Panasas (http://www.panasas.com), and want to avoid local drives.
>  Because panasas has its own redundancy and cache, Can I set RF=1?  If so,
> can you think of any reason why we shouldn’t use panasas?

I don't see why you'd us panasas and cassandra together. They take
very different approaches to the problem.

-ryan


Re: Why are writes faster than reads?

2010-05-25 Thread Tatu Saloranta
On Tue, May 25, 2010 at 4:04 AM, Mark Greene  wrote:
> I'm fairly certain the write path hits the commit log first, then the
> memtable.

True, but that does not make them any less sequential -- journal logs
are strictly sequential fast writes. Actual ordering occurs in memory,
and results are eventually flushed from memtable to disk.
There is no similar ordering for reads.

-+ Tatu +-


Anyone using hadoop/MapReduce integration currently?

2010-05-25 Thread Jeremy Hanna
I'll be doing a presentation on Cassandra's (0.6+) hadoop integration next 
week. Is anyone currently using MapReduce or the initial Pig integration?

(If you're unaware of such integration, see 
http://wiki.apache.org/cassandra/HadoopSupport)

If so, could you post to this thread on how you're using it or planning on 
using it (if not covered by the shroud of secrecy)?

e.g.
What is the use case?

Why are you using Cassandra versus using data stored in HDFS or HBase?

Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps are 
you running the Job Tracker and Task Trackers on Cassandra nodes?

Is there anything holding you back from using it (if you would like to use it 
but currently cannot)?

Thanks!

RE: high-scale-lib & clhm-production jars

2010-05-25 Thread Carlos Sanchez
Thanks a lot

From: Tobias Jungen [mailto:tobias.jun...@gmail.com]
Sent: Tuesday, May 25, 2010 10:56 AM
To: user@cassandra.apache.org
Subject: Re: high-scale-lib & clhm-production jars

High-scale-lib: http://sourceforge.net/projects/high-scale-lib/
CLHM: http://code.google.com/p/concurrentlinkedhashmap/
On Tue, May 25, 2010 at 10:17 AM, Carlos Sanchez 
mailto:carlos.sanc...@riskmetrics.com>> wrote:
Do anyone know if there are repositories for high-scale-lib & clhm-production 
jars? Is the source available somewhere?

Thanks

Carlos

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Panasas and Cassandra

2010-05-25 Thread Fernanda Foertter
Hi everyone,

So we have Panasas (http://www.panasas.com), and want to avoid local drives.
Because panasas has its own redundancy and cache, Can I set RF=1?  If so,
can you think of any reason why we shouldn¹t use panasas?

Thanks in advance

ŒFernie¹


Re: high-scale-lib & clhm-production jars

2010-05-25 Thread Tobias Jungen
High-scale-lib: http://sourceforge.net/projects/high-scale-lib/
CLHM: http://code.google.com/p/concurrentlinkedhashmap/

On Tue, May 25, 2010 at 10:17 AM, Carlos Sanchez <
carlos.sanc...@riskmetrics.com> wrote:

> Do anyone know if there are repositories for high-scale-lib &
> clhm-production jars? Is the source available somewhere?
>
> Thanks
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>


RE: Nunit Testing & Cassandra

2010-05-25 Thread Sandeep
A great suggestion.  I tried what you have mentioned. They are not equal.  I 
get the same error.

Thanks for your help. I appreciate it.

From: Miguel Verde [mailto:miguelitov...@gmail.com]
Sent: Tuesday, May 25, 2010 11:07 AM
To: user@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Nunit Testing & Cassandra

My guess is that it is using object equality to compare.  One thing to test 
would be to create two KeySlices whose contents had the same values, add them 
to separate lists, and then compare the lists. I think you'll find that they 
are not 'equivalent'.


On May 25, 2010, at 10:00 AM, Sandeep 
mailto:sand...@indatus.com>> wrote:
SDSWebService.Service1Test.GetListOfRowKeysFromCF:
  Expected: equivalent to < , 
, 
, 
 >

  But was:  < , 
, 
, 
 >

Is.EquivalentTo( ICollection ) - tests that two collections are equivalent.

Two collections are equivalent if they contain the same items, in any order.

Assert.That(listOfKeys,  
Is.EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));

From: Miguel Verde [mailto:miguelitov...@gmail.com]
Sent: Tuesday, May 25, 2010 9:51 AM
To: user@cassandra.apache.org
Subject: Re: Nunit Testing & Cassandra

It would be helpful to know in what way the test fails, or more information 
about listOfKeys or the return value of GetListOfRowKeysFromCF at assert time, 
or for that matter what GetListOfRowKeysFromCF is, or the insertion code.

Also, does Is.EquivalentTo compare object equality on the items inside the 
collection? If so, that would be a problem.
On Tue, May 25, 2010 at 8:40 AM, Sandeep 
mailto:sand...@indatus.com>> wrote:
Assert.AreEqual(listOfKeys,  
Is.EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));

TestService.GetListOfRowKeysFromCF() returns a List. I 
am constructing the same list in the same order in which I have 
inserted in some other method.

But the test always fails. Can any one please tell me where am I 
going wrong. Timestamp value is a global variable and is used through out the 
class.

  Thanks in advance.



high-scale-lib & clhm-production jars

2010-05-25 Thread Carlos Sanchez
Do anyone know if there are repositories for high-scale-lib & clhm-production 
jars? Is the source available somewhere?

Thanks

Carlos

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


Re: Nunit Testing & Cassandra

2010-05-25 Thread Miguel Verde
My guess is that it is using object equality to compare.  One thing to  
test would be to create two KeySlices whose contents had the same  
values, add them to separate lists, and then compare the lists. I  
think you'll find that they are not 'equivalent'.



On May 25, 2010, at 10:00 AM, Sandeep  wrote:


SDSWebService.Service1Test.GetListOfRowKeysFromCF:

  Expected: equivalent to < System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key3,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key4,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key2,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])> >




  But was:  < System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key3,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key4,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])>, key2,columns: System.Collections.Generic.List`1 
[Apache.Cassandra.ColumnOrSuperColumn])> >




Is.EquivalentTo( ICollection ) - tests that two collections are  
equivalent.




Two collections are equivalent if they contain the same items, in  
any order.




Assert.That(listOfKeys,  Is.EquivalentTo 
(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));




From: Miguel Verde [mailto:miguelitov...@gmail.com]
Sent: Tuesday, May 25, 2010 9:51 AM
To: user@cassandra.apache.org
Subject: Re: Nunit Testing & Cassandra



It would be helpful to know in what way the test fails, or more  
information about listOfKeys or the return value of  
GetListOfRowKeysFromCF at assert time, or for that matter what  
GetListOfRowKeysFromCF is, or the insertion code.




Also, does Is.EquivalentTo compare object equality on the items  
inside the collection? If so, that would be a problem.


On Tue, May 25, 2010 at 8:40 AM, Sandeep  wrote:

Assert.AreEqual(listOfKeys,  Is.EquivalentTo 
(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));




TestService.GetListOfRowKeysFromCF() returns a  
List. I am constructing the same list in the  
same order in which I have inserted in some other method.




But the test always fails. Can any one please tell me  
where am I going wrong. Timestamp value is a global variable and is  
used through out the class.




  Thanks in advance.







RE: Nunit Testing & Cassandra

2010-05-25 Thread Sandeep
SDSWebService.Service1Test.GetListOfRowKeysFromCF:
  Expected: equivalent to < , 
, 
, 
 >

  But was:  < , 
, 
, 
 >

Is.EquivalentTo( ICollection ) - tests that two collections are equivalent.

Two collections are equivalent if they contain the same items, in any order.

Assert.That(listOfKeys,  
Is.EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));

From: Miguel Verde [mailto:miguelitov...@gmail.com]
Sent: Tuesday, May 25, 2010 9:51 AM
To: user@cassandra.apache.org
Subject: Re: Nunit Testing & Cassandra

It would be helpful to know in what way the test fails, or more information 
about listOfKeys or the return value of GetListOfRowKeysFromCF at assert time, 
or for that matter what GetListOfRowKeysFromCF is, or the insertion code.

Also, does Is.EquivalentTo compare object equality on the items inside the 
collection? If so, that would be a problem.
On Tue, May 25, 2010 at 8:40 AM, Sandeep 
mailto:sand...@indatus.com>> wrote:
Assert.AreEqual(listOfKeys,  
Is.EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));

TestService.GetListOfRowKeysFromCF() returns a List. I 
am constructing the same list in the same order in which I have 
inserted in some other method.

But the test always fails. Can any one please tell me where am I 
going wrong. Timestamp value is a global variable and is used through out the 
class.

  Thanks in advance.



Re: get() with TTL update?

2010-05-25 Thread Vick Khera
On Mon, May 24, 2010 at 4:53 PM, Jonathan Ellis  wrote:
> (a) cassandra does not use update-in-place storage so doing the update
> as part of the get call isn't much of an efficiency gain

If you could issue an "update" type of command, any other data needed
for the new copy of the object could be copied internally to the
server, saving the round-trip of that data over the wire to the
client.  I know from my own experience using Postgres that this is a
big win for some operations.  In this particular case there are
several megabytes of data involved, so there would be a significant
efficiency gain, IMO.


Re: Problem accessing Cassandra wiki top page with browser locale other than english

2010-05-25 Thread Jonathan Ellis
Turns out this is a bug in the version of MoinMoin the ASF has
installed.  There's nothing we can do until the infrastructure team
upgrades: https://issues.apache.org/jira/browse/INFRA-2741

On Sun, May 23, 2010 at 10:09 PM, Yuki Morishita  wrote:
> Hi all,
>
> I'm currently working on translating cassandra wiki to Japanese.
> Cassandra is gaining attention in Japan, too. :)
>
> I noticed that for those who have browser locale with 'ja', accessing
> top page of cassandra wiki (http://wiki.apache.org/cassandra) displays
> Japanese default front page
> (http://wiki.apache.org/cassandra/フロントページ), not the one wanted
> (http://wiki.apache.org/cassandra/FrontPage).
>
> Since the front page for Japanese locale is not editable, I cannot
> make any change to it.
> (FrontPage is translated into Japanese, but with the name FrontPage_JP.)
>
> Can I get privilege to edit Japanese front page above?
> Or, can someone from dev team edit above front page so that everyone
> with browser locale 'ja' get redirected to 'FrontPage_JP'?
> (Just put '#redirect FrontPage_JP' in first line of
> http://wiki.apache.org/cassandra/フロントページ)
>
> Thanks in advance,
>
> 
> Yuki Morishita
> t:yukim (http://twitter.com/yukim)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


RE: Hector vs cassandra-java-client

2010-05-25 Thread Dop Sun
Updated.

Cheers~
Dop

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Tuesday, May 25, 2010 8:39 PM
To: user@cassandra.apache.org
Subject: Re: Hector vs cassandra-java-client

You should link it on http://wiki.apache.org/cassandra/ClientOptions
(click Login to edit)

On Tue, May 25, 2010 at 2:12 AM, Dop Sun  wrote:
> A good chance to introduce my http://code.google.com/p/jassandra/
>
>
>
> J
>
>
>
> Another Java client, and well, it cannot be found with Cassandra java
client
> keywords. K
>
>
>
> From: Ran Tavory [mailto:ran...@gmail.com]
> Sent: Tuesday, May 25, 2010 2:52 PM
> To: user@cassandra.apache.org
> Subject: Re: Hector vs cassandra-java-client
>
>
>
> cassandra-java-client is up to cassandra's 0.4.2 version, so you probably
> can't use it out of the box.
>
> Hector is active and up to the latest 0.6.1 release with a bunch of
> committers, contributors and users.
> See http://wiki.github.com/rantav/hector/
> and http://groups.google.com/group/hector-users
>
> On Tue, May 25, 2010 at 5:36 AM, Jeff Zhang  wrote:
>
> I think hector is better, and seems the author of
> cassandra-java-client does not continue work on it.
>
>
> On Tue, May 25, 2010 at 10:21 AM, Peter Hsu  wrote:
>> Hi All,
>>
>> This may have been answered already, but I did a [quick] Google search
and
>> didn't find much.  Which is the better Java client to use?  Hector or
>> cassandra-java-client or neither?
>>
>> it seems Hector is more fully featured and more active as a project in
>> general.
>>
>> What are user experiences with either library?  Any advice?
>>
>> Thanks,
>> Peter
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com




Re: Nunit Testing & Cassandra

2010-05-25 Thread Miguel Verde
It would be helpful to know in what way the test fails, or more information
about listOfKeys or the return value of GetListOfRowKeysFromCF at assert
time, or for that matter what GetListOfRowKeysFromCF is, or the insertion
code.

Also, does Is.EquivalentTo compare object equality on the items inside the
collection? If so, that would be a problem.

On Tue, May 25, 2010 at 8:40 AM, Sandeep  wrote:

>  Assert.AreEqual(listOfKeys,  Is
> .EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName",
> "Keyspace1")));
>
>
>
> TestService.GetListOfRowKeysFromCF() returns a List.
> I am constructing the same list in the same order in which I have
> inserted in some other method.
>
>
>
> But the test always fails. Can any one please tell me where am
> I going wrong. Timestamp value is a global variable and is used through out
> the class.
>
>
>
>   Thanks in advance.
>


Nunit Testing & Cassandra

2010-05-25 Thread Sandeep
Hi all,

I am recent grad and working on  Cassandra and Nunit testing.

I wrote a unit test in C# which goes like this

List listOfKeys = new List();

KeySlice item1 = new KeySlice();

KeySlice item2 = new KeySlice();

KeySlice item3 = new KeySlice();

KeySlice item4 = new KeySlice();

item1.Key = "key1";

item2.Key = "key2";

item3.Key = "key3";

item4.Key = "key4";

List listOfColumnOrSuperColumn1 = new 
List();

List listOfColumnOrSuperColumn2 = new 
List();

List listOfColumnOrSuperColumn3 = new 
List();

List listOfColumnOrSuperColumn4 = new 
List();

listOfColumnOrSuperColumn1.Add(new ColumnOrSuperColumn() { Column = 
new Column() { Name = utf8Encoding.GetBytes("key1"), Value = 
utf8Encoding.GetBytes("100"), Timestamp = timeStamp } });

listOfColumnOrSuperColumn2.Add(new ColumnOrSuperColumn() { Column = 
new Column() { Name = utf8Encoding.GetBytes("key2"), Value = 
utf8Encoding.GetBytes("200"), Timestamp = timeStamp } });

listOfColumnOrSuperColumn3.Add(new ColumnOrSuperColumn() { Column = 
new Column() { Name = utf8Encoding.GetBytes("key3"), Value = 
utf8Encoding.GetBytes("300"), Timestamp = timeStamp } });

listOfColumnOrSuperColumn4.Add(new ColumnOrSuperColumn() { Column = 
new Column() { Name = utf8Encoding.GetBytes("key4"), Value = 
utf8Encoding.GetBytes("400"), Timestamp = timeStamp } });

item1.Columns = listOfColumnOrSuperColumn1;

item2.Columns = listOfColumnOrSuperColumn2;

item3.Columns = listOfColumnOrSuperColumn3;

item4.Columns = listOfColumnOrSuperColumn4;

listOfKeys.Add(item1);

listOfKeys.Add(item3);

listOfKeys.Add(item4);

listOfKeys.Add(item2);

Assert.AreEqual(listOfKeys,  
Is.EquivalentTo(TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1")));

TestService.GetListOfRowKeysFromCF() returns a List. I 
am constructing the same list in the same order in which I have 
inserted in some other method.

But the test always fails. Can any one please tell me where am I 
going wrong. Timestamp value is a global variable and is used through out the 
class.

  Thanks in advance.


Re: Cassandra configuration settings

2010-05-25 Thread Jonathan Ellis
If you don't know what your workload bottlenecks are, the defaults are fine.

On Tue, May 25, 2010 at 5:05 AM, sharanabasava raddi
 wrote:
> Hi All,
> Could u please give configuration settings for "single node"(Windows
> machine), so that it must be "time and space efficient".
>
>
>
>
>
> Thanks,
> Sharan
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-25 Thread Jonathan Ellis
What happens if you disable row cache?

On Tue, May 25, 2010 at 4:53 AM, Ran Tavory  wrote:
> It seems there's an error reporting the Key cache hit rate. The value is
> always 0.0 and I have a feeling it's incorrect. This is seen both by using
> notetool cfstats as well as accessing JMX directly
> (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
> RecentHitRate)
>                            RowsCached="1000"
>                     KeysCached="1000"/>
>                 Column Family: KvAds
>                 SSTable count: 7
>                 Space used (live): 1288942061
>                 Space used (total): 1559831566
>                 Memtable Columns Count: 73698
>                 Memtable Data Size: 17121092
>                 Memtable Switch Count: 33
>                 Read Count: 3614433
>                 Read Latency: 0.068 ms.
>                 Write Count: 3503269
>                 Write Latency: 0.024 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 1000
>                 Key cache size: 619624
>                 Key cache hit rate: 0.0
>                 Row cache capacity: 1000
>                 Row cache size: 447154
>                 Row cache hit rate: 0.8460295730014572
>                 Compacted row minimum size: 387
>                 Compacted row maximum size: 31430
>                 Compacted row mean size: 631
> The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems
> to be 0.0 while the number of unique keys stays about 619624 for quite a
> while.
> Is it a real caching problem or just a reporting glitch?



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Key cache capacity: 1 when using KeysCached="50%"

2010-05-25 Thread Jonathan Ellis
That does look like a bug.  Can you create a ticket and upload a
(preferably small-ish) sstable that illustrates the problem?

On Mon, May 24, 2010 at 12:07 PM, Ran Tavory  wrote:
> I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't
> correct, but I do think there's a problem. Here's with my own data:
> When using actual numbers (in this case for RowsCached) it works as
> expected, however when specifying KeysCached="100%" I get only 1.
>                KeysCached="100%"
>         RowsCached="1"
>         />
>
>                 Column Family: KvAds
>                 SSTable count: 7
>                 Space used (live): 797535964
>                 Space used (total): 797535964
>                 Memtable Columns Count: 42292
>                 Memtable Data Size: 10514176
>                 Memtable Switch Count: 24
>                 Read Count: 2563704
>                 Read Latency: 4.590 ms.
>                 Write Count: 1963804
>                 Write Latency: 0.025 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 1
>                 Key cache size: 1
>                 Key cache hit rate: 0.0
>                 Row cache capacity: 1
>                 Row cache size: 1
>                 Row cache hit rate: 0.2206178354382234
>                 Compacted row minimum size: 386
>                 Compacted row maximum size: 9808
>                 Compacted row mean size: 616
>
> On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis  wrote:
>>
>> If you really want a cache capacity of 0 then you need to use 0
>> explicitly, otherwise the % versions will give you at least 1.
>>
>> On Mon, May 24, 2010 at 12:34 AM, Ran Tavory  wrote:
>> > I've noticed that when defining KeysCached="50%" (or KeysCached="100%"
>> > and I
>> > didn't test other values with %) then cfstats reports Key cache
>> > capacity: 1
>> > This looks weird... is this expected? (version 0.6.1)
>> > For example, in the default configuration:
>> >       > >                     ColumnType="Super"
>> >                     CompareWith="UTF8Type"
>> >                     CompareSubcolumnsWith="UTF8Type"
>> >                     RowsCached="1"
>> >                     KeysCached="50%"/>
>> >
>> > 
>> > Keyspace: Keyspace1
>> >         Read Count: 0
>> >         Read Latency: NaN ms.
>> >         Write Count: 0
>> >         Write Latency: NaN ms.
>> >         Pending Tasks: 0
>> >                 Column Family: Super1
>> >                 SSTable count: 0
>> >                 Space used (live): 0
>> >                 Space used (total): 0
>> >                 Memtable Columns Count: 0
>> >                 Memtable Data Size: 0
>> >                 Memtable Switch Count: 0
>> >                 Read Count: 0
>> >                 Read Latency: NaN ms.
>> >                 Write Count: 0
>> >                 Write Latency: NaN ms.
>> >                 Pending Tasks: 0
>> >                 Key cache capacity: 20
>> >                 Key cache size: 0
>> >                 Key cache hit rate: NaN
>> >                 Row cache: disabled
>> >                 Compacted row minimum size: 0
>> >                 Compacted row maximum size: 0
>> >                 Compacted row mean size: 0
>> >                 Column Family: Super2
>> >                 SSTable count: 0
>> >                 Space used (live): 0
>> >                 Space used (total): 0
>> >                 Memtable Columns Count: 0
>> >                 Memtable Data Size: 0
>> >                 Memtable Switch Count: 0
>> >                 Read Count: 0
>> >                 Read Latency: NaN ms.
>> >                 Write Count: 0
>> >                 Write Latency: NaN ms.
>> >                 Pending Tasks: 0
>> >                 Key cache capacity: 1
>> >                 Key cache size: 0
>> >                 Key cache hit rate: NaN
>> >                 Row cache capacity: 1
>> >                 Row cache size: 0
>> >                 Row cache hit rate: NaN
>> >                 Compacted row minimum size: 0
>> >                 Compacted row maximum size: 0
>> >                 Compacted row mean size: 0
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Jonathan Ellis
That's true.  But fundamentally Cassandra is expected to use more
space than mysql for a few reasons; usually the biggest factor is that
Cassandra has to write out each column name in each row, since column
names are dynamic unlike in mysql where you declare the columns once
for the whole table.

2010/5/25 Peter Schüller :
>> Could you please tell me why?
>
> There might be pending sstable removals on disk, which won't happen
> until GC or restart. If you just did a bulk insert and checked
> diskspace immediately afterwards, I think this is a possible
> explanation.
>
> (See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)
>
> --
> / Peter Schuller aka scode
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Hector vs cassandra-java-client

2010-05-25 Thread Jonathan Ellis
You should link it on http://wiki.apache.org/cassandra/ClientOptions
(click Login to edit)

On Tue, May 25, 2010 at 2:12 AM, Dop Sun  wrote:
> A good chance to introduce my http://code.google.com/p/jassandra/
>
>
>
> J
>
>
>
> Another Java client, and well, it cannot be found with Cassandra java client
> keywords. K
>
>
>
> From: Ran Tavory [mailto:ran...@gmail.com]
> Sent: Tuesday, May 25, 2010 2:52 PM
> To: user@cassandra.apache.org
> Subject: Re: Hector vs cassandra-java-client
>
>
>
> cassandra-java-client is up to cassandra's 0.4.2 version, so you probably
> can't use it out of the box.
>
> Hector is active and up to the latest 0.6.1 release with a bunch of
> committers, contributors and users.
> See http://wiki.github.com/rantav/hector/
> and http://groups.google.com/group/hector-users
>
> On Tue, May 25, 2010 at 5:36 AM, Jeff Zhang  wrote:
>
> I think hector is better, and seems the author of
> cassandra-java-client does not continue work on it.
>
>
> On Tue, May 25, 2010 at 10:21 AM, Peter Hsu  wrote:
>> Hi All,
>>
>> This may have been answered already, but I did a [quick] Google search and
>> didn't find much.  Which is the better Java client to use?  Hector or
>> cassandra-java-client or neither?
>>
>> it seems Hector is more fully featured and more active as a project in
>> general.
>>
>> What are user experiences with either library?  Any advice?
>>
>> Thanks,
>> Peter
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Why are writes faster than reads?

2010-05-25 Thread Peter Schüller
> I'm fairly certain the write path hits the commit log first, then the
> memtable.

I didn't mean to imply an ordering between the two (I probably should
not have said "memtable plus commit log"...), and yes I believe so.

-- 
/ Peter Schuller aka scode


Re: Why are writes faster than reads?

2010-05-25 Thread Mark Greene
I'm fairly certain the write path hits the commit log first, then the
memtable.

2010/5/25 Peter Schüller 

> > I have seen several off-hand mentions that writes are inherently faster
> than
> > reads. Why is this so?
>
> I believe the primary factor people are referring to is that writes
> are faster than reads in terms of disk I/O because writes are
> inherently sequential. Writes initially only happen in-memory plus in
> a (sequentially written) commit log; when flushed out to an sstable
> that is likewise sequential writing.
>
> Reads on the other hand, to the extent that they go down to disk, will
> suffer the usual overhead associated with disk seeks.
>
> See http://wiki.apache.org/cassandra/ArchitectureInternals for details.
>
> --
> / Peter Schuller aka scode
>


Cassandra configuration settings

2010-05-25 Thread sharanabasava raddi
Hi All,
Could u please give configuration settings for "single node"(Windows
machine), so that it must be "time and space efficient".





Thanks,
Sharan


Error reporting Key cache hit rate with cfstats or with JMX

2010-05-25 Thread Ran Tavory
It seems there's an error reporting the Key cache hit rate. The value is
always 0.0 and I have a feeling it's incorrect. This is seen both by using
notetool cfstats as well as accessing JMX directly
(org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
RecentHitRate)

  

Column Family: KvAds
SSTable count: 7
Space used (live): 1288942061
Space used (total): 1559831566
Memtable Columns Count: 73698
Memtable Data Size: 17121092
Memtable Switch Count: 33
Read Count: 3614433
Read Latency: 0.068 ms.
Write Count: 3503269
Write Latency: 0.024 ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 619624
Key cache hit rate: 0.0
Row cache capacity: 1000
Row cache size: 447154
Row cache hit rate: 0.8460295730014572
Compacted row minimum size: 387
Compacted row maximum size: 31430
Compacted row mean size: 631

The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems
to be 0.0 while the number of unique keys stays about 619624 for quite a
while.
Is it a real caching problem or just a reporting glitch?


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread sharanabasava raddi
Hi Peter,
Thanks a lot.



Regards,
Sharan

2010/5/25 Peter Schüller 

> > Could you please tell me why?
>
> There might be pending sstable removals on disk, which won't happen
> until GC or restart. If you just did a bulk insert and checked
> diskspace immediately afterwards, I think this is a possible
> explanation.
>
> (See "Write path" on
> http://wiki.apache.org/cassandra/ArchitectureInternals)
>
> --
> / Peter Schuller aka scode
>


Re: Why are writes faster than reads?

2010-05-25 Thread Peter Schüller
> I have seen several off-hand mentions that writes are inherently faster than
> reads. Why is this so?

I believe the primary factor people are referring to is that writes
are faster than reads in terms of disk I/O because writes are
inherently sequential. Writes initially only happen in-memory plus in
a (sequentially written) commit log; when flushed out to an sstable
that is likewise sequential writing.

Reads on the other hand, to the extent that they go down to disk, will
suffer the usual overhead associated with disk seeks.

See http://wiki.apache.org/cassandra/ArchitectureInternals for details.

-- 
/ Peter Schuller aka scode


Why are writes faster than reads?

2010-05-25 Thread David Boxenhorn
I have seen several off-hand mentions that writes are inherently faster than
reads. Why is this so?


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread Peter Schüller
> Could you please tell me why?

There might be pending sstable removals on disk, which won't happen
until GC or restart. If you just did a bulk insert and checked
diskspace immediately afterwards, I think this is a possible
explanation.

(See "Write path" on http://wiki.apache.org/cassandra/ArchitectureInternals)

-- 
/ Peter Schuller aka scode


Re: Why Cassandra is "space inefficient" compared to MySQL?

2010-05-25 Thread sharanabasava raddi
Hi Cao,
Thanks for your response.

actually am using ReplicationFactor = 1.




Thanks,
Sharan

2010/5/25 casablinca126.com 

>  hi Sharan,
> what's the replication factor are you using ?
>
> regards,
> Cao Jiguang
>
>
> 2010-05-25
> --
>  casablinca126.com
> --
>  *发件人:* sharanabasava raddi
> *发送时间:* 2010-05-25  13:46:38
> *收件人:* user@cassandra.apache.org
> *抄送:*
> *主题:* Why Cassandra is "space inefficient" compared to MySQL?
>  Hi all,
> Am running "Cassandra" on Windows XP (single node) machine.
> I have made insertion of about "10 million" records into "Cassandra" , and
> it took around 90 minutes to insert and 8GB of space.
> For the same number of records MySQL will take "3 GB" space.
>
> Could you please tell me why?
> And please Give me the complete documentation for running Thrift API with
> Java to talk to Cassandra in "Red Hat Linux".
>
>
>
>
>
> Thanks,
> Sharan
>
>
>


RE: Hector vs cassandra-java-client

2010-05-25 Thread Dop Sun
A good chance to introduce my http://code.google.com/p/jassandra/

 

J

 

Another Java client, and well, it cannot be found with Cassandra java client 
keywords. K

 

From: Ran Tavory [mailto:ran...@gmail.com] 
Sent: Tuesday, May 25, 2010 2:52 PM
To: user@cassandra.apache.org
Subject: Re: Hector vs cassandra-java-client

 

cassandra-java-client is up to cassandra's 0.4.2 version, so you probably can't 
use it out of the box.

Hector is active and up to the latest 0.6.1 release with a bunch of committers, 
contributors and users. See http://wiki.github.com/rantav/hector/ and 
http://groups.google.com/group/hector-users

On Tue, May 25, 2010 at 5:36 AM, Jeff Zhang  wrote:

I think hector is better, and seems the author of
cassandra-java-client does not continue work on it.




On Tue, May 25, 2010 at 10:21 AM, Peter Hsu  wrote:
> Hi All,
>
> This may have been answered already, but I did a [quick] Google search and 
> didn't find much.  Which is the better Java client to use?  Hector or 
> cassandra-java-client or neither?
>
> it seems Hector is more fully featured and more active as a project in 
> general.
>
> What are user experiences with either library?  Any advice?
>
> Thanks,
> Peter




--
Best Regards

Jeff Zhang