Re: unsubscribe

2011-01-06 Thread Robert Coli
On Thu, Jan 6, 2011 at 10:52 PM, Nichole Kulobone  wrote:
> [ unsubscribe ]

http://wiki.apache.org/cassandra/FAQ#unsubscribe
"
Q : How do I unsubscribe from the email list?

A : Send an email to user-unsubscr...@cassandra.apache.org
"

=Rob


Re: quick question about super columns

2011-01-06 Thread Arijit Mukherjee
Thanx to both of you. I can now go ahead a bit more.

Arijit

On 7 January 2011 12:53, Narendra Sharma  wrote:
> With raw thrift APIs:
>
> 1. Fetch column from supercolumn:
>
> ColumnPath cp = new ColumnPath("ColumnFamily");
> cp.setSuper_column("SuperColumnName");
> cp.setColumn("ColumnName");
> ColumnOrSuperColumn resp = client.get(getByteBuffer("RowKey"), cp,
> ConsistencyLevel.ONE);
> Column c = resp.getColumn();
>
> 2. Add a new supercolumn:
>
>     SuperColumn superColumn = new SuperColumn();
>     superColumn.setName(getBytes("SuperColumnName"));
>     cols = new ArrayList();
>     Column c = new Column();
>     c.setName(name);
>     c.setValue(value);
>     c.setTimestamp(timeStamp);
>     cols.add(c);
>     //repeat above 5 lines for as many cols you want in supercolumn
>     superColumn.setColumns(cols);
>
>
>     List mutations = new ArrayList();
>     ColumnOrSuperColumn csc = new ColumnOrSuperColumn();
>     csc.setSuper_column(superColumn);
>     csc.setSuper_columnIsSet(true);
>     Mutation m = new Mutation();
>     m.setColumn_or_supercolumn(csc);
>     m.setColumn_or_supercolumnIsSet(true);
>     mutations.add(m);
>
>
>     Map> allMutations = new HashMap List>();
>     allMutations.put("ColumnFamilyName", mutations);
>     Map>> mutationMap = new
> HashMap>>();
>     mutationMap.put(getByteBuffer("RowKey"), mutations);
>     client.batch_mutate(mutationMap, ConsistencyLevel.ONE);
>
> HTH!
>
> Thanks,
> Naren
>
>
>
> On Thu, Jan 6, 2011 at 10:42 PM, Arijit Mukherjee 
> wrote:
>>
>> Thank you. And is it similar if I want to search a subcolumn within a
>> given supercolumn? I mean I have the supercolumn key and the subcolumn
>> key - can I fetch the particular subcolumn?
>>
>> Can you share a small piece of example code for both?
>>
>> I'm still new into this and trying to figure out the Thrift APIs. I
>> attempted to use Hector, but got myself into more confusion.
>>
>> Arijit
>>
>> On 7 January 2011 11:44, Roshan Dawrani  wrote:
>> >
>> > On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee 
>> > wrote:
>> >>
>> >> Hi
>> >>
>> >> I've a quick question about supercolumns.
>> >> EventRecord = {
>> >>    eventKey2: {
>> >>        e2-ts1: {set of columns},
>> >>        e2-ts2: {set of columns},
>> >>        ...
>> >>        e2-tsn: {set of columns}
>> >>    }
>> >>    
>> >> }
>> >>
>> >> If I want to append another "e2-tsp: {set of columns}" to the event
>> >> record keyed by eventKey2, do I need to retrieve the entire eventKey2
>> >> map, and then append this new row and re-insert eventKey2?
>> >
>> > No, you can simply insert a new super column with its sub-columns with
>> > the rowKey that you want, and it will join the other super columns of that
>> > row.
>> >
>> > A row have billions of super columns. Imagine fetching them all, just to
>> > add one more super column into it.
>> >
>> >
>> >
>> >
>> >
>>
>>
>> --
>> "And when the night is cloudy,
>> There is still a light that shines on me,
>> Shine on until tomorrow, let it be."
>
>



-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."


Re: quick question about super columns

2011-01-06 Thread Narendra Sharma
With raw thrift APIs:

1. Fetch column from supercolumn:

ColumnPath cp = new ColumnPath("ColumnFamily");
cp.setSuper_column("SuperColumnName");
cp.setColumn("ColumnName");
ColumnOrSuperColumn resp = client.get(getByteBuffer("RowKey"), cp,
ConsistencyLevel.ONE);
Column c = resp.getColumn();

2. Add a new supercolumn:

SuperColumn superColumn = new SuperColumn();
superColumn.setName(getBytes("SuperColumnName"));
cols = new ArrayList();
Column c = new Column();
c.setName(name);
c.setValue(value);
c.setTimestamp(timeStamp);
cols.add(c);
//repeat above 5 lines for as many cols you want in supercolumn
superColumn.setColumns(cols);


List mutations = new ArrayList();
ColumnOrSuperColumn csc = new ColumnOrSuperColumn();
csc.setSuper_column(superColumn);
csc.setSuper_columnIsSet(true);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(csc);
m.setColumn_or_supercolumnIsSet(true);
mutations.add(m);


Map> allMutations = new HashMap>();
allMutations.put("ColumnFamilyName", mutations);
Map>> mutationMap = new
HashMap>>();
mutationMap.put(getByteBuffer("RowKey"), mutations);
client.batch_mutate(mutationMap, ConsistencyLevel.ONE);

HTH!

Thanks,
Naren



On Thu, Jan 6, 2011 at 10:42 PM, Arijit Mukherjee wrote:

> Thank you. And is it similar if I want to search a subcolumn within a
> given supercolumn? I mean I have the supercolumn key and the subcolumn
> key - can I fetch the particular subcolumn?
>
> Can you share a small piece of example code for both?
>
> I'm still new into this and trying to figure out the Thrift APIs. I
> attempted to use Hector, but got myself into more confusion.
>
> Arijit
>
> On 7 January 2011 11:44, Roshan Dawrani  wrote:
> >
> > On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee 
> wrote:
> >>
> >> Hi
> >>
> >> I've a quick question about supercolumns.
> >> EventRecord = {
> >>eventKey2: {
> >>e2-ts1: {set of columns},
> >>e2-ts2: {set of columns},
> >>...
> >>e2-tsn: {set of columns}
> >>}
> >>
> >> }
> >>
> >> If I want to append another "e2-tsp: {set of columns}" to the event
> >> record keyed by eventKey2, do I need to retrieve the entire eventKey2
> >> map, and then append this new row and re-insert eventKey2?
> >
> > No, you can simply insert a new super column with its sub-columns with
> the rowKey that you want, and it will join the other super columns of that
> row.
> >
> > A row have billions of super columns. Imagine fetching them all, just to
> add one more super column into it.
> >
> >
> >
> >
> >
>
>
> --
> "And when the night is cloudy,
> There is still a light that shines on me,
> Shine on until tomorrow, let it be."
>


Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100

2011-01-06 Thread Stu Hood
What version of Cassandra were you testing with?

On Wed, Jan 5, 2011 at 6:11 AM, David Boxenhorn  wrote:

> The CLI sometimes gets only 100 results (even though there are more) - and
> sometimes gets all the results, even when there are more than 100!
>
> What is going on here? Is there some logic that says if there are too many
> results return 100, even though "too many" can be more than 100?
>


Re: quick question about super columns

2011-01-06 Thread Roshan Dawrani
On Fri, Jan 7, 2011 at 12:12 PM, Arijit Mukherjee wrote:

> Thank you. And is it similar if I want to search a subcolumn within a
> given supercolumn? I mean I have the supercolumn key and the subcolumn
> key - can I fetch the particular subcolumn?
>
> Can you share a small piece of example code for both?
>

Have you gone through the examples  @
https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/hector/api/ApiV2SystemTest.java?

For now, here are some snippets to help you:

1) So, say, for a given row and a given super-column, if u want to fetch one
particular sub-column, this is how you can do it:

SubSliceQuery subSliceQuery = HFactory.createSubSliceQuery(keyspace, .., ..,
.., ..)
subSliceQuery.setColumnFamily()
.setKey().setSuperColumn().setRange(subCol,
subCol, false, 1)

You can manipulate the range, if you want more than 1 sub-column from there.

2) Inserting a new super-column (with its sub-columns) in a row

HSuperColumn superCol = HFactory.createSuperColumn(,
   [HFactory.createColumn(, new byte[0], US,
BAS)],
   SS, US, BAS) - /* SS, US, etc are serializers used here.
*/

mutator.addInsertion(, , superCol)

This example inserts in a row a new super-column that has 1 sub-column (you
can add as many as u like to the list of sub-cols being passed)


<#12d5f46d9509473c_>
<#12d5f46d9509473c_> <#12d5f46d9509473c_>
<#12d5f46d9509473c_>
   <#>
<#>
<#>   <#>


unsubscribe

2011-01-06 Thread Nichole Kulobone

  

Re: quick question about super columns

2011-01-06 Thread Arijit Mukherjee
Thank you. And is it similar if I want to search a subcolumn within a
given supercolumn? I mean I have the supercolumn key and the subcolumn
key - can I fetch the particular subcolumn?

Can you share a small piece of example code for both?

I'm still new into this and trying to figure out the Thrift APIs. I
attempted to use Hector, but got myself into more confusion.

Arijit

On 7 January 2011 11:44, Roshan Dawrani  wrote:
>
> On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee  wrote:
>>
>> Hi
>>
>> I've a quick question about supercolumns.
>> EventRecord = {
>>    eventKey2: {
>>        e2-ts1: {set of columns},
>>        e2-ts2: {set of columns},
>>        ...
>>        e2-tsn: {set of columns}
>>    }
>>    
>> }
>>
>> If I want to append another "e2-tsp: {set of columns}" to the event
>> record keyed by eventKey2, do I need to retrieve the entire eventKey2
>> map, and then append this new row and re-insert eventKey2?
>
> No, you can simply insert a new super column with its sub-columns with the 
> rowKey that you want, and it will join the other super columns of that row.
>
> A row have billions of super columns. Imagine fetching them all, just to add 
> one more super column into it.
>
>
>
>
>


--
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."


Re: quick question about super columns

2011-01-06 Thread Roshan Dawrani
On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee wrote:

> Hi
>
> I've a quick question about supercolumns.
> EventRecord = {
>eventKey2: {
>e2-ts1: {set of columns},
>e2-ts2: {set of columns},
>...
>e2-tsn: {set of columns}
>}
>
> }
>
> If I want to append another "e2-tsp: {set of columns}" to the event
> record keyed by eventKey2, do I need to retrieve the entire eventKey2
> map, and then append this new row and re-insert eventKey2?
>

No, you can simply insert a new super column with its sub-columns with the
rowKey that you want, and it will join the other super columns of that row.

A row have billions of super columns. Imagine fetching them all, just to add
one more super column into it.
   <#>
<#>
<#>   <#>


quick question about super columns

2011-01-06 Thread Arijit Mukherjee
Hi

I've a quick question about supercolumns. Say I've a structure like
this (based on the supercolumn family structured mention in WTF is a
SuperColum):

EventRecord = {
eventKey1: {
e1-ts1: {set of columns},
e1-ts2: {set of columns},
...
e1-tsn: {set of columns}
}, // end row
eventKey2: {
e2-ts1: {set of columns},
e2-ts2: {set of columns},
...
e2-tsn: {set of columns}
}

}

If I want to append another "e2-tsp: {set of columns}" to the event
record keyed by eventKey2, do I need to retrieve the entire eventKey2
map, and then append this new row and re-insert eventKey2?

Regards
Arijit


-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."


Re: Newbie question - logsandra Flume Sink plugin.

2011-01-06 Thread Tyler Hobbs
Hi Derek,

The Cassandra sink currently only works with Cassandra 0.7, although I don't
suppose that is what is causing the 'InvalidSink' problem.

What version of Flume are you using?  The plugin works with Flume 0.9.1, but
for 0.9.2 they upgraded the Thrift library version that the project uses and
I haven't had the time yet to update the plugin.  This would normally be a
compile-time error, so I imagine 'InvalidSink' is Flume's response to that.

Adding support for Cassandra 0.6 and Flume 0.9.2 are both on my TODO list.
If either of those are hard version requirements for you, I could make the
effort sooner rather than later.

I'm glad you're checking it out!  Let me know if you think of any
improvements once you get it working :)

- Tyler

On Thu, Jan 6, 2011 at 8:22 PM, Deeter, Derek <
derek.dee...@digitalinsight.com> wrote:

> Hi -
>
> I'm not sure if this should be asked on the Cassandra or Flume list, so I'm
> trying both -
>
> I am trying to do a proof of concept for logging into Cassandra.  We need
> to capture large volumes of audit events in a non-lossy manner at the same
> time being able to quickly access those events for various reports, so this
> Flume-Cassandra combination seems to be just the ticket.  For this POC, I've
> set up a ring/cluster of 4 Cassandra nodes and they are all running
> successfully it seems (version 0.6.8), and I can do the simple entry and
> access of data by hand using the cli.
>
> Next I am trying to use Flume to push log data into Cassandra - I've set up
> the logsandra keyspace for logsandraSyslogSink and the extra column families
> for the simpleCassandraSink, set up the plugins for Flume, updated the
> flume-site.xml to add the plugin references, and set the $FLUME_CLASSPATH
> all as mentioned on the https://github.com/thobbs/flume-cassandra-pluginpage. 
>  However, I always get an "Invalid sink" error when I try to configure
> a Flume node to have Cassandra as a sink.  Neither of these commands work:
>
> 172.16.199.166 : collectorSource(35853) |
> simpleCassandraSink("KeySpace1","data","FlumeIndexes","172.16.200.130:7005
> ");
> 172.16.199.166 : collectorSource(35853) | logsandraSyslogSink("
> 172.16.200.130:7005");
>
> Can anyone please tell me if this is the correct command, or what the
> format should be?  It is not really mentioned in the instructions.
>
> I do see the following when starting the flume nodes, but not on the flume
> master:
> 2011-01-06 17:12:06,586 [main] INFO conf.SinkFactoryImpl: Found sink
> builder simpleCassandraSink in
> org.apache.cassandra.plugins.SimpleCassandraSink
> 2011-01-06 17:12:06,586 [main] WARN conf.SinkFactoryImpl: No sink
> decorators found in org.apache.cassandra.plugins.SimpleCassandraSink
> 2011-01-06 17:12:06,586 [main] INFO conf.SinkFactoryImpl: Found sink
> builder logsandraSyslogSink in
> org.apache.cassandra.plugins.LogsandraSyslogSink
> 2011-01-06 17:12:06,586 [main] WARN conf.SinkFactoryImpl: No sink
> decorators found in org.apache.cassandra.plugins.LogsandraSyslogSink
>
> So it looks like the plugins are being loaded - without going into the
> source I am at a loss at the 'Invalid Sink' error.  Any information
> appreciated -
>
>Thanks in advance,
>-Derek
>
>
> --
> Derek Deeter, Sr. Software Engineer   Intuit Financial Services
>  5601 Lindero Canyon
> Road.
> derek.dee...@digitalinsight.comWestlake, CA 91362
>
>
>


Newbie question - logsandra Flume Sink plugin.

2011-01-06 Thread Deeter, Derek
Hi -

I'm not sure if this should be asked on the Cassandra or Flume list, so I'm 
trying both - 

I am trying to do a proof of concept for logging into Cassandra.  We need to 
capture large volumes of audit events in a non-lossy manner at the same time 
being able to quickly access those events for various reports, so this 
Flume-Cassandra combination seems to be just the ticket.  For this POC, I've 
set up a ring/cluster of 4 Cassandra nodes and they are all running 
successfully it seems (version 0.6.8), and I can do the simple entry and access 
of data by hand using the cli.

Next I am trying to use Flume to push log data into Cassandra - I've set up the 
logsandra keyspace for logsandraSyslogSink and the extra column families for 
the simpleCassandraSink, set up the plugins for Flume, updated the 
flume-site.xml to add the plugin references, and set the $FLUME_CLASSPATH all 
as mentioned on the https://github.com/thobbs/flume-cassandra-plugin page.  
However, I always get an "Invalid sink" error when I try to configure a Flume 
node to have Cassandra as a sink.  Neither of these commands work:

172.16.199.166 : collectorSource(35853) | 
simpleCassandraSink("KeySpace1","data","FlumeIndexes","172.16.200.130:7005");
172.16.199.166 : collectorSource(35853) | 
logsandraSyslogSink("172.16.200.130:7005");

Can anyone please tell me if this is the correct command, or what the format 
should be?  It is not really mentioned in the instructions.

I do see the following when starting the flume nodes, but not on the flume 
master:
2011-01-06 17:12:06,586 [main] INFO conf.SinkFactoryImpl: Found sink builder 
simpleCassandraSink in org.apache.cassandra.plugins.SimpleCassandraSink
2011-01-06 17:12:06,586 [main] WARN conf.SinkFactoryImpl: No sink decorators 
found in org.apache.cassandra.plugins.SimpleCassandraSink
2011-01-06 17:12:06,586 [main] INFO conf.SinkFactoryImpl: Found sink builder 
logsandraSyslogSink in org.apache.cassandra.plugins.LogsandraSyslogSink
2011-01-06 17:12:06,586 [main] WARN conf.SinkFactoryImpl: No sink decorators 
found in org.apache.cassandra.plugins.LogsandraSyslogSink

So it looks like the plugins are being loaded - without going into the source I 
am at a loss at the 'Invalid Sink' error.  Any information appreciated -

Thanks in advance,
-Derek


--
Derek Deeter, Sr. Software Engineer   Intuit Financial Services
  5601 Lindero Canyon Road.
derek.dee...@digitalinsight.com    Westlake, CA 91362




Re: Reclaim deleted rows space

2011-01-06 Thread Jonathan Shook
I believe the following condition within submitMinorIfNeeded(...)
determines whether to continue, so it's not a hard loop.

// if (sstables.size() >= minThreshold) ...



On Thu, Jan 6, 2011 at 2:51 AM, shimi  wrote:
> According to the code it make sense.
> submitMinorIfNeeded() calls doCompaction() which
> calls submitMinorIfNeeded().
> With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
> compaction.
>
> Shimi
> On Thu, Jan 6, 2011 at 10:26 AM, shimi  wrote:
>>
>>
>> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis  wrote:
>>>
>>> Pretty sure there's logic in there that says "don't bother compacting
>>> a single sstable."
>>
>> No. You can do it.
>> Based on the log I have a feeling that it triggers an infinite compaction
>> loop.
>>
>>>
>>> On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
>>> > How does minor compaction is triggered? Is it triggered Only when a new
>>> > SStable is added?
>>> >
>>> > I was wondering if triggering a compaction
>>> > with minimumCompactionThreshold
>>> > set to 1 would be useful. If this can happen I assume it will do
>>> > compaction
>>> > on files with similar size and remove deleted rows on the rest.
>>> > Shimi
>>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
>>> > 
>>> > wrote:
>>> >>
>>> >> > I don't have a problem with disk space. I have a problem with the
>>> >> > data
>>> >> > size.
>>> >>
>>> >> [snip]
>>> >>
>>> >> > Bottom line is that I want to reduce the number of requests that
>>> >> > goes to
>>> >> > disk. Since there is enough data that is no longer valid I can do it
>>> >> > by
>>> >> > reclaiming the space. The only way to do it is by running Major
>>> >> > compaction.
>>> >> > I can wait and let Cassandra do it for me but then the data size
>>> >> > will
>>> >> > get
>>> >> > even bigger and the response time will be worst. I can do it
>>> >> > manually
>>> >> > but I
>>> >> > prefer it to happen in the background with less impact on the system
>>> >>
>>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>>> >>
>>> >> So essentially, for workloads that are teetering on the edge of cache
>>> >> warmness and is subject to significant overwrites or removals, it may
>>> >> be beneficial to perform much more aggressive background compaction
>>> >> even though it might waste lots of CPU, to keep the in-memory working
>>> >> set down.
>>> >>
>>> >> There was talk (I think in the compaction redesign ticket) about
>>> >> potentially improving the use of bloom filters such that obsolete data
>>> >> in sstables could be eliminated from the read set without
>>> >> necessitating actual compaction; that might help address cases like
>>> >> these too.
>>> >>
>>> >> I don't think there's a pre-existing silver bullet in a current
>>> >> release; you probably have to live with the need for
>>> >> greater-than-theoretically-optimal memory requirements to keep the
>>> >> working set in memory.
>>> >>
>>> >> --
>>> >> / Peter Schuller
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>
>
>


Re: unsubscribe

2011-01-06 Thread Brandon Williams
http://wiki.apache.org/cassandra/FAQ#unsubscribe


unsubscribe

2011-01-06 Thread rambabu pakala



  

Re: cassandra 0.7.0 noob question

2011-01-06 Thread Narendra Sharma
The schema is not loaded from cassandra.yaml by default. You need to either
load it through jconsole or define it through CLI. Please read following
page for details:
http://wiki.apache.org/cassandra/LiveSchemaUpdates

Also look for "Where are my keyspaces" on following page:
http://wiki.apache.org/cassandra/StorageConfiguration

Thanks,
Naren

On Thu, Jan 6, 2011 at 2:00 PM, felix gao  wrote:

> Hi all,
>
> I started cassandra with very thing untouched in the conf folder, when I
> examine the cassandra.yaml file, there seems to be a default keyspace
> defined like below.
> keyspaces:
> - name: Keyspace1
>   replica_placement_strategy:
> org.apache.cassandra.locator.SimpleStrategy
>   replication_factor: 1
>   column_families:
> - name: Standard1
>
> my question is when I ran the cassandra-cli and show keyspaces; only system
> keyspace is there.  What is going on?
>
> Thanks,
>
> Felix
>
>


Re: cassandra 0.7.0 noob question

2011-01-06 Thread CassUser CassUser
Cassandra doesn't load any keyspaces by default.  You have to manually do it
once.  Using loadSchemaFromYAML method exposed by JMX

On Thu, Jan 6, 2011 at 2:00 PM, felix gao  wrote:

> Hi all,
>
> I started cassandra with very thing untouched in the conf folder, when I
> examine the cassandra.yaml file, there seems to be a default keyspace
> defined like below.
> keyspaces:
> - name: Keyspace1
>   replica_placement_strategy:
> org.apache.cassandra.locator.SimpleStrategy
>   replication_factor: 1
>   column_families:
> - name: Standard1
>
> my question is when I ran the cassandra-cli and show keyspaces; only system
> keyspace is there.  What is going on?
>
> Thanks,
>
> Felix
>
>


cassandra 0.7.0 noob question

2011-01-06 Thread felix gao
Hi all,

I started cassandra with very thing untouched in the conf folder, when I
examine the cassandra.yaml file, there seems to be a default keyspace
defined like below.
keyspaces:
- name: Keyspace1
  replica_placement_strategy:
org.apache.cassandra.locator.SimpleStrategy
  replication_factor: 1
  column_families:
- name: Standard1

my question is when I ran the cassandra-cli and show keyspaces; only system
keyspace is there.  What is going on?

Thanks,

Felix


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
already doing that

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:09, "Ran Tavory"  wrote:


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
fine by me

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:40, "Jonathan Ellis"  wrote:
> We're planning to clean out contrib:
> https://issues.apache.org/jira/browse/CASSANDRA-1805
>
> Maybe tools?
>
> On Thu, Jan 6, 2011 at 2:43 PM, Stephen Connolly
>  wrote:
>> I nearly have one ready...
>>
>> my plan is to have it added to contrib... if the cassandra devs agree
>>
>> -stephen
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on
the
>> screen
>>
>> On 6 Jan 2011 19:38, "B. Todd Burruss"  wrote:
>>> has anyone created a maven plugin, like cargo for tomcat, for automating
>>> starting/stopping a cassandra instance?
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
testers welcome

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 20:45, "B. Todd Burruss"  wrote:
> would u like some testers? we were about to write one.
>
> On 01/06/2011 12:43 PM, Stephen Connolly wrote:
>>
>> I nearly have one ready...
>>
>> my plan is to have it added to contrib... if the cassandra devs agree
>>
>> -stephen
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random
>> nonsense words and other nonsense are a direct result of using swype
>> to type on the screen
>>
>> On 6 Jan 2011 19:38, "B. Todd Burruss" > > wrote:
>> > has anyone created a maven plugin, like cargo for tomcat, for
>> automating
>> > starting/stopping a cassandra instance?


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
capistrano is a different use case. the maven plugin is for integration
testing, not live deployment

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 21:32, "shimi"  wrote:


Re: maven cassandra plugin

2011-01-06 Thread B. Todd Burruss
starting cassandra within maven is very useful (needed) for automated 
integration testing.  however, i'm interested in deployment tools as well


On 01/06/2011 01:32 PM, shimi wrote:


I use Capistrano for install, upgrades, start, stop and restart.
I use it for other projects as well.
It is very useful for automated tasks that needs to run on multiple 
machines


Shiy

On 2011 1 6 21:38, "B. Todd Burruss" > wrote:


has anyone created a maven plugin, like cargo for tomcat, for 
automating starting/stopping a cassandra instance?


Re: maven cassandra plugin

2011-01-06 Thread Jonathan Ellis
We're planning to clean out contrib:
https://issues.apache.org/jira/browse/CASSANDRA-1805

Maybe tools?

On Thu, Jan 6, 2011 at 2:43 PM, Stephen Connolly
 wrote:
> I nearly have one ready...
>
> my plan is to have it added to contrib... if the cassandra devs agree
>
> -stephen
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
>
> On 6 Jan 2011 19:38, "B. Todd Burruss"  wrote:
>> has anyone created a maven plugin, like cargo for tomcat, for automating
>> starting/stopping a cassandra instance?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: maven cassandra plugin

2011-01-06 Thread shimi
I use Capistrano for install, upgrades, start, stop and restart.
I use it for other projects as well.
It is very useful for automated tasks that needs to run on multiple machines

Shiy

On 2011 1 6 21:38, "B. Todd Burruss"  wrote:

has anyone created a maven plugin, like cargo for tomcat, for automating
starting/stopping a cassandra instance?


Re: maven cassandra plugin

2011-01-06 Thread Ran Tavory
Stephen, just FYI cassandra cannot be stopped cleanly. It's jvm must
be taken down. So the plugin would need to probably fork a jvm and
kill it when it's done.

On Thursday, January 6, 2011, B. Todd Burruss  wrote:
>
>
>
>
>
>
> would u like some testers?  we were about to write one.
>
> On 01/06/2011 12:43 PM, Stephen Connolly wrote:
>
>   I nearly have one ready...
>   my plan is to have it added to contrib... if the cassandra devs
> agree
>   -stephen
>   - Stephen
>   ---
> Sent from my Android phone, so random spelling mistakes, random
> nonsense words and other nonsense are a direct result of using
> swype to type on the screen
>   On 6 Jan 2011 19:38, "B. Todd Burruss"
> 
> wrote:
> > has anyone created a maven plugin, like cargo for tomcat,
> for automating
> > starting/stopping a cassandra instance?
>
>
>
>
>

-- 
/Ran


Re: maven cassandra plugin

2011-01-06 Thread B. Todd Burruss

would u like some testers?  we were about to write one.

On 01/06/2011 12:43 PM, Stephen Connolly wrote:


I nearly have one ready...

my plan is to have it added to contrib... if the cassandra devs agree

-stephen

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random 
nonsense words and other nonsense are a direct result of using swype 
to type on the screen


On 6 Jan 2011 19:38, "B. Todd Burruss" > wrote:
> has anyone created a maven plugin, like cargo for tomcat, for 
automating

> starting/stopping a cassandra instance?


Re: maven cassandra plugin

2011-01-06 Thread Stephen Connolly
I nearly have one ready...

my plan is to have it added to contrib... if the cassandra devs agree

-stephen

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 6 Jan 2011 19:38, "B. Todd Burruss"  wrote:
> has anyone created a maven plugin, like cargo for tomcat, for automating
> starting/stopping a cassandra instance?


Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread ian douglas

Thanks Richard!

strings UserGameshareData-*-Index.db | grep ':' | wc -l

Node 1:
strings/grep/wc: 979,123
space used: 2,061,497,786

Node 2:
strings/grep/wc: 443,558
space used: 854,213,778

Node 3:
strings/grep/wc: 2,103,294
space used: 4,505,048,405



On 01/06/2011 11:43 AM, Robert Coli wrote:

On Thu, Jan 6, 2011 at 10:50 AM, ian douglas  wrote:

Is there any way to determine via a "nodetool cfstats" (or similar) how many
rows we have per column family to help answer your second question a little
better?

In 0.6, you can get an (inexact, but probably sufficient for this
purpose) estimate of this by counting the number of lines in :

"strings ColumnFamily-*-Index.db |grep :"

=Rob


Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread Robert Coli
On Thu, Jan 6, 2011 at 10:50 AM, ian douglas  wrote:
> Is there any way to determine via a "nodetool cfstats" (or similar) how many
> rows we have per column family to help answer your second question a little
> better?

In 0.6, you can get an (inexact, but probably sufficient for this
purpose) estimate of this by counting the number of lines in :

"strings ColumnFamily-*-Index.db |grep :"

=Rob


maven cassandra plugin

2011-01-06 Thread B. Todd Burruss
has anyone created a maven plugin, like cargo for tomcat, for automating 
starting/stopping a cassandra instance?


Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread ian douglas
We're currently on 0.6.0 waiting for the full release of 0.7 before we 
upgrade. We have other Thrift/PHP code to update whenever we upgrade 
Cassandra, so we don't want to upgrade to a release candidate on our 
production system.


We *did* have a problem with a column family setup where we had few rows 
(probably hundreds?), and those few rows exceeding 100MB in size, so we 
migrated that column family's data to a new column family and stored 
that old data into what would now be hundreds of thousands of rows.
Our largest row size right now is in the ballpark of a few hundred 
kilobytes.


Is there any way to determine via a "nodetool cfstats" (or similar) how 
many rows we have per column family to help answer your second question 
a little better? I do know from our migration that we created something 
like 30,000 smaller rows for each of the bigger rows, and then we 
deleted that old column family from our nodes (first by removing it from 
the XML configuration, then by deleting files at the OS level). When our 
migration finished, we *still* saw this large imbalance, which is what 
prompted my questions, and led to "nodetool move" to reset our token 
values, etc., but even running cleanups, flushes and repairs on each 
node individually, we're still left with this imbalanced load.


Thanks for your help. Let me know if there's any additional information 
I can give.



On 01/06/2011 10:39 AM, Peter Schuller wrote:

I posted row sizes (min/max/mean) of our largest data set in my original
message, but had zero responses on the mailing list. The folks in IRC told
me to wait it out, see if to rebalanced on its own (it didn't), or to run a
repair on each node one at a time (didn't help), and that it wasn't a big
concern until we had "dozens of GBs" worth of data.

Ok. It may not be a concern practically right now, but an unexplained
imbalance is not good. First off, is this the very latest 0.6 release
or else one of the 0.7 rc:s, or is this an old 0.6? Not that I
remember off hand whether there were any bugs fixed in the 0.6 series
that would explain this particular behavior, but it's probably a good
start to ask if you have the latest version.

Also, you mentioned originally that "Our row min/max/mean values are
mostly the same". I'm not entirely positive to what you are referring;
the important points I wanted to ask about are:

(1) Do you have "many" keys (say, thousands or more) so that there
should be no statistically significant imbalance between the nodes in
terms of the *number* of rows?

(2) How sure are you about the distribution of row sizes; is it
possible you have a small number of very large rows that are screwing
up the statistics?



Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread Peter Schuller
> I posted row sizes (min/max/mean) of our largest data set in my original
> message, but had zero responses on the mailing list. The folks in IRC told
> me to wait it out, see if to rebalanced on its own (it didn't), or to run a
> repair on each node one at a time (didn't help), and that it wasn't a big
> concern until we had "dozens of GBs" worth of data.

Ok. It may not be a concern practically right now, but an unexplained
imbalance is not good. First off, is this the very latest 0.6 release
or else one of the 0.7 rc:s, or is this an old 0.6? Not that I
remember off hand whether there were any bugs fixed in the 0.6 series
that would explain this particular behavior, but it's probably a good
start to ask if you have the latest version.

Also, you mentioned originally that "Our row min/max/mean values are
mostly the same". I'm not entirely positive to what you are referring;
the important points I wanted to ask about are:

(1) Do you have "many" keys (say, thousands or more) so that there
should be no statistically significant imbalance between the nodes in
terms of the *number* of rows?

(2) How sure are you about the distribution of row sizes; is it
possible you have a small number of very large rows that are screwing
up the statistics?

-- 
/ Peter Schuller


Re: SSTable files not getting deleted

2011-01-06 Thread Ching-Cheng Chen
Yes, those SSTable files has "compacted" tag.

Those with compacted tag have size 0, so disk space is not an issue.

However, the matching Filter, Index, Statistics files were not removed,
either.
So I ended up with tons of file under data directory although they not using
much space.

I'm running cassandra 0.7-rc2.

Red Hat Linux 2.6.18-194.26.1.el5

SUN JDK
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

Regards,

Chen

On Thu, Jan 6, 2011 at 12:18 PM, Robert Coli  wrote:

> On Thu, Jan 6, 2011 at 7:59 AM, Ching-Cheng Chen
>  wrote:
>
> > I performed a nodetool compact, all went good and finished.   All column
> > family now only have one big live SSTable file.
> > Then I use jconsole to force a GC, but those old SSTable files still not
> > getting deleted.   I thought this should trigger a deletion for those
> > SSTable files maked for delete.
>
> Which old SSTable files, specifically?
>
> Does their name include the tag "compacted"?
>
> And what version of cassandra are you running in what environment?
>
> =Rob
>


Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread ian douglas

Hi Peter,

I posted row sizes (min/max/mean) of our largest data set in my original 
message, but had zero responses on the mailing list. The folks in IRC 
told me to wait it out, see if to rebalanced on its own (it didn't), or 
to run a repair on each node one at a time (didn't help), and that it 
wasn't a big concern until we had "dozens of GBs" worth of data.




On 01/06/2011 10:08 AM, Peter Schuller wrote:

I've been lurking in the #cassandra IRC channel lately looking for help on
this, but wanted to try the mailing list as well.

Was this resolved off-list, and if so what was the problem?

I don't see a problem in your description to explain the imbalance,
assuming you don't have extreme variation in the size of rows (or very
few rows). I was hoping someone else would spot something but the
thread seems dead still :)



Re: SSTable files not getting deleted

2011-01-06 Thread Robert Coli
On Thu, Jan 6, 2011 at 7:59 AM, Ching-Cheng Chen
 wrote:

> I performed a nodetool compact, all went good and finished.   All column
> family now only have one big live SSTable file.
> Then I use jconsole to force a GC, but those old SSTable files still not
> getting deleted.   I thought this should trigger a deletion for those
> SSTable files maked for delete.

Which old SSTable files, specifically?

Does their name include the tag "compacted"?

And what version of cassandra are you running in what environment?

=Rob


Re: How can I correct this Cassandra load imbalance?

2011-01-06 Thread Peter Schuller
> I've been lurking in the #cassandra IRC channel lately looking for help on
> this, but wanted to try the mailing list as well.

Was this resolved off-list, and if so what was the problem?

I don't see a problem in your description to explain the imbalance,
assuming you don't have extreme variation in the size of rows (or very
few rows). I was hoping someone else would spot something but the
thread seems dead still :)

-- 
/ Peter Schuller


Re: Join equivalent in cassandra

2011-01-06 Thread ruslan usifov
Great thanks!!

2011/1/6 Jonathan Ellis 

>
> http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/
>
> On Thu, Jan 6, 2011 at 9:34 AM, ruslan usifov 
> wrote:
> > Hello,
> >
> > Dear community please share your receipts how you implements sql joins in
> > cassandra? In may case we have a list of news which can be in multiple
> > rubrics. How select news in partial rubric. Which is the best solutions?
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


SSTable files not getting deleted

2011-01-06 Thread Ching-Cheng Chen
My impression is that force GC should have deleted the SSTable files no
longer valid.

I performed a nodetool compact, all went good and finished.   All column
family now only have one big live SSTable file.

Then I use jconsole to force a GC, but those old SSTable files still not
getting deleted.   I thought this should trigger a deletion for those
SSTable files maked for delete.

I'm using 0.7-rc2.

When I perform the force GC, no exception in log file.

 INFO [ScheduledTasks:1] 2011-01-06 10:43:16,786 GCInspector.java (line 133)
GC for ConcurrentMarkSweep: 456 ms, 1529612464 reclaimed leaving 114919560
used; max is 4424663040
 INFO [ScheduledTasks:1] 2011-01-06 10:48:25,580 GCInspector.java (line 133)
GC for ConcurrentMarkSweep: 505 ms, 56558064 reclaimed leaving 121900224
used; max is 4424663040
 INFO [ScheduledTasks:1] 2011-01-06 10:50:47,760 GCInspector.java (line 133)
GC for ConcurrentMarkSweep: 455 ms, 28188544 reclaimed leaving 124425408
used; max is 4424663040
 INFO [ScheduledTasks:1] 2011-01-06 10:52:57,107 GCInspector.java (line 133)
GC for ConcurrentMarkSweep: 513 ms, 27953872 reclaimed leaving 126936960
used; max is 4424663040

By the way, those old SSTable files did get removed if I restart the node.

Regards,

Chen


Re: Join equivalent in cassandra

2011-01-06 Thread Jonathan Ellis
http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/

On Thu, Jan 6, 2011 at 9:34 AM, ruslan usifov  wrote:
> Hello,
>
> Dear community please share your receipts how you implements sql joins in
> cassandra? In may case we have a list of news which can be in multiple
> rubrics. How select news in partial rubric. Which is the best solutions?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Join equivalent in cassandra

2011-01-06 Thread ruslan usifov
Hello,

Dear community please share your receipts how you implements sql joins in
cassandra? In may case we have a list of news which can be in multiple
rubrics. How select news in partial rubric. Which is the best solutions?


Re: Reclaim deleted rows space

2011-01-06 Thread shimi
According to the code it make sense.
submitMinorIfNeeded() calls doCompaction() which calls
submitMinorIfNeeded().
With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
compaction.

Shimi

On Thu, Jan 6, 2011 at 10:26 AM, shimi  wrote:

>
>
> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis  wrote:
>
>> Pretty sure there's logic in there that says "don't bother compacting
>> a single sstable."
>
> No. You can do it.
> Based on the log I have a feeling that it triggers an infinite compaction
> loop.
>
>
>
>>  On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
>> > How does minor compaction is triggered? Is it triggered Only when a new
>> > SStable is added?
>> >
>> > I was wondering if triggering a compaction
>> with minimumCompactionThreshold
>> > set to 1 would be useful. If this can happen I assume it will do
>> compaction
>> > on files with similar size and remove deleted rows on the rest.
>> > Shimi
>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
>> peter.schul...@infidyne.com>
>> > wrote:
>> >>
>> >> > I don't have a problem with disk space. I have a problem with the
>> data
>> >> > size.
>> >>
>> >> [snip]
>> >>
>> >> > Bottom line is that I want to reduce the number of requests that goes
>> to
>> >> > disk. Since there is enough data that is no longer valid I can do it
>> by
>> >> > reclaiming the space. The only way to do it is by running Major
>> >> > compaction.
>> >> > I can wait and let Cassandra do it for me but then the data size will
>> >> > get
>> >> > even bigger and the response time will be worst. I can do it manually
>> >> > but I
>> >> > prefer it to happen in the background with less impact on the system
>> >>
>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>> >>
>> >> So essentially, for workloads that are teetering on the edge of cache
>> >> warmness and is subject to significant overwrites or removals, it may
>> >> be beneficial to perform much more aggressive background compaction
>> >> even though it might waste lots of CPU, to keep the in-memory working
>> >> set down.
>> >>
>> >> There was talk (I think in the compaction redesign ticket) about
>> >> potentially improving the use of bloom filters such that obsolete data
>> >> in sstables could be eliminated from the read set without
>> >> necessitating actual compaction; that might help address cases like
>> >> these too.
>> >>
>> >> I don't think there's a pre-existing silver bullet in a current
>> >> release; you probably have to live with the need for
>> >> greater-than-theoretically-optimal memory requirements to keep the
>> >> working set in memory.
>> >>
>> >> --
>> >> / Peter Schuller
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>


Re: Reclaim deleted rows space

2011-01-06 Thread shimi
On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis  wrote:

> Pretty sure there's logic in there that says "don't bother compacting
> a single sstable."

No. You can do it.
Based on the log I have a feeling that it triggers an infinite compaction
loop.



> On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
> > How does minor compaction is triggered? Is it triggered Only when a new
> > SStable is added?
> >
> > I was wondering if triggering a compaction
> with minimumCompactionThreshold
> > set to 1 would be useful. If this can happen I assume it will do
> compaction
> > on files with similar size and remove deleted rows on the rest.
> > Shimi
> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
> peter.schul...@infidyne.com>
> > wrote:
> >>
> >> > I don't have a problem with disk space. I have a problem with the data
> >> > size.
> >>
> >> [snip]
> >>
> >> > Bottom line is that I want to reduce the number of requests that goes
> to
> >> > disk. Since there is enough data that is no longer valid I can do it
> by
> >> > reclaiming the space. The only way to do it is by running Major
> >> > compaction.
> >> > I can wait and let Cassandra do it for me but then the data size will
> >> > get
> >> > even bigger and the response time will be worst. I can do it manually
> >> > but I
> >> > prefer it to happen in the background with less impact on the system
> >>
> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
> >>
> >> So essentially, for workloads that are teetering on the edge of cache
> >> warmness and is subject to significant overwrites or removals, it may
> >> be beneficial to perform much more aggressive background compaction
> >> even though it might waste lots of CPU, to keep the in-memory working
> >> set down.
> >>
> >> There was talk (I think in the compaction redesign ticket) about
> >> potentially improving the use of bloom filters such that obsolete data
> >> in sstables could be eliminated from the read set without
> >> necessitating actual compaction; that might help address cases like
> >> these too.
> >>
> >> I don't think there's a pre-existing silver bullet in a current
> >> release; you probably have to live with the need for
> >> greater-than-theoretically-optimal memory requirements to keep the
> >> working set in memory.
> >>
> >> --
> >> / Peter Schuller
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Reclaim deleted rows space

2011-01-06 Thread shimi
Am I missing something here? It is already possible to trigger major
compaction on a specific CF.

On Thu, Jan 6, 2011 at 4:50 AM, Tyler Hobbs  wrote:

> Although it's not exactly the ability to list specific SSTables, the
> ability to only compact specific CFs will be in upcoming releases:
>
> https://issues.apache.org/jira/browse/CASSANDRA-1812
>
> - Tyler
>
>
> On Wed, Jan 5, 2011 at 7:46 PM, Edward Capriolo wrote:
>
>> On Wed, Jan 5, 2011 at 4:31 PM, Jonathan Ellis  wrote:
>> > Pretty sure there's logic in there that says "don't bother compacting
>> > a single sstable."
>> >
>> > On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
>> >> How does minor compaction is triggered? Is it triggered Only when a new
>> >> SStable is added?
>> >>
>> >> I was wondering if triggering a compaction
>> with minimumCompactionThreshold
>> >> set to 1 would be useful. If this can happen I assume it will do
>> compaction
>> >> on files with similar size and remove deleted rows on the rest.
>> >> Shimi
>> >> On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
>> peter.schul...@infidyne.com>
>> >> wrote:
>> >>>
>> >>> > I don't have a problem with disk space. I have a problem with the
>> data
>> >>> > size.
>> >>>
>> >>> [snip]
>> >>>
>> >>> > Bottom line is that I want to reduce the number of requests that
>> goes to
>> >>> > disk. Since there is enough data that is no longer valid I can do it
>> by
>> >>> > reclaiming the space. The only way to do it is by running Major
>> >>> > compaction.
>> >>> > I can wait and let Cassandra do it for me but then the data size
>> will
>> >>> > get
>> >>> > even bigger and the response time will be worst. I can do it
>> manually
>> >>> > but I
>> >>> > prefer it to happen in the background with less impact on the system
>> >>>
>> >>> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>> >>>
>> >>> So essentially, for workloads that are teetering on the edge of cache
>> >>> warmness and is subject to significant overwrites or removals, it may
>> >>> be beneficial to perform much more aggressive background compaction
>> >>> even though it might waste lots of CPU, to keep the in-memory working
>> >>> set down.
>> >>>
>> >>> There was talk (I think in the compaction redesign ticket) about
>> >>> potentially improving the use of bloom filters such that obsolete data
>> >>> in sstables could be eliminated from the read set without
>> >>> necessitating actual compaction; that might help address cases like
>> >>> these too.
>> >>>
>> >>> I don't think there's a pre-existing silver bullet in a current
>> >>> release; you probably have to live with the need for
>> >>> greater-than-theoretically-optimal memory requirements to keep the
>> >>> working set in memory.
>> >>>
>> >>> --
>> >>> / Peter Schuller
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of Riptano, the source for professional Cassandra support
>> > http://riptano.com
>> >
>>
>> I was wording if it made sense to have a JMX operation that can
>> compact a list of tables by file name. This opens it up for power
>> users to have more options then compact entire keyspace.
>>
>
>


[ANN] Cassandra 0.7.0-rc4 available from Maven Central Repository

2011-01-06 Thread Stephen Connolly
Hi All,

Cassandra 0.7.0-rc4 is now available from the Maven Central Repository.

Apache Maven
-


org.apache.cassandra
cassandra-all
0.7.0-rc4


Apache Ivy
-



Groovy Grape
-

@Grapes(
@Grab(group='org.apache.cassandra', module='cassandra-all',
version='0.7.0-rc4')
)

Apache Buildr
-

'org.apache.cassandra:cassandra-all:jar:0.7.0-rc4'


Enjoy!

-Stephen

P.S.
Hopefully as projects dependent on cassandra move to central the Maven
experience for all will considerably improve. Having looked at the
crazy hacks that people have used to work around artifacts not being
in central I can only say that it would not have resulted in a good
Maven experience.

P.P.S.
AFAIK the plan is to split cassandra into multiple artifacts for the
different use cases, this will result in clients being able to depend
on a much smaller jar with a reduced dependency tree. The
cassandra-all jar is just a stop-gap until the cassandra build is
refactored to produce the required artifacts.