from:"shimi"

Re: Counter question

2012-03-29 Thread Shimi Kiviti

You set the consistency with every request.
Usually a client library will let you set a default one for all write/read
requests.
I don't know if Hector lets you set a default consistency level per CF.
Take a look at the Hector docs or ask it in the Hector mailing list.

Shimi

On Thu, Mar 29, 2012 at 11:47 AM, Tamar Fraenkel wrote:

> Can this be set on a CF basis.
> Only this CF needs higher consistency level.
> Thanks,
> Tamar
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Thu, Mar 29, 2012 at 10:44 AM, Shimi Kiviti  wrote:
>
>> Like everything else in Cassandra, If you need full consistency you need
>> to make sure that you have the right combination of (write consistency
>> level) + (read consistency level)
>>
>> if
>> W = write consistency level
>> R = read consistency level
>> N = replication factor
>> then
>> W + R > N
>>
>> Shimi
>>
>>
>> On Thu, Mar 29, 2012 at 10:09 AM, Tamar Fraenkel wrote:
>>
>>> Hi!
>>> Asking again, as I didn't get responses :)
>>>
>>> I have a ring with 3 nodes and replication factor of 2.
>>> I have counter cf with the following definition:
>>>
>>> CREATE COLUMN FAMILY tk_counters
>>> with comparator = 'UTF8Type'
>>> and default_validation_class = 'CounterColumnType'
>>> and key_validation_class = 'CompositeType(UTF8Type,UUIDType)'
>>> and replicate_on_write = true;
>>>
>>> In my code (Java, Hector), I increment a counter and then read it.
>>> Is it possible that the value read will be the value before increment?
>>> If yes, how can I ensure it does not happen. All my reads and writes are
>>> done with consistency level one.
>>> If this is consistency issue, can I do only the actions on tk_counters
>>> column family with a higher consistency level?
>>> What does replicate_on_write mean? I thought this should help, but maybe
>>> even if replicating after write, my read happen before replication
>>> finished and it returns value from a still not updated node.
>>>
>>> My increment code is:
>>> Mutator mutator =
>>> HFactory.createMutator(keyspace,
>>> CompositeSerializer.get());
>>> mutator.incrementCounter(key,"tk_counters", columnName, inc);
>>> mutator.execute();
>>>
>>> My read counter code is:
>>> CounterQuery query =
>>> createCounterColumnQuery(keyspace,
>>> CompositeSerializer.get(), StringSerializer.get());
>>> query.setColumnFamily("tk_counters");
>>> query.setKey(key);
>>> query.setName(columnName);
>>> QueryResult> r = query.execute();
>>> return r.get().getValue();
>>>
>>> Thanks,
>>> *Tamar Fraenkel *
>>> Senior Software Engineer, TOK Media
>>>
>>> [image: Inline image 1]
>>>
>>> ta...@tok-media.com
>>> Tel:   +972 2 6409736
>>> Mob:  +972 54 8356490
>>> Fax:   +972 2 5612956
>>>
>>>
>>>
>>>
>>
>
<>

Re: Counter question

2012-03-29 Thread Shimi Kiviti

Like everything else in Cassandra, If you need full consistency you need to
make sure that you have the right combination of (write consistency level)
+ (read consistency level)

if
W = write consistency level
R = read consistency level
N = replication factor
then
W + R > N

Shimi

On Thu, Mar 29, 2012 at 10:09 AM, Tamar Fraenkel wrote:

> Hi!
> Asking again, as I didn't get responses :)
>
> I have a ring with 3 nodes and replication factor of 2.
> I have counter cf with the following definition:
>
> CREATE COLUMN FAMILY tk_counters
> with comparator = 'UTF8Type'
> and default_validation_class = 'CounterColumnType'
> and key_validation_class = 'CompositeType(UTF8Type,UUIDType)'
> and replicate_on_write = true;
>
> In my code (Java, Hector), I increment a counter and then read it.
> Is it possible that the value read will be the value before increment?
> If yes, how can I ensure it does not happen. All my reads and writes are
> done with consistency level one.
> If this is consistency issue, can I do only the actions on tk_counters
> column family with a higher consistency level?
> What does replicate_on_write mean? I thought this should help, but maybe
> even if replicating after write, my read happen before replication
> finished and it returns value from a still not updated node.
>
> My increment code is:
> Mutator mutator =
> HFactory.createMutator(keyspace,
> CompositeSerializer.get());
> mutator.incrementCounter(key,"tk_counters", columnName, inc);
> mutator.execute();
>
> My read counter code is:
> CounterQuery query =
> createCounterColumnQuery(keyspace,
> CompositeSerializer.get(), StringSerializer.get());
> query.setColumnFamily("tk_counters");
> query.setKey(key);
> query.setName(columnName);
> QueryResult> r = query.execute();
> return r.get().getValue();
>
> Thanks,
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
<>

Re: Row iteration over indexed clause

2012-03-13 Thread Shimi Kiviti

Yes.use get_indexed_slices (http://wiki.apache.org/cassandra/API)
On Tue, Mar 13, 2012 at 2:12 PM, Vivek Mishra  wrote:

> Hi,
> Is it possible to iterate and fetch in chunks using thrift API by querying
> using "secondary indexes"?
>
> -Vivek
>

Re: Composite column docs

2012-01-06 Thread Shimi Kiviti

On Thu, Jan 5, 2012 at 9:13 PM, aaron morton wrote:

> What client are you using ?
>
I am writing a client.


> For example pycassa has some sweet documentation
> http://pycassa.github.com/pycassa/assorted/composite_types.html
>
It is a sweet documentation but it doesn't help me. I a lower level
documntation


> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/01/2012, at 12:48 AM, Shimi Kiviti wrote:
>
> Is there a doc for using composite columns with thrift?
> Is
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
>  the
> only doc?
> does the client needs to add the length to the get \ get_slice... queries
> or is it taken care of on the server side?
>
> Shimi
>
>
>

Composite column docs

2012-01-05 Thread Shimi Kiviti

Is there a doc for using composite columns with thrift?
Is
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
the
only doc?
does the client needs to add the length to the get \ get_slice... queries
or is it taken care of on the server side?

Shimi

Re: CassandraDaemon deactivate doesn't shutdown Cassandra

2011-10-15 Thread Shimi Kiviti

The problem doesn't exist after the column family is truncated or
if durable_writes=true

Shimi

On Tue, Oct 11, 2011 at 9:30 PM, Shimi Kiviti  wrote:

> I am running an Embedded Cassandra (0.8.7) and
> calling CassandraDaemon.deactivate() after I write rows (at least 1),
> doesn't shutdown Cassandra.
> If I run only "reads" it does shutdown even without
> calling CassandraDaemon.deactivate()
>
> Anyone have any idea what can cause this problem?
>
> Shimi
>

CassandraDaemon deactivate doesn't shutdown Cassandra

2011-10-11 Thread Shimi Kiviti

I am running an Embedded Cassandra (0.8.7) and
calling CassandraDaemon.deactivate() after I write rows (at least 1),
doesn't shutdown Cassandra.
If I run only "reads" it does shutdown even without
calling CassandraDaemon.deactivate()

Anyone have any idea what can cause this problem?

Shimi

Re: Cassandra Capistrano recipes

2011-07-06 Thread shimi

Modify your Capistrano script to install an init script. If you use debian
or redhat you can copy these or modify them:
https://github.com/Shimi/cassandra/blob/trunk/debian/init
https://github.com/Shimi/cassandra/blob/trunk/redhat/cassandra

and setup Capistrano to call /etc/init.d/cassandra stop/start/restart

Shimi

On Thu, Jul 7, 2011 at 4:27 AM, R Headley  wrote:

> Hi
>
> I'm using Capistrano with Cassandra and was wondering if anyone has a
> recipe(s), for in particular, starting Cassandra as a daemon.  Running the
> 'bin/cassandra' shell script (without the '-f' switch) doesn't quite work as
> this only runs Cassandra in the background, logging out will kill it.
>
> Thanks, Richard
>

Re: Read time get worse during dynamic snitch reset

2011-05-11 Thread shimi

I finally found some time to get back to this issue.
I turned on the DEBUG log on the StorageProxy and it shows that all of these
request are read from the other datacenter.

Shimi

On Tue, Apr 12, 2011 at 2:31 PM, aaron morton wrote:

> Something feels odd.
>
> From Peters nice write up of the dynamic snitch
> http://www.mail-archive.com/user@cassandra.apache.org/msg12092.html The
> RackInferringSnitch (and the PropertyFileSnitch) derive from the
> AbstractNetworkTopologySnitch and should...
> "
> In the case of the NetworkTopologyStrategy, it inherits the
> implementation in AbstractNetworkTopologySnitch which sorts by
> AbstractNetworkTopologySnitch.compareEndPoints(), which:
>
> (1) Always prefers itself to any other node. So "myself" is always
> "closest", no matter what.
> (2) Else, always prefers a node in the same rack, to a node in a different
> rack.
> (3) Else, always prefers a node in the same dc, to a node in a different
> dc.
> <http://www.mail-archive.com/user@cassandra.apache.org/msg12092.html>"
>
> AFAIK the (data) request should be going to the local DC even after the
> DynamicSnitch has reset the scores. Because the underlying
> RackInferringSnitch should prefer local nodes.
>
> Just for fun check rack and dc assignments are what you thought using the
> operations on o.a.c.db.EndpointSnitchInfo bean in JConsole. Pass in the ip
> address for the nodes in each dc. If possible can you provide some info on
> the ip's in each dc?
>
> Aaron
>
> On 12 Apr 2011, at 18:24, shimi wrote:
>
> On Tue, Apr 12, 2011 at 12:26 AM, aaron morton wrote:
>
>> The reset interval clears the latency tracked for each node so a bad node
>> will be read from again. The scores for each node are then updated every
>> 100ms (default) using the last 100 responses from a node.
>>
>> How long does the bad performance last for?
>>
> Only a few seconds and but there are a lot of read requests during this
> time
>
>>
>> What CL are you reading at ? At Quorum with RF 4 the read request will be
>> sent to 3 nodes, ordered by proximity and wellness according to the dynamic
>> snitch. (for background recent discussion on dynamic snitch
>> http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html)
>>
> I am reading with CL of ONE,  read_repair_chance=0.33, RackInferringSnitch
> and keys_cached = rows_cached = 0
>
>>
>> You can take a look at the weights and timings used by the DynamicSnitch
>> in JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level
>> you will be able to see which nodes the request is sent to.
>>
> Everything looks OK. The weights are around 3 for the nodes in the same
> data center and around 5 for the others. I will turn on the DEBUG level to
> see if I can find more info.
>
>>
>> My guess is the DynamicSnitch is doing the right thing and the slow down
>> is a node with a problem getting back into the list of nodes used for your
>> read. It's then moved down the list as it's bad performance is noticed.
>>
> Looking the DynamicSnitch MBean I don't see any problems with any of the
> nodes. My guess is that during the reset time there are reads that are sent
> to the other data center.
>
>>
>> Hope that helps
>> Aaron
>>
>
> Shimi
>
>
>>
>> On 12 Apr 2011, at 01:28, shimi wrote:
>>
>> I finally upgraded 0.6.x to 0.7.4.  The nodes are running with the new
>> version for several days across 2 data centers.
>> I noticed that the read time in some of the nodes increase by x50-60 every
>> ten minutes.
>> There was no indication in the logs for something that happen at the same
>> time. The only thing that I know that is running every 10 minutes is
>> the dynamic snitch reset.
>> So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I
>> have the problem once in every 20 minutes.
>>
>> I am running all nodes with:
>> replica_placement_strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   strategy_options:
>> DC1 : 2
>> DC2 : 2
>>   replication_factor: 4
>>
>> (DC1 and DC2 are taken from the ips)
>> Does anyone familiar with this kind of behavior?
>>
>> Shimi
>>
>>
>>
>
>

Re: Combining all CFs into one big one

2011-05-01 Thread shimi

On Sun, May 1, 2011 at 9:48 PM, Jake Luciani  wrote:

> If you have N column families you need N * memtable size of RAM to support
> this.  If that's not an option you can merge them into one as you suggest
> but then you will have much larger SSTables, slower compactions, etc.



> I don't necessarily agree with Tyler that the OS cache will be less
> effective... But I do agree that if the sizes of sstables are too large for
> you then more hardware is the solution...


If you merge CFs which are hardly accessed with one which are accessed
frequently, when you read the SSTable you load data that is hardly accessed
to the OS cache.

Another thing which you should be aware is that if you need to run any of
the nodetool cf tasks, and you really need it for a specific CF running it
on the specific CF is better and faster.

Shimi


>
>
> On Sun, May 1, 2011 at 1:24 PM, Tyler Hobbs  wrote:
>
>> When you have a high number of CFs, it's a good idea to consider merging
>> CFs with highly correlated access patterns and similar structure into one.
>> It is *not* a good idea to merge all of your CFs into one (unless they all
>> happen to meet this criteria). Here's why:
>>
>> Besides big compactions and long repairs that you can't break down into
>> smaller pieces, the main problem here is that your caching will become much
>> less efficient. The OS buffer cache will be less effective because rows from
>> all of the CFs will be interspersed in the SSTables. You will no longer be
>> able to tune the key or row cache to only cache frequently accessed data.
>> Both of these will tend to cause a serious increase in latency for your hot
>> data.
>>
>>> Shouldn't these kinds of problems be solved by Cassandra?
>>>
>> They are mainly solved by Cassandra's general solution to any performance
>> problem: the addition of more nodes. There are tickets open to improve
>> compaction strategies, put bounds on SSTable sizes, etc; for example,
>> https://issues.apache.org/jira/browse/CASSANDRA-1608 , but the addition
>> of more nodes is a reliable solution to problems of this nature.
>>
>> On Sun, May 1, 2011 at 7:28 AM, David Boxenhorn wrote:
>>
>>> Shouldn't these kinds of problems be solved by Cassandra? Isn't there a
>>> maximum SSTable size?
>>>
>>> On Sun, May 1, 2011 at 3:24 PM, shimi  wrote:
>>>
>>>> Big sstables, long compactions, in major compaction you will need to
>>>> have free disk space in the size of all the sstables (which you should have
>>>> anyway).
>>>>
>>>> Shimi
>>>>
>>>>
>>>> On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn wrote:
>>>>
>>>>> I'm having problems administering my cluster because I have too many
>>>>> CFs (~40).
>>>>>
>>>>> I'm thinking of combining them all into one big CF. I would prefix the
>>>>> current CF name to the keys, repeat the CF name in a column, and index the
>>>>> column (so I can loop over all rows, which I have to do sometimes, for 
>>>>> some
>>>>> CFs).
>>>>>
>>>>> Can anyone think of any disadvantages to this approach?
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax <http://datastax.com/>
>> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
>> Python client library
>>
>>
>
>
> --
> http://twitter.com/tjake
>

Re: Combining all CFs into one big one

2011-05-01 Thread shimi

Big sstables, long compactions, in major compaction you will need to have
free disk space in the size of all the sstables (which you should have
anyway).

Shimi

On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn  wrote:

> I'm having problems administering my cluster because I have too many CFs
> (~40).
>
> I'm thinking of combining them all into one big CF. I would prefix the
> current CF name to the keys, repeat the CF name in a column, and index the
> column (so I can loop over all rows, which I have to do sometimes, for some
> CFs).
>
> Can anyone think of any disadvantages to this approach?
>
>

Re: Tombstones and memtable_operations

2011-04-19 Thread shimi

You can use memtable_flush_after_mins instead of the cron

Shimi

2011/4/19 Héctor Izquierdo Seliva 

>
> El mié, 20-04-2011 a las 08:16 +1200, aaron morton escribió:
> > I think their may be an issue here, we are counting the number of columns
> in the operation. When deleting an entire row we do not have a column count.
> >
> > Can you let us know what version you are using and how you are doing the
> delete ?
> >
> > Thanks
> > Aaron
> >
>
> I'm using 0.7.4. I have a file with all the row keys I have to delete
> (around 100 million) and I just go through the file and issue deletes
> through pelops.
>
> Should I manually issue flushes with a cron every x time?
>
> > On 20 Apr 2011, at 04:21, Héctor Izquierdo Seliva wrote:
> >
> > > Ok, I've read about gc grace seconds, but i'm not sure I understand it
> > > fully. Untill gc grace seconds have passed, and there is a compaction,
> > > the tombstones live in memory? I have to delete 100 million rows and my
> > > insert rate is very low, so I don't have a lot of compactions. What
> > > should I do in this case? Lower the major compaction threshold and
> > > memtable_operations to some very low number?
> > >
> > > Thanks
> > >
> > > El mar, 19-04-2011 a las 17:36 +0200, Héctor Izquierdo Seliva escribió:
> > >> Hi everyone. I've configured in one of my column families
> > >> memtable_operations = 0.02 and started deleting keys. I have already
> > >> deleted 54k, but there hasn't been any flush of the memtable. Memory
> > >> keeps pilling up and eventually nodes start to do stop-the-world GCs.
> Is
> > >> this the way this is supposed to work or have I done something wrong?
> > >>
> > >> Thanks!
> > >>
> > >
> > >
> >
>
>
>

Re: Cassandra 0.7.4 Bug?

2011-04-17 Thread shimi

I had the same thing.
Node restart should solve it.

Shimi


On Sun, Apr 17, 2011 at 4:25 PM, Dikang Gu  wrote:

> +1.
>
> I also met this problem several days before, and I haven't got a solution
> yet...
>
>
> On Sun, Apr 17, 2011 at 9:17 PM, csharpplusproject <
> csharpplusproj...@gmail.com> wrote:
>
>>  Often, I see the following behavior:
>>
>> (1) Cassandra works, all nodes are up etc
>>
>> (2) a 'move' operation is being run on one of the nodes
>>
>> (3) following this 'move' operation, even after a couple of hours / days
>> where it is obvious the operation has ended, the node which had 'moved'
>> remains with a status of *?*
>>
>> perhaps it's a bug?
>>
>>
>> ___
>>
>> shalom@host:/opt/cassandra/apache-cassandra-0.7.4$ bin/nodetool -host
>> 192.168.0.5 ring
>> Address Status State   LoadOwns
>> Token
>>
>> 127605887595351923798765477786913079296
>> 192.168.0.253   Up Normal  88.66 MB25.00%
>> 0
>>   192.168.0.4 Up Normal  558.2 MB50.00%
>> 85070591730234615865843651857942052863
>>   192.168.0.5 Up Normal  71.03 MB16.67%
>> 113427455640312821154458202477256070485
>>   192.168.0.6 Up Normal  44.71 MB8.33%
>> 127605887595351923798765477786913079296
>>
>> shalom@host:/opt/cassandra/apache-cassandra-0.7.4$ bin/nodetool -host
>> 192.168.0.4 move 92535295865117307932921825928971026432
>>
>> shalom@host:/opt/cassandra/apache-cassandra-0.7.4$ bin/nodetool -host
>> 192.168.0.5 ring
>> Address Status State   LoadOwns
>> Token
>>
>> 127605887595351923798765477786913079296
>> 192.168.0.253   Up Normal  171.17 MB   25.00%
>> 0
>> 192.168.0.4 *?*  Normal  212.11 MB   54.39%
>> 92535295865117307932921825928971026432
>> 192.168.0.5 Up Normal  263.91 MB   12.28%
>> 113427455640312821154458202477256070485
>> 192.168.0.6 Up Normal  26.21 MB8.33%
>> 127605887595351923798765477786913079296
>>
>
>
>
> --
> Dikang Gu
>
> 0086 - 18611140205
>
>

Re: Read time get worse during dynamic snitch reset

2011-04-11 Thread shimi

On Tue, Apr 12, 2011 at 12:26 AM, aaron morton wrote:

> The reset interval clears the latency tracked for each node so a bad node
> will be read from again. The scores for each node are then updated every
> 100ms (default) using the last 100 responses from a node.
>
> How long does the bad performance last for?
>
Only a few seconds and but there are a lot of read requests during this time

>
> What CL are you reading at ? At Quorum with RF 4 the read request will be
> sent to 3 nodes, ordered by proximity and wellness according to the dynamic
> snitch. (for background recent discussion on dynamic snitch
> http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html)
>
I am reading with CL of ONE,  read_repair_chance=0.33, RackInferringSnitch
and keys_cached = rows_cached = 0

>
> You can take a look at the weights and timings used by the DynamicSnitch in
> JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level you
> will be able to see which nodes the request is sent to.
>
Everything looks OK. The weights are around 3 for the nodes in the same data
center and around 5 for the others. I will turn on the DEBUG level to see if
I can find more info.

>
> My guess is the DynamicSnitch is doing the right thing and the slow down is
> a node with a problem getting back into the list of nodes used for your
> read. It's then moved down the list as it's bad performance is noticed.
>
Looking the DynamicSnitch MBean I don't see any problems with any of the
nodes. My guess is that during the reset time there are reads that are sent
to the other data center.

>
> Hope that helps
> Aaron
>

Shimi


>
> On 12 Apr 2011, at 01:28, shimi wrote:
>
> I finally upgraded 0.6.x to 0.7.4.  The nodes are running with the new
> version for several days across 2 data centers.
> I noticed that the read time in some of the nodes increase by x50-60 every
> ten minutes.
> There was no indication in the logs for something that happen at the same
> time. The only thing that I know that is running every 10 minutes is
> the dynamic snitch reset.
> So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I
> have the problem once in every 20 minutes.
>
> I am running all nodes with:
> replica_placement_strategy:
> org.apache.cassandra.locator.NetworkTopologyStrategy
>   strategy_options:
> DC1 : 2
> DC2 : 2
>   replication_factor: 4
>
> (DC1 and DC2 are taken from the ips)
> Does anyone familiar with this kind of behavior?
>
> Shimi
>
>
>

Read time get worse during dynamic snitch reset

2011-04-11 Thread shimi

I finally upgraded 0.6.x to 0.7.4.  The nodes are running with the new
version for several days across 2 data centers.
I noticed that the read time in some of the nodes increase by x50-60 every
ten minutes.
There was no indication in the logs for something that happen at the same
time. The only thing that I know that is running every 10 minutes is
the dynamic snitch reset.
So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I
have the problem once in every 20 minutes.

I am running all nodes with:
replica_placement_strategy:
org.apache.cassandra.locator.NetworkTopologyStrategy
  strategy_options:
DC1 : 2
DC2 : 2
  replication_factor: 4

(DC1 and DC2 are taken from the ips)
Does anyone familiar with this kind of behavior?

Shimi

index file contains a different key or row size

2011-04-04 Thread shimi

It make sense to me that compaction should solved this as well since
compaction creates new index files.
Am I missing something here?

WARN [CompactionExecutor:1] 2011-04-04 14:50:54,105 CompactionManager.java
(line 602) Row scrubbed successfully but index file contains a different key
or row size; consider rebuilding the index as described in
http://www.mail-archive.com/user@cassandra.apache.org/msg03325.html

Shimi

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread shimi

The bigger the file the longer it will take for it to be part of a
compaction again.
Compacting bucket of large files takes longer then compacting bucket of
small files

Shimi

On Mon, Apr 4, 2011 at 3:58 PM, aaron morton wrote:

> mmm, interesting. My theory was
>
> t0 - major compaction runs, there is now one sstable
> t1 - x new sstables have been created
> t2 - minor compaction runs and determines there are two buckets, one with
> the x new sstables and one with the single big file. The bucket of many
> files is compacted into one, the bucket of one file is ignored.
>
> I can see that it takes longer for the big file to be involved in
> compaction again, and when it finally was it would take more time. But that
> minor compactions of new SSTables would still happen at the same rate,
> especially if they are created at the same rate as previously.
>
> Am I missing something or am I just reading the docs wrong ?
>
> Cheers
> Aaron
>
>
> On 4 Apr 2011, at 22:20, Jonathan Colby wrote:
>
> hi Aaron -
>
> The Datastax documentation brought to light the fact that over time, major
> compactions  will be performed on bigger and bigger SSTables.   They
> actually recommend against performing too many major compactions.  Which is
> why I am wary to trigger too many major compactions ...
>
> http://www.datastax.com/docs/0.7/operations/scheduled_tasks
> Performing Major 
> Compaction¶<http://www.datastax.com/docs/0.7/operations/scheduled_tasks#performing-major-compaction>
>
> A major compaction process merges all SSTables for all column families in a
> keyspace – not just similar sized ones, as in minor compaction. Note that
> this may create extremely large SStables that result in long intervals
> before the next minor compaction (and a resulting increase in CPU usage for
> each minor compaction).
>
> Though a major compaction ultimately frees disk space used by accumulated
> SSTables, during runtime it can temporarily double disk space usage. It is
> best to run major compactions, if at all, at times of low demand on the
> cluster.
>
>
>
>
>
>
> On Apr 4, 2011, at 1:57 PM, aaron morton wrote:
>
> cleanup reads each SSTable on disk and writes a new file that contains the
> same data with the exception of rows that are no longer in a token range the
> node is a replica for. It's not compacting the files into fewer files or
> purging tombstones. But it is re-writing all the data for the CF.
>
> Part of the process will trigger GC if needed to free up disk space from
> SSTables no longer needed.
>
> AFAIK having fewer bigger files will not cause longer minor compactions.
> Compaction thresholds are applied per bucket of files that share a similar
> size, there is normally more smaller files and fewer larger files.
>
> Aaron
>
> On 2 Apr 2011, at 01:45, Jonathan Colby wrote:
>
> I discovered that a Garbage collection cleans up the unused old SSTables.
>   But I still wonder whether cleanup really does a full compaction.  This
> would be undesirable if so.
>
>
>
> On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote:
>
>
> I ran node cleanup on a node in my cluster and discovered the disk usage
> went from 3.3 GB to 5.4 GB.  Why is this?
>
>
> I thought cleanup just removed hinted handoff information.   I read that
> *during* cleanup extra disk space will be used similar to a compaction.  But
> I was expecting the disk usage to go back down when it finished.
>
>
> I hope cleanup doesn't trigger a major compaction.  I'd rather not run
> major compactions because it means future minor compactions will take longer
> and use more CPU and disk.
>
>
>
>
>
>
>
>

Re: urgent

2011-04-03 Thread shimi

How did you solve it?

On Sun, Apr 3, 2011 at 7:32 PM, Anurag Gujral wrote:

> Now it is using all the three disks . I want to understand why recommended
> approach is to use
> one single large volume /directory and not multiple ones,can you please
> explain in detail.
> I am using SSDs using  three small ones is cheaper than using one large
> one.
> Please Suggest
> Thanks
> Anurag
>
>
> On Sun, Apr 3, 2011 at 7:31 AM, aaron morton wrote:
>
>> Is this still a problem ? Are you getting errors on the server ?
>>
>> It should be choosing the directory with the most space.
>>
>> btw, the recommended approach is to use a single large volume/directory
>> for the data.
>>
>> Aaron
>>
>> On 2 Apr 2011, at 01:56, Anurag Gujral wrote:
>>
>> > Hi All,
>> >   I have setup a cassandra cluster with three data directories
>> but cassandra is using only one of them and that disk is out of space
>> > and .Why is cassandra not using all the three data directories.
>> >
>> > Plz Suggest.
>> >
>> > Thanks
>> > Anurag
>>
>>
>

Re: Exceptions on 0.7.0

2011-02-22 Thread shimi

I didn't solved it.
Since it is a test cluster I deleted all the data. I copied some sstables
from my production cluster and I tried again, this time I didn't have this
problem.
I am planing on removing everything from this test cluster. I will start all
over again with 0.6.x , then I will load it with 10th of GB of data (not
sstable copy) and test the upgrade again.

I did a mistake that I didn't backup the data files before I upgraded.

Shimi

On Tue, Feb 22, 2011 at 2:24 PM, David Boxenhorn  wrote:

> Shimi,
>
> I am getting the same error that you report here. What did you do to solve
> it?
>
> David
>
>
> On Thu, Feb 10, 2011 at 2:54 PM, shimi  wrote:
>
>> I upgraded the version on all the nodes but I still gets the Exceptions.
>> I run cleanup on one of the nodes but I don't think there is any cleanup
>> going on.
>>
>> Another weird thing that I see is:
>> INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353
>> CompactionIterator.java (line 135) Compacting large row
>> 333531353730363835363237353338383836383035363036393135323132383
>> 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
>> (725849473109 bytes) incrementally
>>
>> In my production version the largest row is 10259. It shouldn't be
>> different in this case.
>>
>> The first Exception is been thrown on 3 nodes during compaction.
>> The second Exception (Internal error processing get_range_slices) is been
>> thrown all the time by a forth node. I disabled gossip and any client
>> traffic to it and I still get the Exceptions.
>> Is it possible to boot a node with gossip disable?
>>
>> Shimi
>>
>> On Thu, Feb 10, 2011 at 11:11 AM, aaron morton 
>> wrote:
>>
>>> I should be able to repair, install the new version and kick off nodetool
>>> repair .
>>>
>>> If you are uncertain search for cassandra-1992 on the list, there has
>>> been some discussion. You can also wait till some peeps in the states wake
>>> up if you want to be extra sure.
>>>
>>>  The number if the number of columns the iterator is going to return from
>>> the row. I'm guessing that because this happening during compaction it's
>>> using asked for the maximum possible number of columns.
>>>
>>> Aaron
>>>
>>>
>>>
>>> On 10 Feb 2011, at 21:37, shimi wrote:
>>>
>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>>
>>>  Out of curiosity, do you really have on the order of 1,986,622,313
>>> elements (I believe elements=keys) in the cf?
>>>
>>> Dan
>>>
>>> No. I was too puzzled by the numbers
>>>
>>>
>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
>>>  wrote:
>>>
>>>> Shimi,
>>>> You may be seeing the result of CASSANDRA-1992, are you able to test
>>>> with the most recent 0.7 build ?
>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>>
>>>>
>>>> Aaron
>>>>
>>> I will. I hope the data was not corrupted.
>>>
>>>
>>>
>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
>>> wrote:
>>>
>>>> Shimi,
>>>> You may be seeing the result of CASSANDRA-1992, are you able to test
>>>> with the most recent 0.7 build ?
>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>>>
>>>>
>>>> Aaron
>>>>
>>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>>>
>>>> Out of curiosity, do you really have on the order of 1,986,622,313
>>>> elements (I believe elements=keys) in the cf?
>>>>
>>>> Dan
>>>>
>>>>  *From:* shimi [mailto:shim...@gmail.com]
>>>> *Sent:* February-09-11 15:06
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Exceptions on 0.7.0
>>>>
>>>> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
>>>> On 3 out of the 4 nodes I get exceptions in the log.
>>>> I am using RP.
>>>> Changes that I did:
>>>> 1. changed the replication factor from 3 to 4
>>>> 2. configured the nodes to use Dynamic Snitch
>>>> 3. RR of 0.33
>>>>
>>>> I run repair on 2 nodes  before I noticed the errors. One of them is
>>>> having the first error and the other the second.
>>>> I restart the nodes but I still get the exceptions.
>>>>
>

EOFException: attempted to skip x bytes

2011-02-21 Thread shimi

rator.(SSTableIdentityIterator.java:69)
... 19 more

Shimi

Re: Exceptions on 0.7.0

2011-02-10 Thread shimi

I upgraded the version on all the nodes but I still gets the Exceptions.
I run cleanup on one of the nodes but I don't think there is any cleanup
going on.

Another weird thing that I see is:
INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353 CompactionIterator.java
(line 135) Compacting large row
333531353730363835363237353338383836383035363036393135323132383
73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
(725849473109 bytes) incrementally

In my production version the largest row is 10259. It shouldn't be different
in this case.

The first Exception is been thrown on 3 nodes during compaction.
The second Exception (Internal error processing get_range_slices) is been
thrown all the time by a forth node. I disabled gossip and any client
traffic to it and I still get the Exceptions.
Is it possible to boot a node with gossip disable?

Shimi

On Thu, Feb 10, 2011 at 11:11 AM, aaron morton wrote:

> I should be able to repair, install the new version and kick off nodetool
> repair .
>
> If you are uncertain search for cassandra-1992 on the list, there has been
> some discussion. You can also wait till some peeps in the states wake up if
> you want to be extra sure.
>
>  The number if the number of columns the iterator is going to return from
> the row. I'm guessing that because this happening during compaction it's
> using asked for the maximum possible number of columns.
>
> Aaron
>
>
>
> On 10 Feb 2011, at 21:37, shimi wrote:
>
> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>
>  Out of curiosity, do you really have on the order of 1,986,622,313
> elements (I believe elements=keys) in the cf?
>
> Dan
>
> No. I was too puzzled by the numbers
>
>
> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
>  wrote:
>
>> Shimi,
>> You may be seeing the result of CASSANDRA-1992, are you able to test with
>> the most recent 0.7 build ?
>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>
>>
>> Aaron
>>
> I will. I hope the data was not corrupted.
>
>
>
> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton wrote:
>
>> Shimi,
>> You may be seeing the result of CASSANDRA-1992, are you able to test with
>> the most recent 0.7 build ?
>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>
>>
>> Aaron
>>
>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>
>> Out of curiosity, do you really have on the order of 1,986,622,313
>> elements (I believe elements=keys) in the cf?
>>
>> Dan
>>
>>  *From:* shimi [mailto:shim...@gmail.com]
>> *Sent:* February-09-11 15:06
>> *To:* user@cassandra.apache.org
>> *Subject:* Exceptions on 0.7.0
>>
>> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
>> On 3 out of the 4 nodes I get exceptions in the log.
>> I am using RP.
>> Changes that I did:
>> 1. changed the replication factor from 3 to 4
>> 2. configured the nodes to use Dynamic Snitch
>> 3. RR of 0.33
>>
>> I run repair on 2 nodes  before I noticed the errors. One of them is
>> having the first error and the other the second.
>> I restart the nodes but I still get the exceptions.
>>
>> The following Exception I get from 2 nodes:
>>  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java
>> (line 84) Cannot provide an optimal Bloom
>> Filter for 1986622313 elements (1/4 buckets per element).
>> ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
>> AbstractCassandraDaemon.java (line 91) Fatal exception in
>> thread Thread[CompactionExecutor:1,1,main]
>> java.io.IOError: java.io.EOFException
>> at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
>> at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
>> at
>> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
>> at
>> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>> at
>> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>> at
>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
>> at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>> at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>> at
>> com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)
>> at
>

Re: Exceptions on 0.7.0

2011-02-10 Thread shimi

On 10 Feb 2011, at 13:42, Dan Hendry wrote:

Out of curiosity, do you really have on the order of 1,986,622,313 elements
(I believe elements=keys) in the cf?

Dan

No. I was too puzzled by the numbers


On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
 wrote:

> Shimi,
> You may be seeing the result of CASSANDRA-1992, are you able to test with
> the most recent 0.7 build ?
> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>
>
> Aaron
>
I will. I hope the data was not corrupted.



On Thu, Feb 10, 2011 at 10:30 AM, aaron morton wrote:

> Shimi,
> You may be seeing the result of CASSANDRA-1992, are you able to test with
> the most recent 0.7 build ?
> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>
>
> Aaron
>
> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>
> Out of curiosity, do you really have on the order of 1,986,622,313 elements
> (I believe elements=keys) in the cf?
>
> Dan
>
> *From:* shimi [mailto:shim...@gmail.com]
> *Sent:* February-09-11 15:06
> *To:* user@cassandra.apache.org
> *Subject:* Exceptions on 0.7.0
>
> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
> On 3 out of the 4 nodes I get exceptions in the log.
> I am using RP.
> Changes that I did:
> 1. changed the replication factor from 3 to 4
> 2. configured the nodes to use Dynamic Snitch
> 3. RR of 0.33
>
> I run repair on 2 nodes  before I noticed the errors. One of them is having
> the first error and the other the second.
> I restart the nodes but I still get the exceptions.
>
> The following Exception I get from 2 nodes:
>  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java (line
> 84) Cannot provide an optimal Bloom
> Filter for 1986622313 elements (1/4 buckets per element).
> ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
> AbstractCassandraDaemon.java (line 91) Fatal exception in
> thread Thread[CompactionExecutor:1,1,main]
> java.io.IOError: java.io.EOFException
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
> at
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
> at
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
> at
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at
> org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76)
> at
> org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50)
> at
> org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java:88)
> at
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:136)
> at
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
> at
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
> at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at
> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
> at
> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
> at
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
> at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
> at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.Thread

Exceptions on 0.7.0

2011-02-09 Thread shimi

(CollatingIterator.java:217)
at
org.apache.cassandra.db.RowIteratorFactory$3.getReduced(RowIteratorFactory.java:136)
at
org.apache.cassandra.db.RowIteratorFactory$3.getReduced(RowIteratorFactory.java:106)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49)
at
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1294)
at
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:438)
at
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:473)
at
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:1
67)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:78)
... 21 more

any idea what went wrong?
Shimi

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-15 Thread shimi

Same here, Hector with Java.

Shimi

On Fri, Jan 14, 2011 at 9:13 PM, Dan Kuebrich wrote:

> We've done hundreds of gigs in and out of cassandra 0.6.8 with pycassa 0.3.
>  Working on upgrading to 0.7 and pycassa 1.03.
>
> I don't know if we're using it wrong, but the "connection object is tied to
> a particular keyspace" constraint isn't that awesome--we have a number of
> keyspaces used simultaneously.  Haven't looked into it yet.
>
>
> On Fri, Jan 14, 2011 at 1:52 PM, Mike Wynholds wrote:
>
>> We have one in production with Ruby / fauna Cassandra gem and Cassandra
>> 0.6.x.  The project is live but is stuck in a sort of private beta, so it
>> hasn't really been run through any load scenarios.
>>
>> ..mike..
>>
>> --
>> Michael Wynholds | Carbon Five | 310.821.7125 x13 | m...@carbonfive.com
>>
>>
>>
>> On Fri, Jan 14, 2011 at 9:24 AM, Ertio Lew  wrote:
>>
>>> Hey,
>>>
>>> If you have a site in production environment or considering so, what
>>> is the client that you use to interact with Cassandra. I know that
>>> there are several clients available out there according to the
>>> language you use but I would love to know what clients are being used
>>> widely in production environments and are best to work with(support
>>> most required features for performance).
>>>
>>> Also preferably tell about the technology stack for your applications.
>>>
>>> Any suggestions, comments appreciated ?
>>>
>>> Thanks
>>> Ertio
>>>
>>
>>
>

Re: Reclaim deleted rows space

2011-01-10 Thread shimi

I modified the code to limit the size of the SSTables.
I will be glad if someone can take a look at it

https://github.com/Shimi/cassandra/tree/cassandra-0.6

<https://github.com/Shimi/cassandra/tree/cassandra-0.6>Shimi

On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook  wrote:

> I believe the following condition within submitMinorIfNeeded(...)
> determines whether to continue, so it's not a hard loop.
>
> // if (sstables.size() >= minThreshold) ...
>
>
>
> On Thu, Jan 6, 2011 at 2:51 AM, shimi  wrote:
> > According to the code it make sense.
> > submitMinorIfNeeded() calls doCompaction() which
> > calls submitMinorIfNeeded().
> > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
> > compaction.
> >
> > Shimi
> > On Thu, Jan 6, 2011 at 10:26 AM, shimi  wrote:
> >>
> >>
> >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis 
> wrote:
> >>>
> >>> Pretty sure there's logic in there that says "don't bother compacting
> >>> a single sstable."
> >>
> >> No. You can do it.
> >> Based on the log I have a feeling that it triggers an infinite
> compaction
> >> loop.
> >>
> >>>
> >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
> >>> > How does minor compaction is triggered? Is it triggered Only when a
> new
> >>> > SStable is added?
> >>> >
> >>> > I was wondering if triggering a compaction
> >>> > with minimumCompactionThreshold
> >>> > set to 1 would be useful. If this can happen I assume it will do
> >>> > compaction
> >>> > on files with similar size and remove deleted rows on the rest.
> >>> > Shimi
> >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
> >>> > 
> >>> > wrote:
> >>> >>
> >>> >> > I don't have a problem with disk space. I have a problem with the
> >>> >> > data
> >>> >> > size.
> >>> >>
> >>> >> [snip]
> >>> >>
> >>> >> > Bottom line is that I want to reduce the number of requests that
> >>> >> > goes to
> >>> >> > disk. Since there is enough data that is no longer valid I can do
> it
> >>> >> > by
> >>> >> > reclaiming the space. The only way to do it is by running Major
> >>> >> > compaction.
> >>> >> > I can wait and let Cassandra do it for me but then the data size
> >>> >> > will
> >>> >> > get
> >>> >> > even bigger and the response time will be worst. I can do it
> >>> >> > manually
> >>> >> > but I
> >>> >> > prefer it to happen in the background with less impact on the
> system
> >>> >>
> >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
> >>> >>
> >>> >> So essentially, for workloads that are teetering on the edge of
> cache
> >>> >> warmness and is subject to significant overwrites or removals, it
> may
> >>> >> be beneficial to perform much more aggressive background compaction
> >>> >> even though it might waste lots of CPU, to keep the in-memory
> working
> >>> >> set down.
> >>> >>
> >>> >> There was talk (I think in the compaction redesign ticket) about
> >>> >> potentially improving the use of bloom filters such that obsolete
> data
> >>> >> in sstables could be eliminated from the read set without
> >>> >> necessitating actual compaction; that might help address cases like
> >>> >> these too.
> >>> >>
> >>> >> I don't think there's a pre-existing silver bullet in a current
> >>> >> release; you probably have to live with the need for
> >>> >> greater-than-theoretically-optimal memory requirements to keep the
> >>> >> working set in memory.
> >>> >>
> >>> >> --
> >>> >> / Peter Schuller
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder of Riptano, the source for professional Cassandra support
> >>> http://riptano.com
> >>
> >
> >
>

Re: maven cassandra plugin

2011-01-06 Thread shimi

I use Capistrano for install, upgrades, start, stop and restart.
I use it for other projects as well.
It is very useful for automated tasks that needs to run on multiple machines

Shiy

On 2011 1 6 21:38, "B. Todd Burruss"  wrote:

has anyone created a maven plugin, like cargo for tomcat, for automating
starting/stopping a cassandra instance?

Re: Reclaim deleted rows space

2011-01-06 Thread shimi

According to the code it make sense.
submitMinorIfNeeded() calls doCompaction() which calls
submitMinorIfNeeded().
With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run
compaction.

Shimi

On Thu, Jan 6, 2011 at 10:26 AM, shimi  wrote:

>
>
> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis  wrote:
>
>> Pretty sure there's logic in there that says "don't bother compacting
>> a single sstable."
>
> No. You can do it.
> Based on the log I have a feeling that it triggers an infinite compaction
> loop.
>
>
>
>>  On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
>> > How does minor compaction is triggered? Is it triggered Only when a new
>> > SStable is added?
>> >
>> > I was wondering if triggering a compaction
>> with minimumCompactionThreshold
>> > set to 1 would be useful. If this can happen I assume it will do
>> compaction
>> > on files with similar size and remove deleted rows on the rest.
>> > Shimi
>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
>> peter.schul...@infidyne.com>
>> > wrote:
>> >>
>> >> > I don't have a problem with disk space. I have a problem with the
>> data
>> >> > size.
>> >>
>> >> [snip]
>> >>
>> >> > Bottom line is that I want to reduce the number of requests that goes
>> to
>> >> > disk. Since there is enough data that is no longer valid I can do it
>> by
>> >> > reclaiming the space. The only way to do it is by running Major
>> >> > compaction.
>> >> > I can wait and let Cassandra do it for me but then the data size will
>> >> > get
>> >> > even bigger and the response time will be worst. I can do it manually
>> >> > but I
>> >> > prefer it to happen in the background with less impact on the system
>> >>
>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>> >>
>> >> So essentially, for workloads that are teetering on the edge of cache
>> >> warmness and is subject to significant overwrites or removals, it may
>> >> be beneficial to perform much more aggressive background compaction
>> >> even though it might waste lots of CPU, to keep the in-memory working
>> >> set down.
>> >>
>> >> There was talk (I think in the compaction redesign ticket) about
>> >> potentially improving the use of bloom filters such that obsolete data
>> >> in sstables could be eliminated from the read set without
>> >> necessitating actual compaction; that might help address cases like
>> >> these too.
>> >>
>> >> I don't think there's a pre-existing silver bullet in a current
>> >> release; you probably have to live with the need for
>> >> greater-than-theoretically-optimal memory requirements to keep the
>> >> working set in memory.
>> >>
>> >> --
>> >> / Peter Schuller
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: Reclaim deleted rows space

2011-01-06 Thread shimi

On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis  wrote:

> Pretty sure there's logic in there that says "don't bother compacting
> a single sstable."

No. You can do it.
Based on the log I have a feeling that it triggers an infinite compaction
loop.



> On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
> > How does minor compaction is triggered? Is it triggered Only when a new
> > SStable is added?
> >
> > I was wondering if triggering a compaction
> with minimumCompactionThreshold
> > set to 1 would be useful. If this can happen I assume it will do
> compaction
> > on files with similar size and remove deleted rows on the rest.
> > Shimi
> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
> peter.schul...@infidyne.com>
> > wrote:
> >>
> >> > I don't have a problem with disk space. I have a problem with the data
> >> > size.
> >>
> >> [snip]
> >>
> >> > Bottom line is that I want to reduce the number of requests that goes
> to
> >> > disk. Since there is enough data that is no longer valid I can do it
> by
> >> > reclaiming the space. The only way to do it is by running Major
> >> > compaction.
> >> > I can wait and let Cassandra do it for me but then the data size will
> >> > get
> >> > even bigger and the response time will be worst. I can do it manually
> >> > but I
> >> > prefer it to happen in the background with less impact on the system
> >>
> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
> >>
> >> So essentially, for workloads that are teetering on the edge of cache
> >> warmness and is subject to significant overwrites or removals, it may
> >> be beneficial to perform much more aggressive background compaction
> >> even though it might waste lots of CPU, to keep the in-memory working
> >> set down.
> >>
> >> There was talk (I think in the compaction redesign ticket) about
> >> potentially improving the use of bloom filters such that obsolete data
> >> in sstables could be eliminated from the read set without
> >> necessitating actual compaction; that might help address cases like
> >> these too.
> >>
> >> I don't think there's a pre-existing silver bullet in a current
> >> release; you probably have to live with the need for
> >> greater-than-theoretically-optimal memory requirements to keep the
> >> working set in memory.
> >>
> >> --
> >> / Peter Schuller
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Reclaim deleted rows space

2011-01-06 Thread shimi

Am I missing something here? It is already possible to trigger major
compaction on a specific CF.

On Thu, Jan 6, 2011 at 4:50 AM, Tyler Hobbs  wrote:

> Although it's not exactly the ability to list specific SSTables, the
> ability to only compact specific CFs will be in upcoming releases:
>
> https://issues.apache.org/jira/browse/CASSANDRA-1812
>
> - Tyler
>
>
> On Wed, Jan 5, 2011 at 7:46 PM, Edward Capriolo wrote:
>
>> On Wed, Jan 5, 2011 at 4:31 PM, Jonathan Ellis  wrote:
>> > Pretty sure there's logic in there that says "don't bother compacting
>> > a single sstable."
>> >
>> > On Wed, Jan 5, 2011 at 2:26 PM, shimi  wrote:
>> >> How does minor compaction is triggered? Is it triggered Only when a new
>> >> SStable is added?
>> >>
>> >> I was wondering if triggering a compaction
>> with minimumCompactionThreshold
>> >> set to 1 would be useful. If this can happen I assume it will do
>> compaction
>> >> on files with similar size and remove deleted rows on the rest.
>> >> Shimi
>> >> On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller <
>> peter.schul...@infidyne.com>
>> >> wrote:
>> >>>
>> >>> > I don't have a problem with disk space. I have a problem with the
>> data
>> >>> > size.
>> >>>
>> >>> [snip]
>> >>>
>> >>> > Bottom line is that I want to reduce the number of requests that
>> goes to
>> >>> > disk. Since there is enough data that is no longer valid I can do it
>> by
>> >>> > reclaiming the space. The only way to do it is by running Major
>> >>> > compaction.
>> >>> > I can wait and let Cassandra do it for me but then the data size
>> will
>> >>> > get
>> >>> > even bigger and the response time will be worst. I can do it
>> manually
>> >>> > but I
>> >>> > prefer it to happen in the background with less impact on the system
>> >>>
>> >>> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>> >>>
>> >>> So essentially, for workloads that are teetering on the edge of cache
>> >>> warmness and is subject to significant overwrites or removals, it may
>> >>> be beneficial to perform much more aggressive background compaction
>> >>> even though it might waste lots of CPU, to keep the in-memory working
>> >>> set down.
>> >>>
>> >>> There was talk (I think in the compaction redesign ticket) about
>> >>> potentially improving the use of bloom filters such that obsolete data
>> >>> in sstables could be eliminated from the read set without
>> >>> necessitating actual compaction; that might help address cases like
>> >>> these too.
>> >>>
>> >>> I don't think there's a pre-existing silver bullet in a current
>> >>> release; you probably have to live with the need for
>> >>> greater-than-theoretically-optimal memory requirements to keep the
>> >>> working set in memory.
>> >>>
>> >>> --
>> >>> / Peter Schuller
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of Riptano, the source for professional Cassandra support
>> > http://riptano.com
>> >
>>
>> I was wording if it made sense to have a JMX operation that can
>> compact a list of tables by file name. This opens it up for power
>> users to have more options then compact entire keyspace.
>>
>
>

Re: Reclaim deleted rows space

2011-01-05 Thread shimi

How does minor compaction is triggered? Is it triggered Only when a new
SStable is added?

I was wondering if triggering a compaction with minimumCompactionThreshold
set to 1 would be useful. If this can happen I assume it will do compaction
on files with similar size and remove deleted rows on the rest.

Shimi

On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
wrote:

> > I don't have a problem with disk space. I have a problem with the data
> > size.
>
> [snip]
>
> > Bottom line is that I want to reduce the number of requests that goes to
> > disk. Since there is enough data that is no longer valid I can do it by
> > reclaiming the space. The only way to do it is by running Major
> compaction.
> > I can wait and let Cassandra do it for me but then the data size will get
> > even bigger and the response time will be worst. I can do it manually but
> I
> > prefer it to happen in the background with less impact on the system
>
> Ok - that makes perfect sense then. Sorry for misunderstanding :)
>
> So essentially, for workloads that are teetering on the edge of cache
> warmness and is subject to significant overwrites or removals, it may
> be beneficial to perform much more aggressive background compaction
> even though it might waste lots of CPU, to keep the in-memory working
> set down.
>
> There was talk (I think in the compaction redesign ticket) about
> potentially improving the use of bloom filters such that obsolete data
> in sstables could be eliminated from the read set without
> necessitating actual compaction; that might help address cases like
> these too.
>
> I don't think there's a pre-existing silver bullet in a current
> release; you probably have to live with the need for
> greater-than-theoretically-optimal memory requirements to keep the
> working set in memory.
>
> --
> / Peter Schuller
>

Re: Reclaim deleted rows space

2011-01-04 Thread shimi

Yes I am aware of that.
This is the reason I upgraded to 0.6.8.
Still all the deleted rows in the biggest SSTable will be remove in a major
compaction

Shimi

On Tue, Jan 4, 2011 at 6:40 PM, Robert Coli  wrote:

> On Tue, Jan 4, 2011 at 4:33 AM, Peter Schuller
>  wrote:
> > For some cases this will be beneficial, but not always. It's been
> > further improved for 0.7 too w.r.t. tomb stone handling in non-major
> > compactions (I don't have the JIRA ticket number handy).
>
> https://issues.apache.org/jira/browse/CASSANDRA-1074
>
> (For those playing along at home..)
>
> =Rob
>

Re: Bootstrapping taking long

2011-01-04 Thread shimi

You will have something new to talk about in your talk tomorrow :)

You said that the anti compaction was only on a single node? I think that
your new node should get data from at least two other nodes (depending on
the replication factor). Maybe the problem is not in the new node.
In old version (I think prior to 0.6.3) there was case of stuck bootstrap
that required restart to the new node and the nodes which were suppose to
stream data to it. As far as I remember this case was resolved. I haven't
seen this problem since then.

Shimi

On Tue, Jan 4, 2011 at 3:01 PM, Ran Tavory  wrote:

> Running nodetool decommission didn't help. Actually the node refused to
> decommission itself (b/c it wasn't part of the ring). So I simply stopped
> the process, deleted all the data directories and started it again. It
> worked in the sense of the node bootstrapped again but as before, after it
> had finished moving the data nothing happened for a long time (I'm still
> waiting, but nothing seems to be happening).
>
> Any hints how to analyze a "stuck" bootstrapping node??
> thanks
>
> On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory  wrote:
>
>> Thanks Shimi, so indeed anticompaction was run on one of the other nodes
>> from the same DC but to my understanding it has already ended. A few hour
>> ago...
>> I plenty of log messages such as [1] which ended a couple of hours ago,
>> and I've seen the new node streaming and accepting the data from the node
>> which performed the anticompaction and so far it was normal so it seemed
>> that data is at its right place. But now the new node seems sort of stuck.
>> None of the other nodes is anticompacting right now or had been
>> anticompacting since then.
>> The new node's CPU is close to zero, it's iostats are almost zero so I
>> can't find another bottleneck that would keep it hanging.
>>
>> On the IRC someone suggested I'd maybe retry to join this node,
>> e.g. decommission and rejoin it again. I'll try it now...
>>
>>
>> [1]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
>> (line 338) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
>>
>> On Tue, Jan 4, 2011 at 12:45 PM, shimi  wrote:
>>
>>> In my experience most of the time it takes for a node to join the cluster
>>> is the anticompaction on the other nodes. The streaming part is very fast.
>>> Check the other nodes logs to see if there is any node doing
>>> anticompaction.
>>> I don't remember how much data I had in the cluster when I needed to
>>> add/remove nodes. I do remember that it took a few hours.
>>>
>>> The node will join the ring only when it will finish the bootstrap.
>>>
>>> Shimi
>>>
>>>
>>> On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:
>>>
>>>> I asked the same question on the IRC but no luck there, everyone's
>>>> asleep ;)...
>>>>
>>>> Using 0.6.6 I'm adding a new node to the cluster.
>>>> It starts out fine but then gets stuck on the bootstrapping state for
>>>> too long. More than an hour and still counting.
>>>>
>>>> $ bin/nodetool -p 9004 -h localhost streams
>>>>> Mode: Bootstrapping
>>>>> Not sending any streams.
>>>>> Not receiving any streams.
>>>>
>>>>
>>

Re: Reclaim deleted rows space

2011-01-04 Thread shimi

I think I didn't make myself clear.
I don't have a problem with disk space. I have a problem with the data
size.
I have a simple crud application. Most of the requests are read but there
are update/delete and when the time pass the number of deleted rows is big
enough in order to free some disk space (a matter of days and not hours).
Since not all of the data can fit to RAM (and I have a lot of RAM) the rest
is served from disk. Since disk is slow I want to reduce as much as possible
the number of requests that goes to the disk. The more requests to the disk,
the disk wait time gets longer and it takes more time to return a response.

Bottom line is that I want to reduce the number of requests that goes to
disk. Since there is enough data that is no longer valid I can do it by
reclaiming the space. The only way to do it is by running Major compaction.
I can wait and let Cassandra do it for me but then the data size will get
even bigger and the response time will be worst. I can do it manually but I
prefer it to happen in the background with less impact on the system

Shimi


On Tue, Jan 4, 2011 at 2:33 PM, Peter Schuller
wrote:

> > This is what I thought. I was wishing there might be another way to
> reclaim
> > the space.
>
> Be sure you really need this first :) Normally you just let it happen in
> the bg.
>
> > The problem is that the more data you have the more time it will take to
> > Cassandra to response.
>
> Relative to what though? There are definitely important side-effects
> of having very large data sets, and part of that involves compactions,
> but in a normal steady state type of system you should never be in the
> position to "wait" for a major compaction to run. Compactions are
> something that is intended to run every now and then in the
> background. It will result in variations in disk space within certain
> bounds, which is expected.
>
> Certainly the situation can be improved and the current disk space
> utilization situation is not perfect, but the above suggests to me
> that you're trying to do something that is not really intended to be
> done.
>
> > Reclaim space of deleted rows in the biggest SSTable requires Major
> > compaction. This compaction can be triggered by adding x2 data (or x4
> data
> > in the default configuration) to the system or by executing it manually
> > using JMX.
>
> You can indeed choose to trigger major compactions by e.g. cron jobs.
> But just be aware that if you're operating under conditions where you
> are close to disk space running out, you have other concerns too -
> such as periodic repair operations also needing disk space.
>
> Also; suppose you're overwriting lots of data (or replacing by
> deleting and adding other data). It is not necessarily true that you
> need 4x the space relative to what you otherwise do just because of
> the compaction threshold.
>
> Keep in mind that compactions already need extra space anyway. If
> you're *not* overwriting or adding data, a compaction of a single CF
> is expected to need up to twice the amount of space that it occupies.
> If you're doing more overwrites and deletions though, as you point out
> you will have more "dead" data at any given point in time. But on the
> other hand, the peak disk space usage during compactions is lower. So
> the actual peak disk space usage (which is what matters since you must
> have this much disk space) is actually helped by the
> deletions/overwrites too.
>
> Further, suppose you trigger major compactions more often. That means
> each compaction will have a higher relative spike of disk usage
> because less data has had time to be overwritten or removed.
>
> So in a sense, it's like the disk space demands is being moved between
> the category of "dead data retained for longer than necessary" and
> "peak disk usage during compaction".
>
> Also keep in mind that the *low* peak of disk space usage is not
> subject to any fragmentation concerns. Depending on the size of your
> data compared to e.g. column names, that disk space usage might be
> significantly lower than what you would get with an in-place updating
> database. There are lots of trade-offs :)
>
> You say you have to "wait" for deletions though which sounds like
> you're doing something unusual. Are you doing stuff like deleting lots
> of data in bulk from one CF, only to then write data to *another* CF?
> Such that you're actually having to wait for disk space to be freed to
> make room for data somewhere else?
>
> > In case of a system that deletes data regularly, which needs to serve
> > customers all day and the time it takes should be in ms, this is a
> problem.
>
&g

Re: Reclaim deleted rows space

2011-01-04 Thread shimi

This is what I thought. I was wishing there might be another way to reclaim
the space.
The problem is that the more data you have the more time it will take to
Cassandra to response.
Reclaim space of deleted rows in the biggest SSTable requires Major
compaction. This compaction can be triggered by adding x2 data (or x4 data
in the default configuration) to the system or by executing it manually
using JMX.
In case of a system that deletes data regularly, which needs to serve
customers all day and the time it takes should be in ms, this is a problem.

It appears to me that in order to use Cassandra you must have a process that
will trigger major compaction on the nodes once in X amount of time.
One case where you would do that is when you don't (or hardly) delete data.
Another one is when your upper limit of time it should take to response is
very high so major compaction will not hurt you.

It might be that the only way to solve this problem is by having at least
two copies of each row in each data center and use a dynamic snitch.

Shimi

On Mon, Jan 3, 2011 at 7:55 PM, Peter Schuller
wrote:

> > Major compaction does it, but only if GCGraceSeconds has elapsed. See:
> >
> >
> http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
>
> But to be clear, under the assumption that your data is a lot smaller
> than the tombstones, a major compaction will definitely reclaim space
> even if GCGraceSeconds has not elapsed. So actually my original
> response is a bit misleading.
>
> --
> / Peter Schuller
>

Re: Bootstrapping taking long

2011-01-04 Thread shimi

In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi


On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory  wrote:

> I asked the same question on the IRC but no luck there, everyone's asleep
> ;)...
>
> Using 0.6.6 I'm adding a new node to the cluster.
> It starts out fine but then gets stuck on the bootstrapping state for too
> long. More than an hour and still counting.
>
> $ bin/nodetool -p 9004 -h localhost streams
>> Mode: Bootstrapping
>> Not sending any streams.
>> Not receiving any streams.
>
>
> It seemed to have streamed data from other nodes and indeed the load is
> non-zero but I'm not clear what's keeping it right now from finishing.
>
>> $ bin/nodetool -p 9004 -h localhost info
>> 51042355038140769519506191114765231716
>> Load : 22.49 GB
>> Generation No: 1294133781
>> Uptime (seconds) : 1795
>> Heap Memory (MB) : 315.31 / 6117.00
>
>
> nodetool ring does not list this new node in the ring, although nodetool
> can happily talk to the new node, it's just not listing itself as a member
> of the ring. This is expected when the node is still bootstrapping, so the
> question is still how long might the bootstrap take and whether is it stuck.
>
> The data ins't huge so I find it hard to believe that streaming or anti
> compaction are the bottlenecks. I have ~20G on each node and the new node
> already has just about that so it seems that all data had already been
> streamed to it successfully, or at least most of the data... So what is it
> waiting for now? (same question, rephrased... ;)
>
> I tried:
> 1. Restarting the new node. No good. All logs seem normal but at the end
> the node is still in bootstrap mode.
> 2. As someone suggested I increased the rpc timeout from 10k to 30k
> (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
> new node. Should I have done that on all (old) nodes as well? Or maybe only
> on the ones that were supposed to stream data to that node.
> 3. Logging level at DEBUG now but nothing interesting going on except
> for occasional messages such as [1] or [2]
>
> So the question is: what's keeping the new node from finishing the
> bootstrap and how can I check its status?
> Thanks
>
> [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36)
> Disseminating load info ...
> [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
> StorageService.java (line 1189) computing ranges for
> 28356863910078205288614550619314017621,
> 56713727820156410577229101238628035242,
>  85070591730234615865843651857942052863,
> 113427455640312821154458202477256070484,
> 141784319550391026443072753096570088105,
> 170141183460469231731687303715884105727
>
> --
> /Ran
>
>

Reclaim deleted rows space

2011-01-02 Thread shimi

Lets assume I have:
* single 100GB SSTable file
* min compaction threshold is set to 2

If I delete rows which are located in this file. Is the only way to "clean"
the deleted rows is by inserting another 100GB of data or by triggering a
painful major compaction?

Shimi

disks and data files

2010-12-13 Thread shimi

I am reading the kafka design documentation (
http://sna-projects.com/kafka/design.php) and I came across this (under
constant time suffices) :

Intuitively a persistent queue could be built on simple reads and appends to
files as is commonly the case with logging solutions. Though this structure
would not support the rich semantics of a BTree implementation, but it has
the advantage that all operations are O(1) and reads do not block writes or
each other. This has obvious performance advantages since the performance is
completely decoupled from the data size--one server can now take full
advantage of a number of cheap, low-rotational speed 1+TB SATA drives.
Though they have poor seek performance, these drives often have comparable
performance for large reads and writes at 1/3 the price and 3x the capacity.

It is right to say that Cassandra takes advantage of this? the commit log
write is using append and sstables are only read after they were written.

Shimi

Re: unable to start cassandra-0.7r2

2010-12-13 Thread shimi

I have seen this error in 0.6.x when I was missing the cash directory
configuration.
Maybe you are missing something in your configuration.

Shimi

On Mon, Dec 13, 2010 at 12:45 PM, aaron morton wrote:

> I've seen that before when cassandra.yaml file cannot be found or is
> corrupted. It may be that eclipse is not starting cassandra with the current
> working directory set as you think it is. Sorry, cannot help much with
> eclipse.
>
> There are a couple of places where that message can be logged. One is from
> the AbstractCassandraDaemon and the other is from the DatabaseDescriptor.
> Where is your's coming from?
>
> Aaron
>
> On 13 Dec 2010, at 19:22, Liangzhao Zeng wrote:
>
> Bad configuration; unable to start server. Any idea
>
>
>

Re: iterate over all the rows with RP

2010-12-12 Thread shimi

So if I will use a different connection (thrift via Hector), will I get the
same results? It's make sense when you use OPP and I assume it is the same
with RP. I just wanted to make sure this is the case and there is no state
which is kept.

Shimi

On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller  wrote:

> > Is the same connection is required when iterating over all the rows with
> > Random Paritioner or is it possible to use a different connection for
> each
> > iteration?
>
> In general, the choice of RPC connection (I assume you mean the
> underlying thrift connection) does not affect the semantics of the RPC
> calls.
>
> --
> / Peter Schuller
>

iterate over all the rows with RP

2010-12-12 Thread shimi

Is the same connection is required when iterating over all the rows with
Random Paritioner or is it possible to use a different connection for each
iteration?

Shimi

Re: FatClient Gossip error and some other problems

2010-09-20 Thread shimi

I was patient (although it is hard when you have millions of requests which
are not served in time). I was waiting for a long time. There was nothing in
the Logs and in JMX.

Shimi

On Mon, Sep 20, 2010 at 6:12 PM, Gary Dusbabek  wrote:

> On Mon, Sep 20, 2010 at 09:51, shimi  wrote:
> > I have a cluster with 6 nodes on 2 datacenters (3 on each datacenter).
> > I replaced all of the servers in the cluster (0.6.4) with new ones
> (0.6.5).
> > My old cluster was unbalanced since I was using Random Partitioner and I
> > bootstrapped all the nodes without specifying their tokens.
> >
> > Since I wanted the the cluster to be balanced I first added all the new
> > nodes one after the other (with the right tokens this time) and then I
> run
> > decommission on all the old ones, one after the other.
> > One of the decommissioned nodes began throwing too many open files errors
> > while It was decommissioning taking other nodes with him. After the
> second
> > try I decided to stop it and run removetoken on his token from one of the
> > other nodes. After that everything went well except that in the end one
> of
> > the nodes looked unbalanced.
> >
> > I decided to run repair on the cluster. What I got is totally unbalanced
> > nodes with way to much data then what is suppose to be. each node had
> x2-x4
> > more data.
> > I run cleanup and all of them except the one which was unbalanced to
> begin
> > with got back to the size they were suppose to be.
> > Now whenever I try to run cleanup on this node I get:
> >
> >  INFO [COMPACTION-POOL:1] 2010-09-20 12:04:23,069 CompactionManager.java
> > (line 339) AntiCompacting ...
> >  INFO [GC inspection] 2010-09-20 12:05:37,600 GCInspector.java (line 129)
> GC
> > for ConcurrentMarkSweep: 1525 ms, 13641032 reclaimed leaving 767863520
> used;
> > max is 6552551424
> >  INFO [GC inspection] 2010-09-20 12:05:37,601 GCInspector.java (line 150)
> > Pool NameActive   Pending
> >  INFO [GC inspection] 2010-09-20 12:05:37,605 GCInspector.java (line 156)
> > STREAM-STAGE  0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,605 GCInspector.java (line 156)
> > RESPONSE-STAGE0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,606 GCInspector.java (line 156)
> > ROW-READ-STAGE8   717
> >  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> > LB-OPERATIONS 0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> > MISCELLANEOUS-POOL0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> > GMFD  0 2
> >  INFO [GC inspection] 2010-09-20 12:05:37,608 GCInspector.java (line 156)
> > CONSISTENCY-MANAGER   0 1
> >  INFO [GC inspection] 2010-09-20 12:05:37,608 GCInspector.java (line 156)
> > LB-TARGET 0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,609 GCInspector.java (line 156)
> > ROW-MUTATION-STAGE0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,610 GCInspector.java (line 156)
> > MESSAGE-STREAMING-POOL0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,610 GCInspector.java (line 156)
> > LOAD-BALANCER-STAGE   0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,611 GCInspector.java (line 156)
> > FLUSH-SORTER-POOL 0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,612 GCInspector.java (line 156)
> > MEMTABLE-POST-FLUSHER 0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,612 GCInspector.java (line 156)
> > AE-SERVICE-STAGE  0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,613 GCInspector.java (line 156)
> > FLUSH-WRITER-POOL 0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,613 GCInspector.java (line 156)
> > HINTED-HANDOFF-POOL   0 0
> >  INFO [GC inspection] 2010-09-20 12:05:37,616 GCInspector.java (line 161)
> > CompactionManager   n/a 0
> >  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,402
> > SSTableDeletingReference.java (line 104) Deleted ...
> >  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,727
> > SSTableDeletingReference.java (line 104) Deleted ...
> >  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,730
> > SSTableDeletingReference.java (line 104) Deleted ...
> >  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,735
> > SSTableDel

FatClient Gossip error and some other problems

2010-09-20 Thread shimi

12)
at java.util.TimerThread.run(Timer.java:462)
 INFO [GMFD:1] 2010-09-20 13:56:43,251 Gossiper.java (line 586) Node
/X.X.X.X is now part of the cluster

Does anyone have any idea how can I cleanup the problematic node?
Does anyone have any idea how can I get rid of the Gossip error?

Shimi

Re: Bootstrap question

2010-07-18 Thread shimi

If I have problems with never ending bootstraping I do the following. I try
each one if it doesn't help I try the next. It might not be the right thing
to do but it worked for me.

1. Restart the bootstraping node
2. If I see streaming 0/ I restart the node and all the streaming nodes
3. Restart all the nodes
4. If there is data in the bootstraing node I delete it before I restart.

Good luck
Shimi

On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro <
antho...@alumni.caltech.edu> wrote:

> So still waiting for any sort of answer on this one.  The cluster still
> refuses to do anything when I bring up new nodes.  I shut down all the
> new nodes and am waiting.  I'm guessing that maybe the old nodes have
> some state which needs to get cleared out?  Is there anything I can do
> at this point?  Are there alternate strategies for bootstrapping I can
> try?  (For instance can I just scp all the sstables to all the new
> nodes and do a repair, would that actually work?).
>
> Anyone seen this sort of issue?  All this is with 0.6.3 so I assume
> eventually others will see this issue.
>
> -Anthony
>
> On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote:
> > Okay, so things were pretty messed up.  I shut down all the new nodes,
> > then the old nodes started doing the half the ring is down garbage which
> > pretty much requires a full restart of everything.  So I had to shut
> > everything down, then bring the seed back, then the rest of the nodes,
> > so they finally all agreed on the ring again.
> >
> > Then I started one of the new nodes, and have been watching the logs, so
> > far 2 hours since the "Bootstrapping" message appeared in the new
> > log and nothing has happened.  No anticompaction messages anywhere,
> there's
> > one node compacting, but its on the other end of the ring, so no where
> near
> > that new node.  I'm wondering if it will ever get data at this point.
> >
> > Is there something else I should try?  The only thing I can think of
> > is deleting the system directory on the new node, and restarting, so
> > I'll try that and see if it does anything.
> >
> > -Anthony
> >
> > On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote:
> > > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro
> > >  wrote:
> > > > Is the fact that 2 new nodes are in the range messing it up?
> > >
> > > Probably.
> > >
> > > >  And if so
> > > > how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the
> bringing
> > > > up nodes 2,4, waiting for them to finish, then bringing up 3,5?).
> > >
> > > Yes.
> > >
> > > You might have to restart the old nodes too to clear out the confusion.
> > >
> > > --
> > > Jonathan Ellis
> > > Project Chair, Apache Cassandra
> > > co-founder of Riptano, the source for professional Cassandra support
> > > http://riptano.com
> >
> > --
> > 
> > Anthony Molinaro   
>
> --
> 
> Anthony Molinaro   
>

mmap

2010-07-15 Thread shimi

Can someone please explain the mmap issue.
mmap is default for all storage files for 64bit machines.
according to this case
https://issues.apache.org/jira/browse/CASSANDRA-1214it might not be a
good thing.
Is it right to say that you should use mmap only if your MAX expected data
is smaller then the MIN free RAM that could be in your system?

Shimi

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

2010-07-14 Thread shimi

do you mean that you don't release the connection back to fhe pool?

On 2010 7 14 20:51, "Jorge Barrios"  wrote:

Thomas, I had a similar problem a few weeks back. I changed my code to make
sure that each thread only creates and uses one Hector connection. It seems
that client sockets are not being released properly, but I didn't have the
time to dig into it.

Jorge

On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller 
wrote:
>
> > [snip]
...

get_range_slices return the same rows

2010-07-14 Thread shimi

I wrote a code that iterate on all the rows by using get_range_slices.
for the first call I use KeyRange from "" to "".
for all the others I use from  to "".
I always get the same rows that I got in the previous iteration. I tried
changing the batch size but I still gets the same results.
I tried it both in single node and a cluster.
I use RP with version 0.6.3 and Hector.

Does anyone know how this can be done?

Shimi

46 matches

Mail list logo