date:20140411

Eventual consistency with replication factor 1

2014-04-11 Thread Ravil Bayramgalin

I've got classical eventual consistency symptoms (read after write returns
empty result) but there is a surprising twist. The keyspace has replication
factor 1 (it's used as a cache) so how can I get a stale result?

Cassandra version 1.2.15.

Consistency settings (although I think they should not matter with
one-replica case):
Read — CL.ONE
Write — CL.ALL

If you need any additional info I would be happy to provide!

Blog post with Cassandra upgrade tips

2014-04-11 Thread Paulo Ricardo Motta Gomes

Hey,

Some months ago (last year!!) during our previous major upgrade from 1.1 ->
1.2 I started writing a blog post with some tips for a smooth rolling
upgrade, but for some reason I forgot to finish the post. I found it
recently and decided it to publish anyway, as some of the info may be
helpful for future major upgrades:

http://monkeys.chaordic.com.br/operation/zero-downtime-cassandra-upgrade/

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

I've never noticed that that setting tombstone_threshold has any effect...
at least in 2.0.6.

What gets written to the log?


On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan  wrote:

> I was wondering, to remove the tombstones from Sstables created by LCS,
> why don't we just set the tombstone_threshold table property to a very
> small value (say 0.01)..?
>
> As the doc said (
> www.datastax.com/documentation/cql/3.0/cql/cql_reference/compactSubprop.html)
> this will force compaction on the sstable itself for the purpose of
> cleaning tombstones, no merging with other sstables is done.
>
> In addition this property applies to both compaction strategies :-)
>
> Isn't a little bit lighter than changing strategy and hoping for the best?
>
> Regards
>
> Duy Hai DOAN
>  Le 11 avr. 2014 20:16, "Robert Coli"  a écrit :
>
> On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
>> paulo.mo...@chaordicsystems.com> wrote:
>>
>>> My question is : Is there a way to force tombstones to be clared with
>>> LCS? Does scrub help in any case?
>>>
>>
>> 1) Switch to size tiered compaction, compact, and switch back. Not only
>> "with LCS", but...
>>
>> 2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
>> believe it does throw away tombstones if it is able to, but that is not the
>> purpose of it.
>>
>> =Rob
>>
>>

Re: binary protocol server side sockets

2014-04-11 Thread Eric Plowe

The situation I am seeing is this:

To access my companies development environment I need to VPN.

I do some development on the application, and for some reason my VPN drops,
but I had established connections to my development cassandra server.

When I reconnect and check netstat I see the connections I had established
previously still there, and they never go away. I have had connections that
are held open from almost 7 days ago.

I ran 'netstat -tulpn' per the request of Nate McCall, and the receive and
send queues are 0.

I just did a test where I changed the code of my application to use thrift
(using the FluentCassandra driver). Start the application, kill my vpn
connection, reconnect. When I check the cassandra server, I still see the
thrift (9160) connection established, but it is eventually removed because
of keep alive.

If I change rpc_keepalive to false in cassandra.yaml and restart cassandra
then run the same test I outlined above using thrift the connection will
stay, like the native transport connections, until cassandra, or the box,
is restarted.

It seems the lack of keep alive support for native transport is the culprit.

Regards,

Eric Plowe


On Fri, Apr 11, 2014 at 1:12 PM, Nate McCall  wrote:

> Out of curiosity, any folks seeing backups in the send or receive queues
> via netstat while this is happening? (netstat -tulpn for example)
>
> I feel like I had this happen once and it ended up being a sysconfig
> tuning issue (net.core.* and net.ipv4.* stuff specifically).
>
> Can't seem to find anything in my notes though, unfortunately.
>
>
> On Fri, Apr 11, 2014 at 10:16 AM, Phil Luckhurst <
> phil.luckhu...@powerassure.com> wrote:
>
>> We have considered this but wondered how well it would work as the
>> Cassandra
>> Java Driver opens multiple connections internally to each Cassandra node.
>> I
>> suppose it depends how those connections are used internally, if it's
>> round
>> robin then it should work. Perhaps we just need to to try it.
>>
>> --
>> Thanks
>> Phil
>>
>>
>> Chris Lohfink wrote
>> > TCP keep alives (by the setTimeout) are notoriously useless...  The
>> > default
>> > 2 hours is generally far longer then any timeout in NAT translation
>> tables
>> > (generally ~5 min) and even if you decrease the keep alive to a sane
>> value
>> > a log of networks actually throw away TCP keep alive packets.  You see
>> > that
>> > a lot more in cell networks though.  Its almost always a good idea to
>> have
>> > a software keep alive although it seems to be not implemented in this
>> > protocol.  You can make a super simple CF with 1 value and query it
>> every
>> > minute a connection is idle or something.  i.e. "select * from DummyCF
>> > where id = 1"
>> >
>> > --
>> > *Chris Lohfink*
>> > Engineer
>> > 415.663.6738  |  Skype: clohfink.blackbirdit
>> > *Blackbird **[image: favicon]*
>> >
>> > 775.345.3485  |  www.blackbirdIT.com 
>> ;
>> >
>> > *"Formerly PalominoDB/DriveDev"*
>> >
>> >
>> > On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst <
>>
>> > phil.luckhurst@
>>
>> >> wrote:
>> >
>> >> We are also seeing this in our development environment. We have a 3
>> node
>> >> Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting
>> from a
>> >> Tomcat based application running on Windows using the 2.0.0 Cassandra
>> >> Java
>> >> Driver. We have setKeepAlive(true) when building the cluster in the
>> >> application and this does keep one connection open on the client side
>> to
>> >> each of the 3 Cassandra nodes, but we still see the build up of 'old'
>> >> ESTABLISHED connections on each of the Cassandra servers.
>> >>
>> >> We are also getting that same "Unexpected exception during request"
>> >> exception appearing in the logs
>> >>
>> >> ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
>> >> ErrorMessage.java (line 222) Unexpected exception during request
>> >> java.io.IOException: Connection reset by peer
>> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> >> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>> >> at sun.nio.ch.IOUtil.read(Unknown Source)
>> >> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>> >> at
>> >> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>> >> at
>> >>
>> >>
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>> >> at
>> >>
>> >>
>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>> >> at
>> >>
>> >>
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>> >> at
>> >> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> >> Source)
>> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> >> S

Re: clearing tombstones?

2014-04-11 Thread DuyHai Doan

I was wondering, to remove the tombstones from Sstables created by LCS, why
don't we just set the tombstone_threshold table property to a very small
value (say 0.01)..?

As the doc said (
www.datastax.com/documentation/cql/3.0/cql/cql_reference/compactSubprop.html)
this will force compaction on the sstable itself for the purpose of
cleaning tombstones, no merging with other sstables is done.

In addition this property applies to both compaction strategies :-)

Isn't a little bit lighter than changing strategy and hoping for the best?

Regards

Duy Hai DOAN
 Le 11 avr. 2014 20:16, "Robert Coli"  a écrit :

> On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
> paulo.mo...@chaordicsystems.com> wrote:
>
>> My question is : Is there a way to force tombstones to be clared with
>> LCS? Does scrub help in any case?
>>
>
> 1) Switch to size tiered compaction, compact, and switch back. Not only
> "with LCS", but...
>
> 2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
> believe it does throw away tombstones if it is able to, but that is not the
> purpose of it.
>
> =Rob
>
>

Re: Point in Time Recovery

2014-04-11 Thread Robert Coli

On Fri, Apr 11, 2014 at 1:21 AM, Dennis Schwan wrote:

>  The archived commitlogs are copied to the restore directory and
> afterwards cassandra is replaying those commitlogs but still we only see
> the data from the snapshot, not the commitlogs.
>

If you turn up debug log4j settings, you should be able to see whether the
replay is correctly applying mutations to memtables.

Do you see a flush of memtables to sstables at the end of commitlog replay?
If not, memtables are not being created by commitlog replay.

=Rob

Re: List and Cancel running queries

2014-04-11 Thread Robert Coli

On Fri, Apr 11, 2014 at 1:18 AM, Richard Jennings  wrote:

> Is it possible to list all running queries on a Cassandra cluster ?
>

No, but you can get a count of them on a per node basis :

https://issues.apache.org/jira/browse/CASSANDRA-5084

=Rob

Re: Multiget performance

2014-04-11 Thread Allan C

For sanity, I ran the same python script with the same row ids again today and 
it was 10x faster. Must be something going wrong intermittently in my cluster. 

-Allan

On April 11, 2014 at 11:02:11 AM, Allan C (alla...@gmail.com) wrote:

 It’s a fairly standard relational-like CF. Description is the only field 
that’s potentially big (can be up to 1k).

CREATE COLUMN FAMILY 'Event' WITH
  key_validation_class = 'UTF8Type' AND
  comparator = 'UTF8Type' AND
  default_validation_class = 'UTF8Type' AND
  bloom_filter_fp_chance = 0.1 AND
  compaction_strategy = 'LeveledCompactionStrategy' AND
  compaction_strategy_options = {sstable_size_in_mb:160} AND
  compression_options = 
{sstable_compression:SnappyCompressor,chunk_length_kb:64} AND
--  key_alias = 'eventId' AND
  column_metadata = [
      {column_name: 'createdAt', validation_class: 'DateType'},
      {column_name: 'creatorId', validation_class: 'UTF8Type'},
      {column_name: 'creatorName', validation_class: 'UTF8Type'},
      {column_name: 'description', validation_class: 'UTF8Type'},
      {column_name: 'privacy', validation_class: 'UTF8Type'},
      {column_name: 'location', validation_class: 'UTF8Type'},
      {column_name: 'locationId', validation_class: 'UTF8Type'},
      {column_name: 'endTime', validation_class: 'DateType'},
      {column_name: 'name', validation_class: 'UTF8Type'},
      {column_name: 'picture', validation_class: 'UTF8Type'},
      {column_name: 'startTime', validation_class: 'DateType'},
      {column_name: 'updatedAt', validation_class: 'DateType'},

      {column_name: 'lat', validation_class: 'UTF8Type'},
      {column_name: 'lng', validation_class: 'UTF8Type'},
      {column_name: 'street', validation_class: 'UTF8Type'},
      {column_name: 'city', validation_class: 'UTF8Type'},
      {column_name: 'state', validation_class: 'UTF8Type'},
      {column_name: 'zip', validation_class: 'UTF8Type'},
      {column_name: 'country', validation_class: 'UTF8Type'},

      {column_name: '~lastSync', validation_class: 'DateType'},
      {column_name: '~nextSync', validation_class: 'DateType'},

      {column_name: '~syncBlock', validation_class: 'IntegerType'},

      {column_name: 'noCount', validation_class: 'IntegerType'},
      {column_name: 'invitedCount', validation_class: 'IntegerType'},
      {column_name: 'maybeCount', validation_class: 'IntegerType'},
      {column_name: 'yesCount', validation_class: 'IntegerType'},

      {column_name: '~version', validation_class: 'IntegerType'}
];


-Allan

On April 10, 2014 at 4:49:34 PM, Tyler Hobbs (ty...@datastax.com) wrote:


On Thu, Apr 10, 2014 at 6:26 PM, Allan C  wrote:

Looks like the amount of data returned has a big effect. When I only return one 
column, python reports only 20ms compared to 150ms when returning the whole 
row. Rows are each less than 1k in size, but there must be client overhead.

That's a surprising amount of overhead in pycassa.  What's your schema like for 
this CF?


--
Tyler Hobbs
DataStax

Re: clearing tombstones?

2014-04-11 Thread Robert Coli

On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com> wrote:

> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case?
>

1) Switch to size tiered compaction, compact, and switch back. Not only
"with LCS", but...

2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
believe it does throw away tombstones if it is able to, but that is not the
purpose of it.

=Rob

Re: clearing tombstones?

2014-04-11 Thread Robert Coli

(probably should have read downthread before writing my reply.. briefly, +1
most of the thread's commentary regarding major compaction, but don't
listen to the FUD about major compaction, unless you have a really large
amount of data you'll probably be fine..)

On Fri, Apr 11, 2014 at 7:05 AM, William Oberman
wrote:

> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
> repair, or time (as in just wait)?
>

The only operation guaranteed to collect 100% of tombstones is major
compaction. gc_grace_seconds duration is also involved, so be sure to
understand its value.

> I had a CF that was more or less storing session information.  After some
> time, we decided that one piece of this information was pointless to track
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
> a row).   I wrote a process to remove all of those columns (which again in
> a vast majority of cases had the effect of removing the whole row).
>

https://issues.apache.org/jira/browse/CASSANDRA-1581

Describes a tool which "filtered" sstables to remove rows. In a future case
like this one, you might want to consider this approach.

> It wasn't 100% clear to me what to poke to cause compactions to clear the
> tombstones.
>

In order to delete a tombstone, all fragments of the row must be in a
sstable involved in the current compaction.

Some discussion here : https://issues.apache.org/jira/browse/CASSANDRA-1074

>  First I tried nodetool cleanup on a candidate node.  But, afterwards the
> disk usage was the same.
>

Cleanup writes out sstables 1:1, removing data which no belongs to a range
if the node cleaning up no longer owns that range. It is meant for use when
ranges are split, in order to "clean up" the data from the range being
given up.

>  Then I tried nodetool repair on that same node.  But again, disk usage is
> still the same.  The CF has no snapshots.
>

Repair is unrelated to the purging of tombstones.

> So, am I misunderstanding something?  Is there another operation to try?
>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
> I have to run one or the other over all nodes to clear tombstones?
>

If you are using size tiered compaction, run a major compaction. ("nodetool
compact"). If you aren't, I believe that there is nothing you can do.

=Rob

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

At the cost of really quite a lot of compaction, you can temporarily switch
to SizeTiered, and when that is completely done (check each node), switch
back to Leveled.

it's like doing the laundry twice :)

I've done this on CFs that were about 5GB but I don't see why it wouldn't
work on larger ones.

ml

On Fri, Apr 11, 2014 at 1:33 PM, Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com> wrote:

> This thread is really informative, thanks for the good feedback.
>
> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case? Or the only solution would be to create a new
> CF and migrate all the data if you intend to do a large CF cleanup?
>
> Cheers,
>
>
> On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy wrote:
>
>> Thats great Will, if you could update the thread with the actions you
>> decide to take and the results that would be great.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> I've learned a *lot* from this thread.  My thanks to all of the
>>> contributors!
>>>
>>> Paulo: Good luck with LCS.  I wish I could help there, but all of my
>>> CF's are SizeTiered (mostly as I'm on the same schema/same settings since
>>> 0.7...)
>>>
>>> will
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:
>>>

 Levelled Compaction is a wholly different beast when it comes to
 tombstones.

 The tombstones are inserted, like any other write really, at the lower
 levels in the leveldb hierarchy.

 They are only removed after they have had the chance to "naturally"
 migrate upwards in the leveldb hierarchy to the highest level in your data
 store.  How long that takes depends on:
  1. The amount of data in your store and the number of levels your LCS
 strategy has
 2. The amount of new writes entering the bottom funnel of your leveldb,
 forcing upwards compaction and combining

 To give you an idea, I had a similar scenario and ran a (slow,
 throttled) delete job on my cluster around December-January.  Here's a
 graph of the disk space usage on one node.  Notice the still-diclining
 usage long after the cleanup job has finished (sometime in January).  I
 tend to think of tombstones in LCS as little bombs that get to explode much
 later in time:

 http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg

 On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
 paulo.mo...@chaordicsystems.com> wrote:

 I have a similar problem here, I deleted about 30% of a very large CF
 using LCS (about 80GB per node), but still my data hasn't shrinked, even if
 I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
 scrub forces a minor compaction?

 Cheers,

 Paulo

 On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
> ober...@civicscience.com> wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I
>> could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to
>> impatience, I don't care *that* much as I could re-run my clean up tool
>> against the now much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a com

Re: SSTable files question

2014-04-11 Thread Robert Coli

On Fri, Apr 11, 2014 at 10:44 AM, Yulian Oifa  wrote:

> Currently all clients are down therefore no new data is wrotten
>

Hinted handoff delivery.

=Rob

Re: Multiget performance

2014-04-11 Thread Allan C

 It’s a fairly standard relational-like CF. Description is the only field 
that’s potentially big (can be up to 1k).

CREATE COLUMN FAMILY 'Event' WITH
  key_validation_class = 'UTF8Type' AND
  comparator = 'UTF8Type' AND
  default_validation_class = 'UTF8Type' AND
  bloom_filter_fp_chance = 0.1 AND
  compaction_strategy = 'LeveledCompactionStrategy' AND
  compaction_strategy_options = {sstable_size_in_mb:160} AND
  compression_options = 
{sstable_compression:SnappyCompressor,chunk_length_kb:64} AND
--  key_alias = 'eventId' AND
  column_metadata = [
      {column_name: 'createdAt', validation_class: 'DateType'},
      {column_name: 'creatorId', validation_class: 'UTF8Type'},
      {column_name: 'creatorName', validation_class: 'UTF8Type'},
      {column_name: 'description', validation_class: 'UTF8Type'},
      {column_name: 'privacy', validation_class: 'UTF8Type'},
      {column_name: 'location', validation_class: 'UTF8Type'},
      {column_name: 'locationId', validation_class: 'UTF8Type'},
      {column_name: 'endTime', validation_class: 'DateType'},
      {column_name: 'name', validation_class: 'UTF8Type'},
      {column_name: 'picture', validation_class: 'UTF8Type'},
      {column_name: 'startTime', validation_class: 'DateType'},
      {column_name: 'updatedAt', validation_class: 'DateType'},

      {column_name: 'lat', validation_class: 'UTF8Type'},
      {column_name: 'lng', validation_class: 'UTF8Type'},
      {column_name: 'street', validation_class: 'UTF8Type'},
      {column_name: 'city', validation_class: 'UTF8Type'},
      {column_name: 'state', validation_class: 'UTF8Type'},
      {column_name: 'zip', validation_class: 'UTF8Type'},
      {column_name: 'country', validation_class: 'UTF8Type'},

      {column_name: '~lastSync', validation_class: 'DateType'},
      {column_name: '~nextSync', validation_class: 'DateType'},

      {column_name: '~syncBlock', validation_class: 'IntegerType'},

      {column_name: 'noCount', validation_class: 'IntegerType'},
      {column_name: 'invitedCount', validation_class: 'IntegerType'},
      {column_name: 'maybeCount', validation_class: 'IntegerType'},
      {column_name: 'yesCount', validation_class: 'IntegerType'},

      {column_name: '~version', validation_class: 'IntegerType'}
];


-Allan

On April 10, 2014 at 4:49:34 PM, Tyler Hobbs (ty...@datastax.com) wrote:


On Thu, Apr 10, 2014 at 6:26 PM, Allan C  wrote:

Looks like the amount of data returned has a big effect. When I only return one 
column, python reports only 20ms compared to 150ms when returning the whole 
row. Rows are each less than 1k in size, but there must be client overhead.

That's a surprising amount of overhead in pycassa.  What's your schema like for 
this CF?


--
Tyler Hobbs
DataStax

SSTable files question

2014-04-11 Thread Yulian Oifa

Hello to all
I have runed today nodetool compact no specific node.
It created single file ( g-1155) on 18:08
Currently all clients are down therefore no new data is wrotten
However in time of running compact on on other nodes i found that new
SSTables appeared
on this node :

-rw-r--r-- 1 root root 4590359957 Apr 11 18:08 globalIndexes-g-1155-Data.db
-rw-r--r-- 1 root root 2416533871 Apr 11 19:57 globalIndexes-g-1241-Data.db
-rw-r--r-- 1 root root  812435119 Apr 11 20:13 globalIndexes-g-1282-Data.db
-rw-r--r-- 1 root root  809054655 Apr 11 20:27 globalIndexes-g-1303-Data.db
-rw-r--r-- 1 root root  767685693 Apr 11 20:00 globalIndexes-g-1261-Data.db
-rw-r--r-- 1 root root  203513615 Apr 11 20:32 globalIndexes-g-1313-Data.db
-rw-r--r-- 1 root root  202942155 Apr 11 20:35 globalIndexes-g-1318-Data.db
-rw-r--r-- 1 root root  202656433 Apr 11 20:29 globalIndexes-g-1308-Data.db
-rw-r--r-- 1 root root   51223791 Apr 11 20:39 globalIndexes-g-1323-Data.db
-rw-r--r-- 1 root root   50890483 Apr 11 20:37 globalIndexes-g-1320-Data.db
-rw-r--r-- 1 root root   50366855 Apr 11 20:36 globalIndexes-g-1319-Data.db
-rw-r--r-- 1 root root   50366685 Apr 11 20:38 globalIndexes-g-1322-Data.db
-rw-r--r-- 1 root root   50271529 Apr 11 20:39 globalIndexes-g-1324-Data.db

Where does those files comes from?
Also it takes a plenty of space for some reason.

Thanks and best regards
Yulian Oifa

Re: clearing tombstones?

2014-04-11 Thread Paulo Ricardo Motta Gomes

This thread is really informative, thanks for the good feedback.

My question is : Is there a way to force tombstones to be clared with LCS?
Does scrub help in any case? Or the only solution would be to create a new
CF and migrate all the data if you intend to do a large CF cleanup?

Cheers,


On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy  wrote:

> Thats great Will, if you could update the thread with the actions you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman  > wrote:
>
>> I've learned a *lot* from this thread.  My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
>> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>>
>> will
>>
>>
>>
>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:
>>
>>>
>>> Levelled Compaction is a wholly different beast when it comes to
>>> tombstones.
>>>
>>> The tombstones are inserted, like any other write really, at the lower
>>> levels in the leveldb hierarchy.
>>>
>>> They are only removed after they have had the chance to "naturally"
>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>> store.  How long that takes depends on:
>>>  1. The amount of data in your store and the number of levels your LCS
>>> strategy has
>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>> forcing upwards compaction and combining
>>>
>>> To give you an idea, I had a similar scenario and ran a (slow,
>>> throttled) delete job on my cluster around December-January.  Here's a
>>> graph of the disk space usage on one node.  Notice the still-diclining
>>> usage long after the cleanup job has finished (sometime in January).  I
>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>> later in time:
>>>
>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>
>>>
>>>
>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>> paulo.mo...@chaordicsystems.com> wrote:
>>>
>>> I have a similar problem here, I deleted about 30% of a very large CF
>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>> scrub forces a minor compaction?
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:
>>>
 Yes, running nodetool compact (major compaction) creates one large
 SSTable. This will mess up the heuristics of the SizeTiered strategy (is
 this the compaction strategy you are using?) leading to multiple 'small'
 SSTables alongside the single large SSTable, which results in increased
 read latency. You will incur the operational overhead of having to manage
 compactions if you wish to compact these smaller SSTables. For all these
 reasons it is generally advised to stay away from running compactions
 manually.

 Assuming that this is a production environment and you want to keep
 everything running as smoothly as possible I would reduce the gc_grace on
 the CF, allow automatic minor compactions to kick in and then increase the
 gc_grace once again after the tombstones have been removed.


 On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
 ober...@civicscience.com> wrote:

> So, if I was impatient and just "wanted to make this happen now", I
> could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience,
> I don't care *that* much as I could re-run my clean up tool against the 
> now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't
> ever run nodetool compact", but I can't remember why.  Is there any bad
> long term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?
>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
> etc...  The only flaw I can think of is it will take forever until the
> SSTable minor compactions build up enough to consider including the big
> SSTable in a compaction, making it likely I'll have to self manage
> compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy 
> wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day 
>> in

Re: binary protocol server side sockets

2014-04-11 Thread Nate McCall

Out of curiosity, any folks seeing backups in the send or receive queues
via netstat while this is happening? (netstat -tulpn for example)

I feel like I had this happen once and it ended up being a sysconfig tuning
issue (net.core.* and net.ipv4.* stuff specifically).

Can't seem to find anything in my notes though, unfortunately.


On Fri, Apr 11, 2014 at 10:16 AM, Phil Luckhurst <
phil.luckhu...@powerassure.com> wrote:

> We have considered this but wondered how well it would work as the
> Cassandra
> Java Driver opens multiple connections internally to each Cassandra node. I
> suppose it depends how those connections are used internally, if it's round
> robin then it should work. Perhaps we just need to to try it.
>
> --
> Thanks
> Phil
>
>
> Chris Lohfink wrote
> > TCP keep alives (by the setTimeout) are notoriously useless...  The
> > default
> > 2 hours is generally far longer then any timeout in NAT translation
> tables
> > (generally ~5 min) and even if you decrease the keep alive to a sane
> value
> > a log of networks actually throw away TCP keep alive packets.  You see
> > that
> > a lot more in cell networks though.  Its almost always a good idea to
> have
> > a software keep alive although it seems to be not implemented in this
> > protocol.  You can make a super simple CF with 1 value and query it every
> > minute a connection is idle or something.  i.e. "select * from DummyCF
> > where id = 1"
> >
> > --
> > *Chris Lohfink*
> > Engineer
> > 415.663.6738  |  Skype: clohfink.blackbirdit
> > *Blackbird **[image: favicon]*
> >
> > 775.345.3485  |  www.blackbirdIT.com ;
> >
> > *"Formerly PalominoDB/DriveDev"*
> >
> >
> > On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst <
>
> > phil.luckhurst@
>
> >> wrote:
> >
> >> We are also seeing this in our development environment. We have a 3 node
> >> Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from
> a
> >> Tomcat based application running on Windows using the 2.0.0 Cassandra
> >> Java
> >> Driver. We have setKeepAlive(true) when building the cluster in the
> >> application and this does keep one connection open on the client side to
> >> each of the 3 Cassandra nodes, but we still see the build up of 'old'
> >> ESTABLISHED connections on each of the Cassandra servers.
> >>
> >> We are also getting that same "Unexpected exception during request"
> >> exception appearing in the logs
> >>
> >> ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
> >> ErrorMessage.java (line 222) Unexpected exception during request
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
> >> at sun.nio.ch.IOUtil.read(Unknown Source)
> >> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
> >> at
> >> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
> >> at
> >>
> >>
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
> >> at
> >>
> >>
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> >> at
> >>
> >>
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
> >> at
> >> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> >> Source)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> >> Source)
> >> at java.lang.Thread.run(Unknown Source)
> >>
> >> Initially we thought this was down to a firewall that is between our
> >> development machines and the Cassandra nodes but that has now been
> >> configured not to 'kill' any connections on port 9042. We also have the
> >> Windows firewall on the client side turned off.
> >>
> >> We still think this is down to our environment as the same application
> >> running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
> >> doing this but up to now we can't track down the cause.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
> >> Sent from the
>
> > cassandra-user@.apache
>
> >  mailing list archive at
> >> Nabble.com.
> >>
> >
> >
> >
> > --
> > *Chris Lohfink*
> > Engineer
> > 415.663.6738  |  Skype: clohfink.blackbirdit
> >
> > *Blackbird **[image: favicon]*
> >
> > 775.345.3485  |  www.blackbirdIT.com ;
> >
> > *"Formerly PalominoDB/DriveDev"*
> >
> >
> > image001.png (5K)
> > <
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/attachment/7593947/0/image001.png>
> ;
>
>
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.

Re: clearing tombstones?

2014-04-11 Thread Mark Reddy

Thats great Will, if you could update the thread with the actions you
decide to take and the results that would be great.


Mark


On Fri, Apr 11, 2014 at 5:53 PM, William Oberman
wrote:

> I've learned a *lot* from this thread.  My thanks to all of the
> contributors!
>
> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>
> will
>
>
>
> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:
>
>>
>> Levelled Compaction is a wholly different beast when it comes to
>> tombstones.
>>
>> The tombstones are inserted, like any other write really, at the lower
>> levels in the leveldb hierarchy.
>>
>> They are only removed after they have had the chance to "naturally"
>> migrate upwards in the leveldb hierarchy to the highest level in your data
>> store.  How long that takes depends on:
>>  1. The amount of data in your store and the number of levels your LCS
>> strategy has
>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>> forcing upwards compaction and combining
>>
>> To give you an idea, I had a similar scenario and ran a (slow, throttled)
>> delete job on my cluster around December-January.  Here's a graph of the
>> disk space usage on one node.  Notice the still-diclining usage long after
>> the cleanup job has finished (sometime in January).  I tend to think of
>> tombstones in LCS as little bombs that get to explode much later in time:
>>
>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>
>>
>>
>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>> paulo.mo...@chaordicsystems.com> wrote:
>>
>> I have a similar problem here, I deleted about 30% of a very large CF
>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>> scrub forces a minor compaction?
>>
>> Cheers,
>>
>> Paulo
>>
>>
>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:
>>
>>> Yes, running nodetool compact (major compaction) creates one large
>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>> this the compaction strategy you are using?) leading to multiple 'small'
>>> SSTables alongside the single large SSTable, which results in increased
>>> read latency. You will incur the operational overhead of having to manage
>>> compactions if you wish to compact these smaller SSTables. For all these
>>> reasons it is generally advised to stay away from running compactions
>>> manually.
>>>
>>> Assuming that this is a production environment and you want to keep
>>> everything running as smoothly as possible I would reduce the gc_grace on
>>> the CF, allow automatic minor compactions to kick in and then increase the
>>> gc_grace once again after the tombstones have been removed.
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>> ober...@civicscience.com> wrote:
>>>
 So, if I was impatient and just "wanted to make this happen now", I
 could:

 1.) Change GCGraceSeconds of the CF to 0
 2.) run nodetool compact (*)
 3.) Change GCGraceSeconds of the CF back to 10 days

 Since I have ~900M tombstones, even if I miss a few due to impatience,
 I don't care *that* much as I could re-run my clean up tool against the now
 much smaller CF.

 (*) A long long time ago I seem to recall reading advice about "don't
 ever run nodetool compact", but I can't remember why.  Is there any bad
 long term consequence?  Short term there are several:
 -a heavy operation
 -temporary 2x disk space
 -one big SSTable afterwards
 But moving forward, everything is ok right?
  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
 etc...  The only flaw I can think of is it will take forever until the
 SSTable minor compactions build up enough to consider including the big
 SSTable in a compaction, making it likely I'll have to self manage
 compactions.



 On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day 
> interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
> ober...@civicscience.com> wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>> tbarbu...@gmail.com> wrote:
>>
>>> compaction sh

Re: clearing tombstones?

2014-04-11 Thread William Oberman

I've learned a *lot* from this thread.  My thanks to all of the
contributors!

Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)

will


On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:

>
> Levelled Compaction is a wholly different beast when it comes to
> tombstones.
>
> The tombstones are inserted, like any other write really, at the lower
> levels in the leveldb hierarchy.
>
> They are only removed after they have had the chance to "naturally"
> migrate upwards in the leveldb hierarchy to the highest level in your data
> store.  How long that takes depends on:
>  1. The amount of data in your store and the number of levels your LCS
> strategy has
> 2. The amount of new writes entering the bottom funnel of your leveldb,
> forcing upwards compaction and combining
>
> To give you an idea, I had a similar scenario and ran a (slow, throttled)
> delete job on my cluster around December-January.  Here's a graph of the
> disk space usage on one node.  Notice the still-diclining usage long after
> the cleanup job has finished (sometime in January).  I tend to think of
> tombstones in LCS as little bombs that get to explode much later in time:
>
> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>
>
>
> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
> paulo.mo...@chaordicsystems.com> wrote:
>
> I have a similar problem here, I deleted about 30% of a very large CF
> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
> scrub forces a minor compaction?
>
> Cheers,
>
> Paulo
>
>
> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:
>
>> Yes, running nodetool compact (major compaction) creates one large
>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>> this the compaction strategy you are using?) leading to multiple 'small'
>> SSTables alongside the single large SSTable, which results in increased
>> read latency. You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace on
>> the CF, allow automatic minor compactions to kick in and then increase the
>> gc_grace once again after the tombstones have been removed.
>>
>>
>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> So, if I was impatient and just "wanted to make this happen now", I
>>> could:
>>>
>>> 1.) Change GCGraceSeconds of the CF to 0
>>> 2.) run nodetool compact (*)
>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>
>>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>>> don't care *that* much as I could re-run my clean up tool against the now
>>> much smaller CF.
>>>
>>> (*) A long long time ago I seem to recall reading advice about "don't
>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>> long term consequence?  Short term there are several:
>>> -a heavy operation
>>> -temporary 2x disk space
>>> -one big SSTable afterwards
>>> But moving forward, everything is ok right?
>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>> etc...  The only flaw I can think of is it will take forever until the
>>> SSTable minor compactions build up enough to consider including the big
>>> SSTable in a compaction, making it likely I'll have to self manage
>>> compactions.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>>>
 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
 ober...@civicscience.com> wrote:

> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
> tbarbu...@gmail.com> wrote:
>
>> compaction should take care of it; for me it never worked so I run
>> nodetool compaction on every node; that does it.
>>
>>
>> 2014-04-11 16:05 GMT+02:00 William Oberman 
>> :
>>
>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>> nodetool repair, or time (as in just wait)?
>>

Re: clearing tombstones?

2014-04-11 Thread Mina Naguib


Levelled Compaction is a wholly different beast when it comes to tombstones.

The tombstones are inserted, like any other write really, at the lower levels 
in the leveldb hierarchy.

They are only removed after they have had the chance to "naturally" migrate 
upwards in the leveldb hierarchy to the highest level in your data store.  How 
long that takes depends on:
1. The amount of data in your store and the number of levels your LCS 
strategy has
2. The amount of new writes entering the bottom funnel of your leveldb, 
forcing upwards compaction and combining

To give you an idea, I had a similar scenario and ran a (slow, throttled) 
delete job on my cluster around December-January.  Here's a graph of the disk 
space usage on one node.  Notice the still-diclining usage long after the 
cleanup job has finished (sometime in January).  I tend to think of tombstones 
in LCS as little bombs that get to explode much later in time:

http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg



On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes 
 wrote:

> I have a similar problem here, I deleted about 30% of a very large CF using 
> LCS (about 80GB per node), but still my data hasn't shrinked, even if I used 
> 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool scrub 
> forces a minor compaction?
> 
> Cheers,
> 
> Paulo
> 
> 
> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy  wrote:
> Yes, running nodetool compact (major compaction) creates one large SSTable. 
> This will mess up the heuristics of the SizeTiered strategy (is this the 
> compaction strategy you are using?) leading to multiple 'small' SSTables 
> alongside the single large SSTable, which results in increased read latency. 
> You will incur the operational overhead of having to manage compactions if 
> you wish to compact these smaller SSTables. For all these reasons it is 
> generally advised to stay away from running compactions manually.
> 
> Assuming that this is a production environment and you want to keep 
> everything running as smoothly as possible I would reduce the gc_grace on the 
> CF, allow automatic minor compactions to kick in and then increase the 
> gc_grace once again after the tombstones have been removed.
> 
> 
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman  
> wrote:
> So, if I was impatient and just "wanted to make this happen now", I could:
> 
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
> 
> Since I have ~900M tombstones, even if I miss a few due to impatience, I 
> don't care *that* much as I could re-run my clean up tool against the now 
> much smaller CF.
> 
> (*) A long long time ago I seem to recall reading advice about "don't ever 
> run nodetool compact", but I can't remember why.  Is there any bad long term 
> consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables, 
> minor compactions that merge SSTables, etc...  The only flaw I can think of 
> is it will take forever until the SSTable minor compactions build up enough 
> to consider including the big SSTable in a compaction, making it likely I'll 
> have to self manage compactions.
> 
> 
> 
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy  wrote:
> Correct, a tombstone will only be removed after gc_grace period has elapsed. 
> The default value is set to 10 days which allows a great deal of time for 
> consistency to be achieved prior to deletion. If you are operationally 
> confident that you can achieve consistency via anti-entropy repairs within a 
> shorter period you can always reduce that 10 day interval.
> 
> 
> Mark
> 
> 
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman  
> wrote:
> I'm seeing a lot of articles about a dependency between removing tombstones 
> and GCGraceSeconds, which might be my problem (I just checked, and this CF 
> has GCGraceSeconds of 10 days).
> 
> 
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli  
> wrote:
> compaction should take care of it; for me it never worked so I run nodetool 
> compaction on every node; that does it.
> 
> 
> 2014-04-11 16:05 GMT+02:00 William Oberman :
> 
> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool 
> repair, or time (as in just wait)?
> 
> I had a CF that was more or less storing session information.  After some 
> time, we decided that one piece of this information was pointless to track 
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for a 
> row).   I wrote a process to remove all of those columns (which again in a 
> vast majority of cases had the effect of removing the whole row).
> 
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.  After I 
> did this mass delete, everything was the same size on disk (which I expected, 
> knowing how tombstoning works).

Re: clearing tombstones?

2014-04-11 Thread Mark Reddy

To clarify, you would want to manage compactions only if you were concerned
about read latency. If you update rows, those rows may become spread across
an increasing number of SSTables leading to increased read latency.

Thanks for providing some insight into your use case as it does differ from
the norm. If you would consider 50GB a small CF and your data ingestion
sufficient enough to result in more SSTables of similar size soon, yes you
could run a major compaction will little operational overhead and the
compaction strategies heuristics would level out after some time.


On Fri, Apr 11, 2014 at 4:52 PM, Laing, Michael
wrote:

> I have played with this quite a bit and recommend you set gc_grace_seconds
> to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.
>
> A caveat I have is that we use C* 2.0.6 - but the space we expect to
> recover is in fact recovered.
>
> Actually, since we never delete explicitly (just ttl) we always have
> gc_grace_seconds set to 0.
>
> Another important caveat is to be careful with repair: having set gc to 0
> and compacted on a node, if you then repair it, data may come streaming in
> from the other nodes. We don't run into this, as our gc is always 0, but
> others may be able to comment.
>
> ml
>
>
> On Fri, Apr 11, 2014 at 11:26 AM, William Oberman <
> ober...@civicscience.com> wrote:
>
>> Yes, I'm using SizeTiered.
>>
>> I totally understand the "mess up the heuristics" issue.  But, I don't
>> understand "You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables".  My
>> understanding is the small tables will still compact.  The problem is that
>> until I have 3 other (by default) tables of the same size as the "big
>> table", it won't be compacted.
>>
>> In my case, this might not be terrible though, right?  To get into the
>> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
>> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
>> are cleared, but call it 50GB.  That means I won't compact this 50GB file
>> until I gather another 150GB (50,50,50,50->200).   But, that's not
>> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
>> 450GB until I had another 1.3TB would be rough...
>>
>> I think your advice is great for people looking for "normal" answers in
>> the forum, but I don't think my use case is very normal :-)
>>
>> will
>>
>> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy wrote:
>>
>>> Yes, running nodetool compact (major compaction) creates one large
>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>> this the compaction strategy you are using?) leading to multiple 'small'
>>> SSTables alongside the single large SSTable, which results in increased
>>> read latency. You will incur the operational overhead of having to manage
>>> compactions if you wish to compact these smaller SSTables. For all these
>>> reasons it is generally advised to stay away from running compactions
>>> manually.
>>>
>>> Assuming that this is a production environment and you want to keep
>>> everything running as smoothly as possible I would reduce the gc_grace on
>>> the CF, allow automatic minor compactions to kick in and then increase the
>>> gc_grace once again after the tombstones have been removed.
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>> ober...@civicscience.com> wrote:
>>>
 So, if I was impatient and just "wanted to make this happen now", I
 could:

 1.) Change GCGraceSeconds of the CF to 0
 2.) run nodetool compact (*)
 3.) Change GCGraceSeconds of the CF back to 10 days

 Since I have ~900M tombstones, even if I miss a few due to impatience,
 I don't care *that* much as I could re-run my clean up tool against the now
 much smaller CF.

 (*) A long long time ago I seem to recall reading advice about "don't
 ever run nodetool compact", but I can't remember why.  Is there any bad
 long term consequence?  Short term there are several:
 -a heavy operation
 -temporary 2x disk space
 -one big SSTable afterwards
 But moving forward, everything is ok right?
  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
 etc...  The only flaw I can think of is it will take forever until the
 SSTable minor compactions build up enough to consider including the big
 SSTable in a compaction, making it likely I'll have to self manage
 compactions.



 On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

I have played with this quite a bit and recommend you set gc_grace_seconds
to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.

A caveat I have is that we use C* 2.0.6 - but the space we expect to
recover is in fact recovered.

Actually, since we never delete explicitly (just ttl) we always have
gc_grace_seconds set to 0.

Another important caveat is to be careful with repair: having set gc to 0
and compacted on a node, if you then repair it, data may come streaming in
from the other nodes. We don't run into this, as our gc is always 0, but
others may be able to comment.

ml


On Fri, Apr 11, 2014 at 11:26 AM, William Oberman
wrote:

> Yes, I'm using SizeTiered.
>
> I totally understand the "mess up the heuristics" issue.  But, I don't
> understand "You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables".  My
> understanding is the small tables will still compact.  The problem is that
> until I have 3 other (by default) tables of the same size as the "big
> table", it won't be compacted.
>
> In my case, this might not be terrible though, right?  To get into the
> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
> are cleared, but call it 50GB.  That means I won't compact this 50GB file
> until I gather another 150GB (50,50,50,50->200).   But, that's not
> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
> 450GB until I had another 1.3TB would be rough...
>
> I think your advice is great for people looking for "normal" answers in
> the forum, but I don't think my use case is very normal :-)
>
> will
>
> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy wrote:
>
>> Yes, running nodetool compact (major compaction) creates one large
>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>> this the compaction strategy you are using?) leading to multiple 'small'
>> SSTables alongside the single large SSTable, which results in increased
>> read latency. You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace on
>> the CF, allow automatic minor compactions to kick in and then increase the
>> gc_grace once again after the tombstones have been removed.
>>
>>
>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> So, if I was impatient and just "wanted to make this happen now", I
>>> could:
>>>
>>> 1.) Change GCGraceSeconds of the CF to 0
>>> 2.) run nodetool compact (*)
>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>
>>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>>> don't care *that* much as I could re-run my clean up tool against the now
>>> much smaller CF.
>>>
>>> (*) A long long time ago I seem to recall reading advice about "don't
>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>> long term consequence?  Short term there are several:
>>> -a heavy operation
>>> -temporary 2x disk space
>>> -one big SSTable afterwards
>>> But moving forward, everything is ok right?
>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>> etc...  The only flaw I can think of is it will take forever until the
>>> SSTable minor compactions build up enough to consider including the big
>>> SSTable in a compaction, making it likely I'll have to self manage
>>> compactions.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>>>
 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
 ober...@civicscience.com> wrote:

> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
> tbarbu...@gmail.com> wrote:
>
>> compaction should take care of it; for me it never worked so I run
>> nodetool compaction on every node; that does it.
>>
>>
>> 2014-04-11 16:05 GMT+02:00 William Oberman 
>> :
>>
>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>> nodetool repair, or time (as in just

Re: clearing tombstones?

2014-04-11 Thread William Oberman

Yes, I'm using SizeTiered.

I totally understand the "mess up the heuristics" issue.  But, I don't
understand "You will incur the operational overhead of having to manage
compactions if you wish to compact these smaller SSTables".  My
understanding is the small tables will still compact.  The problem is that
until I have 3 other (by default) tables of the same size as the "big
table", it won't be compacted.

In my case, this might not be terrible though, right?  To get into the
trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
90-95% of the data, so I expect the data to be 25-50GB after the tombstones
are cleared, but call it 50GB.  That means I won't compact this 50GB file
until I gather another 150GB (50,50,50,50->200).   But, that's not
*horrible*.  Now, if I only deleted 10% of the data, waiting to compact
450GB until I had another 1.3TB would be rough...

I think your advice is great for people looking for "normal" answers in the
forum, but I don't think my use case is very normal :-)

will

On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy  wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman  > wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>> don't care *that* much as I could re-run my clean up tool against the now
>> much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a compaction, making it likely I'll have to self manage
>> compactions.
>>
>>
>>
>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>>
>>> Correct, a tombstone will only be removed after gc_grace period has
>>> elapsed. The default value is set to 10 days which allows a great deal of
>>> time for consistency to be achieved prior to deletion. If you are
>>> operationally confident that you can achieve consistency via anti-entropy
>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>> ober...@civicscience.com> wrote:
>>>
 I'm seeing a lot of articles about a dependency between removing
 tombstones and GCGraceSeconds, which might be my problem (I just checked,
 and this CF has GCGraceSeconds of 10 days).


 On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli >>> > wrote:

> compaction should take care of it; for me it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman :
>
> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>> nodetool repair, or time (as in just wait)?
>>
>> I had a CF that was more or less storing session information.  After
>> some time, we decided that one piece of this information was pointless to
>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>> columns for a row).   I wrote a process to remove all of those columns
>> (which again in a vast majority of cases had the effect of removing the
>> whole row).
>>
>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>  After I did this mass delete, everything was the same size on disk 
>> (which
>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>> what to poke to cause compactions to clear the tombstones.  First I tried
>> nodetool cleanup on a candidate node.  But, afterwards the disk usag

Re: clearing tombstones?

2014-04-11 Thread Paulo Ricardo Motta Gomes

I have a similar problem here, I deleted about 30% of a very large CF using
LCS (about 80GB per node), but still my data hasn't shrinked, even if I
used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
scrub forces a minor compaction?

Cheers,

Paulo


On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy  wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman  > wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>> don't care *that* much as I could re-run my clean up tool against the now
>> much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a compaction, making it likely I'll have to self manage
>> compactions.
>>
>>
>>
>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>>
>>> Correct, a tombstone will only be removed after gc_grace period has
>>> elapsed. The default value is set to 10 days which allows a great deal of
>>> time for consistency to be achieved prior to deletion. If you are
>>> operationally confident that you can achieve consistency via anti-entropy
>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>
>>>
>>> Mark
>>>
>>>
>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>> ober...@civicscience.com> wrote:
>>>
 I'm seeing a lot of articles about a dependency between removing
 tombstones and GCGraceSeconds, which might be my problem (I just checked,
 and this CF has GCGraceSeconds of 10 days).


 On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli >>> > wrote:

> compaction should take care of it; for me it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman :
>
> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>> nodetool repair, or time (as in just wait)?
>>
>> I had a CF that was more or less storing session information.  After
>> some time, we decided that one piece of this information was pointless to
>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>> columns for a row).   I wrote a process to remove all of those columns
>> (which again in a vast majority of cases had the effect of removing the
>> whole row).
>>
>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>  After I did this mass delete, everything was the same size on disk 
>> (which
>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>> what to poke to cause compactions to clear the tombstones.  First I tried
>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>> the same.  Then I tried nodetool repair on that same node.  But again, 
>> disk
>> usage is still the same.  The CF has no snapshots.
>>
>> So, am I misunderstanding something?  Is there another operation to
>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one 
>> node.
>>  Do I have to run one or the other over all nodes to clear tombstones?
>>
>> Cassandra 1.2.15 if it matters,
>>
>> Thanks!
>>
>> will
>>
>
>



>>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200

Re: binary protocol server side sockets

2014-04-11 Thread Phil Luckhurst

We have considered this but wondered how well it would work as the Cassandra
Java Driver opens multiple connections internally to each Cassandra node. I
suppose it depends how those connections are used internally, if it's round
robin then it should work. Perhaps we just need to to try it.

--
Thanks
Phil


Chris Lohfink wrote
> TCP keep alives (by the setTimeout) are notoriously useless...  The
> default
> 2 hours is generally far longer then any timeout in NAT translation tables
> (generally ~5 min) and even if you decrease the keep alive to a sane value
> a log of networks actually throw away TCP keep alive packets.  You see
> that
> a lot more in cell networks though.  Its almost always a good idea to have
> a software keep alive although it seems to be not implemented in this
> protocol.  You can make a super simple CF with 1 value and query it every
> minute a connection is idle or something.  i.e. "select * from DummyCF
> where id = 1"
> 
> -- 
> *Chris Lohfink*
> Engineer
> 415.663.6738  |  Skype: clohfink.blackbirdit
> *Blackbird **[image: favicon]*
> 
> 775.345.3485  |  www.blackbirdIT.com ;
> 
> *"Formerly PalominoDB/DriveDev"*
> 
> 
> On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst <

> phil.luckhurst@

>> wrote:
> 
>> We are also seeing this in our development environment. We have a 3 node
>> Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a
>> Tomcat based application running on Windows using the 2.0.0 Cassandra
>> Java
>> Driver. We have setKeepAlive(true) when building the cluster in the
>> application and this does keep one connection open on the client side to
>> each of the 3 Cassandra nodes, but we still see the build up of 'old'
>> ESTABLISHED connections on each of the Cassandra servers.
>>
>> We are also getting that same "Unexpected exception during request"
>> exception appearing in the logs
>>
>> ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
>> ErrorMessage.java (line 222) Unexpected exception during request
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>> at sun.nio.ch.IOUtil.read(Unknown Source)
>> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>> at
>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>> at
>>
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>> at
>>
>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>> at
>>
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>> at
>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source)
>> at java.lang.Thread.run(Unknown Source)
>>
>> Initially we thought this was down to a firewall that is between our
>> development machines and the Cassandra nodes but that has now been
>> configured not to 'kill' any connections on port 9042. We also have the
>> Windows firewall on the client side turned off.
>>
>> We still think this is down to our environment as the same application
>> running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
>> doing this but up to now we can't track down the cause.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
>> Sent from the 

> cassandra-user@.apache

>  mailing list archive at
>> Nabble.com.
>>
> 
> 
> 
> -- 
> *Chris Lohfink*
> Engineer
> 415.663.6738  |  Skype: clohfink.blackbirdit
> 
> *Blackbird **[image: favicon]*
> 
> 775.345.3485  |  www.blackbirdIT.com ;
> 
> *"Formerly PalominoDB/DriveDev"*
> 
> 
> image001.png (5K)
> ;





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593955.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: clearing tombstones?

2014-04-11 Thread William Oberman

Answered my own question.  Good writeup here of the pros/cons of compact:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_about_config_compact_c.html

And I was thinking of bad information that used to float in this forum
about major compactions (with respect to the impact to minor compactions).
 I'm hesitant to write the offending sentence again :-)


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
>>> wrote:
>>>
 compaction should take care of it; for me it never worked so I run
 nodetool compaction on every node; that does it.


 2014-04-11 16:05 GMT+02:00 William Oberman :

 I'm wondering what will clear tombstoned rows?  nodetool cleanup,
> nodetool repair, or time (as in just wait)?
>
> I had a CF that was more or less storing session information.  After
> some time, we decided that one piece of this information was pointless to
> track (and was 90%+ of the columns, and in 99% of those cases was ALL
> columns for a row).   I wrote a process to remove all of those columns
> (which again in a vast majority of cases had the effect of removing the
> whole row).
>
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>  After I did this mass delete, everything was the same size on disk (which
> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
> what to poke to cause compactions to clear the tombstones.  First I tried
> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
> the same.  Then I tried nodetool repair on that same node.  But again, 
> disk
> usage is still the same.  The CF has no snapshots.
>
> So, am I misunderstanding something?  Is there another operation to
> try?  Do I have to "just wait"?  I've only done cleanup/repair on one 
> node.
>  Do I have to run one or the other over all nodes to clear tombstones?
>
> Cassandra 1.2.15 if it matters,
>
> Thanks!
>
> will
>


>>>
>>>
>>>
>>
>

Re: clearing tombstones?

2014-04-11 Thread Mark Reddy

Yes, running nodetool compact (major compaction) creates one large SSTable.
This will mess up the heuristics of the SizeTiered strategy (is this the
compaction strategy you are using?) leading to multiple 'small' SSTables
alongside the single large SSTable, which results in increased read
latency. You will incur the operational overhead of having to manage
compactions if you wish to compact these smaller SSTables. For all these
reasons it is generally advised to stay away from running compactions
manually.

Assuming that this is a production environment and you want to keep
everything running as smoothly as possible I would reduce the gc_grace on
the CF, allow automatic minor compactions to kick in and then increase the
gc_grace once again after the tombstones have been removed.


On Fri, Apr 11, 2014 at 3:44 PM, William Oberman
wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
>>> wrote:
>>>
 compaction should take care of it; for me it never worked so I run
 nodetool compaction on every node; that does it.


 2014-04-11 16:05 GMT+02:00 William Oberman :

 I'm wondering what will clear tombstoned rows?  nodetool cleanup,
> nodetool repair, or time (as in just wait)?
>
> I had a CF that was more or less storing session information.  After
> some time, we decided that one piece of this information was pointless to
> track (and was 90%+ of the columns, and in 99% of those cases was ALL
> columns for a row).   I wrote a process to remove all of those columns
> (which again in a vast majority of cases had the effect of removing the
> whole row).
>
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>  After I did this mass delete, everything was the same size on disk (which
> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
> what to poke to cause compactions to clear the tombstones.  First I tried
> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
> the same.  Then I tried nodetool repair on that same node.  But again, 
> disk
> usage is still the same.  The CF has no snapshots.
>
> So, am I misunderstanding something?  Is there another operation to
> try?  Do I have to "just wait"?  I've only done cleanup/repair on one 
> node.
>  Do I have to run one or the other over all nodes to clear tombstones?
>
> Cassandra 1.2.15 if it matters,
>
> Thanks!
>
> will
>


>>>
>>>
>>>
>>
>

Re: List and Cancel running queries

2014-04-11 Thread Jonathan Lacefield

No. This is not possible today

> On Apr 11, 2014, at 1:19 AM, Richard Jennings  
> wrote:
>
> Is it possible to list all running queries on a Cassandra cluster ?
> Is it possible to cancel a running query on a Cassandra cluster?
>
> Regards

Re: clearing tombstones?

2014-04-11 Thread William Oberman

So, if I was impatient and just "wanted to make this happen now", I could:

1.) Change GCGraceSeconds of the CF to 0
2.) run nodetool compact (*)
3.) Change GCGraceSeconds of the CF back to 10 days

Since I have ~900M tombstones, even if I miss a few due to impatience, I
don't care *that* much as I could re-run my clean up tool against the now
much smaller CF.

(*) A long long time ago I seem to recall reading advice about "don't ever
run nodetool compact", but I can't remember why.  Is there any bad long
term consequence?  Short term there are several:
-a heavy operation
-temporary 2x disk space
-one big SSTable afterwards
But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
minor compactions that merge SSTables, etc...  The only flaw I can think of
is it will take forever until the SSTable minor compactions build up enough
to consider including the big SSTable in a compaction, making it likely
I'll have to self manage compactions.

On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy  wrote:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman  > wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
>> wrote:
>>
>>> compaction should take care of it; for me it never worked so I run
>>> nodetool compaction on every node; that does it.
>>>
>>>
>>> 2014-04-11 16:05 GMT+02:00 William Oberman :
>>>
>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
 nodetool repair, or time (as in just wait)?

 I had a CF that was more or less storing session information.  After
 some time, we decided that one piece of this information was pointless to
 track (and was 90%+ of the columns, and in 99% of those cases was ALL
 columns for a row).   I wrote a process to remove all of those columns
 (which again in a vast majority of cases had the effect of removing the
 whole row).

 This CF had ~1 billion rows, so I expect to be left with ~100m rows.
  After I did this mass delete, everything was the same size on disk (which
 I expected, knowing how tombstoning works).  It wasn't 100% clear to me
 what to poke to cause compactions to clear the tombstones.  First I tried
 nodetool cleanup on a candidate node.  But, afterwards the disk usage was
 the same.  Then I tried nodetool repair on that same node.  But again, disk
 usage is still the same.  The CF has no snapshots.

 So, am I misunderstanding something?  Is there another operation to
 try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
  Do I have to run one or the other over all nodes to clear tombstones?

 Cassandra 1.2.15 if it matters,

 Thanks!

 will

>>>
>>>
>>
>>
>>
>

Re: clearing tombstones?

2014-04-11 Thread tommaso barbugli

In my experience even after the gc_grace period tombstones remains stored
on disk (at least using cassandra 2.0.5) ; only a full compaction clears
them. Perhaps that is because my application never reads tombstones?

2014-04-11 16:31 GMT+02:00 Mark Reddy :

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman  > wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
>> wrote:
>>
>>> compaction should take care of it; for me it never worked so I run
>>> nodetool compaction on every node; that does it.
>>>
>>>
>>> 2014-04-11 16:05 GMT+02:00 William Oberman :
>>>
>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
 nodetool repair, or time (as in just wait)?

 I had a CF that was more or less storing session information.  After
 some time, we decided that one piece of this information was pointless to
 track (and was 90%+ of the columns, and in 99% of those cases was ALL
 columns for a row).   I wrote a process to remove all of those columns
 (which again in a vast majority of cases had the effect of removing the
 whole row).

 This CF had ~1 billion rows, so I expect to be left with ~100m rows.
  After I did this mass delete, everything was the same size on disk (which
 I expected, knowing how tombstoning works).  It wasn't 100% clear to me
 what to poke to cause compactions to clear the tombstones.  First I tried
 nodetool cleanup on a candidate node.  But, afterwards the disk usage was
 the same.  Then I tried nodetool repair on that same node.  But again, disk
 usage is still the same.  The CF has no snapshots.

 So, am I misunderstanding something?  Is there another operation to
 try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
  Do I have to run one or the other over all nodes to clear tombstones?

 Cassandra 1.2.15 if it matters,

 Thanks!

 will

>>>
>>>
>>
>>
>>
>

Re: clearing tombstones?

2014-04-11 Thread Mark Reddy

Correct, a tombstone will only be removed after gc_grace period has
elapsed. The default value is set to 10 days which allows a great deal of
time for consistency to be achieved prior to deletion. If you are
operationally confident that you can achieve consistency via anti-entropy
repairs within a shorter period you can always reduce that 10 day interval.


Mark


On Fri, Apr 11, 2014 at 3:16 PM, William Oberman
wrote:

> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli wrote:
>
>> compaction should take care of it; for me it never worked so I run
>> nodetool compaction on every node; that does it.
>>
>>
>> 2014-04-11 16:05 GMT+02:00 William Oberman :
>>
>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>> nodetool repair, or time (as in just wait)?
>>>
>>> I had a CF that was more or less storing session information.  After
>>> some time, we decided that one piece of this information was pointless to
>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>> columns for a row).   I wrote a process to remove all of those columns
>>> (which again in a vast majority of cases had the effect of removing the
>>> whole row).
>>>
>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>  After I did this mass delete, everything was the same size on disk (which
>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>> usage is still the same.  The CF has no snapshots.
>>>
>>> So, am I misunderstanding something?  Is there another operation to try?
>>>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
>>> I have to run one or the other over all nodes to clear tombstones?
>>>
>>> Cassandra 1.2.15 if it matters,
>>>
>>> Thanks!
>>>
>>> will
>>>
>>
>>
>
>
>

Re: regarding schema and suitability of cassandra

2014-04-11 Thread Sergey Murylev

Actually if you want to use Cassandra you should store all user related
data in single row with user ID as primary key.

On 11/04/14 18:14, Prem Yadav wrote:
> Thanks. 
> For the use case, what should I be thinking about schema-wise. ?
>
> Thanks,
> Prem
>
>
> On Fri, Apr 11, 2014 at 2:16 PM, Sergey Murylev
> mailto:sergeymury...@gmail.com>> wrote:
>
> Hi Prem,
>
>
>> Also, I have heard that Cassandra doesn't perform will with high
>> read ops. How true is that?
> I think that it isn't true. Cassandra has very good read
> performance. For more details you can look to benchmark
> .
>
>> How many read connections per machine can handle and how do I
>> measure that in cassandra/
> Cassandra uses one thread-per-client for remote procedure calls.
> For a large number of client connections, this can cause excessive
> memory usage for the thread stack. Connection pooling on the
> client side is highly recommended.
>
> --
> Thanks,
> Sergey
>
>
> On 11/04/14 13:03, Prem Yadav wrote:
>> Hi,
>> I am now to cassandra and even though I am not familiar to the
>> implementation and architecture of cassandra, Is struggle with
>> how to best design the schema.
>>
>> We have an application where we need to store huge amounts of
>> data. Its a per user storage where we store a lot of data for
>> each user and do a lot of random reads using userid.
>> Initially, there will be a lot of writes and once it has
>> stabilized, the reads will increase.
>>
>> We are expecting to randomly read about 15 GB of data everyday.
>> The reads will be per user id.
>>
>> Could you please suggest an implementation and things I need to
>> consider if I have to go with Cassandra. 
>> Also, I have heard that Cassandra doesn't perform will with high
>> read ops. How true is that? How many read connections per machine
>> can handle and how do I measure that in cassandra/
>>
>>
>> Thanks
>
>



signature.asc
Description: OpenPGP digital signature

Re: binary protocol server side sockets

2014-04-11 Thread Chris Lohfink

TCP keep alives (by the setTimeout) are notoriously useless...  The default
2 hours is generally far longer then any timeout in NAT translation tables
(generally ~5 min) and even if you decrease the keep alive to a sane value
a log of networks actually throw away TCP keep alive packets.  You see that
a lot more in cell networks though.  Its almost always a good idea to have
a software keep alive although it seems to be not implemented in this
protocol.  You can make a super simple CF with 1 value and query it every
minute a connection is idle or something.  i.e. "select * from DummyCF
where id = 1"

-- 
*Chris Lohfink*
Engineer
415.663.6738  |  Skype: clohfink.blackbirdit
*Blackbird **[image: favicon]*

775.345.3485  |  www.blackbirdIT.com 

*"Formerly PalominoDB/DriveDev"*


On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst <
phil.luckhu...@powerassure.com> wrote:

> We are also seeing this in our development environment. We have a 3 node
> Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a
> Tomcat based application running on Windows using the 2.0.0 Cassandra Java
> Driver. We have setKeepAlive(true) when building the cluster in the
> application and this does keep one connection open on the client side to
> each of the 3 Cassandra nodes, but we still see the build up of 'old'
> ESTABLISHED connections on each of the Cassandra servers.
>
> We are also getting that same "Unexpected exception during request"
> exception appearing in the logs
>
> ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
> ErrorMessage.java (line 222) Unexpected exception during request
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
> at sun.nio.ch.IOUtil.read(Unknown Source)
> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
> at
>
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
> at
>
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at
>
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.lang.Thread.run(Unknown Source)
>
> Initially we thought this was down to a firewall that is between our
> development machines and the Cassandra nodes but that has now been
> configured not to 'kill' any connections on port 9042. We also have the
> Windows firewall on the client side turned off.
>
> We still think this is down to our environment as the same application
> running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
> doing this but up to now we can't track down the cause.
>
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
*Chris Lohfink*
Engineer
415.663.6738  |  Skype: clohfink.blackbirdit

*Blackbird **[image: favicon]*

775.345.3485  |  www.blackbirdIT.com 

*"Formerly PalominoDB/DriveDev"*
<>

Re: clearing tombstones?

2014-04-11 Thread William Oberman

I'm seeing a lot of articles about a dependency between removing tombstones
and GCGraceSeconds, which might be my problem (I just checked, and this CF
has GCGraceSeconds of 10 days).


On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli wrote:

> compaction should take care of it; for me it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman :
>
> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
>> repair, or time (as in just wait)?
>>
>> I had a CF that was more or less storing session information.  After some
>> time, we decided that one piece of this information was pointless to track
>> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
>> a row).   I wrote a process to remove all of those columns (which again in
>> a vast majority of cases had the effect of removing the whole row).
>>
>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>  After I did this mass delete, everything was the same size on disk (which
>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>> what to poke to cause compactions to clear the tombstones.  First I tried
>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>> usage is still the same.  The CF has no snapshots.
>>
>> So, am I misunderstanding something?  Is there another operation to try?
>>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
>> I have to run one or the other over all nodes to clear tombstones?
>>
>> Cassandra 1.2.15 if it matters,
>>
>> Thanks!
>>
>> will
>>
>
>

Re: regarding schema and suitability of cassandra

2014-04-11 Thread Prem Yadav

Thanks.
For the use case, what should I be thinking about schema-wise. ?

Thanks,
Prem


On Fri, Apr 11, 2014 at 2:16 PM, Sergey Murylev wrote:

>  Hi Prem,
>
>
> Also, I have heard that Cassandra doesn't perform will with high read ops.
> How true is that?
>
> I think that it isn't true. Cassandra has very good read performance. For
> more details you can look to 
> benchmark
> .
>
> How many read connections per machine can handle and how do I measure that
> in cassandra/
>
>  Cassandra uses one thread-per-client for remote procedure calls. For a
> large number of client connections, this can cause excessive memory usage
> for the thread stack. Connection pooling on the client side is highly
> recommended.
>
> --
> Thanks,
> Sergey
>
>
> On 11/04/14 13:03, Prem Yadav wrote:
>
> Hi,
> I am now to cassandra and even though I am not familiar to the
> implementation and architecture of cassandra, Is struggle with how to best
> design the schema.
>
>  We have an application where we need to store huge amounts of data. Its
> a per user storage where we store a lot of data for each user and do a lot
> of random reads using userid.
> Initially, there will be a lot of writes and once it has stabilized, the
> reads will increase.
>
>  We are expecting to randomly read about 15 GB of data everyday. The
> reads will be per user id.
>
>  Could you please suggest an implementation and things I need to consider
> if I have to go with Cassandra.
> Also, I have heard that Cassandra doesn't perform will with high read ops.
> How true is that? How many read connections per machine can handle and how
> do I measure that in cassandra/
>
>
>  Thanks
>
>
>

Re: clearing tombstones?

2014-04-11 Thread tommaso barbugli

compaction should take care of it; for me it never worked so I run nodetool
compaction on every node; that does it.


2014-04-11 16:05 GMT+02:00 William Oberman :

> I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
> repair, or time (as in just wait)?
>
> I had a CF that was more or less storing session information.  After some
> time, we decided that one piece of this information was pointless to track
> (and was 90%+ of the columns, and in 99% of those cases was ALL columns for
> a row).   I wrote a process to remove all of those columns (which again in
> a vast majority of cases had the effect of removing the whole row).
>
> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>  After I did this mass delete, everything was the same size on disk (which
> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
> what to poke to cause compactions to clear the tombstones.  First I tried
> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
> the same.  Then I tried nodetool repair on that same node.  But again, disk
> usage is still the same.  The CF has no snapshots.
>
> So, am I misunderstanding something?  Is there another operation to try?
>  Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
> I have to run one or the other over all nodes to clear tombstones?
>
> Cassandra 1.2.15 if it matters,
>
> Thanks!
>
> will
>

clearing tombstones?

2014-04-11 Thread William Oberman

I'm wondering what will clear tombstoned rows?  nodetool cleanup, nodetool
repair, or time (as in just wait)?

I had a CF that was more or less storing session information.  After some
time, we decided that one piece of this information was pointless to track
(and was 90%+ of the columns, and in 99% of those cases was ALL columns for
a row).   I wrote a process to remove all of those columns (which again in
a vast majority of cases had the effect of removing the whole row).

This CF had ~1 billion rows, so I expect to be left with ~100m rows.  After
I did this mass delete, everything was the same size on disk (which I
expected, knowing how tombstoning works).  It wasn't 100% clear to me what
to poke to cause compactions to clear the tombstones.  First I tried
nodetool cleanup on a candidate node.  But, afterwards the disk usage was
the same.  Then I tried nodetool repair on that same node.  But again, disk
usage is still the same.  The CF has no snapshots.

So, am I misunderstanding something?  Is there another operation to try?
 Do I have to "just wait"?  I've only done cleanup/repair on one node.  Do
I have to run one or the other over all nodes to clear tombstones?

Cassandra 1.2.15 if it matters,

Thanks!

will

Re: regarding schema and suitability of cassandra

2014-04-11 Thread Sergey Murylev

Hi Prem,

> Also, I have heard that Cassandra doesn't perform will with high read
> ops. How true is that?
I think that it isn't true. Cassandra has very good read performance.
For more details you can look to benchmark
.
> How many read connections per machine can handle and how do I measure
> that in cassandra/
Cassandra uses one thread-per-client for remote procedure calls. For a
large number of client connections, this can cause excessive memory
usage for the thread stack. Connection pooling on the client side is
highly recommended.

--
Thanks,
Sergey

On 11/04/14 13:03, Prem Yadav wrote:
> Hi,
> I am now to cassandra and even though I am not familiar to the
> implementation and architecture of cassandra, Is struggle with how to
> best design the schema.
>
> We have an application where we need to store huge amounts of data.
> Its a per user storage where we store a lot of data for each user and
> do a lot of random reads using userid.
> Initially, there will be a lot of writes and once it has stabilized,
> the reads will increase.
>
> We are expecting to randomly read about 15 GB of data everyday. The
> reads will be per user id.
>
> Could you please suggest an implementation and things I need to
> consider if I have to go with Cassandra. 
> Also, I have heard that Cassandra doesn't perform will with high read
> ops. How true is that? How many read connections per machine can
> handle and how do I measure that in cassandra/
>
>
> Thanks



signature.asc
Description: OpenPGP digital signature

regarding schema and suitability of cassandra

2014-04-11 Thread Prem Yadav

Hi,
I am now to cassandra and even though I am not familiar to the
implementation and architecture of cassandra, Is struggle with how to best
design the schema.

We have an application where we need to store huge amounts of data. Its a
per user storage where we store a lot of data for each user and do a lot of
random reads using userid.
Initially, there will be a lot of writes and once it has stabilized, the
reads will increase.

We are expecting to randomly read about 15 GB of data everyday. The reads
will be per user id.

Could you please suggest an implementation and things I need to consider if
I have to go with Cassandra.
Also, I have heard that Cassandra doesn't perform will with high read ops.
How true is that? How many read connections per machine can handle and how
do I measure that in cassandra/


Thanks

Re: Minimum database size and ops/second to start considering Cassandra

2014-04-11 Thread motta.lrd

Thanks Tim,

> Significant number of writes / second -> possibly a good use case for 
cassandra.

what is a significant number for you?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Minimum-database-size-and-ops-second-to-start-considering-Cassandra-tp7593918p7593940.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Point in Time Recovery

2014-04-11 Thread Dennis Schwan

Hi Rob,

we need this for the worst case scenario, so our intention is to restore the 
entire cluster, not a single node.
I am really not sure what the correct procedure would be. I think we have 
configured everything properly so the nodes are archiving the commitlogs (even 
though I am not sure when exactly a single commitlog gets archived) but we 
could not manage to recover a database from those archived commitlogs. What we 
did was:

1. Insert "good" data into the cluster
2. Take a snapshot
3. Insert "good" data into the cluster
4. Note Timestamp
5. Insert "bad" data into the cluster
6. Shut down whole cluster
7. Delete SSTables and restore from snapshot
8. Set Timestamp in commitlog_archiving.properties and restart the nodes

The archived commitlogs are copied to the restore directory and afterwards 
cassandra is replaying those commitlogs but still we only see the data from the 
snapshot, not the commitlogs.

Regards,
Dennis

P.S.: Cassandra 2.0.6

Am 10.04.2014 23:17, schrieb Robert Coli:
On Thu, Apr 10, 2014 at 1:19 AM, Dennis Schwan 
mailto:dennis.sch...@1und1.de>> wrote:
do you know any description how to perform a point-in-time recovery
using the archived commitlogs?
We have already tried several things but it just did not work.

Are you restoring the entire *cluster* to a point in time, or a given node? And 
why?

The only people who are likely to have any experience/expertise with that 
archived commitlog stuff are the people from Netflix who contributed it.

=Rob






--
Dennis Schwan

Oracle DBA
Mail Core

1&1 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany
Phone: +49 721 91374-8738
E-Mail: dennis.sch...@1und1.de | Web: 
www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484

Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas 
Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian 
Würst
Aufsichtsratsvorsitzender: Michael Scheeren

Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen 
enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten 
Sie diese Email. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, 
diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise 
auch immer zu verwenden.

This E-Mail may contain confidential and/or privileged information. If you are 
not the intended recipient of this E-Mail, you are hereby notified that saving, 
distribution or use of the content of this E-Mail in any way is prohibited. If 
you have received this E-Mail in error, please notify the sender and delete the 
E-Mail.

List and Cancel running queries

2014-04-11 Thread Richard Jennings

Is it possible to list all running queries on a Cassandra cluster ?
Is it possible to cancel a running query on a Cassandra cluster?

Regards

Re: binary protocol server side sockets

2014-04-11 Thread Phil Luckhurst

We are also seeing this in our development environment. We have a 3 node
Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a
Tomcat based application running on Windows using the 2.0.0 Cassandra Java
Driver. We have setKeepAlive(true) when building the cluster in the
application and this does keep one connection open on the client side to
each of the 3 Cassandra nodes, but we still see the build up of 'old'
ESTABLISHED connections on each of the Cassandra servers. 

We are also getting that same "Unexpected exception during request"
exception appearing in the logs

ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Initially we thought this was down to a firewall that is between our
development machines and the Cassandra nodes but that has now been
configured not to 'kill' any connections on port 9042. We also have the
Windows firewall on the client side turned off.

We still think this is down to our environment as the same application
running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
doing this but up to now we can't track down the cause.




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

42 matches

Mail list logo