Evict Tombstones with STCS

2016-05-28 Thread Anuj Wadehra
Hi,
We are using C* 2.0.x . What options are available if disk space is too full to 
do compaction on huge sstables formed by STCS (created around long ago but not 
getting compacted due to min_compaction_threshold being 4).
We suspect that huge space will be released when 2 largest sstables get 
compacted together such that tombstone eviction is possible. But there is not 
enough space for compacting them together assuming that compaction would need 
at least free disk=size of sstable1 + size of sstable 2 ??
I read STCS code and if no sstables are available for compactions, it should 
pick individual sstable for compaction. But somehow, huge sstables are not 
participating in individual compaction.. is it due to default 20% tombstone 
threshold?? And if it so, forceUserdefinedcompaction or setting 
unchecked_tombstone_compactions to true wont help either as tombstones are less 
than 20% and not much disk would be recovered.
It is not possible to add additional disks too.
We see huge difference in disk utilization of different nodes. May be some 
nodes were able to get away with tombstones while others didnt manage to evict 
tombstones.

Would be good to know more alternatives from community.

ThanksAnuj






Sent from Yahoo Mail on Android

Re: Error while rebuilding a node: Stream failed

2016-05-28 Thread George Sigletos
No luck unfortunately. It seems that the connection to the destination node
was lost.

However there was progress compared to the previous times. A lot more data
was streamed.

(From source node)
INFO  [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 -
InetAddress /54.172.235.227 is now DOWN
INFO  [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238
OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227
ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938
StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_79]
at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.ChannelInputStream.read(Unknown Source) ~[na:1.7.0_79]
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown
Source) ~[na:1.7.0_79]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257)
~[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
INFO  [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 -
InetAddress /54.172.235.227 is now UP

On Fri, May 27, 2016 at 5:37 PM, George Sigletos 
wrote:

> I am trying once more using more aggressive tcp settings, as recommended
> here
> 
>
> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3 
> net.ipv4.tcp_keepalive_intvl=10
>
> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all nodes)
>
> Let's see what happens. I don't know what else to try. I have even further
> increased streaming_socket_timeout_in_ms
>
>
>
> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta 
> wrote:
>
>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in this
>> case because the incoming connection on the source node is timing out on
>> the network layer, and streaming_socket_timeout_in_ms controls the socket
>> timeout in the app layer and throws SocketTimeoutException (not 
>> java.io.IOException:
>> Connection timed out). So you should probably use more aggressive tcp
>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
>> tuning that? Even that might not be sufficient as some routers tend to
>> ignore tcp keep-alives and just kill idle connections.
>>
>> As said before, this will ultimately be fixed by adding keep-alive to the
>> app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help, one
>> extreme approach would be to backport this to 2.1 (unless some experienced
>> operator out there has a more creative approach).
>>
>> @eevans, I'm not sure he is using a mixed version cluster, it seem he
>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>>
>> 2016-05-27 11:39 GMT-03:00 Eric Evans :
>>
>>> From the various stacktraces in this thread, it's obvious you are
>>> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
>>> supported with mixed Cassandra versions.  Sometimes it will work,
>>> sometimes it won't (and it will definitely not work in this instance).
>>>
>>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
>>> the new nodes using 2.1.13, and upgrade after.
>>>
>>> On Fri, May 27, 2016 at 8:41 AM, George Sigletos 
>>> wrote:
>>>
>>>  ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>>>  StreamSession.java:505 - [Stream
>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>>  Streaming error occurred
>>>  java.lang.RuntimeException: Outgoing stream handler has been closed
>>>  at
>>> 
>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>>>  ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>  at
>>> 
>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>>>  ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>  at
>>> 
>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>>>  ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>  at
>>> 
>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>>>  ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>  at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>> 
>>>  And this is from the source no