Re: Understanding of proliferation of sstables during a repair

2017-02-26 Thread Benjamin Roth
Too many open files. Which is 100k by default and we had >40k sstables.
Normally the are around 500-1000.

Am 27.02.2017 02:40 schrieb "Seth Edwards" :

> This makes a lot more sense. What does TMOF stand for?
>
> On Sun, Feb 26, 2017 at 1:01 PM, Benjamin Roth 
> wrote:
>
>> Hi Seth,
>>
>> Repairs can create a lot of tiny SSTables. I also encountered the
>> creation of so many sstables that the node died because of TMOF. At that
>> time the affected nodes were REALLY inconsistent.
>>
>> One reason can be immense inconsistencies spread over many
>> partition(-ranges) with a lot of subrange repairs that trigger a lot of
>> independant streams. Each stream results in a single SSTable that can be
>> very small. No matter how small it is, it has to be compacted and can cause
>> a compaction impact that is a lot bigger than expected from a tiny little
>> table.
>>
>> Also consider that there is a theoretical race condition that can cause
>> repairs even though data is not inconsistent due to "flighing in mutations"
>> during merkle tree calculation.
>>
>> 2017-02-26 20:41 GMT+01:00 Seth Edwards :
>>
>>> Hello,
>>>
>>> We just ran a repair on a keyspace using TWCS and a mixture of TTLs
>>> .This caused a large proliferation of sstables and compactions. There is
>>> likely a lot of entropy in this keyspace. I am trying to better understand
>>> why this is.
>>>
>>> I've also read that you may not want to run repairs on short TTL data
>>> and rely upon other anti-entropy mechanisms to achieve consistency instead.
>>> Is this generally true?
>>>
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


Re: How does cassandra achieve Linearizability?

2017-02-26 Thread Kant Kodali
Is there way to apply the commits from this
https://github.com/bdeggleston/cassandra/tree/CASSANDRA-6246-trunk branch
to Apache Cassandra 3.10 branch? I thought I could just merge these two
branches but looks like there are several trunks so I am confused which
trunk I am merging to?
I want to merge it just to try on my local machine.

Thanks!

On Wed, Feb 22, 2017 at 8:04 PM, Michael Shuler 
wrote:

> I updated the fix version on CASSANDRA-6246 to 4.x. The 3.11.x edit was
> a bulk move when removing the cassandra-3.X branch and the 3.x Jira
> version. There are likely other new feature tickets that should really
> say 4.x.
>
> --
> Kind regards,
> Michael
>
> On 02/22/2017 07:28 PM, Kant Kodali wrote:
> > I hope that patch is reviewed as quickly as possible. We use LWT's
> > heavily and we are getting a throughput of 600 writes/sec and each write
> > is 1KB in our case.
> >
> >
> >
> >
> >
> > On Wed, Feb 22, 2017 at 7:48 AM, Edward Capriolo  > > wrote:
> >
> >
> >
> > On Wed, Feb 22, 2017 at 9:47 AM, Ariel Weisberg  > > wrote:
> >
> > __
> > Hi,
> >
> > No it's not going to be in 3.11.x. The earliest release it could
> > make it into is 4.0.
> >
> > Ariel
> >
> > On Wed, Feb 22, 2017, at 03:34 AM, Kant Kodali wrote:
> >> Hi Ariel,
> >>
> >> Can we really expect the fix in 3.11.x as the
> >> ticket https://issues.apache.org/jira/browse/CASSANDRA-6246
> >>  jql=text%20~%20%22epaxos%22> says?
> >>
> >> Thanks,
> >> kant
> >>
> >> On Thu, Feb 16, 2017 at 2:12 PM, Ariel Weisberg
> >> > wrote:
> >>
> >> __
> >> Hi,
> >>
> >> That would work and would help a lot with the dueling
> >> proposer issue.
> >>
> >> A lot of the leader election stuff is designed to reduce
> >> the number of roundtrips and not just address the dueling
> >> proposer issue. Those will have downtime because it's
> >> there for correctness. Just adding an affinity for a
> >> specific proposer is probably a free lunch.
> >>
> >> I don't think you can group keys because the Paxos
> >> proposals are per partition which is why we get linear
> >> scale out for Paxos. I don't believe it's linearizable
> >> across multiple partitions. You can use the clustering key
> >> and deterministically pick one of the live replicas for
> >> that clustering key. Sort the list of replicas by IP, hash
> >> the clustering key, use the hash as an index into the list
> >> of replicas.
> >>
> >> Batching is of limited usefulness because we only use
> >> Paxos for CAS I think? So in a batch by definition all but
> >> one will fail the CAS. This is something where a
> >> distinguished coordinator could help by failing the rest
> >> of the contending requests more inexpensively than it
> >> currently does.
> >>
> >>
> >> Ariel
> >>
> >> On Thu, Feb 16, 2017, at 04:55 PM, Edward Capriolo wrote:
> >>>
> >>>
> >>> On Thu, Feb 16, 2017 at 4:33 PM, Ariel Weisberg
> >>> > wrote:
> >>>
> >>> __
> >>> Hi,
> >>>
> >>> Classic Paxos doesn't have a leader. There are
> >>> variants on the original Lamport approach that will
> >>> elect a leader (or some other variation like Mencius)
> >>> to improve throughput, latency, and performance under
> >>> contention. Cassandra implements the approach from
> >>> the beginning of "Paxos Made Simple"
> >>> (https://goo.gl/SrP0Wb) with no additional
> >>> optimizations that I am aware of. There is no
> >>> distinguished proposer (leader).
> >>>
> >>> That paper does go on to discuss electing a
> >>> distinguished proposer, but that was never done for
> >>> C*. I believe it's not considered a good fit for C*
> >>> philosophically.
> >>>
> >>> Ariel
> >>>
> >>> On Thu, Feb 16, 2017, at 04:20 PM, Kant Kodali wrote:
>  @Ariel Weisberg EPaxos looks very interesting as it
>  looks like it doesn't need any designated leader for
>  C* but I am assuming the paxos that is implemented
>  today for LWT's requires Leader election and If so,
>  don't we need to have an odd number of nodes 

Re: Understanding of proliferation of sstables during a repair

2017-02-26 Thread Seth Edwards
This makes a lot more sense. What does TMOF stand for?

On Sun, Feb 26, 2017 at 1:01 PM, Benjamin Roth 
wrote:

> Hi Seth,
>
> Repairs can create a lot of tiny SSTables. I also encountered the creation
> of so many sstables that the node died because of TMOF. At that time the
> affected nodes were REALLY inconsistent.
>
> One reason can be immense inconsistencies spread over many
> partition(-ranges) with a lot of subrange repairs that trigger a lot of
> independant streams. Each stream results in a single SSTable that can be
> very small. No matter how small it is, it has to be compacted and can cause
> a compaction impact that is a lot bigger than expected from a tiny little
> table.
>
> Also consider that there is a theoretical race condition that can cause
> repairs even though data is not inconsistent due to "flighing in mutations"
> during merkle tree calculation.
>
> 2017-02-26 20:41 GMT+01:00 Seth Edwards :
>
>> Hello,
>>
>> We just ran a repair on a keyspace using TWCS and a mixture of TTLs .This
>> caused a large proliferation of sstables and compactions. There is likely a
>> lot of entropy in this keyspace. I am trying to better understand why this
>> is.
>>
>> I've also read that you may not want to run repairs on short TTL data and
>> rely upon other anti-entropy mechanisms to achieve consistency instead. Is
>> this generally true?
>>
>>
>> Thanks!
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Understanding of proliferation of sstables during a repair

2017-02-26 Thread Benjamin Roth
Hi Seth,

Repairs can create a lot of tiny SSTables. I also encountered the creation
of so many sstables that the node died because of TMOF. At that time the
affected nodes were REALLY inconsistent.

One reason can be immense inconsistencies spread over many
partition(-ranges) with a lot of subrange repairs that trigger a lot of
independant streams. Each stream results in a single SSTable that can be
very small. No matter how small it is, it has to be compacted and can cause
a compaction impact that is a lot bigger than expected from a tiny little
table.

Also consider that there is a theoretical race condition that can cause
repairs even though data is not inconsistent due to "flighing in mutations"
during merkle tree calculation.

2017-02-26 20:41 GMT+01:00 Seth Edwards :

> Hello,
>
> We just ran a repair on a keyspace using TWCS and a mixture of TTLs .This
> caused a large proliferation of sstables and compactions. There is likely a
> lot of entropy in this keyspace. I am trying to better understand why this
> is.
>
> I've also read that you may not want to run repairs on short TTL data and
> rely upon other anti-entropy mechanisms to achieve consistency instead. Is
> this generally true?
>
>
> Thanks!
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Understanding of proliferation of sstables during a repair

2017-02-26 Thread Seth Edwards
Hello,

We just ran a repair on a keyspace using TWCS and a mixture of TTLs .This
caused a large proliferation of sstables and compactions. There is likely a
lot of entropy in this keyspace. I am trying to better understand why this
is.

I've also read that you may not want to run repairs on short TTL data and
rely upon other anti-entropy mechanisms to achieve consistency instead. Is
this generally true?


Thanks!