SV: May allow_mult cause DoS?

Rune Skou Larsen Wed, 18 Dec 2013 12:09:11 -0800

Save the transaction list inside the customer object keyed by customerid. Index 
this object with 2i on storeids for each contained tx.

If some customer objects grow too big, you can move old txs into archive 
objects keyed by customerid_seqno. For your low latency customer reads, you 
probably only need the newest txs anyway.

That's just one idea. Trifork will be happy to help you find a suitable model 
for your use cases.

We usually do this by stress-testing a simulation with realistic data 
sizes/shapes and access patterns. It's fastest if we come onsite for a couple 
of days and work with you to set it up, but we can also help you offsite.

Write me if you're interested, then we can do a call.

Rune Skou Larsen
Trifork, Denmark

----- Reply message -----
Fra: "Viable Nisei" <vsni...@gmail.com>
Til: "riak-users@lists.basho.com" <riak-users@lists.basho.com>
Emne: May allow_mult cause DoS?
Dato: ons., dec. 18, 2013 20:13

---------- Forwarded message ----------
From: Viable Nisei <vsni...@gmail.com<mailto:vsni...@gmail.com>>
Date: Thu, Dec 19, 2013 at 2:11 AM
Subject: Re: May allow_mult cause DoS?
To: Russell Brown <russell.br...@me.com<mailto:russell.br...@me.com>>

Hi.

Thank you for your descriptive and so informative answer very much.

On Wed, Dec 18, 2013 at 3:29 PM, Russell Brown 
<russell.br...@me.com<mailto:russell.br...@me.com>> wrote:
Hi,

Can you describe your use case a little? Maybe it would be easier for us to 
help.
Yeah, let me describe some abstract case equivalent to our. Let we have 
CUSTOMER object, STORE object and TRANSACTION object, each TRANSACTION has one 
tribool attribute STATE={ACTIVE, COMPLETED, ROLLED_BACK}.

We should be able to list all the TRANSACTIONs of given CUSTOMER, for example 
(so we should establish 1-many relation, this list should not be long, 
10^2-10^3 records, but we should be able to obtain this list fast enough). Also 
we should be able to list all the TRANSACTIONs of given STATE made in given 
STORE (lists may be very long, up to 10^8 records), but these list may be 
computed with some latency. Predictable latency is surely preferred but is not 
show-stopper. So, that's all.

Another pain is races and/or operations atomicity, but it's not so important at 
current time.

On 18 Dec 2013, at 04:32, Viable Nisei 
<vsni...@gmail.com<mailto:vsni...@gmail.com>> wrote:

> On Wed, Dec 18, 2013 at 8:32 AM, Erik Søe Sørensen 
> <e...@trifork.com<mailto:e...@trifork.com>> wrote:
> It really is not a good idea to use siblings to represent 1-to-many 
> relations. That's not what it's intended for, nor what it's optimized for...
> Ok, understood.
>
> Can you tell us exactly why you need Bitcask rather than LevelDB? 2i would 
> probably do it.
> 1) According to 
> http://docs.basho.com/riak/latest/ops/running/backups/#LevelDB-Backups , it's 
> real pain to implement backups with leveldb.
> 2) According to 
> http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/ , reads may 
> be slower comparing to bitcask, it's critical for us
>
> Otherwise, storing a list of items under each key could be a solution, 
> depending of course on the number of items per key. (But do perform conflict 
> resolution.)
> Why any conflict resolving is required? As far as I understood, with 
> allow_mult=true riak should just collect all the values written to key 
> without anything additional work? What design decision leads to exponential 
> slowdown and crashes when multiple values allowed for any single key?.. So, 
> what's the REAL purpose of allow_mult=true if it's bad idea to use it for 
> unlimited values per single key?

The real purpose of allow_mult=true is so that writes are never dropped. In the 
case where your application concurrently writes to the same key on two 
different nodes, or on two partitioned nodes, Riak keeps both values. Other 
data stores will lose one of the writes based on timestamp, serialise your 
writes (slow) or simply refuse to accept one or more of them.
Ok, but documentation doesn't make points really clear.

It is the job of the client to aggregate those multiple writes into a single 
value when it detects the conflict on read. Conflict resolution is required 
because your data is opaque to Riak. Riak doesn’t know that you’re storing 
lists of values, or JPEGs or JSON. It can’t possibly know how to resolve two 
conflicting values unless it knows the semantics of the values. Riak _does_ 
collect all the values written to a key, but it does so as a temporary measure, 
it expects your application to resolve them to a single value. How many are you 
writing per Key?
As I said before, we need really many values in our 1-many sets - up to 10^8
Also why not to implement separate bucket mode allowing just to collect all the 
values writing? Anyway, current allow_mult implementation looks like very 
dangerous. Also documentation should be more clear - in "sibling explosion" 
paragraph some statement should be added pointing that this relates to 
allow_mult=true too.

Riak’s sweetspot is highly write available applications. If you have the time 
read the Amazon Dynamo paper[1], as it explains the _problems_ Riak solves as 
well as the way in which it solves them. If you don’t have these problems, 
maybe Riak is not the right datastore for you. Solving these problems comes 
with some developer complexity costs. You’ve run into one of them. We have many 
customers who think the trade-off is worth it: that the high availability and 
low-latency makes up for having eventual consistency.

Yeah, ok, but what riak<2.0 really allows? FTS looks unscalable (am I right? is 
any way to speed-up it available?), list of all bucket keys is not for 
production, 2i is not implemented for bitcask (anyway, we'll try them on 
leveldb), links "implemented as hacks in java driver". So, riak<2.0 with 
bitcask is only good distributed 1-1 hashmap with mapred support.

>
> Ok, documentation contains the following paragraph:
>
> > Sibling explosion occurs when an object rapidly collects siblings without 
> > being reconciled. This can lead to a myriad of issues. Having an enormous 
> > object in your node can cause reads of that object to crash the entire 
> > node. Other issues are increased cluster latency as the object is 
> > replicated and out of memory errors.
>
> But there is no point if it related to allow_mult=false or both cases.

Sorry, but I don’t understand what you mean by this statement. The point of 
allow_mult=true is so that writes are not arbitrarily dropped. It allows Riak 
nodes to continue to be available to take writes even if they can’t communicate 
with each other. Have a look at Kyle Kingsbury’s Jepsen[2] post on Riak.

I'm just speaking about that this paragraph should contain something like 
"don't write multiple values into single key in bucket with allow_mult=true, 
this will cause dramatic slowdowns/crashes". It's not really obvious that 
siblings explosion is related to bucket with allow_mult=true.

>
> So, the only solution is leveldb+2i?

Maybe. Or maybe just use the client as it is intended to resolve sibling values 
and send that value and a vector clock back to Riak.
Not a solution for big sets of 10^8 elements

Or maybe roll your own indexes like in this blog post[3].
 It's not an option to use some custom "indexes" on client side for long lists, 
so the only option is to write some erlang piece of code?..

With Riak 2.0 there are a few data types added to Riak that are not opaque. 
Maybe Riak’s Sets would suit your purpose (depending on the size of your Set.)

 What are you meaning by "depending size of Set"?
Will I be able to store 10^8 values and enumerate/add new values fast enough?

You’re fighting the database at the moment, rather than working with it. The 
properties of Riak buy you some wonderful things (high availability, partition 
tolerance, low latency) but you have to want / need those properties, and then 
you have to accept that there is a data modelling / developer complexity price 
to pay. We don’t think that price is too high. We have many customers who 
agree. We’re always working to lower that price (see Strong Consistency, 
Yokozuna, Data Types etc in Riak 2.0[4].)
We've built 2.0 TP but it like to crash frequently and 2.0 driver still is not 
ready, but according to docs it looks like significantly better. But questions 
about maximum Set size and FTS scalability still looks actual.

You seem to have had a very negative first experience of Riak (and Basho.) I 
think that is because you misunderstand what it is for and how it should be 
used. I'm very keen to fix that. If it turns out that Riak is just not for you, 
that is fine too.
It's not negative experience, it's just WTFZOMG state. Everything looked good 
until loading/scalability tests...

In response to your earlier mail, I think Basho’s consulting costs sound 
incredibly low. I think you got that answer because you reached out to Basho 
through that channel, rather than ask the list. We’re still trying to track 
down who you spoke to and when, if you could provide me details of that 
conversation directly (rather than to the list) I’d be very grateful.
I think it's not really important for now, I think we've incorrectly emphasized 
our questions/thoughts.
Anyway for now looks like there is no silver bullet priced for $5k - all the 
possible approaches to solve our problem was already listed in this thread. And 
the only way I've missed in op message was custom indexing on server side 
(implemented as precommit hook, am I right? such as FTS?)

I’m not sure if it is just a cultural / language thing, but you’re very 
negative right now, and you sound like you're attacking Basho and Riak. I don’t 
think that is warranted at this point as we’re just trying to help you figure 
out if Riak is the datastore you want / need.

As I said before, I'm not negative. This picture http://tinyurl.com/p5zntks 
excellently describes thoughts of our dev team after set of loading tests. We 
got 100 writes/sec on single core i3 host. Ok, we got up to 500 writes (but we 
need 10k+) on single cc2.8xlarge host, but with 5 cc2.8xlarge nodes we got 
lesser with latency significantly increased. We changed our approach to using 
allow_mult - and got only 100 for some first seconds, then exponentially 
dropping to zero, then total crash of all the cluster... Also you are right - 
english is not my native language. What about subject of our thread - take it 
like like some yellow press headline (but I still think that it's not so good 
idea to allow client code to do SUCH BAD THINGS WITH WHOLE CLUSTER)

Cheers

Russell

[1] http://dl.acm.org/citation.cfm?id=1294281
[2] http://aphyr.com/posts/285-call-me-maybe-riak
[3] http://basho.com/index-for-fun-and-for-profit/
[4] http://basho.com/technical-preview-of-riak-2-0/

>
>
>
> On Wed, Dec 18, 2013 at 8:32 AM, Erik Søe Sørensen 
> <e...@trifork.com<mailto:e...@trifork.com>> wrote:
> It really is not a good idea to use siblings to represent 1-to-many 
> relations. That's not what it's intended for, nor what it's optimized for...
> Can you tell us exactly why you need Bitcask rather than LevelDB? 2i would 
> probably do it.
> Otherwise, storing a list of items under each key could be a solution, 
> depending of course on the number of items per key. (But do perform conflict 
> resolution.)
> /Erik
>
>
>
> -------- Oprindelig meddelelse --------
> Fra: Viable Nisei <vsni...@gmail.com<mailto:vsni...@gmail.com>>
> Dato:
> Til: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
> Emne: May allow_mult cause DoS?
>
>
> Hi.
>
> Recently we've described that something is going unexpectedly. We are using 
> Riak 1.4.2 with some buckets with allow_mult=true.
> We've tried our app under load then found that... concurrently writes into 
> bucket with allow_mult turning Riak into irresponsible slowpoke and even 
> crash it.
>
> Core i3 with 4GB RAM performs only 20 writes/sec with 5 client threads 
> writing 20 short strings into 20 keys in bucket with allow_mult=true, 
> search=false. With 40 values per 40 keys it performs only 6 writes/sec. 60x60 
> cause riak crash?
> Throughput drops drastically. Ok, we've not chaged concurrency factor (5) and 
> increased our data set 4x, but why throughput drops?
> Ok, we increase our dataset linear, 20 strings * 20 keys, 40 strings*20 keys, 
>  60 strings*20 keys... Results will be same - exponential throughput drop 
> with crash at end.
>
> Cluster of five Amazon EC2 cc2.8xlarge nodes becomes irresponsibly with 
> throughput 1-5 writes/sec with only 80-100 values per 1-10 keys.
>
> So, we think it is very strange.
>
> Here you can check our code sample (in java) reproducing this behavior: 
> https://bitbucket.org/vsnisei/riak-allow_mult_wtf
>
> So, we have asked Basho about this, but they said that "we think SQLish" and 
> asked us for $5k for 2-days consultation to resolve our problem.
> So, I've decided to ask here if we are really so stupid and not able to 
> understood some simple things or Basho didn't understood us correctly?..
>
> Anyway, looks like that some DoS/DDoS attack approach utilizing this behavior 
> may be proposed. We should only know that some service/appliation/website is 
> using Riak with allow_mult buckets then provoke concurrent writes into them...
>
> Actually our question to Basho was broader. Our application needs to 
> implement 1-many bindings. Riak allows the following approaches to simultate 
> such bindings, according to documentation:
>
>  1.  Riak search - but we've found that it's VERY slow (20x performance drop 
> when search enabled, even for simple objects like {source_id: xxx, target_id: 
> yyy}, also we've found that search is not really scalable - adding new nodes 
> into cluster not increasing throughput, but even slows cluster down...
>  2.  secondary indexes. But, according to docs, they are working only on 
> LevelDb, but we need Bitcask
>  3.  Link walking. But, according to docs, it's "rest only operation" and in 
> java driver it's implemented as a hack
>  4.  allow_mult. But we've found that it's just a nightmare. So we told Basho 
> about this and given link to our example, but they didn't given us any 
> feedback
>  5.  Bucket keys enumeration. But, according to docs, this operation causes 
> full keys scan on each node and must not be used in production
>  6.  Mapred queries. Ok, we didn't tried them yet, maybe it's silver bullet, 
> really. But according to docs (and common sense) mapred causes full-scan (for 
> bucket at least. Or for all keys?) and it's operation with unpredictable 
> latency.
>
> So, where we are wrong? Is everything ok with behavior I've described? Are we 
> misunderstood Riak completely and should pay $5k for some mind-expansion, or 
> there is no any hidden mystical knowledge and they will not say us anything 
> excepting approaches listed above?
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

SV: May allow_mult cause DoS?

Reply via email to