Let's talk about Riak data types for a moment. Riak data types are collectively 
implementations of what academia refer to as CRDT's (convergent or conflict 
free replicated data types.) The key benefit a CRDT offers, over a traditional 
KV by contrast, is in automatic conflict resolution. The various CRDT's 
provided in Riak have specific conflict resolution strategies. This does not 
come for free. There is a computational cost associated with CRDT's. If your 
use case requires automated conflict resolution strategies than CRDT's are a 
good fit. Internally CRDT's rely on vector clocks (see DVV's in the 
documentation) to resolve conflict. 

Considering your ETL use case I'm going to presume that your data is immutable 
(I could very well be wrong here.) If your data is immutable I would consider 
simply using a KV and not paying the CRDT computational penalty (and possibly 
even the write once bucket.) The CRDT penalty you pay is obviously subjective 
to your use case, configuration, hw deployment etc. 

Hope that helps!
-Alexander 

@siculars
http://siculars.posthaven.com

Sent from my iRotaryPhone

> On Oct 20, 2015, at 12:39, Dennis Nicolay <dnico...@orcawave.net> wrote:
> 
> Hi Alexander,
>  
> I’m parsing the file and storing each row with own key in a map datatype 
> bucket and each column is a register. 
>  
> Thanks,
> Dennis
>  
> From: Alexander Sicular [mailto:sicul...@gmail.com] 
> Sent: Tuesday, October 20, 2015 10:34 AM
> To: Dennis Nicolay
> Cc: Christopher Mancini; riak-users@lists.basho.com
> Subject: Re: Using Bucket Data Types slowed insert performance
>  
> Hi Dennis,
>  
> It's a bit unclear what you are trying to do here. Are you 1. uploading the 
> entire file and saving it to one key with the value being the file? Or are 
> you 2. parsing the file and storing each row as a register in a map? 
>  
> Either of those approaches are not appropriate in Riak KV. For the first case 
> I would point you to Riak S2 which is designed to manage large binary object 
> storage. You can keep the large file as a single addressable entity and 
> access it via Amazon S3 or Swift protocol. For the second case I would 
> consider maintaining one key (map) per row in the file and have a register 
> per column in the row. Or not use Riak data types (maps, sets, registers, 
> flags and counters) and simply keep each row in the file as a KV in Riak 
> either as a raw string or as a serialized json string. ETL'ing out of 
> relational databases and into Riak is a very common use case and often 
> implemented in the fashion I described. 
>  
> As Chris mentioned, soft upper bound on value size should be 1MB. I say soft 
> because we won't enforce it although there are settings in the config that 
> can be changed to enforce it (default 5MB warning, 50MB reject I believe.) 
> 
> Best,
> Alexander
> 
> 
> @siculars
> http://siculars.posthaven.com
>  
> Sent from my iRotaryPhone
> 
> On Oct 20, 2015, at 10:22, Christopher Mancini <cmanc...@basho.com> wrote:
> 
> Hi Dennis,
> 
> I am not the most experienced, but what I do know is that a file that size 
> causes a great deal of network chatter because it has to handoff that data to 
> the other nodes in the network and will cause delays in Riak's ability to 
> send and confirm consistency across the ring. Typically we recommend that you 
> try to structure your objects to around 1mb or less to ensure consistent 
> performance. That max object size can vary of course based on your network / 
> server specs and configuration.
> 
> I hope this helps.
> 
> Chris
>  
> On Tue, Oct 20, 2015 at 8:18 AM Dennis Nicolay <dnico...@orcawave.net> wrote:
> Hi,
>  
> I’m using .net RiakClient 2.0 to insert a 44mb delimited file with 139k rows 
> of data into riak.  I switched to a map bucket data type with registers.   It 
> is taking about 3 times longer to insert into this bucket vs non data typed 
> bucket.  Any suggestions?
>  
> Thanks in advance,
> Dennis
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to