Honestly, right now we need to work on optimising the Map. We do have a smaller/faster map in a branch that we’re working on shipping soon, as well as other optimisation planned.
Does your use case have you adding and removing registers, or is this basically a set schema of registers per key? If you’re not removing/re-adding registers, I would use a CRDT not in Riak, but in your application. CRDTs in Riak make sense for causal data types: where the actor management is onerous for the client. What you’re modelling using the map looks like Last Write Wins element Set. This is a pretty simple CRDT to make in your own programming language/application, and you can write this data type yourself, and simply store the binary representation of it in riak, using riak’s siblings (allow_mult=true.) hen your application gets siblings values, simply run you application code’s merge function. There are details of the last write elements set here https://github.com/soundcloud/roshi, but all you need to is store a pair (element, TS) for each member in the set. If you’re going to store removing registers it gets more complex…are you? Cheers Russell On 20 Oct 2015, at 20:25, Dennis Nicolay <dnico...@orcawave.net> wrote: > > ResultObject cdr; > while (queued.TryDequeue(out cdr)) > { > long beforeProcessing = DateTime.Now.Ticks; > UpdateMap.Builder builder = BuildMapObject(bucket, > cdr); > UpdateMap cmd = builder.Build(); > RiakResult rslt = client.Execute(cmd); > > > > > private static UpdateMap.Builder BuildMapObject(string bucketname, > ResultObject cdr ) > { > > var builder = new UpdateMap.Builder() > .WithBucketType("maps") > .WithBucket(bucketname) > .WithKey(cdr.CdrKey); > var mapOperation = new UpdateMap.MapOperation(); > mapOperation.SetRegister("FileTimeStamp", > cdr.CdrValue.FileTimeStamp.ToString()); > mapOperation.SetRegister("AuditId", > cdr.CdrValue.AuditId.ToString()); > mapOperation.SetRegister("CdrId", cdr.CdrValue.CdrId.ToString()); > mapOperation.SetRegister("IsBillable", > cdr.CdrValue.IsBillable.ToString()); > mapOperation.SetRegister("SwitchId", > cdr.CdrValue.SwitchId.ToString()); > mapOperation.SetRegister("SwitchDescription", > cdr.CdrValue.SwitchDescription.ToString()); > mapOperation.SetRegister("SequenceNumber", > cdr.CdrValue.SequenceNumber.ToString()); > mapOperation.SetRegister("CallDirection", > cdr.CdrValue.CallDirection.ToString()); > mapOperation.SetRegister("CallTypeId", > cdr.CdrValue.CallTypeId.ToString()); > mapOperation.SetRegister("Partition", > cdr.CdrValue.Partition.ToString()); > mapOperation.SetRegister("CustomerTrunkId", > cdr.CdrValue.CustomerTrunkId.ToString()); > mapOperation.SetRegister("OrigIpAddress", > cdr.CdrValue.OrigIpAddress.ToString()); > mapOperation.SetRegister("OrigPort", > cdr.CdrValue.OrigPort.ToString()); > mapOperation.SetRegister("SupplierTrunkId", > cdr.CdrValue.SupplierTrunkId.ToString()); > mapOperation.SetRegister("TermIpAddress", > cdr.CdrValue.TermIpAddress.ToString()); > mapOperation.SetRegister("TermPort", > cdr.CdrValue.TermPort.ToString()); > mapOperation.SetRegister("Ani", cdr.CdrValue.Ani.ToString()); > mapOperation.SetRegister("OutpulseNumber", > cdr.CdrValue.OutpulseNumber.ToString()); > mapOperation.SetRegister("SubscriberNumber", > cdr.CdrValue.SupplierTrunkId.ToString()); > mapOperation.SetRegister("CallingNoa", > cdr.CdrValue.CallingNoa.ToString()); > mapOperation.SetRegister("DialedNoa", > cdr.CdrValue.DialedNoa.ToString()); > mapOperation.SetRegister("OutpulseNoa", > cdr.CdrValue.OutpulseNumber.ToString()); > mapOperation.SetRegister("TreatmentCode", > cdr.CdrValue.TreatmentCode.ToString()); > mapOperation.SetRegister("CompletionCode", > cdr.CdrValue.CompletionCode.ToString()); > mapOperation.SetRegister("CustomerName", > cdr.CdrValue.CustomerName.ToString()); > mapOperation.SetRegister("CustId", > cdr.CdrValue.CustId.ToString()); > mapOperation.SetRegister("CustContractId", > cdr.CdrValue.CustContractId.ToString()); > mapOperation.SetRegister("CustCountryCode", > cdr.CdrValue.CustCountryCode.ToString()); > mapOperation.SetRegister("CustDuration", > cdr.CdrValue.CustDuration.ToString()); > mapOperation.SetRegister("Price", cdr.CdrValue.Price.ToString()); > mapOperation.SetRegister("BasePrice", > cdr.CdrValue.BasePrice.ToString()); > mapOperation.SetRegister("BillingDestinationName", > cdr.CdrValue.BillingDestinationName.ToString()); > mapOperation.SetRegister("BillingGroupId", > cdr.CdrValue.BillingGroupId.ToString()); > mapOperation.SetRegister("SupplierName", > cdr.CdrValue.SupplierName.ToString()); > mapOperation.SetRegister("SuppId", > cdr.CdrValue.SuppId.ToString()); > mapOperation.SetRegister("SuppContractId", > cdr.CdrValue.SuppContractId.ToString()); > mapOperation.SetRegister("SuppCountryCode", > cdr.CdrValue.SuppCountryCode.ToString()); > mapOperation.SetRegister("SuppDuration", > cdr.CdrValue.SuppDuration.ToString()); > mapOperation.SetRegister("Cost", cdr.CdrValue.Cost.ToString()); > mapOperation.SetRegister("BaseCost", > cdr.CdrValue.BaseCost.ToString()); > mapOperation.SetRegister("RoutingDestinationName", > cdr.CdrValue.RoutingDestinationName.ToString()); > mapOperation.SetRegister("RoutingGroupId", > cdr.CdrValue.RoutingGroupId.ToString()); > mapOperation.SetRegister("RouteToCountryCode", > cdr.CdrValue.RouteToCountryCode.ToString()); > mapOperation.SetRegister("Pdd", cdr.CdrValue.Pdd.ToString()); > mapOperation.SetRegister("RealDuration", > cdr.CdrValue.RealDuration.ToString()); > mapOperation.SetRegister("StartTime", > cdr.CdrValue.StartTime.ToString()); > mapOperation.SetRegister("EndTime", > cdr.CdrValue.EndTime.ToString()); > mapOperation.SetRegister("NumberCalled", > cdr.CdrValue.NumberCalled.ToString()); > mapOperation.SetRegister("CallingLataOcn", > cdr.CdrValue.CallingLataOcn.ToString()); > mapOperation.SetRegister("DialedLataOcn", > cdr.CdrValue.DialedLataOcn.ToString()); > mapOperation.SetRegister("LrnLataOcn", > cdr.CdrValue.LrnLataOcn.ToString()); > mapOperation.SetRegister("CustomerPrefix", > cdr.CdrValue.CustomerPrefix.ToString()); > mapOperation.SetRegister("SupplierPrefix", > cdr.CdrValue.SupplierPrefix.ToString()); > mapOperation.SetRegister("OriginationCountryCode", > cdr.CdrValue.OriginationCountryCode.ToString()); > mapOperation.SetRegister("OriginationCost", > cdr.CdrValue.OriginationCost.ToString()); > mapOperation.SetRegister("FixedPricePerCall", > cdr.CdrValue.FixedPricePerCall.ToString()); > mapOperation.SetRegister("FixedCostPerCall", > cdr.CdrValue.FixedCostPerCall.ToString()); > mapOperation.SetRegister("InvoiceId", > cdr.CdrValue.InvoiceId.ToString()); > mapOperation.SetRegister("BusinessId", > cdr.CdrValue.BusinessId.ToString()); > > builder.WithMapOperation(mapOperation); > return builder; > } > > > From: Christopher Mancini [mailto:cmanc...@basho.com] > Sent: Tuesday, October 20, 2015 11:52 AM > To: Mark Schmidt; Alexander Sicular; Dennis Nicolay > Cc: riak-users@lists.basho.com > Subject: Re: Using Bucket Data Types slowed insert performance > > Hi Mark / Dennis, > > Can you provide the snippet of the code that puts a 5k record onto Riak as a > map? > > Chris > > On Tue, Oct 20, 2015 at 11:30 AM Mark Schmidt <mschm...@orcawave.net> wrote: > Hi folks, sorry for the confusion. > > Our scenario is as follows: > > We have a 6 node development cluster running on its own network segment using > HAProxy to facilitate load-balancing across the nodes. A single Riak-dot-NET > client service is performing the insert operations from dedicated hardware > located within the same network segment. We have basic network throughput > capabilities of 100 Mbit with an average speed achievable of 75 Mbit. > > The data we are attempting to insert is composed of phone call record > receipts from telephone carriers. These records are batched and written to a > flat file for incorporation into our reporting engine. 1) Our Riak client > process takes a flat file (In this case, a 40MB collection of records, each > record being approximately 5k in size) and parses the entire file so each > record can be added to a local .NET queue. > 2) Once the entire file has been parsed and each record loaded into the local > queue, 20 threads are spawned and connections are opened to our Riak nodes > via the HAProxy. > 3) Each thread will pull a 5k record from the queue on a first come first > served basis and perform a put to the Riak environment. > > When first testing our client insert process, we were pushing the 5K records > as whole strings into the Riak environment. Network throughput topped out at > around 80 Mbits with a total load time of 90 seconds for 149k records. When > the client process was modified (same queuing and de-queuing methods) so that > a map datatype bucket would be created and keys stored as registers, we saw > network throughput drop to around 10 Mbit with total upload time increase to > around 270 seconds for the 149k records. > > It appears as though we’ve either encountered a potential bottleneck > unrelated to network throughput, or we’re just seeing an expected processing > penalty for our use of Riak datatypes. Please note, we’re configuring Zabbix > so we can monitor disk IO on each node as processor and memory resources > don’t appear to be the culprit either. > > If the reduction in processing speed is a natural consequence to utilizing > Riak data types, is the inter-node network the optimum place to increase > resources? Our eventual datacenter implementation will support speeds of over > 40 Gbit for inter-node communication. We’re just trying to identify which > levers from an operational standpoint we can throw to boost performance, or > if our client implementation is suspect. > > You bring up some excellent points regarding our use of CRDTs. In our case, > the call data records are mutable as they are subject to changes by phone > carriers for billing error corrections, incorrect data and a host of other > reasons. We may be better served by treating the records as immutable and > performing wide scale record removal and “reprocessing” in the event changes > to existing records are received/requested. > > Thank you, > > Mark Schmidt > > From: Alexander Sicular [mailto:sicul...@gmail.com] > Sent: Tuesday, October 20, 2015 10:55 AM > To: Dennis Nicolay <dnico...@orcawave.net> > Cc: Christopher Mancini <cmanc...@basho.com>; riak-users@lists.basho.com; > Mark Schmidt <mschm...@orcawave.net> > > Subject: Re: Using Bucket Data Types slowed insert performance > > Let's talk about Riak data types for a moment. Riak data types are > collectively implementations of what academia refer to as CRDT's (convergent > or conflict free replicated data types.) The key benefit a CRDT offers, over > a traditional KV by contrast, is in automatic conflict resolution. The > various CRDT's provided in Riak have specific conflict resolution strategies. > This does not come for free. There is a computational cost associated with > CRDT's. If your use case requires automated conflict resolution strategies > than CRDT's are a good fit. Internally CRDT's rely on vector clocks (see > DVV's in the documentation) to resolve conflict. > > Considering your ETL use case I'm going to presume that your data is > immutable (I could very well be wrong here.) If your data is immutable I > would consider simply using a KV and not paying the CRDT computational > penalty (and possibly even the write once bucket.) The CRDT penalty you pay > is obviously subjective to your use case, configuration, hw deployment etc. > > Hope that helps! > -Alexander > > > @siculars > http://siculars.posthaven.com > > Sent from my iRotaryPhone > > On Oct 20, 2015, at 12:39, Dennis Nicolay <dnico...@orcawave.net> wrote: > > Hi Alexander, > > I’m parsing the file and storing each row with own key in a map datatype > bucket and each column is a register. > > Thanks, > Dennis > > From: Alexander Sicular [mailto:sicul...@gmail.com] > Sent: Tuesday, October 20, 2015 10:34 AM > To: Dennis Nicolay > Cc: Christopher Mancini; riak-users@lists.basho.com > Subject: Re: Using Bucket Data Types slowed insert performance > > Hi Dennis, > > It's a bit unclear what you are trying to do here. Are you 1. uploading the > entire file and saving it to one key with the value being the file? Or are > you 2. parsing the file and storing each row as a register in a map? > > Either of those approaches are not appropriate in Riak KV. For the first case > I would point you to Riak S2 which is designed to manage large binary object > storage. You can keep the large file as a single addressable entity and > access it via Amazon S3 or Swift protocol. For the second case I would > consider maintaining one key (map) per row in the file and have a register > per column in the row. Or not use Riak data types (maps, sets, registers, > flags and counters) and simply keep each row in the file as a KV in Riak > either as a raw string or as a serialized json string. ETL'ing out of > relational databases and into Riak is a very common use case and often > implemented in the fashion I described. > > As Chris mentioned, soft upper bound on value size should be 1MB. I say soft > because we won't enforce it although there are settings in the config that > can be changed to enforce it (default 5MB warning, 50MB reject I believe.) > > Best, > Alexander > > > > @siculars > http://siculars.posthaven.com > > Sent from my iRotaryPhone > > On Oct 20, 2015, at 10:22, Christopher Mancini <cmanc...@basho.com> wrote: > > Hi Dennis, > > I am not the most experienced, but what I do know is that a file that size > causes a great deal of network chatter because it has to handoff that data to > the other nodes in the network and will cause delays in Riak's ability to > send and confirm consistency across the ring. Typically we recommend that you > try to structure your objects to around 1mb or less to ensure consistent > performance. That max object size can vary of course based on your network / > server specs and configuration. > > I hope this helps. > > Chris > > On Tue, Oct 20, 2015 at 8:18 AM Dennis Nicolay <dnico...@orcawave.net> wrote: > Hi, > > I’m using .net RiakClient 2.0 to insert a 44mb delimited file with 139k rows > of data into riak. I switched to a map bucket data type with registers. It > is taking about 3 times longer to insert into this bucket vs non data typed > bucket. Any suggestions? > > Thanks in advance, > Dennis > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com