Re: Insert vs Upsert

2020-02-03 Thread Jean-Daniel Cryans
Hi Dmitry, It depends if the upsert turns into an insert or an update, and it will share the same characteristics as what it turns into. So if all your upserts turn into inserts, because none of the rows already exist, then it's just like if you had done a pure insert workload. Hope this helps,

Re: Changing number of Kudu worker threads

2019-02-14 Thread Jean-Daniel Cryans
d a way to reuse client instance in NiFi while > still keeping native concurrency benefits of NiFi and our performance > improved by 10 times at least! > > Thanks for your help and ideas! It is quite a relief for us! > Boris > > On Thu, Feb 14, 2019 at 11:51 AM Jean-Daniel Cr

Re: Changing number of Kudu worker threads

2019-02-13 Thread Jean-Daniel Cryans
Some comments on the original problem: "we need to process 1000s of operations per second and noticed that our Kudu 1.5 cluster was only using 10 threads while our application spins up 50 clients/threads" I wouldn't directly infer that 20 threads won't be enough to match your needs. The time it

Re: Spark KuduContext timeout settings

2018-05-24 Thread Jean-Daniel Cryans
Hi Vladimir, As you saw there's no way to do this, although it shouldn't be too hard to add. I definitely see value in it. Are you facing timeouts writing or reading? Maybe there's something that can be optimized in your use case so that you don't even run a chance of getting timeouts. Thanks,

Re: first and second run 2x query time difference

2018-01-03 Thread Jean-Daniel Cryans
;bo...@boristyukin.com> wrote: > it is possible but I thought Kudu keeps its stuff in its own folders > > On Wed, Jan 3, 2018 at 1:45 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > >> Hey Boris, >> >> Thanks for reporting back with results! >> &

Re: first and second run 2x query time difference

2018-01-03 Thread Jean-Daniel Cryans
r help, J-D > > On Sat, Dec 16, 2017 at 4:05 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > >> I'm more thinking in terms of the startup IO having some impact on the >> co-located services, but we really need to know what "went down" means. >>

Re: [ANNOUNCE] New committers over past several months

2017-12-19 Thread Jean-Daniel Cryans
One (or three) of us! One of us! Congrats to all, J-D On Mon, Dec 18, 2017 at 9:00 PM, Todd Lipcon wrote: > Hi Kudu community, > > I'm pleased to announce that the Kudu PMC has voted to add Andrew Wong, > Grant Henke, and Hao Hao as Kudu committers and PMC members. This >

Re: first and second run 2x query time difference

2017-12-16 Thread Jean-Daniel Cryans
ither one. I'll get with > him on Monday to gather more details > > On Sat, Dec 16, 2017 at 3:28 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > >> Hi Boris, >> >> How exactly did HDFS and ZK go down? A Kudu restart is fairly >> IO-intensive but

Re: first and second run 2x query time difference

2017-12-16 Thread Jean-Daniel Cryans
often than >> that! :)) >> >> I will report back with our results. So far I am really impressed with >> Kudu - we have been benchmarking ingest and egress throughput and our >> typical queries runtime. The biggest pain so far is lack of support for >> decimals >

Re: first and second run 2x query time difference

2017-12-13 Thread Jean-Daniel Cryans
evious query and this time I do not see > any difference in query time before the first and second time - I guess > this confirms your statement about " first time ever scanning the table > since a Kudu restart" and collecting metadata. > Maybe, I've been known to be right once or tw

Re: first and second run 2x query time difference

2017-12-13 Thread Jean-Daniel Cryans
Hi Boris, Given that we don't have much data we can use here, I'll have to extrapolate. As an aside though, this is yet another example where we need more Kudu-side metrics in the query profile. So, Kudu lazily loads a bunch of metadata and that can really affect scan times. If this was your

Re: kudu tablet change_config add_replica exec slowly

2017-09-21 Thread Jean-Daniel Cryans
Hi Lee, There were a lot of improvements recently to copy time, if you have to run against master then things should speed up on your next refresh. But, TBH, it still won't be perfect as this isn't done in a streaming fashion. The source tablet servers sends one block, the destination writes it,

Re: what is your typical size of tablet.

2017-08-30 Thread Jean-Daniel Cryans
Hi Denis, I don't directly manage Kudu clusters but what I've seen ranges anywhere from 0 bytes to 100GB. I wouldn't recommend going much higher than this because re-replicating 100GB takes a _long_ time, although it should be a little better in upcoming 1.5.0 thanks to Hao's work. Sweet spot is

Re: Re: KUDU INSERT SLOWLY

2017-08-23 Thread Jean-Daniel Cryans
(putting dev@ in bcc again, please be mindful of which mailing list you write to) The best way would be to follow the "Batch insert" method described here: http://kudu.apache.org/docs/kudu_impala_integration.html#kudu_impala_insert_bulk J-D On Wed, Aug 23, 2017 at 6:30 PM, sky

Re: [kudu] import from hdfs

2017-08-16 Thread Jean-Daniel Cryans
or the person responsible for delivering the message to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. All unintended > recipients are obliged to delete this message and destroy any pr

Re: [ANNOUNCE] Apache Kudu 1.4.0 released

2017-06-19 Thread Jean-Daniel Cryans
es-in-multiple-formats > > > > 2017-06-16 17:43 GMT+03:00 Jean-Daniel Cryans <jdcry...@apache.org>: > >> Hi Pavel, >> >> Cloudera stopped releasing binaries in lockstep with Apache Kudu >> releases. Right now the Kudu project doesn't publish binaries, on

Re: Question about kerberos integration

2017-05-03 Thread Jean-Daniel Cryans
Hi, You can find the documentation here: http://kudu.apache.org/docs/security.html It could use more information for the client side of things, but basically there's nothing you need to do (apart from being kinit'd) to get authentication and wire encryption working. I'd also recommend reading

Re: Help with Kudu Kerberos Integration

2017-04-04 Thread Jean-Daniel Cryans
Hi Juan, The documentation is a bit late coming, here's a draft: https://gerrit.cloudera.org/#/c/6479/ Cheers, J-D On Tue, Apr 4, 2017 at 5:14 AM, Juan Pablo Briganti < juan.briga...@globant.com> wrote: > Hi Kudu Team, > > First of all thank you for providing kerberos security feature in Kudu

Re: How to reuse tablet server UUID, or removing old one

2017-03-09 Thread Jean-Daniel Cryans
Hi Alexandre, Tablet replicas are not tied to a UUID, so removing or reusing one wouldn't achieve what you want. The main thing missing here is that Kudu doesn't do tablet re-balancing at runtime, so tabletserver5 will get tablets the next time a node dies or if you create new tables. Obviously

Re: Impala-KUDU debian 8 support

2016-12-04 Thread Jean-Daniel Cryans
It might be supported at some point but I don't know of any timeline. J-D On Wed, Nov 30, 2016 at 4:57 AM, Ladislav Gabčo wrote: > Hi, > > > > I have managed to deploy KUDU to cluster (on debian jessie), but was > unable to install Impala-KUDU (using CM). > > I have

Re: Kudu 1.0.0 Tablet Server not Starting After Replacing Failed Drive

2016-11-02 Thread Jean-Daniel Cryans
Hi Trey, Kudu currently requires removing all the Kudu data folders on a machine when one disk fails. This is because Kudu effectively does striping over all the data disks. Assuming you're not running with replication=1, your data should already be re-replicated on your other nodes. Hope this

Re: Error while import data from impala to kudu using sql "insert into ... select ..."

2016-10-12 Thread Jean-Daniel Cryans
Hi, There's been some reports (https://issues.cloudera.org/browse/IMPALA-3991) that the latest Impala Kudu is slower yes, try the one that was released before that? Regarding your log, I'd investigate what's going on in 192.168.110.9's tserver WARNING log around the time of that query. J-D On

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-21 Thread Jean-Daniel Cryans
(with my vendor hat on) >From a Cloudera perspective, support for Kudu is still in beta. We offer the bits with no guarantees. If you have more questions regarding parcels, CM, etc, please direct them to http://community.cloudera.com/t5/Beta-Releases-Apache-Kudu/bd-p/Beta Thanks! J-D On Wed,

Re: Create encoded columns in kudu

2016-09-21 Thread Jean-Daniel Cryans
Hi Amit, There's this jira on the Impala side: https://issues.cloudera.org/browse/IMPALA-3726 I don't know exactly when it'll be available, but I think it's being looked at. Dan Burkert also has a Rust shell for Kudu somewhere, I'll let him comment about it. J-D On Wed, Sep 21, 2016 at 5:36