Re: [ANNOUNCE] Welcoming Márton Greber as Kudu committer and PMC member

2023-11-14 Thread Alexey Serbin
Congratulations, Márton! On Tue, Nov 14, 2023 at 9:37 AM Andrew Wong wrote: > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Márton Greber as a > new committer and PMC member. > > Some of Márton's contributions include: > - Getting Kudu to build and run on Apple

Re: Does Kudu connecor has Kerberos auth support?

2023-09-15 Thread Alexey Serbin
Hi Melih, Yes, Kudu clients do support Kerberos authentication. I'm not sure what exactly you referred to as "Kudu connector", but both C++ and Java clients can authenticate to a secure Kudu cluster using Kerberos.

Re: [ANNOUNCE] Welcoming Abhishek Chennaka as Kudu committer and PMC member

2023-02-27 Thread Alexey Serbin
Congrats, Abhishek! I'm happy to know you've accepted the invitation and look forward to contributing to the project. Kind regards, Alexey On Mon, Feb 27, 2023 at 9:18 AM Abhishek Chennaka wrote: > Thank you all a ton for your appreciation. I'll try to keep contributing > more and more. > >

Re: Kudu cluster sizing questions

2021-09-23 Thread Alexey Serbin
Hi Chetan, Thank you for taking a look at Kudu! Apache Kudu is designed to perform well in OLAP workloads. You can scale Kudu cluster horizontally pretty well at least up to few hundreds of nodes, and here you could find more information on recommended data-per-node-sizes, scaling limitations,

Re: Failure to find org.apache.kudu:kudu-binary:jar:linux-aarch_64:1.9.0 in Maven Central

2021-05-11 Thread Alexey Serbin
Hi, Indeed, with the work performed in the context https://issues.apache.org/jira/browse/KUDU-3007, it has become possible to build and run Kudu on ARM/aarch64 in 1.13 release. It seems in 1.14 and in the main trunk the ARM/aarch64 build is now broken [1]. I don't think that upgrading the

Re: Cache error running KuduTestHarness

2020-10-12 Thread Alexey Serbin
_mb=475") > > .addTabletServerFlag("--block_cache_capacity_mb=475"); > > } > > > > @Rule > > public KuduTestHarness harness=new KuduTestHarness(builder); > > > > Is there a reason why the javadocs for the test cluster classes are not > availabl

Re: Cache error running KuduTestHarness

2020-10-07 Thread Alexey Serbin
Hi, I haven't looked at the issue with the builder ignoring the settings you added, but as a working example of adding custom flags to Kudu master and tablet servers you can take a look at:

Re: Kudu - Azure Integration Script

2020-09-03 Thread Alexey Serbin
Hi, I'm not aware of such a thing as "Azure integration for Kudu" at this point (what it would entail, BTW?). Maybe, somebody else can chime in if they have some sort of Azure-specific content that they find useful. But as for the scripts to automatically start Kudu servers after rebooting a

Re: setFaultTolerant ordering guarantees

2020-07-25 Thread Alexey Serbin
to have returned tablet rows ordered. Thanks, Alexey On Sat, Jul 25, 2020 at 2:53 PM Alexey Serbin wrote: > Hi Petar, > > Yes, you are right: fault-tolerant scans sort their results in primary key > order (note: within a tablet only; this sort is not global). I'm not sure >

Re: setFaultTolerant ordering guarantees

2020-07-25 Thread Alexey Serbin
Hi Petar, Yes, you are right: fault-tolerant scans sort their results in primary key order (note: within a tablet only; this sort is not global). I'm not sure there are other explicit guarantees exposed in the API in that regard. Kind regards, Alexey On Thu, Jul 23, 2020 at 4:07 PM Petar

Re: Why is it slow to write Kudu with 100+ threads?

2020-06-09 Thread Alexey Serbin
Hi, Thank you for the stats. I guess one crucial point is using proper flush mode for Kudu sessions. Make sure it's AUTO_FLUSH_BACKGROUND, not AUTO_FLUSH_SYNC. Another important point is the number of RPC workers: by default it's 20, but given that your server has 28 cores (I guess it's 2

Re: Why does partition keys have to be in the primary keys?

2020-05-06 Thread Alexey Serbin
Hi, The restriction on the partitioning key to be composed of primary key columns significantly simplifies the design and implementation. However, I'm not sure I understand why the rules of partitioning come to play here. To me it looks like the main question is about the schema for the table,

Re: Implications/downside of increasing rpc_service_queue_length

2020-05-01 Thread Alexey Serbin
I guess the point about the low-latency requests was that long RPC queues might add extra latency to request handling, and the latency might be unpredictably long. E.g., if the queue is almost full and a new RPC request is added, the request will be dispatched to one of the available service

Re: [ANNOUNCE] Welcoming Bankim Bhavsar as Kudu committer and PMC member

2020-04-21 Thread Alexey Serbin
Congratulations Bankim! Great to see these valuable contributions, keep it up! Best regards, Alexey On Sat, Apr 18, 2020 at 11:04 PM Hao Hao wrote: > Congrats Bankim! Well deserved! > > Best, > Hao > > On Sat, Apr 18, 2020 at 5:45 PM Andrew Wong wrote: > >> Congratulations Bankim! Keep up

Re: some troubles about kudu cluster

2020-04-15 Thread Alexey Serbin
Hi, Those messages from the Kudu Java client say that something is wrong with the specified server (UUID 0178e667f8a8474caace936b7539e746). I would take a look into the kudu-tserver logs at the node where the server was running. /Alexey On Tue, Apr 14, 2020 at 2:19 AM evan <564740...@qq.com>

Re: Will multiple transactions write to Kudu concurrently cause deadlock in Kudu?

2020-04-13 Thread Alexey Serbin
At this point, Kudu doesn't support multi-row transactions, so I'm not sure how deadlock is possible. On Mon, Apr 13, 2020 at 3:29 AM Ray Liu (rayliu) wrote: > Just found this ticket which answers my question. > > https://issues.apache.org/jira/browse/KUDU-47 > > > > I’ll try it out anyways. >

Re: hash and range partition uneven distribution for one tablet server

2020-03-26 Thread Alexey Serbin
Hi, Do you mean you still see uneven distribution of leader replicas? Thanks, Alexey On Thu, Mar 26, 2020 at 7:56 PM Fisk Xia wrote: > Hi, > > Thanks for you time and attention. > > To further elaborate the situation, we are having replication factor = 1. > We have tried running Kudu

Re: Kudu/Spark LIMIT support

2020-01-23 Thread Alexey Serbin
Yes, your observations match what's in the code: Kudu Spark bindings don't support scanner row limits, but Kudu Java, C++ and Python clients do support that. And indeed, https://issues.apache.org/jira/browse/KUDU-16 contains relevant information on the status of this feature, missing As of my

Re: [ANNOUNCE] Welcoming Yifan Zhang as Kudu committer and PMC member

2020-01-07 Thread Alexey Serbin
Congratulations Yifan and keep the great work going! :) /Alexey On Tue, Jan 7, 2020 at 10:59 AM Hao Hao wrote: > Congratulations! > > On Tue, Jan 7, 2020 at 10:02 AM Grant Henke wrote: > >> Congratulations! >> >> On Tue, Jan 7, 2020 at 12:20 AM 赖迎春 wrote: >> >>> Congratulations Yifan! >>>

[ANNOUNCE] Apache Kudu 1.11.1 Released

2019-11-20 Thread Alexey Serbin
The Apache Kudu team is happy to announce the release of Kudu 1.11.1! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It supports many integrations with other data analytics projects both inside

[ANNOUNCE] Apache Kudu 1.10.1 Released

2019-11-20 Thread Alexey Serbin
The Apache Kudu team is happy to announce the release of Kudu 1.10.1! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It supports many integrations with other data analytics projects both inside

[ANNOUNCE] Apache Kudu 1.11.0 Released

2019-11-01 Thread Alexey Serbin
The Apache Kudu team is happy to announce the release of Kudu 1.11.0! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and

Re: "Too many open files" error

2019-10-10 Thread Alexey Serbin
let server and > migrate existing tablets? > > On Sat, Oct 5, 2019 at 10:05 PM Alexey Serbin > wrote: > >> Hi, >> >> Most likely the issue happened because of high number of tablet replicas >> at the tablet server. In case of high spike of in the input d

Re: [ANNOUNCE] Welcoming Lifu He, Yao Xu, and Yao Zhang as Kudu committers and PMC members

2019-08-29 Thread Alexey Serbin
Congratulations and thank you guys! It's great to see those awesome contributions coming from the new members of the Kudu community. Excellent work, keep it up! /Alexey On Mon, Aug 26, 2019 at 5:06 PM 赖迎春 wrote: > Congratulations! > > Grant Henke 于2019年8月27日 周二05:10写道: > >> Congratulations!

Re: impala with kudu write become very slow

2019-07-19 Thread Alexey Serbin
Hi, It's hard to say what might be the problem without additional information. Could you clarify on the following questions: 1. What was the rate of write operations for the 270M rows you mentioned? Was that regular 50K rows/sec or something else? 2. Do you still observe the slowness or it's

Re: is this mean the disk read rate was too slow

2019-07-15 Thread Alexey Serbin
Hi, What was the expectation for the scan operation's timing w.r.t. the size of the result set? Did you see it was much faster in past? I would start with making sure the primary key of the table has indeed the columns used in the predicate. Also, if there has been 'trickle inserts' running

Re: Single value range partitions using the Java API

2019-02-20 Thread Alexey Serbin
Hi Nabeelah, If you are looking at some hints how to deduce range partition bounds the Impala-like way just from a single tuple, one starting point I could see is

Re: strange behavior of getPendingErrors

2018-11-17 Thread Alexey Serbin
https://issues.apache.org/jira/browse/KUDU-2625 is the JIRA to track this issue. Feel free to add details, comments, etc. Thanks, Alexey On Sat, Nov 17, 2018 at 7:13 AM Alexey Serbin wrote: > Hey Todd, > > Yes, that behavior is a bit strange especially given the fact that the &

Re: strange behavior of getPendingErrors

2018-11-17 Thread Alexey Serbin
, such an error is a per-row data issue and should only affect > the row with the problem, not some arbitrary subset of rows in the batch > which happened to share a partition. > > Does anyone disagree? > > Todd > > On Fri, Nov 16, 2018, 9:28 PM Alexey Serbin >> Hi Boris, >

Re: Index question

2018-11-01 Thread Alexey Serbin
One more bit which might be relevant in this context: there is a work-in-progress patch https://gerrit.cloudera.org/#/c/10983/ addressing KUDU-1291. That's not secondary indices per se, but that might help in some cases where the prefix component of the primary key is of low cardinality. On Thu,

Re: Install kudu-1.6 in Ubuntu-14.04 via apt-get

2018-09-03 Thread Alexey Serbin
Hi, I think there are Kudu 1.6.0 trusty deb packages at: http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/pool/contrib/k/kudu As per your question of installing Kudu without Cloudera Manager, you can always build Kudu from source:  

Re: Data inconsistency after restart

2017-12-07 Thread Alexey Serbin
Hi Petter, Before going too deep in attempts to find the place where the data was lost, I just wanted to make sure we definitely know that the data was delivered from the client to the server side. Did you verified the client didn't report any errors during data ingestion? Most likely you

Re: Unable to access Kudu table created using Spark via Impala

2017-10-13 Thread Alexey Serbin
Hi Nitin, Impala needs to know about Kudu tables which were created 'externally' (i.e. not via Impala). Have you run that 'CREATE EXTERNAL TABLE ...' via Impala shell already? If not, you need to do so. More information on the topic can be found at:

Re: The Error message

2017-09-29 Thread Alexey Serbin
Hi Khursheed, I don't think there is a mistake from your side here, just some packages are missing on your machine and some intermittent failure from github HTTP server. It seems the error from line 5 is about absence of 'git' command at your VM. The last error about 'line 1: 404:' looks

Re: impala + kudu

2017-09-26 Thread Alexey Serbin
? Regards khursheed On Sep 26, 2017 7:43 PM, "Alexey Serbin" <aser...@cloudera.com <mailto:aser...@cloudera.com>> wrote: Hi Khursheed, It seems the issue is with hostname resolution, at least. You need to have Internet access with DNS resolver properly configure

Re: impala + kudu

2017-09-26 Thread Alexey Serbin
,as issue with the github. Regards Khursheed On Tue, Sep 26, 2017 at 7:31 AM, Alexey Serbin <aser...@cloudera.com <mailto:aser...@cloudera.com>> wrote: Hi, What instructions did you use to get quickstart Kudu VM? It's recommended to use instruction at https://kud

Re: impala + kudu

2017-09-25 Thread Alexey Serbin
Hi, What instructions did you use to get quickstart Kudu VM? It's recommended to use instruction at https://kudu.apache.org/docs/quickstart.html It's supposed the instructions will get you working VM up and running. At what step that failed and what was the error message? It might be helpful

Re: Configure Impala for Kudu on Separate Cluster

2017-08-15 Thread Alexey Serbin
Ben, As Todd mentioned, it might be some network connectivity problem. I would suspect some issues with connectivity between the node where the Impala shell is running and the Kudu master node. To start troubleshooting, I would verify that the node where you run the Impala shell (that's

Re: tserver died by clock unsync.

2017-06-16 Thread Alexey Serbin
Hi Jason, I think the workaround you mentioned (i.e. replacing LOG(FATAL) with LOG(WARNING) in the cited code snippet) is not safe at all. If ntp_gettime() returns TIME_ERROR code, that means the 'now_usec' variable might be left uninitialized, and the code relying on the

Re: I got an "authentication token expired" error.

2017-06-14 Thread Alexey Serbin
Hi Jason, It seems your Java Kudu client hit the authn token expiration issue. As you mentioned, that's a well known issue and it is described in the docs. Just FYI, the Kudu C++ client starting 1.4.0 automatically re-acquires authn token when needed, and I hope the Java client will do so

Re: Help start kudu error: Bad status: Invalid argument: Tried to update clock beyond the max. error.

2017-05-02 Thread Alexey Serbin
Hi, It seems the clock among the machines in the cluster is not synchronized as expected. It might be because of NTP configuration issues. There is some information to start troubleshooting with: http://kudu.apache.org/docs/troubleshooting.html#ntp That error might appear during tablet

Re: Security Roadmap

2017-03-18 Thread Alexey Serbin
You can get some information on security-related features in upcoming Kudu 1.3.0 release at https://github.com/apache/kudu/blob/master/docs/release_notes.adoc#rn_1.3.0_new_features In the long run, there are plans to add fine-grained authorization (ACLs for table/column-level access, ACLs for

Re: [Benchmarking]

2017-03-14 Thread Alexey Serbin
On Tue, Mar 14, 2017 at 11:25 AM, Alexey Serbin <aser...@cloudera.com> wrote: > Hi, > > It seems that sort of benchmark is not a trivial undertaking. I'm sure > there is a lot to consider while doing that sort of benchmark. Probably, > more senior members of the Ku

Re: [Benchmarking]

2017-03-14 Thread Alexey Serbin
Hi, It seems that sort of benchmark is not a trivial undertaking. I'm sure there is a lot to consider while doing that sort of benchmark. Probably, more senior members of the Kudu team could suggest something else, but right away I can suggest the following: 1. Consider using real hardware

Re: What does RowSet Compaction Duration means?

2017-03-14 Thread Alexey Serbin
Hi Jason, As I understand, that 'milliseconds / second' cryptic unit means 'number of units / for sampling (or averaging) interval'. I.e., they capture that metric reading (expressed in milliseconds) every second, subtract previous value from the current value, and declare the result as the