[ANNOUNCE] Welcoming Márton Greber as Kudu committer and PMC member

2023-11-14 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Márton Greber as a new committer and PMC member. Some of Márton's contributions include: - Getting Kudu to build and run on Apple silicon - Improving feature parity of the Python client with a number of features -

Welcoming Yuqi Du as Kudu committer and PMC member

2023-06-06 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Yuqi Du as a new committer and PMC member. Just some of Yuqi's contributions include: - Designing and implementing automatic partition leader rebalancing - Adding several bug fixes, performance improvements, and tooling

[ANNOUNCE] Welcoming Abhishek Chennaka as Kudu committer and PMC member

2023-02-22 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Abhishek Chennaka as a new committer and PMC member. Some of Abhishek's contributions include: - Improving the usability of ksck and backup/restore tooling - Introducing bootstrapping metrics and webserver pages - Adding

Re: Performance problems of using spark SQL to read kudu data!

2020-11-29 Thread Andrew Wong
> spark-submit --master yarn --deploy-mode cluster --name test > --queue bigdata_pro --conf spark.dynamicAllocation.maxExecutors=20 > --executor-cores 1 --executor-memory 8g --driver-memory 8g > --class uc.com.Test hdfs://ns1/user/hue/Test.jar > >Saprk and kudu version: the Spark version is 2.4.0 and kudu version is > 1.10.0. > > In addition to increasing the number of hash partitions under each range > partition, is there any way to increase the number of tasks for spark to read > kudu data through parameters? > >Thanks! > > > > > > > -- Andrew Wong

Re: [proposal] Kudu operating system requirements changes

2020-11-25 Thread Andrew Wong
ies the required C++14 and higher >> - Introduce new dependencies that require or benefit from C++14 and >> higher >> - Potential performance improvements >> >> If you have any concerns about these changes your feedback would be >> appreciated. If you are in support of these changes a response indicating >> your support is encouraged as well. >> >> Thank you, >> Grant >> > -- Andrew Wong

Re: Better handling disk usage / tablet placement in hybrid cluster

2020-05-21 Thread Andrew Wong
> other suggestions? > > We're on 1.10.0+cdh6.3.3 > > Thanks in advance! > > -mauricio > > ** I know this is obv not ideal and we're addressing it long term, but > it's HW we got from acquisitions and we're on-prem so not as easy as > changing an instance type :) > > > -- Andrew Wong

Re: real-time pipeline with Kudu

2020-05-12 Thread Andrew Wong
Great post, Boris! We're happy to help. Thanks for sharing, and for being an active member of the community :) On Tue, May 12, 2020 at 12:23 PM Boris Tyukin wrote: > Hi guys, > > there are not a lot of real-life experiences with Kudu and I wanted to > share with you my blog post where I

Re: [ANNOUNCE] Welcoming Bankim Bhavsar as Kudu committer and PMC member

2020-04-18 Thread Andrew Wong
Congratulations Bankim! Keep up the great work  On Sat, Apr 18, 2020 at 3:28 PM Adar Dembo wrote: > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Bankim > Bhavsar as a new committer and PMC member. > > Bankim has been actively writing Kudu code for the last

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-31 Thread Andrew Wong
cations in other components, > like Parquet, which means we have to change (and in some cases rewrite) our > code that uses Parquet or Kafka, since these products are rapidly evolving, > and many times in ways that break compatibility with old versions - in > other words, it's a big

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-30 Thread Andrew Wong
indicate the nature of the > background operations performed for those tablets? > > Some of these questions can also be answered via Kudu metrics.There's > the ops_behind_leader tablet-level metric, which can tell you how far > behind a replica may be. Unfortunately I can't find a metric for > average number of WAL segments retained (or a histogram); I thought we > had that, but maybe not. > -- Andrew Wong

Re: Partitioning Rules of Thumb

2020-03-09 Thread Andrew Wong
table ends up taking 18Gb after replication (so with 3x >>>> replication it is ~9Gb per tablet if I do not partition), should I aim for >>>> 1Gb tablets (6 tablets before replication) or should I aim for 500Mb >>>> tablets if my cluster capacity allows so (12 tablets before replication)? >>>> confused why they say "at least" not "at most" - does it mean I should >>>> design it so a tablet takes 2Gb or 3Gb in this example? >>>> >>>> Assume that I have tons of CPU cores on a cluster... >>>> Based on my quick test, it seems that queries are faster if I have more >>>> tablets/partitions...In this example, 18 tablets gave me the best timing >>>> but tablet size was around 300-400Mb. But the doc says "at least 1Gb". >>>> >>>> Really confused what the doc is saying, please help >>>> >>>> Boris >>>> >>>> -- Andrew Wong

Re: [ANNOUNCE] Welcoming Yifan Zhang as Kudu committer and PMC member

2020-01-06 Thread Andrew Wong
from such tablet > servers, which is a valuable building block for tserver > decommissioning. > - Most recently, deduplicating RPCs sent by Kudu masters to tablet servers. > > Please join me in congratulating Yifan! > -- Andrew Wong

Re: [ANNOUNCE] Welcoming Lifu He, Yao Xu, and Yao Zhang as Kudu committers and PMC members

2019-08-26 Thread Andrew Wong
lp operate their very large Kudu deployments. All three > have been instrumental in growing Kudu's presence within China as well > as helping new Chinese users come up to speed with Kudu. > > Please join me in congratulating Lifu, Yao, and Yao! > -- Andrew Wong

Re: is this mean the disk read rate was too slow

2019-07-15 Thread Andrew Wong
1.10k 3.85M 18 >> tbill_code 372.82k 5.99M 27 >> goods_id 426.24k 281.3K 8 >> dates 426.24k 8.5K 8 >> business_id 426.24k 2.7K 8 >> goods_name 426.24k 291.1K 8 >> paid_in_amt 376.93k 1019.6K 24 >> profit 376.93k 1.14M 24 >> total 3.21M 12.55M 125is this mean the disk read rate was too slow ? >> >> >> 2019-07-15 >> -- >> lk_hadoop >> > -- Andrew Wong

Re: [ANNOUNCE] Welcoming Yingchun Lai as a Kudu committer and PMC member

2019-06-06 Thread Andrew Wong
Well done Yingchun and congratulations! Keep up the good work! :) Andrew On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote: > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a > new committer and PMC member. > > Yingchun has been contributing to

[ANNOUNCE] Apache Kudu 1.9.0 Released

2019-03-12 Thread Andrew Wong
The Apache Kudu team is happy to announce the release of Kudu 1.9.0! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and supports

Re: Unable to load consensus metadata (the metadata file is missed and the part of tablets is unavailable)

2018-06-05 Thread Andrew Wong
The root cause of the issue is a bit nuanced and it boils down to the fact that the consensus metadata doesn't always get fsynced, and a hard shut down can thus lead to the posted behavior. This comment

Re: subscriber

2017-12-15 Thread Andrew Wong
To subscribe to the user list, send a message on over to user-subscr...@kudu.apache.org! On Thu, Dec 14, 2017 at 10:54 PM, zha...@broadtech.com.cn < zha...@broadtech.com.cn> wrote: > subscriber > > -- > zha...@broadtech.com.cn > -- Andrew Wong

Re: Data inconsistency after restart

2017-12-05 Thread Andrew Wong
omesuch). > Following up on what I posted, take a look at https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans. It seems definitely possible that not all of the rows had finished inserting when counting, or that the scans were sent to a stale replica. On Tue, Dec 5, 2017

Re: RPC negotiation error

2017-12-04 Thread Andrew Wong
egotiator.sendHello( > Negotiator.java:178) > at org.apache.kudu.client.TabletClient.channelConnected( > TabletClient.java:586) > at org.apache.kudu.client.shaded.org.jboss.netty.channel. > SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler. > java:100) > at org.apache.kudu.client.TabletClient.handleUpstream( > TabletClient.java:595) > > Thanks, > Zhen > -- Andrew Wong

Re: Help me understand Kudu scalability limitations

2017-11-29 Thread Andrew Wong
wasted. We can use it > for HDFS of course and Kafka or something else but my concern is why Kudu > cannot use more than 8Tb per node. Is it something that is going to change > in future maybe? > > On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <aw...@cloudera.com> wrote: > &g

Re: co-locating kudu table servers with HDFS data nodes

2017-11-29 Thread Andrew Wong
( couple of G before > replication ) . Does KUDU tries to distribute data across tablet servers > for each table i.e. slow performance with too much sparse data. i.e. for > small table what is better fewer disk partitions ( host-partition ) vs > evenly distributed across worker nodes. > > Thanks, > Sunil Parmar > -- Andrew Wong

Re: Help me understand Kudu scalability limitations

2017-11-29 Thread Andrew Wong
ntioned above I am going to have a bunch of tables with lots of rows. > > I do not have an option to pick a different hardware configuration for our > cluster. > > thanks > -- Andrew Wong

Re: Time-travel reads via SQL query

2017-11-28 Thread Andrew Wong
this for time points more than 5 mins in the past >> you need to increase the "--tablet_history_max_age_sec" flag so that the >> history won't get garbage collected. >> >> HTH >> -david >> >> On Mon, Nov 27, 2017 at 9:42 PM, An

Re: Time-travel reads via SQL query

2017-11-27 Thread Andrew Wong
m/impactradius> | Facebook > <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn > <https://www.linkedin.com/company/impact-radius-inc-> > -- Andrew Wong

Re: How to control network traffic when ts crash.

2017-09-04 Thread Andrew Wong
crash, data will migration > between ts. Network traffic will full and can not write or write > normally. it’s unacceptable. Is there any good way to control network > traffic when ts crash, and write or read service is not affect. Thanks! > -- Andrew Wong

Re: [KUDU] Adding tablet server data directories

2017-08-14 Thread Andrew Wong
that is > legally privileged and confidential. If you are not the intended recipient, > or the person responsible for delivering the message to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. All unintended > recipients are obliged to delete this message and destroy any printed > copies. > > > -- Andrew Wong