Hi Kudu community,
I'm happy to announce that the Kudu PMC has voted to add Márton Greber as a
new committer and PMC member.
Some of Márton's contributions include:
- Getting Kudu to build and run on Apple silicon
- Improving feature parity of the Python client with a number of features
-
Hi Kudu community,
I'm happy to announce that the Kudu PMC has voted to add Yuqi Du as a
new committer and PMC member.
Just some of Yuqi's contributions include:
- Designing and implementing automatic partition leader rebalancing
- Adding several bug fixes, performance improvements, and tooling
Hi Kudu community,
I'm happy to announce that the Kudu PMC has voted to add Abhishek Chennaka
as a
new committer and PMC member.
Some of Abhishek's contributions include:
- Improving the usability of ksck and backup/restore tooling
- Introducing bootstrapping metrics and webserver pages
- Adding
> spark-submit --master yarn --deploy-mode cluster --name test
> --queue bigdata_pro --conf spark.dynamicAllocation.maxExecutors=20
> --executor-cores 1 --executor-memory 8g --driver-memory 8g
> --class uc.com.Test hdfs://ns1/user/hue/Test.jar
>
>Saprk and kudu version: the Spark version is 2.4.0 and kudu version is
> 1.10.0.
>
> In addition to increasing the number of hash partitions under each range
> partition, is there any way to increase the number of tasks for spark to read
> kudu data through parameters?
>
>Thanks!
>
>
>
>
>
>
>
--
Andrew Wong
ies the required C++14 and higher
>> - Introduce new dependencies that require or benefit from C++14 and
>> higher
>> - Potential performance improvements
>>
>> If you have any concerns about these changes your feedback would be
>> appreciated. If you are in support of these changes a response indicating
>> your support is encouraged as well.
>>
>> Thank you,
>> Grant
>>
>
--
Andrew Wong
> other suggestions?
>
> We're on 1.10.0+cdh6.3.3
>
> Thanks in advance!
>
> -mauricio
>
> ** I know this is obv not ideal and we're addressing it long term, but
> it's HW we got from acquisitions and we're on-prem so not as easy as
> changing an instance type :)
>
>
>
--
Andrew Wong
Great post, Boris! We're happy to help. Thanks for sharing, and for being
an active member of the community :)
On Tue, May 12, 2020 at 12:23 PM Boris Tyukin wrote:
> Hi guys,
>
> there are not a lot of real-life experiences with Kudu and I wanted to
> share with you my blog post where I
Congratulations Bankim! Keep up the great work
On Sat, Apr 18, 2020 at 3:28 PM Adar Dembo wrote:
> Hi Kudu community,
>
> I'm happy to announce that the Kudu PMC has voted to add Bankim
> Bhavsar as a new committer and PMC member.
>
> Bankim has been actively writing Kudu code for the last
cations in other components,
> like Parquet, which means we have to change (and in some cases rewrite) our
> code that uses Parquet or Kafka, since these products are rapidly evolving,
> and many times in ways that break compatibility with old versions - in
> other words, it's a big
indicate the nature of the
> background operations performed for those tablets?
>
> Some of these questions can also be answered via Kudu metrics.There's
> the ops_behind_leader tablet-level metric, which can tell you how far
> behind a replica may be. Unfortunately I can't find a metric for
> average number of WAL segments retained (or a histogram); I thought we
> had that, but maybe not.
>
--
Andrew Wong
table ends up taking 18Gb after replication (so with 3x
>>>> replication it is ~9Gb per tablet if I do not partition), should I aim for
>>>> 1Gb tablets (6 tablets before replication) or should I aim for 500Mb
>>>> tablets if my cluster capacity allows so (12 tablets before replication)?
>>>> confused why they say "at least" not "at most" - does it mean I should
>>>> design it so a tablet takes 2Gb or 3Gb in this example?
>>>>
>>>> Assume that I have tons of CPU cores on a cluster...
>>>> Based on my quick test, it seems that queries are faster if I have more
>>>> tablets/partitions...In this example, 18 tablets gave me the best timing
>>>> but tablet size was around 300-400Mb. But the doc says "at least 1Gb".
>>>>
>>>> Really confused what the doc is saying, please help
>>>>
>>>> Boris
>>>>
>>>>
--
Andrew Wong
from such tablet
> servers, which is a valuable building block for tserver
> decommissioning.
> - Most recently, deduplicating RPCs sent by Kudu masters to tablet servers.
>
> Please join me in congratulating Yifan!
>
--
Andrew Wong
lp operate their very large Kudu deployments. All three
> have been instrumental in growing Kudu's presence within China as well
> as helping new Chinese users come up to speed with Kudu.
>
> Please join me in congratulating Lifu, Yao, and Yao!
>
--
Andrew Wong
1.10k 3.85M 18
>> tbill_code 372.82k 5.99M 27
>> goods_id 426.24k 281.3K 8
>> dates 426.24k 8.5K 8
>> business_id 426.24k 2.7K 8
>> goods_name 426.24k 291.1K 8
>> paid_in_amt 376.93k 1019.6K 24
>> profit 376.93k 1.14M 24
>> total 3.21M 12.55M 125is this mean the disk read rate was too slow ?
>>
>>
>> 2019-07-15
>> --
>> lk_hadoop
>>
>
--
Andrew Wong
Well done Yingchun and congratulations! Keep up the good work! :)
Andrew
On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote:
> Hi Kudu community,
>
> I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a
> new committer and PMC member.
>
> Yingchun has been contributing to
The Apache Kudu team is happy to announce the release of Kudu 1.9.0!
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
The root cause of the issue is a bit nuanced and it boils down to the fact
that the consensus metadata doesn't always get fsynced, and a hard shut
down can thus lead to the posted behavior. This comment
To subscribe to the user list, send a message on over to
user-subscr...@kudu.apache.org!
On Thu, Dec 14, 2017 at 10:54 PM, zha...@broadtech.com.cn <
zha...@broadtech.com.cn> wrote:
> subscriber
>
> --
> zha...@broadtech.com.cn
>
--
Andrew Wong
omesuch).
>
Following up on what I posted, take a look at
https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans.
It seems definitely possible that not all of the rows had finished
inserting when counting, or that the scans were sent to a stale replica.
On Tue, Dec 5, 2017
egotiator.sendHello(
> Negotiator.java:178)
> at org.apache.kudu.client.TabletClient.channelConnected(
> TabletClient.java:586)
> at org.apache.kudu.client.shaded.org.jboss.netty.channel.
> SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.
> java:100)
> at org.apache.kudu.client.TabletClient.handleUpstream(
> TabletClient.java:595)
>
> Thanks,
> Zhen
>
--
Andrew Wong
wasted. We can use it
> for HDFS of course and Kafka or something else but my concern is why Kudu
> cannot use more than 8Tb per node. Is it something that is going to change
> in future maybe?
>
> On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <aw...@cloudera.com> wrote:
>
&g
( couple of G before
> replication ) . Does KUDU tries to distribute data across tablet servers
> for each table i.e. slow performance with too much sparse data. i.e. for
> small table what is better fewer disk partitions ( host-partition ) vs
> evenly distributed across worker nodes.
>
> Thanks,
> Sunil Parmar
>
--
Andrew Wong
ntioned above I am going to have a bunch of tables with lots of rows.
>
> I do not have an option to pick a different hardware configuration for our
> cluster.
>
> thanks
>
--
Andrew Wong
this for time points more than 5 mins in the past
>> you need to increase the "--tablet_history_max_age_sec" flag so that the
>> history won't get garbage collected.
>>
>> HTH
>> -david
>>
>> On Mon, Nov 27, 2017 at 9:42 PM, An
m/impactradius> | Facebook
> <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
> <https://www.linkedin.com/company/impact-radius-inc->
>
--
Andrew Wong
crash, data will migration
> between ts. Network traffic will full and can not write or write
> normally. it’s unacceptable. Is there any good way to control network
> traffic when ts crash, and write or read service is not affect. Thanks!
>
--
Andrew Wong
that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>
--
Andrew Wong
27 matches
Mail list logo