believe uses epoll(2) under the hood. There's one other place where we
> use ppoll() (in RPC negotiation), but no select().
>
> A bit of historical curiosity: we actually had this bug a few years back
and fixed it, see 82cf3724077a8fb639a44dd86f04d10ecbedabf4
--
Todd Lipcon
Software Engineer, Cloudera
gn.html are
>>> missing SQL examples.
>>>
>>> I can not find the exact SQL syntax for partition management.
>>>
>>> can this be added?
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>>
>>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
rocess is along the line of:
>
> 1) copy software to target machine
>
> 2) shut down services on machine
>
> 3) expand software to final location
>
> 4) reboot (if new kernel)
>
> 5) restart services.
>
OK, hopefully that happens quickly us
all nodes an appropriate way
>> to bring spark in a position to work with kudu?
>> What about the beeline-shell from hive and the possibility to read from
>> kudu?
>>
>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>> parcels. Build a working python-kudu library successfully from scratch (git)
>>
>> Thanks a lot!
>> Frank
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
f their inserts start failing with "data out of range for
int32" errors or whatever. Forcing people to evaluate the column sizes up
front avoids nasty surprises later.
But, maybe you can see my biases towards static-typed languages leaking
through here ;-)
-Todd
> 2017-02-14 19:44 GMT+01:0
fy the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
--
Todd Lipcon
Software Engineer, Cloudera
ne
>
> Can you please help us on this, if you have any idea about this issue or
> any impact of this error on the functionality.
>
> Thank you so much for your help.
>
> Thanks,
> Amit
>
> On Tue, Jan 3, 2017 at 8:14 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
&g
.table_name("test_table")
> .schema()
> .add_hash_partitions({"key"}, 2)
> .set_range_partition_columns({"time"})
> .num_replicas(1)
> .Create()
>
> I later try to add a partition:
>
> auto timesplit(KuduSchema & schema, std::int64_t t) {
> auto split = schema.NewRow();
> check_ok(split->SetInt64("time", t));
> return split;
> }
>
> alterer->AddRangePartition(
> timesplit(schema, date_start),
> timesplit(schema, next_date_start));
>
> check_ok(alterer->Alter());
>
> But I get an error "Invalid argument: New range partition conflicts with
> existing range partition".
>
> How are hash and range partitioning intended to be mixed?
>
>
>
>
>
>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
ou parallelism on the client side.
-Todd
> Thanks,
> Amit
>
> On Aug 31, 2016 10:36 PM, "Todd Lipcon" <t...@cloudera.com> wrote:
>
>> Hi Amit,
>>
>> That's correct, there is no "order by" support in the Java API, because
>> this i
iseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>
--
Todd Lipcon
Software Engineer, Cloudera
others?
>
> With your innovative design, these tests should show some good numbers.
>
>
>
>Regards,
>
>Roberta Marton
>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
let's aim to
finish up the happy hour by around that time.
If you can't find us, feel free to ping me via Slack (
https://getkudu-slack.herokuapp.com/ if you don't already have an account)
Thanks
-Todd
On Tue, Sep 20, 2016 at 10:28 AM, Todd Lipcon <t...@cloudera.com> wrote:
> Sounds
and whoever's around
can drop by and put some faces to names.
Let me know if you're interested - if not enough people are around, I'll
can the idea, but if it seems there are at least a few people in town it
might be fun.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
crowded
during the conference.
-Todd
On Sat, Sep 17, 2016 at 7:12 PM, Clifford Resnick <cresn...@mediamath.com>
wrote:
> +1. We're just starting with Kudu, but it would be nice to meet other
> users, and a casual Q & A would be great if you're up for it!
>
> On Sep 17, 2016 9
predicates on your data
frames. (though I haven't personally verified it)
-Todd
> On Sep 20, 2016, at 12:11 AM, Todd Lipcon <t...@apache.org> wrote:
>
> The Apache Kudu team is happy to announce the release of Kudu 1.0.0!
>
> Kudu is an open source storage engine for struc
Hrm, looks like there may not be sufficient interest after all (or too many
drinks available at the conference itself?)
Unless someone texts /slacks me I'll plan to stick around here at the
conference.
Todd
On Sep 28, 2016 6:00 PM, "Todd Lipcon" <t...@cloudera.com> wrote:
&
t; I have a table with 16 buckets over 3 physical machines. The tablet
> only
> >>> has
> >>> one replica.
> >>>
> >>>
> >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
> >>>
> >>> In one machine, there are total 8 tablets, so the on-disk size is
> about
> >>> 4.5*8 = 36G.
> >>>
> >>> however, in the same machine, the disk actually used is about 211G.
> >>>
> >>>
> >>> # du -sh /data/kudu/tserver/data/
> >>>
> >>> 210G /data/kudu/tserver/data/
> >>>
> >>>
> >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
> >>>
> >>> 8133
> >>>
> >>>
> >>>
> >>> What’s the difference between data file and on-disk size.
> >>>
> >>> Can files in /data/kudu/tserver/data/ be compacted, purged, or some of
> >>> them
> >>> be deleted?
> >>>
> >>>
> >>> Thanks very much.
> >>>
> >>>
> >>> BR
> >>>
> >>> Brooks
> >>>
> >>>
> >>>
>
--
Todd Lipcon
Software Engineer, Cloudera
o <darren@gmail.com> wrote:
>
>> kudu master seldom crashes, but starting with yesterday, one of our
>> two kud masters crashes very often
>>
>> Can anyone help to see what's going on?
>>
>> you can obtain get core file here : http://167.88.124.211:8000/c
>> ore.22459.xz
>>
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
u use Impala -this
should help a lot wth joins where one side of the join has selective
predicates on a large table.
-Todd
>
> On Oct 10, 2016, at 4:15 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> Hey Ben,
>
> Yea, we currently don't do great with very wide tables. For e
r communication?
>
Should not have any negative effect. There's no "conversion" or anything to
worry about.
>
> Do you have any reference or details that what could be the precaution
> that needs to be taken or how can we do it?
>
It ought to "just work"
;
>
> On Nov 30, 2016, at 4:29 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> On Wed, Nov 30, 2016 at 6:26 AM, Weber, Richard <riwe...@akamai.com> w
> rote:
>
>> Hi All,
>>
>> I'm trying to figure out the right/best/easiest way to find out how much
&g
ce from the container.
> I will try it later this month.
>
> By the way, when will kudu's next release come out? Will 1.2 release in
> mid-January include this fix?
>
> Thanks.
> BR
> -GU
>
>
> ------ 原始邮件 --
> *发件人:* "Todd Lipcon&
org/jira/browse/KUDU-1603 a while back. Hopefully he
will chime in with a better answer than I can give :)
-Todd
2016-12-13 16:05 GMT+01:00 Frank Heimerzheim <fh.or...@gmail.com>:
>
>> Hello Todd,
>>
>> thanks a lot for the clarification.
>>
>> Greetings
egate as you prefer. Unfortunately this would give you only the
physical size and not the logical, since you'd have to scan the actual data
to know its uncompressed sizes.
If you have any interest in helping to build such a tool I'd be happy to
point you in the right direction. Otherwise let's file
re being too conservative?
Thanks
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
munication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>
--
Todd Lipcon
Software Engineer, Cloudera
es
> depending on the workload. Anything else is untested AFAIK.
>
I would amend this and say that SSD for the WAL is nice to have, but not a
requirement. We do lots of testing on non-SSD test clusters and I'm aware
of many production clusters which also do not have SSD.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
ssibly better performance. The tradeoff may be
non-linear, though (i.e doubling MM threads won't double performance!)
As Kudu is still a young project, we're still gathering operational
experience from users around topics like this. It would be great if you can
share back any results you find with the community.
Thanks
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
The Apache Kudu team is happy to announce the release of Kudu 1.3.0.
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
...@gmail.com>
>>> wrote:
>>>
>>>> Hi.
>>>>
>>>> I'm using Apache Kudu 1.2 on CDH 5.10.
>>>>
>>>> Currently, I'm doing a performance test of Kudu.
>>>>
>>>> Flushing OS Page Cache is easy, but I
; Then I'll try in my spare time.
>
> 2017-04-11 7:46 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>
>> On Sun, Apr 9, 2017 at 6:38 PM, Jason Heo <jason.heo@gmail.com>
>> wrote:
>>
>>> Hi Todd.
>>>
>>> I hope you had a good week
sure there aren't additional problems in the cluster (admin
>> guide
>> on the ksck tool
>> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
>> ).
>>
>>
>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>> without restarting cluster?
>>>
>>
>> The flag can be changed, but it comes with the same caveats as above:
>>
>> 'kudu tserver set-flag
>> follower_unavailable_considered_failed_sec
>> 900 --force'
>>
>>
>> - Dan
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
The Apache Kudu team is happy to announce the release of Kudu 1.3.1.
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
ed Hat 4.8.5-11)
> Copyright (C) 2015 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
>
> Thanks,
>
> Jason.
>
--
Todd Lipcon
Software Engineer, Cloudera
t familiar with the
> contributing process <https://kudu.apache.org/docs/contributing.html>.
>
> Thanks,
>
> Jason
>
> 2017-04-11 12:55 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>
>> Sure. Here's a high-level overview of the approach:
>>
>&g
gt; How can i tel Kudu to completely remove the dead tabletserver5 UUID and
>>> populate the new tabletserver5 UUID instead ?
>>>
>>> the `kudu` command line tool does not seem to allow to delete a tablet
>>> server UUID, or decommission
>>> so how ?
>>>
>>> Or other way, how can i recreate an empty Kudu tablet server reusing my
>>> old UUID ?
>>>
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
rily table to evict
>> cached block of testing table.
>>
>> It is cumbersome, so I'd like to know is there a command for flushing
>> block caches (or another kudu's caches which I don't know yet)
>>
>> Thanks.
>>
>> Regards,
>> Jason
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
y
> libk5crypto.so.3 => /usr/path/to/lib/libk5crypto.so.3 (0x7f4f17b23000)
>
> Thanks,
>
> Jason.
>
> 2017-04-18 4:00 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>
>> Hi Jason,
>>
>> This is interesting. It seems like for some reason your libkrb5.so isn't
o get it in more detail?
>>>>
>>>> I tried what I did again and again to reproduce same error, but it
>>>> didn't happen again.
>>>>
>>>> Please feel free to ask me for anything what you need to resolve.
>>>>
>>>> Regards,
>>>>
>>>> Jason
>>>>
>>>> 2017-04-23 1:56 GMT+09:00 <davidral...@gmail.com>:
>>>>
>>>>> Hi Jason
>>>>>
>>>>> Anything else of interest in those logs? Can you share them (with
>>>>> just me, if you prefer)? Would it be possible to also get the WAL with
>>>>> the corrupted entry?
>>>>> Did this happen on a single server?
>>>>>
>>>>> Best
>>>>> David
>>>>>
>>>>
>>>>
>>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
erableException: [Peer
> master-prod-dc1-datanode151.pdc1i.gradientx.com:7051] Connection closed,
> [33361ms] trace too long, truncated)
> CAUSED BY: NoLeaderFoundException: Master config (
> prod-dc1-datanode151.pdc1i.gradientx.com:7051) has no leader. Exceptions
> received: or
th setting up a shared memory
region and also found another small speedup over the domain socket.
However, there was a lot of complexity involved in this code (particularly
the shared memory approach) relative to the gain that we saw, so we didn't
end up merging it before his internship ended :)
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
y compacted cleanup is
>> more unlikely)
>> In Kudu 1.3 we added a background task to clean up old data even in
>> the absence of compactions. Could you upgrade?
>>
>> Best
>> David
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
(RDD.scala:920)
>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>> ontext.scala:1869)
>> at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
>> ontext.scala:1869)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> --
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
e-getbyinetaddress-takes-way-too-long confirmation so far). See
>> profiler screenshot http://pasteboard.co/8uHil3I5H.png (kudu-client
>> v1.3.1), every call take 53 ms (!) on average.
>> Also, could you recheck logic, why this function recalls 88 times in 12
>> seconds
rease input throughput then should i increase
>> '--rpc_num_service_threads' right?
>>
>> 3. Why '--rpc_num_acceptors_per_address' has so small value compared
>> to --rpc_num_service_threads? Because I'm going to increase that value
>> too, do you think this is a bad idea? if so can you plz describe
>> reason?
>>
>> Thanks for replying me!
>>
>> Have a nice day~ :)
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
atrato.io/
> blog/2017/05/28/apex-kudu-output/ . Please use the comments section to
> provide any feedback.
>
> Regards,
> Ananth
>
--
Todd Lipcon
Software Engineer, Cloudera
suggest plan 1, plus also put it on several people's calendars
to verify :) Alternatively, something like in 2017 add the partitions for
2018 and 2019, so you always maintain one extra year ahead and you are less
likely to "not notice" if the new one is not created in time.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
The Apache Kudu team is happy to announce the release of Kudu 1.4.0.
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
No problem. We are here to help! We are glad to see your team using Kudu.
Todd
On Jun 17, 2017 7:24 PM, "Jason Heo" wrote:
> Hi Jean-Daniel, Todd, and Alexey
>
> Thank your for the replies.
>
> Recently, I've experienced many issues but successfully resolved them with
skew, you'll
have to use the more advanced APIs to retrieve propagated timestamps from
the server side after each write.
-Todd
On Sun, Jun 18, 2017 at 1:36 PM, Todd Lipcon <t...@cloudera.com> wrote:
> Hi Ananth,
>
> Answers inline below
>
> On Sat, Jun 17, 2017 at 1:40 PM
t 15 minutes). You can bump this to a
longer amount of time.
>
> If it is otherwise , does the model hold good after a compaction is
> performed ?
>
>
Yes, as of version 1.2 (I think) the full history is properly retained
regardless of any compactions, etc, subject to the above mentioned history
limit.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
ed here
> https://kudu.apache.org/docs/schema_design.html#encoding Kudu may
> "transparently fall back to plain encoding" from dictionary encoding.
> I think it would be useful to the user to see actual used encoding &
> compression.
>
> --
> with best regards, Pav
r firms, each
> of which is a legally separate and independent entity. Please see
> www.deloitte.com.au/about <http://www.deloitte.com/au/about> for a
> detailed description of the legal structure of Deloitte Touche Tohmatsu
> Limited and its member firms. Nothing in this e-mail, nor any related
> attachments or communications or services, have any capacity to bind any
> other entity under the ‘Deloitte’ network of member firms (including those
> operating in Australia).
>
--
Todd Lipcon
Software Engineer, Cloudera
Oops, adding the original poster in case he or she is not subscribed to the
list.
On Sep 19, 2017 10:46 PM, "Todd Lipcon" <t...@cloudera.com> wrote:
> Hi Yuya,
>
> There should be no problem to use the Apache Kudu logo in your conference
> slides, assuming yo
Hi Yuya,
There should be no problem to use the Apache Kudu logo in your conference
slides, assuming you are just using as intended to describe or refer to the
project itself. This is considered "nominative use" under trademark laws.
You can read more about nominative use at:
tside of the actual table data?
>
> --
> Br.
> Janne Keskitalo,
> Database Architect, PAF.COM
> For support: dbdsupp...@paf.com
>
>
--
Todd Lipcon
Software Engineer, Cloudera
d kudu version:
> 32 cpu Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz 128G memory 6*16T hdd
> for data and 3T for wal. kudu 1.4.0 5 master + 5 tserver.
> if more interesting things happened, I will replay here.
> thanks again.
>
--
Todd Lipcon
Software Engineer, Cloudera
e
>>>> primary has up to say an hour before (or something like that).
>>>>
>>>>
>>>> So far we considered a couple of options:
>>>> - refreshing the seconday instance with a full copy of the primary one
>>>> every so often, but that would mean having to transfer say 50TB of data
>>>> between the two locations every time, and our network bandwidth constraints
>>>> would prevent to do that even on a daily basis
>>>> - having a column that contains the most recent time a row was updated,
>>>> however this column couldn't be part of the primary key (because the
>>>> primary key in Kudu is immutable), and therefore finding which rows have
>>>> been changed every time would require a full scan of the table to be
>>>> sync'd. It would also rely on the "last update timestamp" column to be
>>>> always updated by the application (an assumption that we would like to
>>>> avoid), and would need some other process to take into accounts the rows
>>>> that are deleted.
>>>>
>>>>
>>>> Since many of today's RDBMS (Oracle, MySQL, etc) allow for some sort of
>>>> 'Change Data Capture' mechanism where only the 'deltas' are captured and
>>>> applied to the secondary instance, we were wondering if there's any way in
>>>> Kudu to achieve something like that (possibly mining the WALs, since my
>>>> understanding is that each change gets applied to the WALs first).
>>>>
>>>>
>>>> Thanks,
>>>> Franco Venturi
>>>>
>>>
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
on the ASF slack in case we decide to go
> forward with this. If we don't decide to go forward with it, it's a good
> idea to hold onto the channel and pin a message in there about how to get
> to the "official" Kudu slack.
>
> On Mon, Oct 23, 2017 at 3:00 PM, Todd Lipcon <t...@cl
channels on the official ASF slack (http://the-asf.slack.com/
> )
> and migrate our discussions there. What does everyone think?
>
--
Todd Lipcon
Software Engineer, Cloudera
need for anywhere near that range.
-Todd
>
> On Thu, Nov 16, 2017 at 5:30 PM, Dan Burkert <danburk...@apache.org>
> wrote:
>
> > Aren't we going to need efficient encodings in order to make decimal work
> > well, anyway?
> >
> > - Dan
&g
sses, MD5 hashes and other similar types
>> of data.
>>
>> Is there any interest or uses for a INT128 column type? Is anyone
>> currently using a STRING or BINARY column for 128 bit data?
>>
>> Thank you,
>> Grant
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
1.3 it was called "kudu test loadgen" and may have fewer options
available.
-Todd
On Wed, Nov 1, 2017 at 12:23 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> On Wed, Nov 1, 2017 at 12:20 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>>>
erloaded (it's a torture-test cluster of
sorts that is always way out of balance, re-replicating stuff, etc)
-Todd
>
>
>
> On Wed, Nov 1, 2017 at 1:40 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> On Wed, Nov 1, 2017 at 1:23 PM, Chao Sun <sunc...@uber.com> wrote:
&
find information about these background
> operations? I want to understand what happens in situations when some node
> is offline and then comes back up after a while. What is tablet
> initialization and bootstrapping, etc.
>
> --
> Br.
> Janne Keskitalo,
> Database Architect,
What's the full log line where you're seeing this crash? Is it coming from
tablet_bootstrap.cc, raft_consensus.cc, or elsewhere?
-Todd
2017-11-01 15:45 GMT-07:00 Franco Venturi <fvent...@comcast.net>:
> Our version is kudu 1.5.0-cdh5.13.0.
>
> Franco
>
>
>
>
&
--table-num-buckets=32
There are also a bunch of options to tune buffer sizes, flush options, etc.
But with the default settings above on an 8-node cluster I have, I was able
to insert 8M rows in 44 seconds (180k/sec).
Adding --buffer-size-bytes=1000 almost doubled the above throughput
(330k r
and even time values up to 1000 seconds in the future (we read 1
billion nanoseconds as 1 billion microseconds (=1000 seconds)). I'll work
on reproducing this and a patch, to backport to previous versions.
-Todd
On Wed, Nov 1, 2017 at 5:00 PM, Todd Lipcon <t...@cloudera.com> wrote:
&g
One thing you might try is to update the consensus rpc timeout to 30
seconds instead of 1. We changed the default in later versions.
I'd also recommend updating up 1.4 or 1.5 for other related fixes to
consensus stability. I think I recall you were on 1.3 still?
Todd
On Nov 3, 2017 7:47 PM,
;
> > On Mon, Oct 16, 2017 at 2:29 PM, Matteo Durighetto <
> m.durighe...@miriade.it> wrote:
> > the "abcdefgh1234" it's an example of the the string created by the
> cloudera manager during the enable kerberos.
>
> ...
>
> On Mon, Oct 16, 2017 at 11:57
On Tue, Oct 24, 2017 at 12:41 PM, Todd Lipcon <t...@cloudera.com> wrote:
> I've filed https://issues.apache.org/jira/browse/KUDU-2198 to provide a
> workaround for systems like this. I should have a patch up shortly since
> it's relatively simple.
>
>
... and here's the patc
Hey Chao,
Nice to hear you are checking out Kudu.
What are you using to consume from Kafka and write to Kudu? Is it possible
that it is Java code and you are using the SYNC flush mode? That would
result in a separate round trip for each record and thus very low
throughput.
Todd
On Oct 30, 2017
sert insert = kuduTable.newInsert();
> PartialRow row = insert.getRow();
> // fill the columns
> kuduSession.apply(insert)
> }
>
> I didn't specify the flushing mode, so it will pick up the AUTO_FLUSH_SYNC
> as default?
> should I use MANUAL_FLUSH?
>
> Thanks,
> Ch
is case (only upsert)?
>>
>> Thanks again,
>> Chao
>>
>> On Mon, Oct 30, 2017 at 11:42 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>>> If you want to manage batching yourself you can use the manual flush
>>> mode. Easiest would be the auto
ON "78" <= VALUES < "785000",
> PARTITION "785000" <= VALUES < "79",
> PARTITION "79" <= VALUES < "795000",
> PARTITION "795000" <= VALUES < "80",
> PARTITION "80" <= VALUES < "805000",
> PARTITION "805000" <= VALUES < "81",
> PARTITION "81" <= VALUES < "815000",
> PARTITION "815000" <= VALUES < "82",
> PARTITION "82" <= VALUES < "825000",
> PARTITION "825000" <= VALUES < "83",
> PARTITION "83" <= VALUES < "835000",
> PARTITION "835000" <= VALUES < "84",
> PARTITION "84" <= VALUES < "845000",
> PARTITION "845000" <= VALUES < "85",
> PARTITION "85" <= VALUES < "855000",
> PARTITION "855000" <= VALUES < "86",
> PARTITION "86" <= VALUES < "865000",
> PARTITION "865000" <= VALUES < "87",
> PARTITION "87" <= VALUES < "875000",
> PARTITION "875000" <= VALUES < "88",
> PARTITION "88" <= VALUES < "885000",
> PARTITION "885000" <= VALUES < "89",
> PARTITION "89" <= VALUES < "895000",
> PARTITION "895000" <= VALUES < "90",
> PARTITION "90" <= VALUES < "905000",
> PARTITION "905000" <= VALUES < "91",
> PARTITION "91" <= VALUES < "915000",
> PARTITION "915000" <= VALUES < "92",
> PARTITION "92" <= VALUES < "925000",
> PARTITION "925000" <= VALUES < "93",
> PARTITION "93" <= VALUES < "935000",
> PARTITION "935000" <= VALUES < "94",
> PARTITION "94" <= VALUES < "945000",
> PARTITION "945000" <= VALUES < "95",
> PARTITION "95" <= VALUES < "955000",
> PARTITION "955000" <= VALUES < "96",
> PARTITION "96" <= VALUES < "965000",
> PARTITION "965000" <= VALUES < "97",
> PARTITION "97" <= VALUES < "975000",
> PARTITION "975000" <= VALUES < "98",
> PARTITION "98" <= VALUES < "985000",
> PARTITION "985000" <= VALUES < "99",
> PARTITION "99" <= VALUES < "995000",
> PARTITION VALUES >= "995000"
> )
>
>
>
So it looks like you have a numeric value being stored here in the string
column. Are you sure that you are properly zero-padding when creating your
key? For example if you accidentally scan from "50_..." to "80_..." you
will end up scanning a huge portion of your table.
> i did not delete rows in this table ever.
>
> my scanner code is below:
> buildKey method will build the lower bound and the upper bound, the unique
> id is same, the startRow offset(third part) is 0, and the endRow offset is
> , startRow and endRow only differs from time.
> though the max offset is big(999), generally it is less than 100.
>
> private KuduScanner buildScanner(Metric startRow, Metric endRow,
> List dimensionIds, List dimensionFilterList) {
> KuduTable kuduTable =
> kuduService.getKuduTable(BizConfig.parseFrom(startRow.getBizId()));
>
> PartialRow lower = kuduTable.getSchema().newPartialRow();
> lower.addString("key", buildKey(startRow));
> PartialRow upper = kuduTable.getSchema().newPartialRow();
> upper.addString("key", buildKey(endRow));
>
> LOG.info("build scanner. lower = {}, upper = {}", buildKey(startRow),
> buildKey(endRow));
>
> KuduScanner.KuduScannerBuilder builder =
> kuduService.getKuduClient().newScannerBuilder(kuduTable);
> builder.setProjectedColumnNames(COLUMNS);
> builder.lowerBound(lower);
> builder.exclusiveUpperBound(upper);
> builder.prefetching(true);
> builder.batchSizeBytes(MAX_BATCH_SIZE);
>
> if (CollectionUtils.isNotEmpty(dimensionFilterList)) {
> for (int i = 0; i < dimensionIds.size() && i < MAX_DIMENSION_NUM;
> i++) {
> for (DimensionFilter dimensionFilter : dimensionFilterList) {
> if (!Objects.equals(dimensionFilter.getDimensionId(),
> dimensionIds.get(i))) {
> continue;
> }
> ColumnSchema columnSchema =
> kuduTable.getSchema().getColumn(String.format("dimension_%02d", i));
> KuduPredicate predicate = buildKuduPredicate(columnSchema,
> dimensionFilter);
> if (predicate != null) {
> builder.addPredicate(predicate);
> LOG.info("add predicate. predicate = {}",
> predicate.toString());
> }
> }
> }
> }
> return builder.build();
> }
>
>
What client version are you using? 1.7.0?
> i checked the metrics, only get content below, it seems no relationship
> with my table.
>
Looks like you got the metrics from the kudu master, not a tablet server.
You need to figure out which tablet server you are scanning and grab the
metrics from that one.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
should call hundreds
> times nextRows() to fetch all data, and it finally cost several minutes.
>
> i don't know why this happened and how to resolve itmaybe the final
> solution is that i should giving up kudu, using hbase instead...
>
--
Todd Lipcon
Software Engineer, Cloudera
st-values). So, if you had for example:
pre-chunk in-list: 1,2,3,4,5,6
chunk 1: col2 IN (1,6)
chunk 2: col2 IN (2,5)
chunk 3: col2 IN (3,4)
then you will actually scan over the middle portion of that table 3 times.
If you sort the in-list before chunking you'll avoid the multiple-scan
effect here.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
t; will it have a bad effect even though these data were firstly loaded.
> i do not know compaction mechanism of kudu, will it lead to many
> compaction, thus lead to bad scan performance.
>
> Best regards.
>
--
Todd Lipcon
Software Engineer, Cloudera
contains Insert, Update, Delete operations, if
> the database does not exist in the data there will be
> some new data loss, how to avoid such problems.
>
--
Todd Lipcon
Software Engineer, Cloudera
ay node1 load record1 from WAL at t1, node2 t2, node3 t3 (t1 <
> t2 < t3) then reading client attached node1 can see record but other
> reading clients attached not node1(node2, node3) have possibilities missing
> record1.
> >
> > I think that does not happens in kudu, and i wonder how kudu synchronize
> real time data.
> >
> > Thanks!
> >
>
>
--
Todd Lipcon
Software Engineer, Cloudera
Oh, one other piece of feedback: maybe worth editing the title to say "vs
Apache Parquet" instead of "vs Apache Impala" since in all cases you are
using Impala as the query engine?
-Todd
On Fri, Jan 5, 2018 at 11:06 AM, Todd Lipcon <t...@cloudera.com> wrote
evelopers for such an amazing and much-needed product.
>
> Boris
>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
ache.org/docs/command_line_tools_referenc
>>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I
>>>>>> would recommend taking down all tablet servers at once, otherwise
>>>>>> tablet
>>>>>> replicas may try to replicate data from the server that was taken
>>>>>> down.
>>>>>>
>>>>>> Hope this helped,
>>>>>> Andrew
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
>>>>>> petter.von.dolw...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Kudu users,
>>>>>>>
>>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline for
>>>>>>> evaluation we ingested 3 month worth of data. During ingestion we
>>>>>>> were
>>>>>>> facing messages from the maintenance threads that a soft memory
>>>>>>> limit were
>>>>>>> reached. It seems like the background maintenance threads stopped
>>>>>>> performing their tasks at this point in time. It also so seems like
>>>>>>> the
>>>>>>> memory was never recovered even after stopping ingestion so I guess
>>>>>>> there
>>>>>>> was a large backlog being built up. I guess the root cause here is
>>>>>>> that we
>>>>>>> were a bit too conservative when giving Kudu memory. After a
>>>>>>> reststart a
>>>>>>> lot of maintenance tasks were started (i.e. compaction).
>>>>>>>
>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>> was missing. We added this missing data and on some chunks we got the
>>>>>>> information that all rows were already present, i.e impala says
>>>>>>> something
>>>>>>> like Modified: 0 rows, nnn errors. Doing the verification again
>>>>>>> now
>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>> insert
>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>> returns
>>>>>>> a different value.
>>>>>>>
>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>> after
>>>>>>> seeing soft memory limit warnings?
>>>>>>>
>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>> only
>>>>>>> restart the tablet servers, only restart one tablet server at a time
>>>>>>> or
>>>>>>> something like that)?
>>>>>>>
>>>>>>> The table design uses 50 tablets per day (times 90 days). It is 8 TB
>>>>>>> of data after 3xreplication over 5 tablet servers.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Petter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrew Wong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrew Wong
>>>>>
>>>>>
>>>>
>>>>
>>>
>
> --
> David Alves
>
--
Todd Lipcon
Software Engineer, Cloudera
ittle clearer.
Thanks
-Todd
>
> On Fri, Jan 5, 2018 at 11:13 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Oh, one other piece of feedback: maybe worth editing the title to say "vs
>> Apache Parquet" instead of "vs Apache Impala" since in all cases
estamp representation with microsecond
precision, so that's what Kudu implemented internally. With 64 bits there
is still enough range to store dates for 584,554 years at microsecond
precision.
I think
https://impala.apache.org/docs/build/html/topics/impala_timestamp.html has
some info about Kudu compatibility and limitations.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
our feedback! look forward to new releases coming up!
>
> Boris
>
> On Fri, Jan 5, 2018 at 9:08 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> On Fri, Jan 5, 2018 at 5:50 PM, Boris Tyukin <bo...@boristyukin.com>
>> wrote:
>>
>>> Hi Todd,
>
some_kudu_table
>> SELECT * FROM some_csv_tabledoes the trick.
>>
>> You can also use Kudu’s MapReduce OutputFormat to load data from HDFS,
>> HBase, or any other data store that has an InputFormat.
>>
>> No tool is provided to load data directly into Kudu’s on-d
-Todd
>
> On Mon, Jan 29, 2018 at 2:22 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> On Mon, Jan 29, 2018 at 11:18 AM, Patrick Angeles <patr...@cloudera.com>
>> wrote:
>>
>>> Hi Boris.
>>>
>>> 1) I would like to bypass Impa
e run some basic smoke
tests of Kudu on ~800 nodes before.
>
> Looking forward to your inputs on any organisation using Kudu where data
> volumes of more than 10 TB is ingested everyday.
>
Hope some other users can chime in.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
gt; RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is
> recommended to leave for other systems? We prefer to make the machine
> dedicated to Kudu. Can tablet server leverage the whole space efficiently?
> >
> > Thanks,
> > Quanlong
>
--
Todd Lipcon
Software Engineer, Cloudera
systems.
One recommendation, though is to consider using a dedicated disk for the
Kudu WAL and metadata, which can help performance, since the WAL can be
sensitive to other heavy workloads monopolizing bandwidth on the same
spindle.
-Todd
>
> At 2018-08-03 02:26:37, "Todd Lipcon" wrot
-off.
-Todd
> At 2018-06-15 23:41:17, "Todd Lipcon" wrote:
>
> Also, keep in mind that when the MRS flushes, it flushes into a bunch of
> separate RowSets, not 1:1. It "rolls" to a new RowSet every N MB (N=32 by
> default). This is set by --budgeted_compacti
lts or giving some more prescriptive
advice?
I'm a little nervous that saying "here are all the internals, and here are
100 config flags to study" will scare users more than help them :)
-Todd
>
> At 2018-08-02 01:06:40,"Todd Lipcon" wrote:
>
> On Wed, Aug 1, 2018
> Does any body know what is the maximum distinct values of a String column
> that Kudu considers in order to set its encoding to Dictionary? Many thanks
> :)
>
> br,
>
>
--
Todd Lipcon
Software Engineer, Cloudera
>>>>> perhaps partition kudu table, even if small, into multiple tablets), it
>>>>> was
>>>>> to speed up joins/exchanges, not to parallelize the scan.
>>>>>
>>>>> For example recently we ran into this slow query where the
Impala 2.12. The external RPC protocol is still Thrift.
Todd
On Mon, Jul 23, 2018, 7:02 AM Clifford Resnick
wrote:
> Is this impala 3.0? I’m concerned about breaking changes and our RPC to
> Impala is thrift-based.
>
> From: Todd Lipcon
> Reply-To: "user@kudu.apache.org&qu
gh replication count.*
>
> *I could see bumping the replication count to 5 for these tables since the
> extra storage cost is low and it will ensure higher availability of the
> important central tables, but I'd be surprised if there is any measurable
> perf impact.*
> "
@boot2docker:~/kudu# python
> Python 2.7.12 (default, Dec 4 2017, 14:50:18)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import kudu
> >>> import kudu.client
>
appen automatically so long as the filter predicate has
been pushed down. Using 'explain()' and showing us the results, along with
the code you used to create your table, will help understand what might be
the problem with performance.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
1 - 100 of 122 matches
Mail list logo