Often for these cases having multiple partitions doesn't provide any
advantage. There are fixed-cost overheads to having many tablets, so if
the tablets are small these costs can outweigh the benefit. Additionally,
if you aren't actively writing to the table then the benefit of
parallelizing
of data you would actually want it to be parallelized
across many tablets, and therefore be able to take advantage of many
tservers to perform the scan.
- Dan
On Thu, Oct 11, 2018 at 3:25 PM Dan Burkert wrote:
> > Just to clarify, are you saying that partition by hash(shop_id),
> hash(cu
a bunch of independent files instead and each file will have data
> for the specific hash of shop_id/customer_id?
>
> Boris
>
> On Thu, Oct 11, 2018 at 4:05 PM Dan Burkert wrote:
>
>> Hi Boris,
>>
>> The two examples you gave are exactly equivalent; the relative
Hi Boris,
The two examples you gave are exactly equivalent; the relative ordering of
hash levels has no effect on query performance, hotspotting, or anything
else. Given that 60% of your queries don't specify a specific customer_id,
it does make sense to use hash(shop_id), hash(customer_id)
I'm only aware of one reason you'd want to pre-partition the data before
inserting it into Kudu, and that's if you are sorting the input data prior
to inserting. Having a way to map a row to a partition means the sort step
can be done per-partition instead of globally, which can help reduce
Hi all,
I'm pleased to announce that the Kudu PMC has voted to add Attila Bukor and
Sailesh Mukil as committers and PMC members.
Attila has contributed many supportability, build, docs, and quality of
life improvements. In addition, Attila has been very active helping users
on our Slack and
Hi Mauricio,
The docs you linked to are for Impala, not Kudu. Kudu's timestamp type
internally keeps microsecond precision. Your example of multiplying by
1000 is correct; you should adjust whatever your timestamp is to
microseconds since the unix epoch. There are a bunch of different time
ut to each one sequentially. Does that sound crazy?
>
> On Mon, Apr 23, 2018 at 3:23 PM Dan Burkert <danburk...@apache.org> wrote:
>
>> Hey Scott,
>>
>> Patrick's answer is spot on. I'm curious, though, is your usecase to
>> find the latest value? Effectively
Hey Scott,
Patrick's answer is spot on. I'm curious, though, is your usecase to find
the latest value? Effectively a 'SORT BY DESC date LIMIT 1', or are you
looking for the last n values, or all values? I ask because we frequently
get the 'last value' question, and the solution for that might
correct. No need for the async client in that scenario.
- Dan
>
> -José
> ----------
> *De:* Dan Burkert <danburk...@apache.org>
> *Enviado:* 9 de abril de 2018 18:32:43
>
> *Para:* user@kudu.apache.org
> *Assunto:* Re: AsyncKudu
>
> Hi José,
&g
> }
> });
>
> //executing the callback
> res.callback(kuduClient.tableExists(tableName));
> }
>
>
> A little of the background of my project. The clients read and write on
> other Database, and when they write something, the
you don't
need to worry about odd/even WRT number of tablet servers.
- Dan
>
> From: Dan Burkert <danburk...@apache.org>
> Reply-To: "user@kudu.apache.org" <user@kudu.apache.org>
> Date: Friday, March 16, 2018 at 2:09 PM
> To: "user@kudu.apache.org" <us
The replication count is the number of tablet servers which Kudu will host
copies on. So if you set the replication level to 5, Kudu will put the
data on 5 separate tablet servers. There's no built-in broadcast table
feature; upping the replication factor is the closest thing. A couple of
Hi Zakaria,
There's a lot going on in that error message. I've got a suggestion, but
first a question:
Where is the line which contains 'Bad indirect slice' come from? Are you
perhaps catching an exception returned by createTable and printing the
error? If so, this could explain the
Hi Ajay,
Have you looked at the documentation section on kudu.apache.org? In
particular these sections may be helpful:
https://kudu.apache.org/docs/schema_design.html
https://kudu.apache.org/docs/administration.html#migrate_to_multi_master
Hi Franco,
Great question, and I think this gets towards a deeper use-case that Kudu
could really excel at, but currently doesn't have the full set of required
features to support. To your original question: you've pretty much covered
all of the bases. Kudu doesn't have an efficient way to
Aren't we going to need efficient encodings in order to make decimal work
well, anyway?
- Dan
On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon <t...@cloudera.com> wrote:
> On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert <danburk...@apache.org>
> wrote:
>
> > I think it would
I think it would be useful. As far as I've seen the main costs in carrying
data types are in writing performant encoders, and updating integrations to
work with them. I'm guessing with 128 bit integers there would be some
integrations that can't or won't support it, which might be a cause for
Hi Amit,
Access to Kudu via the Impala JDBC interface do go through Impala, and
should be accounted for in Impala resource and capacity planning. Access
to Kudu via the Kudu Java client API do not go through Impala, and
therefore do not need to be accounted for in Impala capacity planning.
Usage
7 and wanted to use something instead of mysql to store users, posts,
> likes, comments and messages would you recommend using Kudu over Hbase in
> this case?
>
> Regards,
>
> Cheyenne O. Forbes
>
> On Wed, May 17, 2017 at 3:41 PM, Dan Burkert <danburk...@apache.org>
&
The closest thing that exists right now is the Impala or SparkSQL
integrations. As far as I know the targeted use cases are a little
different, with Phoenix more focussed on OLTP workloads and Kudu targeting
analytic workloads, at least on the read side.
- Dan
On Wed, May 17, 2017 at 1:26 PM,
Hi Cheyenne,
There is currently no support for coprocessors, nor is it something anyone
is working on, as far as I know. Is there specific functionality you are
looking for?
- Dan
On Wed, May 17, 2017 at 1:06 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:
> Will there be or are
; considered, so before coding it would be great to work through a design
> document to explore the alternatives. For example, we could try to apply
> encryption at the 'fs/' layer, which would cover all non-WAL data, but then
> we would lose the ability to specify encryption on a per-colum
Adar has told me it's fine to run the new 'kudu fs check' tool against a
Kudu 1.2 server. It will require building locally, though.
- Dan
On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert <danburk...@apache.org> wrote:
> Hi Jason,
>
> First question: what filesystem and OS
Hi Jason, answers inline:
On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo wrote:
>
> Q1. Can I disable redistributing tablets on failure of a tserver? The
> reason why I need this is described in Background.
>
We don't have any kind of built-in maintenance mode that would
Hi Jason,
First question: what filesystem and OS are you running?
This has been an ongoing area of work; we fixed a few major issues in 1.2,
and a few more major issues in 1.3, and have a new tool ('kudu fs check')
that will be released in 1.4 to diagnose and fix further issues. In some
cases
Oops, the tablet ID I used in the example is
'4398cf80d68141cdbdae882e97b6da45',
not 'c5299ec14315401a89316b62afad5877'.
- Dan
On Mon, Apr 10, 2017 at 4:34 PM, Dan Burkert <danburk...@apache.org> wrote:
> Kudu does not yet have a way to request tablet rebalancing, but we do have
>
Kudu does not yet have a way to request tablet rebalancing, but we do have
a few tools for balancing tablets manually.
For example, if you had a tablet 'c5299ec14315401a89316b62afad5877' which
you wanted to remove from an old tserver 'c5299ec14315401a89316b62afad5877'
and add to a new tserver
Hi Ben,
Was this meant for the Spark user list, or is there something specific to
the Spark/Kudu integration you are asking about?
- Dan
On Sun, Apr 9, 2017 at 11:13 AM, Benjamin Kim wrote:
> I’m curious about if and when Spark SQL will ever remove its dependency on
> Hive
Hi Jason,
There is no command to have Kudu evict its block cache, but restarting the
tablet server process will have that effect. Ideally all written data will
be flushed before the restart, otherwise startup/bootstrap will take a bit
longer. Flushing typically happens within 60s of the last
Hi Jinsu,
There is no limit quota functionality in Kudu, per se, but we do have a
flag that configures
Kudu to stop using a data directory after the disk has less than a set
number of bytes free:
-fs_data_dirs_reserved_bytes (Number of bytes to reserve on each data
directory filesystem for
itions. Thanks again for the report!
- Dan
On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert <danburk...@apache.org> wrote:
> Yep: https://issues.apache.org/jira/browse/KUDU-1903
>
> - Dan
>
> On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>
both cases.
I've attached a simple program which demonstrates.
On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburk...@apache.org> wrote:
Hi Paul,
I can't reproduce the behavior you are describing, I always get a single
unbounded range partition when creating the table without specifyin
I only have range partitioning (by commenting out the call to
> add_hash_partitions), adding a bounded partition succeeds, regardless of
> whether I first drop the unbounded partition. This seems surprising; why
> the difference?
>
> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burker
Hi Paul,
I think the issue you are running into is that if you don't add a range
partition explicitly during table creation (by calling add_range_partition
or inserting a split with add_range_partition_split), Kudu will default to
creating 1 unbounded range partition. So your two options are to
Hi Tenny,
First off, how many tablet servers are in your cluster? 16 partitions is
appropriate for one or maybe two tablet servers, so if your cluster is
bigger you could try bumping the number of partitions.
Second, the schemas don't look identical, you have an additional 'id'
column in the
Hi Mike,
I think your best bet is the 'ksck' tool, you can see the various options
and health checks it exposes by running 'kudu cluster ksck --help'.
- Dan
On Thu, Feb 16, 2017 at 1:06 PM, Mike Zupan wrote:
> Hi all,
>
> We need to upgrade nodes in the kudu cluster and we
Hi Darren,
Assuming you are asking about Impala syntax, you can find some examples
here:
https://kudu.apache.org/docs/kudu_impala_integration.html#advanced_partitioning
- Dan
On Sun, Feb 12, 2017 at 6:37 PM, Darren Hoo wrote:
> specifically what is the SQL syntax for
Hi Amit,
Kerberos support is not yet ready to turn on, it's still being actively
worked on. When it's ready for production use we'll remove the
'experimental' designator, and you will see those flags move out of the
unsupported section (we also reserve the right to change or remove them
while
The Apache Kudu team is happy to announce the release of Kudu 1.0.1!
Kudu is an open source storage engine for structured data which supports
low-latency random access together with efficient analytical access
patterns. It is designed within the context of the Apache Hadoop ecosystem
and supports
40 matches
Mail list logo