Re: is it worth to have partitions on very small tables?

2018-10-15 Thread Dan Burkert
Often for these cases having multiple partitions doesn't provide any advantage. There are fixed-cost overheads to having many tablets, so if the tablets are small these costs can outweigh the benefit. Additionally, if you aren't actively writing to the table then the benefit of parallelizing

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
of data you would actually want it to be parallelized across many tablets, and therefore be able to take advantage of many tservers to perform the scan. - Dan On Thu, Oct 11, 2018 at 3:25 PM Dan Burkert wrote: > > Just to clarify, are you saying that partition by hash(shop_id), > hash(cu

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
a bunch of independent files instead and each file will have data > for the specific hash of shop_id/customer_id? > > Boris > > On Thu, Oct 11, 2018 at 4:05 PM Dan Burkert wrote: > >> Hi Boris, >> >> The two examples you gave are exactly equivalent; the relative

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
Hi Boris, The two examples you gave are exactly equivalent; the relative ordering of hash levels has no effect on query performance, hotspotting, or anything else. Given that 60% of your queries don't specify a specific customer_id, it does make sense to use hash(shop_id), hash(customer_id)

Re: Kudu hashes and Java hashes

2018-08-28 Thread Dan Burkert
I'm only aware of one reason you'd want to pre-partition the data before inserting it into Kudu, and that's if you are sorting the input data prior to inserting. Having a way to map a row to a partition means the sort step can be done per-partition instead of globally, which can help reduce

[ANNOUNCE] Recognizing the newest Apache Kudu committers

2018-07-25 Thread Dan Burkert
Hi all, I'm pleased to announce that the Kudu PMC has voted to add Attila Bukor and Sailesh Mukil as committers and PMC members. Attila has contributed many supportability, build, docs, and quality of life improvements. In addition, Attila has been very active helping users on our Slack and

Re: Right way to insert to timestamp column via Java api

2018-05-02 Thread Dan Burkert
Hi Mauricio, The docs you linked to are for Impala, not Kudu. Kudu's timestamp type internally keeps microsecond precision. Your example of multiplying by 1000 is correct; you should adjust whatever your timestamp is to microseconds since the unix epoch. There are a bunch of different time

Re: Reverse sort on Primary Key

2018-04-24 Thread Dan Burkert
ut to each one sequentially. Does that sound crazy? > > On Mon, Apr 23, 2018 at 3:23 PM Dan Burkert <danburk...@apache.org> wrote: > >> Hey Scott, >> >> Patrick's answer is spot on. I'm curious, though, is your usecase to >> find the latest value? Effectively

Re: Reverse sort on Primary Key

2018-04-23 Thread Dan Burkert
Hey Scott, Patrick's answer is spot on. I'm curious, though, is your usecase to find the latest value? Effectively a 'SORT BY DESC date LIMIT 1', or are you looking for the last n values, or all values? I ask because we frequently get the 'last value' question, and the solution for that might

Re: AsyncKudu

2018-04-10 Thread Dan Burkert
correct. No need for the async client in that scenario. - Dan > > -José > ---------- > *De:* Dan Burkert <danburk...@apache.org> > *Enviado:* 9 de abril de 2018 18:32:43 > > *Para:* user@kudu.apache.org > *Assunto:* Re: AsyncKudu > > Hi José, &g

Re: AsyncKudu

2018-04-09 Thread Dan Burkert
> } > }); > > //executing the callback > res.callback(kuduClient.tableExists(tableName)); > } > > > A little of the background of my project. The clients read and write on > other Database, and when they write something, the

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Dan Burkert
you don't need to worry about odd/even WRT number of tablet servers. - Dan > > From: Dan Burkert <danburk...@apache.org> > Reply-To: "user@kudu.apache.org" <user@kudu.apache.org> > Date: Friday, March 16, 2018 at 2:09 PM > To: "user@kudu.apache.org" <us

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Dan Burkert
The replication count is the number of tablet servers which Kudu will host copies on. So if you set the replication level to 5, Kudu will put the data on 5 separate tablet servers. There's no built-in broadcast table feature; upping the replication factor is the closest thing. A couple of

Re: Cannot create Kudu table with Range Partitioning

2018-02-12 Thread Dan Burkert
Hi Zakaria, There's a lot going on in that error message. I've got a suggestion, but first a question: Where is the line which contains 'Bad indirect slice' come from? Are you perhaps catching an exception returned by createTable and printing the error? If so, this could explain the

Re: Kudu Queries

2017-12-20 Thread Dan Burkert
Hi Ajay, Have you looked at the documentation section on kudu.apache.org? In particular these sections may be helpful: https://kudu.apache.org/docs/schema_design.html https://kudu.apache.org/docs/administration.html#migrate_to_multi_master

Re: Efficient way of computing max(PK) in Kudu

2017-12-14 Thread Dan Burkert
Hi Franco, Great question, and I think this gets towards a deeper use-case that Kudu could really excel at, but currently doesn't have the full set of required features to support. To your original question: you've pretty much covered all of the bases. Kudu doesn't have an efficient way to

Re: INT128 Column Support Interest

2017-11-16 Thread Dan Burkert
Aren't we going to need efficient encodings in order to make decimal work well, anyway? - Dan On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon <t...@cloudera.com> wrote: > On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert <danburk...@apache.org> > wrote: > > > I think it would

Re: INT128 Column Support Interest

2017-11-16 Thread Dan Burkert
I think it would be useful. As far as I've seen the main costs in carrying data types are in writing performant encoders, and updating integrations to work with them. I'm guessing with 128 bit integers there would be some integrations that can't or won't support it, which might be a cause for

Re: kudu resource/hardware question

2017-09-14 Thread Dan Burkert
Hi Amit, Access to Kudu via the Impala JDBC interface do go through Impala, and should be accounted for in Impala resource and capacity planning. Access to Kudu via the Kudu Java client API do not go through Impala, and therefore do not need to be accounted for in Impala capacity planning. Usage

Re: Hbase's Phoenix SQL "clone" for Kudu

2017-05-17 Thread Dan Burkert
7 and wanted to use something instead of mysql to store users, posts, > likes, comments and messages would you recommend using Kudu over Hbase in > this case? > > Regards, > > Cheyenne O. Forbes > > On Wed, May 17, 2017 at 3:41 PM, Dan Burkert <danburk...@apache.org> &

Re: Hbase's Phoenix SQL "clone" for Kudu

2017-05-17 Thread Dan Burkert
The closest thing that exists right now is the Impala or SparkSQL integrations. As far as I know the targeted use cases are a little different, with Phoenix more focussed on OLTP workloads and Kudu targeting analytic workloads, at least on the read side. - Dan On Wed, May 17, 2017 at 1:26 PM,

Re: Coprocessors

2017-05-17 Thread Dan Burkert
Hi Cheyenne, There is currently no support for coprocessors, nor is it something anyone is working on, as far as I know. Is there specific functionality you are looking for? - Dan On Wed, May 17, 2017 at 1:06 PM, Cheyenne Forbes < cheyenne.osanu.for...@gmail.com> wrote: > Will there be or are

Re: Data encryption in Kudu

2017-05-02 Thread Dan Burkert
; considered, so before coding it would be great to work through a design > document to explore the alternatives. For example, we could try to apply > encryption at the 'fs/' layer, which would cover all non-WAL data, but then > we would lose the ability to specify encryption on a per-colum

Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Adar has told me it's fine to run the new 'kudu fs check' tool against a Kudu 1.2 server. It will require building locally, though. - Dan On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert <danburk...@apache.org> wrote: > Hi Jason, > > First question: what filesystem and OS

Re: Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Dan Burkert
Hi Jason, answers inline: On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo wrote: > > Q1. Can I disable redistributing tablets on failure of a tserver? The > reason why I need this is described in Background. > We don't have any kind of built-in maintenance mode that would

Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Hi Jason, First question: what filesystem and OS are you running? This has been an ongoing area of work; we fixed a few major issues in 1.2, and a few more major issues in 1.3, and have a new tool ('kudu fs check') that will be released in 1.4 to diagnose and fix further issues. In some cases

Re: Is there any recommended scale out strategy?

2017-04-10 Thread Dan Burkert
Oops, the tablet ID I used in the example is '4398cf80d68141cdbdae882e97b6da45', not 'c5299ec14315401a89316b62afad5877'. - Dan On Mon, Apr 10, 2017 at 4:34 PM, Dan Burkert <danburk...@apache.org> wrote: > Kudu does not yet have a way to request tablet rebalancing, but we do have >

Re: Is there any recommended scale out strategy?

2017-04-10 Thread Dan Burkert
Kudu does not yet have a way to request tablet rebalancing, but we do have a few tools for balancing tablets manually. For example, if you had a tablet 'c5299ec14315401a89316b62afad5877' which you wanted to remove from an old tserver 'c5299ec14315401a89316b62afad5877' and add to a new tserver

Re: Spark 2.1 and Hive Metastore

2017-04-09 Thread Dan Burkert
Hi Ben, Was this meant for the Spark user list, or is there something specific to the Spark/Kudu integration you are asking about? - Dan On Sun, Apr 9, 2017 at 11:13 AM, Benjamin Kim wrote: > I’m curious about if and when Spark SQL will ever remove its dependency on > Hive

Re: How to flush `block_cache_capacity_mb` easily?

2017-04-07 Thread Dan Burkert
Hi Jason, There is no command to have Kudu evict its block cache, but restarting the tablet server process will have that effect. Ideally all written data will be flushed before the restart, otherwise startup/bootstrap will take a bit longer. Flushing typically happens within 60s of the last

Re: I have a question about KUDU Disk.

2017-03-23 Thread Dan Burkert
Hi Jinsu, There is no limit quota functionality in Kudu, per se, but we do have a flag that configures Kudu to stop using a data directory after the disk has less than a set number of bytes free: -fs_data_dirs_reserved_bytes (Number of bytes to reserve on each data directory filesystem for

Re: mixing range and hash partitioning

2017-03-06 Thread Dan Burkert
itions. Thanks again for the report! - Dan On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert <danburk...@apache.org> wrote: > Yep: https://issues.apache.org/jira/browse/KUDU-1903 > > - Dan > > On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <t...@cloudera.com> wrote: > >

Re: mixing range and hash partitioning

2017-02-28 Thread Dan Burkert
both cases. I've attached a simple program which demonstrates. On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburk...@apache.org> wrote: Hi Paul, I can't reproduce the behavior you are describing, I always get a single unbounded range partition when creating the table without specifyin

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
I only have range partitioning (by commenting out the call to > add_hash_partitions), adding a bounded partition succeeds, regardless of > whether I first drop the unbounded partition. This seems surprising; why > the difference? > > On Fri, Feb 24, 2017 at 4:20 PM, Dan Burker

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
Hi Paul, I think the issue you are running into is that if you don't add a range partition explicitly during table creation (by calling add_range_partition or inserting a split with add_range_partition_split), Kudu will default to creating 1 unbounded range partition. So your two options are to

Re: kudu table design question

2017-02-23 Thread Dan Burkert
Hi Tenny, First off, how many tablet servers are in your cluster? 16 partitions is appropriate for one or maybe two tablet servers, so if your cluster is bigger you could try bumping the number of partitions. Second, the schemas don't look identical, you have an additional 'id' column in the

Re: How to get the health of Kudu

2017-02-16 Thread Dan Burkert
Hi Mike, I think your best bet is the 'ksck' tool, you can see the various options and health checks it exposes by running 'kudu cluster ksck --help'. - Dan On Thu, Feb 16, 2017 at 1:06 PM, Mike Zupan wrote: > Hi all, > > We need to upgrade nodes in the kudu cluster and we

Re: Adding examples to docs?

2017-02-12 Thread Dan Burkert
Hi Darren, Assuming you are asking about Impala syntax, you can find some examples here: https://kudu.apache.org/docs/kudu_impala_integration.html#advanced_partitioning - Dan On Sun, Feb 12, 2017 at 6:37 PM, Darren Hoo wrote: > specifically what is the SQL syntax for

Re: Kudu kerberos flags

2017-02-06 Thread Dan Burkert
Hi Amit, Kerberos support is not yet ready to turn on, it's still being actively worked on. When it's ready for production use we'll remove the 'experimental' designator, and you will see those flags move out of the unsupported section (we also reserve the right to change or remove them while

[ANNOUNCE] Apache Kudu 1.0.1 release

2016-10-11 Thread Dan Burkert
The Apache Kudu team is happy to announce the release of Kudu 1.0.1! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and supports