Thanks, I’ll see if I can find some available cycles
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, May 12, 2016 6:25 PM
To: user@kudu.incubator.apache.org
Subject: Re: Encryption
On Thu, May 12, 2016 at 9:45 AM, Jordan Birdsell
I've used kudu with an EAV model for sparse data and that worked extremely well
for us with billions of rows and the correct partitioning.
-Chris
On 5/12/16, 3:21 PM, "Dan Burkert"
> wrote:
Hi Ben,
Kudu doesn't support sparse datasets with many
Hi Ben,
Kudu doesn't support sparse datasets with many columns very well. Kudu's
data model looks much more like the relational, structured data model of a
traditional SQL database than HBase's data model. Kudu doesn't yet have a
map column type (or any nested column types), but we do have
Can Kudu handle the use case where sparse data is involved? In many of our
processes, we deal with data that can have any number of columns and many
previously unknown column names depending on what attributes are brought in at
the time. Currently, we use HBase to handle this. Since Kudu is
Thanks for the advice, Dan.
>Instead, take advantage of the index capability of Primary Keys.
Currently I did make the "5-min" field a part of the primary key as well. I
am most likely overdoing it. I will play around with the schema and use
cases around it.
>since each tablet server should only
On Thu, May 12, 2016 at 11:39 AM, Sand Stone wrote:
I don't know how Kudu load balance the data across the tablet servers.
>
Individual tablets are replicated and balanced across all available tablet
servers, for more on that see
Thanks, Dan.
In your scheme, I assume you suggest the range partition on the timestamp.
I don't know how Kudu load balance the data across the tablet servers. For
example, do I need to pre-calculate every day, a list of 5 minutes apart
timestamps at table creation? [assume I have to create a new
Forgot to add the PK specification to the CREATE TABLE, it should have read
as follows:
CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE)
PRIMARY KEY (metric, time);
- Dan
On Thu, May 12, 2016 at 11:12 AM, Dan Burkert wrote:
>
> On Thu, May 12, 2016 at
> Is the requirement to pre-aggregate by time window?
No, I am thinking to create a column say, "minute". It's basically the
minute field of the timestamp column(even round to 5-min bucket depending
on the needs). So it's a computed column being filled in on data ingestion.
My goal is that this
On Thu, May 12, 2016 at 8:32 AM, Chris George
wrote:
> How hard would a predicate based delete be?
> Ie ScanDelete or something.
> -Chris George
>
That might be pretty difficult, since it implicitly assumes cross row
transactional consistency. If consistency isn't
Thanks Todd. From a roadmap perspective, do think this will be the recommended
way of enabling encryption for Kudu or should a design be put together for
something more integrated with Kudu itself?
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, May 12, 2016 12:31 PM
To:
Hi,
A while back we had a thread going about using dm-crypt as a means to encrypt
kudu data. Out of curiosity, has any one actually done this?
Thanks,
Jordan Birdsell
Hi. Presumably I need to write a program to delete the unwanted rows, say,
remove all data older than 3 days, while the table is still ingesting new
data.
How well will this perform for large tables? Both deletion and ingestion
wise.
Or for this specific case that I retire data by day, I should
13 matches
Mail list logo