Re: what is your typical size of tablet.

2017-08-30 Thread Jean-Daniel Cryans
Hi Denis,

I don't directly manage Kudu clusters but what I've seen ranges anywhere
from 0 bytes to 100GB. I wouldn't recommend going much higher than this
because re-replicating 100GB takes a _long_ time, although it should be a
little better in upcoming 1.5.0 thanks to Hao's work.

Sweet spot is probably more in the low tens of GBs.

Hope this helps,

J-D

On Wed, Aug 30, 2017 at 3:06 AM, Denis Bolshakov 
wrote:

> Hello Kudu community,
>
> Could you please share your typical single tablet size of a table?
>
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.de...@gmail.com
>


DMP/CDP Profile Store

2017-08-30 Thread Benjamin Kim
I was wondering has anyone worked on a DMP/CDP for storing user and
customer profiles in Kudu. Each user will have their base ID's aka identity
graph along with statistics based on their attributes along with tables for
these attributes grouped by category.

Please let me know what you think of my thoughts.

I was thinking of creating a base profile table to store the ID's and
statistics along with unchanging or rarely changing attributes, such as
name, that do not need to be tracked. Next, I would create tables to
categorize groups of attributes, such as user information, behaviors,
geolocation, devices, etc. These attribute tables would have columns for
each attribute and would track changes by only inserting data via a time
stamp column to know when it was entered. Essentially, I would follow the
type 2 slowly changing dimension operandi for data warehouses. For
attributes that expire, we will partition by a time range so that we can
drop off expired data. For attributes where we only need to latest one, we
would add an active column to easily flag and query them after inactivating
older versions.

Any comments or advice would be truly appreciated.

Cheers,
Ben


Re: Why per tablet server 's upper limit is 4TB.

2017-08-30 Thread Denis Bolshakov
Mike Percy answers to @kinglee (from Kudu Slack channel)
there are multiple issues that interact but one issue is that if you have
many tablets you will use many threads. Adar has been focusing on improving
density lately and trying to quantify the scaling limits.


On 30 August 2017 at 13:22, yuyunliuhen  wrote:

> "Recommended maximum amount of stored data, post-replication and
> post-compression, per tablet server is 4TB."
> what will happen if the data more than 4T? the disk is large than before.
> 6T a disk is is common, there any test data or doc?
>
>


-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com