Some misc thoughts on this line.

- Could abstract a DimensionEncoder to encode dimension on rowkey.
Currently there are two ways of encoding -- dictionary and fixed len.
- For long text description, they could be stored in hbase value instead of
rowkey. This will make them much slower to filter, but still much better
than on rowkey.

On Sat, Jan 9, 2016 at 4:00 PM, yu feng <[email protected]> wrote:

> Let me have a try to explain it.
>
> Cube size determines how to split region for table in hbase after generate
> all cuboid files, for example, If all of your cuboid file size is 100GB,
> your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin
> will create hbase table with 10 regions. it will calculate every start
> rowkey and end rowkey of every region before create htable. then create
> table with those split infomations.
>
> Rowkey column length is another thing, you can choose either use dictionary
> or set rowkey column length for every dimension , If you use dictionary,
> kylin will build dictionary for this column(Trie tree), it means every
> value of the dimension will be encoded as a unique number value, because
> dimension value is a part of hbase rowkey, so it will reduce hbase table
> size with dictionary. However, kylin store the dictionary in memory, if
> dimension cardinality is large, It will become something bad. If you set
> rowkey
> column length to N for one dimension, kylin will not build dictionary for
> it, and every value will be cutted to a N-length string, so, no dictionary
> in memory, rowkey in hbase table will be longer.
>
> Hope to be helpful to you.
>
> 2016-01-09 13:00 GMT+08:00 Kiriti Sai <[email protected]>:
>
> > Hi,
> > When using an UHC dimension, I've disabled the dictionary for that
> > dimension in the advanced settings and set the rowkey column length as
> 100
> > since it's something like a text description. The data has around 6.6
> > billion rows and I guess the cardinality is nearly 1 billion for this
> row.
> > I know Kylin is not suitable to be used in such scenario, but can someone
> > please explain me the relationship between the cube size and the rowkey
> > column length. I'm asking this question just out of curiosity, since I
> > haven't found any explanation relating these two.
> >
> > Thank You.
> >
>

Reply via email to