Hello,

Thank you for the quick reply! Good to know.

As for the suggestions:

1) use derived columns -> unfortunately the logic of derived dimensions doesn't match our usecase 2) use multiple cubes -> as well, unfortunately we need to have all dimensions interconnected in one cube

We are currently evaluating the following approaches to deal with our situation:

*A.* Use *LIKE* in a naive way:

E.g. we have 60 columns named 'a', 'b', 'c', etc. Each one can have one of these three values: 'L', 'M', 'H' (+ NULL). So what we did is "rotate" the whole thing in such a way that the values become column names and the column names turn into values, all concatenated into one string(and sorted before concat). Thus, the column names are 'L', 'M' and 'H' and values would be like: 'acde', 'a', 'h', 'ace', 'dg', etc.

Now, when we want to get a count of *WHERE a = 'M' OR b = 'L'**OR c = 'L'*, we do *WHERE M LIKE '%a%' OR L LIKE '%b%'**OR L LIKE '%c%'*.

It looks very expensive but surprisingly for our amount of data (2 billion rows 6 million out of which have any values in those 60 columns) currently it works not so bad (response time p99 < 1s).

*B.* *Group several *dimensions (e.g. 3)*into one* and use something like a *bitmap*.

Again, e.g. we have 60 columns named 'a', 'b', 'c', etc. Each one can have one of these three values: 'L', 'M', 'H' (+ NULL). So we wanna group them into, say, columns named 'abc', 'def', 'ghi', etc. Then the values would be like 'L__', 'LM_', '_HL', 'HLM', etc. (The underscore means NULL.) Thus, we can imitate *WHERE a = 'M' OR b = 'L'**OR c = 'L'* with *WHERE abc LIKE 'M__' OR abc LIKE '_L_'**OR c LIKE '__L'*. Presumably such LIKE expressions with single-character wildcards will be easier for Kylin to handle than those with '%'. We haven't tried that yet though.

*C.* Similar to the previous one but *replacing values with integers* and applying *bitwise operations with bitmasks*. E.g. L would be "encoded" with 01, M - 10, H - 11. Then a string 'HL_' would look like 110100 -> 52. Then when we want to get an 'L' in the middle ('_L_'), we could do something like *WHERE abc & 001100 = 01*. BUT it looks like bitwise operations are not supported neither by ANSI SQL nor by Kylin. And if they were, we are not sure it would be faster than a dictionary.

Would you please provide us with some feedback on what you think about the approaches described above and which one you think would actually work/be the best?

Thank you!


Best regards
Victoria

On 18.06.2016 04:07, ShaoFeng Shi wrote:
Almost true; You can think Kylin is 64 bit, in theory it supports up to 63
dimension in one cube;

There is no plan to extend to 128 or more in near term I believe; Since in
most of the cases the dimension number wouldn't exceed 20, 64 is already
"redundant" and causing extra space;

With so many dimensions, there must be room for optimization; You can try
some ways like:
1) extract some columns to lookup tables, and create them as "derived"
dimension in the cube;
2) or create multiple cubes, each serving a part of these columns;

If you have other way, please also share with the community; Thanks;


2016-06-18 0:01 GMT+08:00 Victoria Tskhay <[email protected]>:

Hello,

It looks like the max number of dimensions in one cube is 62, is that
correct?

We would like to add more than that. That may sound crazy, I know, but we
have a special case where all the dimensions have low cardinality (3) and
the data is very sparse. We already tried with 62 dimensions and it works
great.

Is there any way to work around that limit? What would you suggest? Thank
you!



Best regards
--
Victoria Tskhay

*Java Backend Developer*I glispa GmbH

Sonnenburger Straße 73, 10437 Berlin, Germany
E [email protected] <//
e.mail.ru/compose/?mailto=mailto%[email protected]>
Skype: vikatskhay I www.glispa.com <http://www.glispa.com>

Sitz Berlin, AG Charlottenburg HRB 114678B




--
Victoria Tskhay

*Java Backend Developer*I glispa GmbH

Sonnenburger Straße 73, 10437 Berlin, Germany
E [email protected] <//e.mail.ru/compose/?mailto=mailto%[email protected]>
Skype: vikatskhay I www.glispa.com <http://www.glispa.com>

Sitz Berlin, AG Charlottenburg HRB 114678B

Reply via email to