Hello,
Thank you for the quick reply! Good to know.
As for the suggestions:
1) use derived columns -> unfortunately the logic of derived dimensions
doesn't match our usecase
2) use multiple cubes -> as well, unfortunately we need to have all
dimensions interconnected in one cube
We are currently evaluating the following approaches to deal with our
situation:
*A.* Use *LIKE* in a naive way:
E.g. we have 60 columns named 'a', 'b', 'c', etc. Each one can have one
of these three values: 'L', 'M', 'H' (+ NULL). So what we did is
"rotate" the whole thing in such a way that the values become column
names and the column names turn into values, all concatenated into one
string(and sorted before concat). Thus, the column names are 'L', 'M'
and 'H' and values would be like: 'acde', 'a', 'h', 'ace', 'dg', etc.
Now, when we want to get a count of *WHERE a = 'M' OR b = 'L'**OR c =
'L'*, we do *WHERE M LIKE '%a%' OR L LIKE '%b%'**OR L LIKE '%c%'*.
It looks very expensive but surprisingly for our amount of data (2
billion rows 6 million out of which have any values in those 60 columns)
currently it works not so bad (response time p99 < 1s).
*B.* *Group several *dimensions (e.g. 3)*into one* and use something
like a *bitmap*.
Again, e.g. we have 60 columns named 'a', 'b', 'c', etc. Each one can
have one of these three values: 'L', 'M', 'H' (+ NULL). So we wanna
group them into, say, columns named 'abc', 'def', 'ghi', etc. Then the
values would be like 'L__', 'LM_', '_HL', 'HLM', etc. (The underscore
means NULL.) Thus, we can imitate *WHERE a = 'M' OR b = 'L'**OR c = 'L'*
with *WHERE abc LIKE 'M__' OR abc LIKE '_L_'**OR c LIKE '__L'*.
Presumably such LIKE expressions with single-character wildcards will be
easier for Kylin to handle than those with '%'. We haven't tried that
yet though.
*C.* Similar to the previous one but *replacing values with integers*
and applying *bitwise operations with bitmasks*. E.g. L would be
"encoded" with 01, M - 10, H - 11. Then a string 'HL_' would look like
110100 -> 52. Then when we want to get an 'L' in the middle ('_L_'), we
could do something like *WHERE abc & 001100 = 01*. BUT it looks like
bitwise operations are not supported neither by ANSI SQL nor by Kylin.
And if they were, we are not sure it would be faster than a dictionary.
Would you please provide us with some feedback on what you think about
the approaches described above and which one you think would actually
work/be the best?
Thank you!
Best regards
Victoria
On 18.06.2016 04:07, ShaoFeng Shi wrote:
Almost true; You can think Kylin is 64 bit, in theory it supports up to 63
dimension in one cube;
There is no plan to extend to 128 or more in near term I believe; Since in
most of the cases the dimension number wouldn't exceed 20, 64 is already
"redundant" and causing extra space;
With so many dimensions, there must be room for optimization; You can try
some ways like:
1) extract some columns to lookup tables, and create them as "derived"
dimension in the cube;
2) or create multiple cubes, each serving a part of these columns;
If you have other way, please also share with the community; Thanks;
2016-06-18 0:01 GMT+08:00 Victoria Tskhay <[email protected]>:
Hello,
It looks like the max number of dimensions in one cube is 62, is that
correct?
We would like to add more than that. That may sound crazy, I know, but we
have a special case where all the dimensions have low cardinality (3) and
the data is very sparse. We already tried with 62 dimensions and it works
great.
Is there any way to work around that limit? What would you suggest? Thank
you!
Best regards
--
Victoria Tskhay
*Java Backend Developer*I glispa GmbH
Sonnenburger Straße 73, 10437 Berlin, Germany
E [email protected] <//
e.mail.ru/compose/?mailto=mailto%[email protected]>
Skype: vikatskhay I www.glispa.com <http://www.glispa.com>
Sitz Berlin, AG Charlottenburg HRB 114678B
--
Victoria Tskhay
*Java Backend Developer*I glispa GmbH
Sonnenburger Straße 73, 10437 Berlin, Germany
E [email protected]
<//e.mail.ru/compose/?mailto=mailto%[email protected]>
Skype: vikatskhay I www.glispa.com <http://www.glispa.com>
Sitz Berlin, AG Charlottenburg HRB 114678B