Agree with feng yu, you need think about whether you need build such
high-cardinality dimension into Cube;

For example, if the column is something like a free text description, or a
timestamp column, it doesn't make sense to have them in Cube, as Kylin is
an OLDAP engine not a common database; you'd better redesign the cube.

If it is something like a "seller_id" (assuming you have a large number of
sellers, like eBay), and you need aggregte the data by each seller_id, this
is a valid case for UHC.

Just think about and then decide how to move on.

2016-01-09 9:52 GMT+08:00 yu feng <[email protected]>:

> assume average size of this column is 32 bytes, 50 millions cardinality
> means 1.5GB, in the step of 'Fact Table Distinct Columns.' mapper need read
> from intermediate table and remove duplicate values(do it in Combiner),
> however, this job will startup more than one mapper and just one reducer,
> therefore, input for reducer is more than 1.5GB and in reduce function
> kylin will create a new Set to contain all unique values, so , this is a
> another 1.5GB.
>
> I have encounter this probelm and I have to change MR config preperty for
> every job, I modify those properties :
>     <property>
>         <name>mapreduce.reduce.java.opts</name>
>         <value>-Xmx6000M</value>
>         <description>Larger heap-size for child jvms of
> reduces.</description>
>     </property>
>
>     <property>
>         <name>mapreduce.reduce.memory.mb</name>
>         <value>8000</value>
>         <description>Larger resource limit for reduces.</description>
>     </property>
> you can check the value of those properties currently used and increase
> them.
>
> At Last, ask yourself Do you really need all detail values of those two
> column, if not , you can create create view to change the source data or
> just do not use dictionary while creating cube, set the length value for
> them in 'Advanced Setting' step..
>
> Hope to be helpful to you.
>
> 2016-01-09 6:17 GMT+08:00 zhong zhang <[email protected]>:
>
> > Hi All,
> >
> > There are two ultra high carnality columns in our cube. Both of them are
> > over 50 million cardinality. When building the cube, it keeps giving us
> the
> > error: Error: GC overhead limit exceeded for the reduce jobs at the
> > step Extract
> > Fact Table Distinct Columns.
> >
> > We've just updated to version1.2.
> >
> > Can anyone give some ideas to solve this issue?
> >
> > Best regards,
> > Zhong
> >
>



-- 
Best regards,

Shaofeng Shi

Reply via email to