According to 15 days in a batch,I tried to build cube ,cube build succeed,but
wen auto merge cube,an error appeared:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2147)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:2078)
at
org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:239)
at
org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208)
at
org.apache.kylin.dict.DictionaryManager.save(DictionaryManager.java:413)
at
org.apache.kylin.dict.DictionaryManager.saveNewDict(DictionaryManager.java:209)
at
org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:176)
at
org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:269)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:145)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:135)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:67)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
-----邮件原件-----
发件人: Luke Han [mailto:[email protected]]
发送时间: 2016年11月12日 0:1
收件人: [email protected]
抄送: dev
主题: Re: Cube 构建优化咨询
don't try to run such huge job one time, please run them one by one, for
example, run 1 month data and then next one...
Best Regards!
---------------------
Luke Han
2016-11-10 14:54 GMT+08:00 仇同心 <[email protected]>:
> 大家好:
>
> 目前在构建cube时遇到问题:cube维度的基数不是很高,但是度量里的字段基数很高,Build Dimension
> Dictionary
> 就非常的占用本机内存,选取的度量的基数有千万、亿,甚至是十亿左右的,度量大多都是SUM,Count_distinct的精确计算。数据量是10
> 个
> 月的数据,我们是打算一次跑完10个月历史数据,然后在按日增跑作业。
>
> 服务器的内存配置为125G,#4 Step Name: Build Dimension Dictionary
> 会一直在跑很长时间,最后到导致内存溢出。
>
> 对于这种度量基数高的问题,有什么好的优化方案吗?
>
>
>
>
>
>
>
> 谢谢~
>
>
>
>
>
>
>