Hi, You can check the "Extract Fact Table Distinct Columns" section in https://kylin.apache.org/docs/howto/howto_optimize_build.html
Usually it may be caused by: 1) cube may have too many dimensions; 2) there is ultra high cardinality column in the dimension list (e.g, a UUID column, timestamp column, etc); 3) hadoop map/reduce memory configuration is small. Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] Ahmad Hammad <[email protected]> 于2021年1月14日周四 上午11:22写道: > Dear , > > hope all is well, > > we are looking to use Apache Kylin instead of SSAS for our business > analysis -dashboard product . we are facing a problem in building the cube > , it contains two hive tables one fact table and one dimension table . > > fact table total number of rows is 47271784 and total size is 5326550430 > as shown in show tblproperties query in hive cmd . > > and dimision tble totoal number of rows is 5261766 and total size is > 1174440814 as shown in show tblproperties query in hive cmd. > > > > > the build process failed in step 3 // > #3 Step Name: Extract Fact Table Distinct Columns > Data Size: 16.19 KB > Duration: 11.78 mins Waiting: 13 seconds > > > the logs give Java heap space Error as follow : > > org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 55 > File System Counters > FILE: Number of bytes read=323698 > FILE: Number of bytes written=29783830 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=252673677 > HDFS: Number of bytes written=16576 > HDFS: Number of read operations=195 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=3 > Job Counters > Failed reduce tasks=4 > Launched map tasks=47 > Launched reduce tasks=5 > Data-local map tasks=47 > Total time spent by all maps in occupied slots (ms)=4363352 > Total time spent by all reduces in occupied slots (ms)=2032100 > Total time spent by all map tasks (ms)=1090838 > Total time spent by all reduce tasks (ms)=508025 > Total vcore-milliseconds taken by all map tasks=1090838 > Total vcore-milliseconds taken by all reduce tasks=508025 > Total megabyte-milliseconds taken by all map tasks=1117018112 > Total megabyte-milliseconds taken by all reduce tasks=520217600 > Map-Reduce Framework > Map input records=47271784 > Map output records=5261813 > Map output bytes=57539075 > Map output materialized bytes=15536194 > Input split bytes=138932 > Combine input records=5261813 > Combine output records=5261813 > Reduce input groups=1 > Reduce shuffle bytes=340412 > Reduce input records=47 > Reduce output records=0 > Spilled Records=5261860 > Shuffled Maps =47 > Failed Shuffles=0 > Merged Map outputs=47 > GC time elapsed (ms)=68095 > CPU time spent (ms)=1246430 > Physical memory (bytes) snapshot=44485660672 > Virtual memory (bytes) snapshot=137661587456 > Total committed heap usage (bytes)=41749577728 > Peak Map Physical memory (bytes)=960831488 > Peak Map Virtual memory (bytes)=2891886592 > Peak Reduce Physical memory (bytes)=305377280 > Peak Reduce Virtual memory (bytes)=2667810816 > Shuffle Errors > BAD_ID=0 > CONNECTION=0 > IO_ERROR=0 > WRONG_LENGTH=0 > WRONG_MAP=0 > WRONG_REDUCE=0 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 > org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter > BYTES=1563833108 > Job Diagnostics:Task failed task_1610370996803_0012_r_000000 > Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 > killedReduces: 0 > > Failure task Diagnostics: > Error: Java heap space > > at org.apache.kylin.engine.mr > .common.MapReduceExecutable.doWork(MapReduceExecutable.java:234) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > > i tried to increase the memory located to Kylin to 17 gb in the setenv.sh > file as recommended > > as follow in setenv.sh file > > export KYLIN_JVM_SETTINGS="-Xms17g -Xmx17g -Xss1024K -XX:MaxPermSize=1g > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:$KYLIN_HOME/logs/kylin.gc.%p -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" > > but still give this error , > > aim using Kylin v3.1.1 on HDP 3.0 , the server resources are 32 GB RAM and > 4 cores i7 CPU. > > please let me know if you need any more information from my side . to > guide us where is the problem with the needed solution , and recommended > setting . > > your quick response is highly appreciated , we need to know how much Kylin > is reliable and what level of support it provides . > > best regards > > Ahmad Hammad > chief technology officer > webiste:http://beyegroup.com/ > mobile:962 79640 1490 > email:[email protected] >
