How to increase map and reduce task number during #3 Step Name: Extract Fact Table Distinct Columns

2017-02-13 Thread peter zhang
Hi all,
I have a cube build very slowly, especially the step 3* Extract Fact Table
Distinct Columns*.
I checked job log at YARN, found the map and task number is small then
previous 2 steps. Could any one tell me how to determine the map and reduce
task number to increase job parallel.

Thanks!


关于kylin查询问题咨询

2017-02-13 Thread 仇同心
大家好:
目前在使用中遇到一个kylin查询慢的场景:
Cube  设计如下:
{
  "uuid": "dfb77a08-f51d-4088-b559-1da67c28a068",
  "last_modified": 1486997875953,
  "version": "1.6.0",
  "name": "insu_t",
  "model_name": "insu_jdmall_model_test",
  "description": "",
  "null_string": null,
  "dimensions": [
{
  "name": "BRAND",
  "table": "DMT.DMT_KYLIN_JDMALL_ORDR_DTL_I_D",
  "column": "BRAND_CD",
  "derived": null
},
{
  "name": "DT",
  "table": "DMT.DMT_KYLIN_JDMALL_ORDR_DTL_I_D",
  "column": "DT",
  "derived": null
},
{
  "name": "DIM.DIM_DAY_DERIVED",
  "table": "DIM.DIM_DAY",
  "column": null,
  "derived": [
"DIM_DAY_NAME",
"DIM_WEEK_NAME",
"DIM_MONTH_NAME"
  ]
},
{
  "name": "FIRST",
 "table": "DIM.DIM_ITEM_GEN_THIRD_CATE_D",
  "column": "ITEM_FIRST_CATE_NAME",
  "derived": null
},
{
  "name": "SECOND",
  "table": "DIM.DIM_ITEM_GEN_THIRD_CATE_D",
  "column": "ITEM_SECOND_CATE_NAME",
  "derived": null
},
{
  "name": "THIRD",
  "table": "DIM.DIM_ITEM_GEN_THIRD_CATE_D",
  "column": "ITEM_THIRD_CATE_NAME",
  "derived": null
}
  ],
  "measures": [
{
  "name": "_COUNT_",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "constant",
  "value": "1",
  "next_parameter": null
},
"returntype": "bigint"
  },
  "dependent_measure_ref": null
},
{
  "name": "QTTY",
  "function": {
"expression": "SUM",
"parameter": {
  "type": "column",
  "value": "SALE_QTTY",
  "next_parameter": null
},
"returntype": "bigint"
  },
  "dependent_measure_ref": null
},
{
  "name": "BEFORE",
  "function": {
"expression": "SUM",
"parameter": {
  "type": "column",
  "value": "BEFORE_PREFR_AMOUNT",
  "next_parameter": null
},
"returntype": "decimal(25,4)"
  },
  "dependent_measure_ref": null
},
{
  "name": "USER",
  "function": {
"expression": "SUM",
"parameter": {
  "type": "column",
  "value": "USER_ACTUAL_PAY_AMOUNT",
  "next_parameter": null
},
   "returntype": "decimal(25,4)"
  },
  "dependent_measure_ref": null
},
{
  "name": "SALE",
  "function": {
"expression": "COUNT_DISTINCT",
"parameter": {
  "type": "column",
  "value": "SALE_ORD_ID",
  "next_parameter": null
},
"returntype": "bitmap"
  },
  "dependent_measure_ref": null
}
  ],
  "dictionaries": [],
  "rowkey": {
"rowkey_columns": [
  {
"column": "BRAND_CD",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "DT",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "ITEM_FIRST_CATE_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "ITEM_SECOND_CATE_NAME",
"encoding": "dict",
"isShardBy": false
  },
  {
"column": "ITEM_THIRD_CATE_NAME",
"encoding": "dict",
"isShardBy": false
  }
]
  },
  "hbase_mapping": {
"column_family": [
  {
"name": "F1",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "_COUNT_",
  "QTTY",
  "BEFORE",
  "USER"
]
  }
]
  },
  {
"name": "F2",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "SALE"
]
  }
]
  }
]
  },
  "aggregation_groups": [
{
  "includes": [
"BRAND_CD",
"DT",
"ITEM_FIRST_CATE_NAME",
"ITEM_SECOND_CATE_NAME",
"ITEM_THIRD_CATE_NAME"
  ],
  "select_rule": {
"hierarchy_dims": [],
"mandatory_dims": [],
"joint_dims": []
  }
}
  ],
  "signature": "Kl5sPTVN78bEYTGKoUOsWg==",
  "notify_list": [],
  "status_need_notify": [
"ERROR",
"DISCARDED",
"SUCCEED"
  ],
  "partition_date_start": 148374720,
  "partition_date_end": 31536,
  "auto_merge_time_ranges": [
60480,
241920
  ],
  "retention_range": 0,
  "engine_type": 2,
  "storage_type": 2,
  "override_kylin_properties": {
"kylin.hbase.region.cut": "1"
  }
}

数据量是14天的数据,sale_ord_id的基数是1.5亿
Select dt,item_second_cate_name,count(distinct sale_ord_id),sum(sale_qtty)
from DMT.DMT_KYLIN_JDMALL_ORDR_DTL_I_D a
left join dim.dim_day b on a.dt = b.dim_day_txdate
left join DIM.DIM_ITEM_GEN_THIRD_CATE_D c on a.item_third_cate_cd = 
c.item_third_cate_id
group by dt,item_second_cate_name;

这条语句执行时间是37秒,去掉count(distinct sale_ord_id)后查询时0.07秒
dt,item_second_cate_name 

Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Do you have the Resource Manager in a dedicated node ?(without container or
Node Manager)

2017-02-13 17:38 GMT+01:00 不清 <452652...@qq.com>:

> I check the configure in CM。
>
> Java Heap Size of ResourceManager in Bytes =1536 MiB
> Container Memory Minimum =1GiB
>
> Container Memory Increment =512MiB
>
> Container Memory Maximum =8GiB
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:34
> *收件人:* "user";
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> check this
> :
> "Basically, it means RM can only allocate memory to containers in
> increments of .  . . "
>
> TIP: is your RM in a work node? If this is true, this can be the problem
> (Its good idea put yarn master, RM, in a dedicated node)
>
>
> 2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:
>
>> how can i get this heap size?
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Alberto Ramón";;
>> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
>> *收件人:* "user";
>> *主题:* Re: kylin job stop accidentally and can resume success!
>>
>> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
>> size for RM
>> Kylin loose connectivity whit RM
>>
>> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>>
>>> hello,kylin community!
>>>
>>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>>
>>> kylin log is like :
>>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>>> hbase.HBaseResourceStore:262 : Update row 
>>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 0 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 1 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 2 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>>> mapred.ClientServiceDelegate:273 : Application state is completed.
>>> FinalApplicationStatus=KILLED. Redirecting to job history server
>>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>>
>>> CM log is like:
>>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>>> User Name: tmn
>>> Queue: root.tmn
>>> State: KILLED
>>> Uberized: false
>>> Submitted: Sun Feb 12 19:19:24 CST 2017
>>> Started: Sun Feb 12 19:19:38 CST 2017
>>> Finished: Sun Feb 12 20:30:13 CST 2017
>>> Elapsed: 1hrs, 10mins, 35sec
>>> Diagnostics:
>>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>>> 10.180.212.38
>>> Job received Kill while in RUNNING state.
>>> Average Map Time 24mins, 48sec
>>>
>>> mapreduce job log
>>> Task KILL is received. Killing attempt!
>>>
>>> and when this happened ,by resume job,the job can resume success! I mean
>>>  it is not stop by error!
>>>
>>> what's the problem?
>>>
>>> My hadoop cluster is very busy,this situation happens very often.
>>>
>>> can I set retry time and retry  Interval?
>>>
>>
>>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
check this
:
"Basically, it means RM can only allocate memory to containers in
increments of .  . . "

TIP: is your RM in a work node? If this is true, this can be the problem
(Its good idea put yarn master, RM, in a dedicated node)


2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:

> how can i get this heap size?
>
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
> *收件人:* "user";
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
> size for RM
> Kylin loose connectivity whit RM
>
> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>
>> hello,kylin community!
>>
>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>
>> kylin log is like :
>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>> hbase.HBaseResourceStore:262 : Update row 
>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>> mapred.ClientServiceDelegate:273 : Application state is completed.
>> FinalApplicationStatus=KILLED. Redirecting to job history server
>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>
>> CM log is like:
>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>> User Name: tmn
>> Queue: root.tmn
>> State: KILLED
>> Uberized: false
>> Submitted: Sun Feb 12 19:19:24 CST 2017
>> Started: Sun Feb 12 19:19:38 CST 2017
>> Finished: Sun Feb 12 20:30:13 CST 2017
>> Elapsed: 1hrs, 10mins, 35sec
>> Diagnostics:
>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>> 10.180.212.38
>> Job received Kill while in RUNNING state.
>> Average Map Time 24mins, 48sec
>>
>> mapreduce job log
>> Task KILL is received. Killing attempt!
>>
>> and when this happened ,by resume job,the job can resume success! I mean
>>  it is not stop by error!
>>
>> what's the problem?
>>
>> My hadoop cluster is very busy,this situation happens very often.
>>
>> can I set retry time and retry  Interval?
>>
>
>


?????? kylin job stop accidentally and can resume success!

2017-02-13 Thread ????
how can i get this heap size??




--  --
??: "Alberto Ram??n";;
: 2017??2??14??(??) 0:17
??: "user"; 

: Re: kylin job stop accidentally and can resume success!



Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size for 
RM

Kylin loose connectivity whit RM


2017-02-13 17:00 GMT+01:00  <452652...@qq.com>:
hello,kylin community!


sometimes my jobs stop accidenttly.It is can stop by any step.


kylin log is like :
2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8] hbase.HBaseResourceStore:262 : 
Update row /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02 from oldTs: 
1486999611524, to newTs: 1486999621545, operation result: true
2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying 
connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, 
sleepTime=1000 MILLISECONDS)
2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying 
connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, 
sleepTime=1000 MILLISECONDS)
2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying 
connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, 
sleepTime=1000 MILLISECONDS)
2017-02-13 23:27:15,495 INFO  [pool-8-thread-8] 
mapred.ClientServiceDelegate:273 : Application state is completed. 
FinalApplicationStatus=KILLED. Redirecting to job history server
2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 : 
updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02



CM log is like:
Job Name:   Kylin_Cube_Builder_user_all_cube_2_only_msisdn
User Name:  tmn
Queue:  root.tmn
State:  KILLED
Uberized:   false
Submitted:  Sun Feb 12 19:19:24 CST 2017
Started:Sun Feb 12 19:19:38 CST 2017
Finished:   Sun Feb 12 20:30:13 CST 2017
Elapsed:1hrs, 10mins, 35sec
Diagnostics:
Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at 10.180.212.38
Job received Kill while in RUNNING state.
Average Map Time24mins, 48sec



mapreduce job log
Task KILL is received. Killing attempt!


and when this happened ,by resume job,the job can resume success! I mean  it is 
not stop by error!


what's the problem?


My hadoop cluster is very busy,this situation happens very often.


can I set retry time and retry  Interval?

Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size
for RM
Kylin loose connectivity whit RM

2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:

> hello,kylin community!
>
> sometimes my jobs stop accidenttly.It is can stop by any step.
>
> kylin log is like :
> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
> hbase.HBaseResourceStore:262 : Update row 
> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
> mapred.ClientServiceDelegate:273 : Application state is completed.
> FinalApplicationStatus=KILLED. Redirecting to job history server
> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>
> CM log is like:
> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
> User Name: tmn
> Queue: root.tmn
> State: KILLED
> Uberized: false
> Submitted: Sun Feb 12 19:19:24 CST 2017
> Started: Sun Feb 12 19:19:38 CST 2017
> Finished: Sun Feb 12 20:30:13 CST 2017
> Elapsed: 1hrs, 10mins, 35sec
> Diagnostics:
> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
> 10.180.212.38
> Job received Kill while in RUNNING state.
> Average Map Time 24mins, 48sec
>
> mapreduce job log
> Task KILL is received. Killing attempt!
>
> and when this happened ,by resume job,the job can resume success! I mean
>  it is not stop by error!
>
> what's the problem?
>
> My hadoop cluster is very busy,this situation happens very often.
>
> can I set retry time and retry  Interval?
>


Re: 求助有一个超大维度

2017-02-13 Thread ShaoFeng Shi
root cause:  java.lang.OutOfMemoryError: GC overhead

How many dimensions in the cube? if you can share the cube JSON, that would
be helpful for trouble shooting.

2017-02-13 10:53 GMT+08:00 不清 <452652...@qq.com>:

> kylin社区,您好!
>
> 是手机号作为维度,这个维度的去重值在500w~1500w。
> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>
> 对于超大维度,kylin需要进行什么设置么?
>
> 我使用的kylin版本是1.6.
>
> 谢谢
>
> 报错步骤是 build cube
> map任务耗时特别长,最后还报错了,如下
> Error: java.io.IOException: Failed to build cube in mapper 36 at
> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.
> cleanup(InMemCuboidMapper.java:145) at 
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1642) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:163) Caused by: 
> java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException: java.io.IOException:
> java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.
> FutureTask.report(FutureTask.java:122) at java.util.concurrent.
> FutureTask.get(FutureTask.java:188) at org.apache.kylin.engine.mr.
> steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143) ... 8 more
> Caused by: java.lang.RuntimeException: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:84)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:128)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder.
> build(DoggedCubeBuilder.java:75) at org.apache.kylin.cube.inmemcubing.
> AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:82) ... 5
> more Caused by: java.io.IOException: java.lang.RuntimeException:
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.abort(DoggedCubeBuilder.java:196)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$
> BuildOnce.checkException(DoggedCubeBuilder.java:169) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:116)
> ... 7 more Caused by: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:289)
> Caused by: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.throwExceptionIfAny(InMemCubeBuilder.java:226)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:186) at org.apache.kylin.cube.
> inmemcubing.InMemCubeBuilder.build(InMemCubeBuilder.java:137) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:284)
> Caused by: java.lang.OutOfMemoryError: Java heap space at
> java.math.BigInteger.(BigInteger.java:973) at
> java.math.BigInteger.valueOf(BigInteger.java:957) at
> java.math.BigDecimal.inflate(BigDecimal.java:3519) at
> java.math.BigDecimal.unscaledValue(BigDecimal.java:2205) at
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:56)
> at 
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:33)
> at org.apache.kylin.measure.MeasureCodec.encode(MeasureCodec.java:76) at
> org.apache.kylin.measure.BufferedMeasureCodec.encode(BufferedMeasureCodec.java:93)
> at org.apache.kylin.gridtable.GTAggregateScanner$AggregationCache$
> ReturningRecord.load(GTAggregateScanner.java:412) at
> org.apache.kylin.gridtable.GTAggregateScanner$AggregationCache$2.next(GTAggregateScanner.java:355)
> at 
> org.apache.kylin.gridtable.GTAggregateScanner$AggregationCache$2.next(GTAggregateScanner.java:342)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> scanAndAggregateGridTable(InMemCubeBuilder.java:436) at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.aggregateCuboid(InMemCubeBuilder.java:399)
> at 

Re: 求助有一个超大维度

2017-02-13 Thread Alberto Ramón
for B: its a java option, (. . . java.opts)
  Check if your JVM isn't very old, there are a lot of optimizacions for GC
in last versions of Java 8

TIP 1: Check if you can reduce dimensionality of cube or use AGG to make
lighter the build process
You canTake some ideas from this


TIP 2: solve first problem A, because if you enlarge Heap, the B will be
worst


2017-02-13 10:16 GMT+01:00 不清 <452652...@qq.com>:

> thanks for reply!
>
> For error A, I can set these parameters in kylin.
>
> But for error B,should I fix this problem for whole hadoop cluster?  Can
> you speak the parameter fix in detail?
>
> This really helped us a lot!
>
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月13日(星期一) 下午3:58
> *收件人:* "user";
> *主题:* Re: 求助有一个超大维度
>
> Hello 不清
>
>
> From your errors: "Failed to build cube in mapper " &
> A- "java.lang.OutOfMemoryError: Java heap space at java" &
> B- "java.lang.OutOfMemoryError: GC overhead limit"
>
> For error A:Check override this parameters from kylin:
>
>
> *   kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g  *
>
> *   kylin.job.mr.config.override.mapreduce.map.memory.mb=8500*
>
>
>
> *For error B:  (this is more complicated)*
>
> *   Check you are using Java 8 or higer*
>
> *   Try with this *-XX:+UseG1GC
>
>Explanation: https://wiki.apache.org/solr/ShawnHeisey
>
>
> yes, use Integer dictionary is the best option
>
>
>
> 2017-02-13 3:53 GMT+01:00 不清 <452652...@qq.com>:
>
>> kylin社区,您好!
>>
>> 是手机号作为维度,这个维度的去重值在500w~1500w。
>> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>>
>> 对于超大维度,kylin需要进行什么设置么?
>>
>> 我使用的kylin版本是1.6.
>>
>> 谢谢
>>
>> 报错步骤是 build cube
>> map任务耗时特别长,最后还报错了,如下
>> Error: java.io.IOException: Failed to build cube in mapper 36 at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:145)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
>> by: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> java.io.IOException: java.io.IOException: java.lang.RuntimeException:
>> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
>> java.util.concurrent.FutureTask.report(FutureTask.java:122) at
>> java.util.concurrent.FutureTask.get(FutureTask.java:188) at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143)
>> ... 8 more Caused by: java.lang.RuntimeException: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:84) at java.util.concurrent.Executors
>> $RunnableAdapter.call(Executors.java:471) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.build(DoggedCubeBuilder.java:128) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder.build(DoggedCubeBuilder.java:75) at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:82) ... 5 more Caused by:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.abort(DoggedCubeBuilder.java:196) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder$BuildOnce.checkException(DoggedCubeBuilder.java:169)
>> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.build(DoggedCubeBuilder.java:116) ... 7 more Caused by:
>> java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThr
>> ead.run(DoggedCubeBuilder.java:289) Caused by: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.throwExce
>> ptionIfAny(InMemCubeBuilder.java:226) at org.apache.kylin.cube.inmemcub

Re: New document: "How to optimize cube build"

2017-02-13 Thread ShaoFeng Shi
correct.

Get Outlook for iOS




On Mon, Feb 13, 2017 at 3:52 PM +0800, "Ajay Chitre"  
wrote:










In this case, if user runs a query with a WHERE clause that has 2 dimensions 
from the "aggregation group" & 2 dimensions from the "other 5 dimensions", 
Kylin will compute the results from the base cuboid, correct? Or would it error 
out?

I can test it myself but I am being lazy -:) Looking for a quick answer from 
the experts. Thanks for your help.
On Sun, Feb 12, 2017 at 3:04 AM, ShaoFeng Shi  wrote:
Ajay,
There is no such a setting, but the "aggregation group" has something similar; 
say the cube totally has 15 dimensions, but in the agg group you only pick up 
10 dimensions, then Kylin will build totally 1 (base cuboid) + 2^10 -1 
(combinations of the 10 dimensions); Use this way you can leave those 5 
dimension only appear on the base cuboid.
2017-02-09 9:20 GMT+08:00 Ajay Chitre :
My question was a general question. Not any specific issue that I am 
encountering -:)

I understand that we can prune by using Hierarchical dimensions, aggregation 
groups etc. But what if these types of aggregations are not possible.

Let's say I've 15 dimensions (& I can't prune any), would Kylin build 32,766 
Cuboids or is there a property to say... "If no. of dimensions are over X, stop 
building more Cuboids. Get from the base"? (Knowing this will slow down the 
queries).

Please let me know. Thanks.


On Mon, Feb 6, 2017 at 5:43 AM, ShaoFeng Shi  wrote:
Ajay, thanks for your feedback;
For question 1, the code has been merged in master branch; next release would 
be 2.0; a beta release will be published soon.
For question 2, yes your understanding is correct: a N dim FULL cube will have 
2^N - 1 cuboids; but if you adopted some way like hierarchy, joint or 
separating dimensions to multi groups, it will be a "partial" cube which means 
some cuboids will be pruned. 
If a query uses dimensions across aggregation groups, then only the base cuboid 
can fulfill it, kylin has to do the post aggregation from the base cuboid, the 
performance would be downgraded. Please check whether it's this case in your 
side.
Get Outlook for iOS




On Mon, Feb 6, 2017 at 2:05 PM +0900, "Ajay Chitre"  
wrote:










Thanks for writing this document. It's very helpful. I've following questions:

1) Doc says... "Kylin will build dictionaries in memory (in next version this 
will be moved to MR)".

Which version can we expect this in? For large Cubes this process takes a long 
time on local machine. We really need to move this to the Hadoop cluster. In 
fact, it will be great if we can have an option to run this under Spark -:) 

2) About the "Build N-Dimension Cuboid" step.

Does Kylin build ALL Cuboids? My understanding is:

Total no. of Cuboids = (2 to the power of # of dimensions) - 1

Correct?

So if there are 7 dimensions, there will be 127 Cuboids, right? Does Kylin 
create ALL of them?

I was under the impression that, after some point, Kylin will just get measures 
from the Base Cuboid; instead of building all of them. Please explain.

Thanks.



On Sat, Feb 4, 2017 at 2:19 AM, Li Yang  wrote:
Be free to update the document with different opinions. :-)

On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi  wrote:
Hi Alberto,
Thanks for your comments! In many cases the data is imported to Hadoop in T+1 
mode. Especially when everyday's data is tens of GB, it is reasonable to 
partition the Hive table by date. The problem is whether it worth to keep a 
long history data in Hive; Usually user only keep a couple monthes' data in 
Hive; If the partition number exceeds the threshold in Hive, he/she can remove 
the oldest partitions or move to another table easily; That is a common 
practice of Hive I think, and it is very good to know that Hive 2.0 will solve 
this. 
2017-01-25 17:10 GMT+08:00 Alberto Ramón :
Be careful about partition by "FLIGHTDATE"

>From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance

"Option 1: Use id_date as partition column on Hive table. This have a big
 problem: the Hive metastore is meant for few hundred of partitions not 
thousand (Hive 9452 there is an idea to solve this isn’t in progress)"

In Hive 2.0 will be a preview (only for testing) to solve this

2017-01-25 9:46 GMT+01:00 ShaoFeng Shi :
Hello,
A new document is added for the practices of cube build. Any suggestion or 
comment is welcomed. We can update the doc later with feedbacks;
Here is the link:https://kylin.apache.org/docs16/howto/howto_optimize_build.html

-- 
Best regards,
Shaofeng Shi 史少锋







-- 
Best regards,
Shaofeng Shi 史少锋
















-- 
Best regards,
Shaofeng Shi 史少锋