??????????:????kylin??????????????????????

2019-02-13 Thread Chao Long
Hi Chen,
   Kylin provides a way to backup 
metadata[http://kylin.apache.org/cn/docs/howto/howto_backup_metadata.html].
you can recover data from backup metadata.
   If you want to migrate cube from a Kylin environment to another, you can use 
cube migration tool[http://kylin.apache.org/cn/docs/howto/howto_use_cli.html 
#CubeMigrationCLI.java]. (Note that the different Kylin environments should 
share the same Hadoop cluster, including HDFS, HBase and HIVE)
--
Best Regards,
Chao Long


--  --
??: "chen snowlake";
: 2019??2??14??(??) 10:43
??: "dev@kylin.apache.org";

: :kylin??



Hi Chao Long



Kylin??

   ??



SnowLake



??8??5114??

Email??che...@outlook.com




??: Chao Long 
: Wednesday, February 13, 2019 6:57:41 PM
??: dev
: ??[Kylin] 
/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??

Hi Chen
   sequence??cubesegment 
mergemerge??merge 
hfilemerge
--
Best Regards,
Chao Long


--  --
??: "chen snowlake";
: 2019??2??13??(??) 6:11
??: "dev@kylin.apache.org";

: [Kylin] 
/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??



Dear All??
??kylin
Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??Kylin??Hbasesegment


Cube build??
   ??
>>  cuboid data
>> convert cuboid data To Hfile  
>> Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/hfilehfile
>> Hfile load To Hbase hile
??cuboid data  
??build??hfile

SnowLake

Re: Unexpected behavior when joinning streaming table and hive table

2019-02-13 Thread ShaoFeng Shi
Using hour as the partition column should be fine. From the data, it seems
the declared column sequence is not matched with the persisted data.

Lifan, I see you posted the cube JSON, could you please also provide the
model's JSON? That would help to analysis the problem. Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Xiaoxiang Yu  于2019年2月14日周四 上午10:34写道:

> Hi, lifei
>
> After check your model.json, I found you use "HOUR_START" as your
> partition_date_column, which is not correct.
> I think you should change to "timestamp" and have another try.
>
> Source code at
> https://github.com/apache/kylin/blob/master/source-kafka/src/main/java/org/apache/kylin/source/kafka/TimedJsonStreamParser.java#L111
>
> If you find any mistake, please let me know.
>
> 
> Best wishes,
> Xiaoxiang Yu
>
>
> On [DATE], "[NAME]" <[ADDRESS]> wrote:
>
> Hello, I am evaluating Kylin and tried to join streaming table and hive
> table, but now got unexpected behavior.
>
> All the scripts can be found in
> https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195
> (may need to modify some script to match the environments)
>
> Environment:
> Centos 7
> Hadoop on CDH-5.8
> dedicated Kafka-2.1 (not included in CDH)
>
> How to reproduce this problem:
>
> 1. run gen_station.pl to generate dim table data
> 2. run import-data.sh to build dim table in Hive
> 3. run factdata.pl and pipe its output into kafka
> 4. create tables TEST_WEATHER.STATION_INFO(hive)
> TEST_WEATHER.WEATHER(streaming) in Kylin
> 5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID
> 6. build the cube
>
> Expected behavior:
> The cube is built correctly and I can get data when search.
>
> Actual behavior:
> On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create
> Intermediate Flat Hive Table)
> On apache-kylin-2.5.2-bin-cdh57: got empty cube
>
> I also tried with this case without streaming, with the format of
> timestamp
> column changed to "%Y-%m-%d %H:%M:%S", and an additional table to
> store the
> mapping of timestamp and {hour,day,month,year}_start.
> In this case, the cube is built as expected.
>
>
> In both failed cases, the intermediate fact table on Hive built in
> step #2
> seems to have wrong column order.
> e.g. on version 2.5.2-cdh57, the schema and content of temp table are
> shown
> below:
>
> CREATE EXTERNAL TABLE IF NOT EXISTS
> kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
> (
> DAY_START date
> ,YEAR_START date
> ,STATION_ID string
> ,QUARTER_START date
> ,MONTH_START date
> ,TEMPERATURE bigint
> ,HOUR_START timestamp
> )
> STORED AS SEQUENCEFILE
> LOCATION
>
> 'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact';
> ALTER TABLE
> kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
> SET
> TBLPROPERTIES('auto.purge'='true');
>
> hive> select * from
> kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
> limit
> 10;
> OK
> NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01
>   NULL
> NULL
> NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31
>   NULL
> NULL
> NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01
>   NULL
> NULL
> Time taken: 0.421 seconds, Fetched: 10 row(s)
>
> While the the content of temp file is:
> # hdfs dfs -text
>
> hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-1
> | head -n 10
> 19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded &
> initialized
> native-zlib library
> 19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
>  

????:????kylin??????????????????????

2019-02-13 Thread chen snowlake
Hi Chao Long



Kylin??

   ??



SnowLake



??8??5114??

Email??che...@outlook.com




??: Chao Long 
: Wednesday, February 13, 2019 6:57:41 PM
??: dev
: ??[Kylin] 
/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??

Hi Chen
   sequence??cubesegment 
mergemerge??merge 
hfilemerge
--
Best Regards,
Chao Long


--  --
??: "chen snowlake";
: 2019??2??13??(??) 6:11
??: "dev@kylin.apache.org";

: [Kylin] 
/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??



Dear All??
??kylin
Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??Kylin??Hbasesegment


Cube build??
   ??
>>  cuboid data
>> convert cuboid data To Hfile  
>> Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/hfilehfile
>> Hfile load To Hbase hile
??cuboid data  
??build??hfile

SnowLake


Re: Unexpected behavior when joinning streaming table and hive table

2019-02-13 Thread Xiaoxiang Yu
Hi, lifei

After check your model.json, I found you use "HOUR_START" as your 
partition_date_column, which is not correct. 
I think you should change to "timestamp" and have another try. 

Source code at 
https://github.com/apache/kylin/blob/master/source-kafka/src/main/java/org/apache/kylin/source/kafka/TimedJsonStreamParser.java#L111

If you find any mistake, please let me know.


Best wishes,
Xiaoxiang Yu 
 

On [DATE], "[NAME]" <[ADDRESS]> wrote:

Hello, I am evaluating Kylin and tried to join streaming table and hive
table, but now got unexpected behavior.

All the scripts can be found in
https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195
(may need to modify some script to match the environments)

Environment: 
Centos 7
Hadoop on CDH-5.8
dedicated Kafka-2.1 (not included in CDH)

How to reproduce this problem:

1. run gen_station.pl to generate dim table data
2. run import-data.sh to build dim table in Hive
3. run factdata.pl and pipe its output into kafka
4. create tables TEST_WEATHER.STATION_INFO(hive)
TEST_WEATHER.WEATHER(streaming) in Kylin
5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID
6. build the cube

Expected behavior:
The cube is built correctly and I can get data when search.

Actual behavior:
On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create
Intermediate Flat Hive Table)
On apache-kylin-2.5.2-bin-cdh57: got empty cube

I also tried with this case without streaming, with the format of timestamp
column changed to "%Y-%m-%d %H:%M:%S", and an additional table to store the
mapping of timestamp and {hour,day,month,year}_start.
In this case, the cube is built as expected. 


In both failed cases, the intermediate fact table on Hive built in step #2
seems to have wrong column order.
e.g. on version 2.5.2-cdh57, the schema and content of temp table are shown
below:

CREATE EXTERNAL TABLE IF NOT EXISTS
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
(
DAY_START date
,YEAR_START date
,STATION_ID string
,QUARTER_START date
,MONTH_START date
,TEMPERATURE bigint
,HOUR_START timestamp
)
STORED AS SEQUENCEFILE
LOCATION

'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact';
ALTER TABLE
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact SET
TBLPROPERTIES('auto.purge'='true');

hive> select * from
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact limit
10;
OK
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  
NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  
NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  
NULL   
NULL
Time taken: 0.421 seconds, Fetched: 10 row(s)

While the the content of temp file is:
# hdfs dfs -text

hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-1
| head -n 10
19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
0030322010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001706
0075762010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002605
0113882010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002963
0214922010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001769
0303062010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00432

Unexpected behavior when joinning streaming table and hive table

2019-02-13 Thread lifan.su
Hello, I am evaluating Kylin and tried to join streaming table and hive
table, but now got unexpected behavior.

All the scripts can be found in
https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195
(may need to modify some script to match the environments)

Environment: 
Centos 7
Hadoop on CDH-5.8
dedicated Kafka-2.1 (not included in CDH)

How to reproduce this problem:

1. run gen_station.pl to generate dim table data
2. run import-data.sh to build dim table in Hive
3. run factdata.pl and pipe its output into kafka
4. create tables TEST_WEATHER.STATION_INFO(hive)
TEST_WEATHER.WEATHER(streaming) in Kylin
5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID
6. build the cube

Expected behavior:
The cube is built correctly and I can get data when search.

Actual behavior:
On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create
Intermediate Flat Hive Table)
On apache-kylin-2.5.2-bin-cdh57: got empty cube

I also tried with this case without streaming, with the format of timestamp
column changed to "%Y-%m-%d %H:%M:%S", and an additional table to store the
mapping of timestamp and {hour,day,month,year}_start.
In this case, the cube is built as expected. 


In both failed cases, the intermediate fact table on Hive built in step #2
seems to have wrong column order.
e.g. on version 2.5.2-cdh57, the schema and content of temp table are shown
below:

CREATE EXTERNAL TABLE IF NOT EXISTS
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
(
DAY_START date
,YEAR_START date
,STATION_ID string
,QUARTER_START date
,MONTH_START date
,TEMPERATURE bigint
,HOUR_START timestamp
)
STORED AS SEQUENCEFILE
LOCATION
'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact';
ALTER TABLE
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact SET
TBLPROPERTIES('auto.purge'='true');

hive> select * from
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact limit
10;
OK
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2009-01-01  2009-10-01  2009-12-01  2009-12-31  NULL   
NULL
NULL2010-01-01  2010-01-01  2010-01-01  2010-01-01  NULL   
NULL
Time taken: 0.421 seconds, Fetched: 10 row(s)

While the the content of temp file is:
# hdfs dfs -text
hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-1
| head -n 10
19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
0030322010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001706
0075762010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002605
0113882010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002963
0214922010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001769
0303062010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00432
0377712010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00808
0443462010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001400
0500512010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00342
0537982010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001587
0597122010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:00-1309
(the '\x01' character is not correctly copied)

So what am I doing wrong?

--
Sent from: http://apache-kylin.74782.x6.nabble.com/


答复: spark.yarn.executor.memoryOverhead' has been deprecated problem

2019-02-13 Thread Na Zhai
Hi, yang

I can not see your picture, you can try to add these in the attachment. And if 
you want to find out why the Spark task failed, go to check the Yarn resource 
manager or Spark history server. And if you have resolved this problem, welcome 
to share to the community.

Best wishes!

发送自 Windows 10 版邮件应用


发件人: yang xiao 
发送时间: Tuesday, February 12, 2019 10:17:48 AM
收件人: dev@kylin.apache.org
主题: spark.yarn.executor.memoryOverhead' has been deprecated problem

Hi all,


Building cube encounter an error is about spark.yarn.executor.memoryOverhead 
parameter.

[image.png]

Kylin version is 2.6.0
Spark version is 2.3.x

I have already changed the corresponding parameter in 
/usr/local/apache-kylin-2.6.0-bin/conf/kylin.properties and restart the kylin 
,but it seems that the metadata in kylin system page do not affect by this 
change.

[image.png]
[image.png]

Does anyone knows how to solve this problem or gives some hints?



答复: 查询结果为0

2019-02-13 Thread Na Zhai
Hi,廉立伟.



What’s your Kylin version? There is an issue about Chinese characters: 
https://issues.apache.org/jira/browse/KYLIN-3705. If your Kylin version is 
lower than 2.5.2, I advise you to upgrade to the latest Kylin version.



Here are some tips for getting more useful info: check the “Here are some tips 
for you when encountering problems with Kylin” in this page 
http://kylin.apache.org/docs/gettingstarted/faq.html



Best wishes!



发送自 Windows 10 版邮件应用




发件人: 廉立伟 
发送时间: Monday, February 11, 2019 5:18:45 PM
收件人: dev@kylin.apache.org
主题: 查询结果为0

你好
select

 count(*)

 from HUOBI_GLOBAL.HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY

where HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY.EXCHANGE_NAME = 'b11'
我这样查询有结果
但是where条件中为中文结果为0,这个怎么解决呢
select

 count(*)

 from HUOBI_GLOBAL.HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY

where HUOBI_YUNYING_DW_KYLIN_USER_INDEX_DAILY.EXCHANGE_NAME = '中国'

结果为0


[jira] [Created] (KYLIN-3814) Add pause interval for job retry

2019-02-13 Thread PENG Zhengshuai (JIRA)
PENG Zhengshuai created KYLIN-3814:
--

 Summary: Add pause interval for job retry
 Key: KYLIN-3814
 URL: https://issues.apache.org/jira/browse/KYLIN-3814
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: PENG Zhengshuai
Assignee: PENG Zhengshuai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


??????[Kylin] /kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??????????????

2019-02-13 Thread Chao Long
Hi Chen
   sequence??cubesegment 
mergemerge??merge 
hfilemerge
--
Best Regards,
Chao Long


--  --
??: "chen snowlake";
: 2019??2??13??(??) 6:11
??: "dev@kylin.apache.org";

: [Kylin] 
/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??



Dear All??
??kylin
Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid??Kylin??Hbasesegment


Cube build??
   ??
>>  cuboid data
>> convert cuboid data To Hfile  
>> Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/hfilehfile
>> Hfile load To Hbase hile
??cuboid data  
??build??hfile

SnowLake

[Kylin] /kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid是否为中间数据

2019-02-13 Thread chen snowlake
Dear All:
因为数据备份问题,关注kylin的后台存储,这里咨询一个问题
Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/cuboid这个目录的下我测试发现它的数据量大小和Kylin的Hbase中的segment一致,我将其删除后查询时不影响的
我的问题的是:这里的数据是否是有意保持的

Cube build过程的后几步:
   。。。
>> 写出 cuboid data
>> convert cuboid data To Hfile  
>> 将在Hdfs://${HAname}/kylin/kylin_metadata/kylin-${jobid}/${cubename}/hfile下面输出hfile文件
>> Hfile load To Hbase 完成后上面的hile目录下数据会移走
我理解cuboid data  数据是中间数据,在build成功后,查询时不会使用的,那是否应该在完成hfile转换后删除呢

SnowLake