Hi lifan, I reproduced this problem with sample streaming cube and I created a jira issue for it[https://issues.apache.org/jira/browse/KYLIN-3815]. If you have already fixed it, a pr is welcome. ------------------ Best Regards, Chao Long
------------------ Original ------------------ From: "lifan.su"<suli...@lvwan.com>; Date: Wed, Feb 13, 2019 04:30 PM To: "dev"<dev@kylin.apache.org>; Subject: Unexpected behavior when joinning streaming table and hive table Hello, I am evaluating Kylin and tried to join streaming table and hive table, but now got unexpected behavior. All the scripts can be found in https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195 (may need to modify some script to match the environments) Environment: Centos 7 Hadoop on CDH-5.8 dedicated Kafka-2.1 (not included in CDH) How to reproduce this problem: 1. run gen_station.pl to generate dim table data 2. run import-data.sh to build dim table in Hive 3. run factdata.pl and pipe its output into kafka 4. create tables TEST_WEATHER.STATION_INFO(hive) TEST_WEATHER.WEATHER(streaming) in Kylin 5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID 6. build the cube Expected behavior: The cube is built correctly and I can get data when search. Actual behavior: On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create Intermediate Flat Hive Table) On apache-kylin-2.5.2-bin-cdh57: got empty cube I also tried with this case without streaming, with the format of timestamp column changed to "%Y-%m-%d %H:%M:%S", and an additional table to store the mapping of timestamp and {hour,day,month,year}_start. In this case, the cube is built as expected. In both failed cases, the intermediate fact table on Hive built in step #2 seems to have wrong column order. e.g. on version 2.5.2-cdh57, the schema and content of temp table are shown below: CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact ( DAY_START date ,YEAR_START date ,STATION_ID string ,QUARTER_START date ,MONTH_START date ,TEMPERATURE bigint ,HOUR_START timestamp ) STORED AS SEQUENCEFILE LOCATION 'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact'; ALTER TABLE kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact SET TBLPROPERTIES('auto.purge'='true'); hive> select * from kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact limit 10; OK NULL 2010-01-01 2010-01-01 2010-01-01 2010-01-01 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2010-01-01 2010-01-01 2010-01-01 2010-01-01 NULL NULL NULL 2010-01-01 2010-01-01 2010-01-01 2010-01-01 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2009-01-01 2009-10-01 2009-12-01 2009-12-31 NULL NULL NULL 2010-01-01 2010-01-01 2010-01-01 2010-01-01 NULL NULL Time taken: 0.421 seconds, Fetched: 10 row(s) While the the content of temp file is: # hdfs dfs -text hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-00001 | head -n 10 19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 0030322010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:001706 0075762010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:002605 0113882010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:002963 0214922010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:001769 0303062010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00432 0377712010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00808 0443462010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:001400 0500512010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00342 0537982010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:001587 0597122010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00-1309 (the '\x01' character is not correctly copied) So what am I doing wrong? -- Sent from: http://apache-kylin.74782.x6.nabble.com/