[ https://issues.apache.org/jira/browse/KYLIN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571341#comment-15571341 ]
Dayue Gao commented on KYLIN-2012: ---------------------------------- I found that even after KYLIN-1985, we can only allow user to append columns to lookup table, the reasons are: * LookupTable use ColumnDesc's zerobasedindex to find key columns in SnapshotTable, if users insert/drop column in the middle of hive table, the indexes of ColumnDesc are not aligned with hive. * If users drop trailing unused column of lookup table, query can fail with ArrayIndexOutOfBoundsException at LookupStringTable#convertRow. That's because #columns of SnapshotTable is larger than length(LookupStringTable.colIsDateTime). > more robust approach to hive schema changes > ------------------------------------------- > > Key: KYLIN-2012 > URL: https://issues.apache.org/jira/browse/KYLIN-2012 > Project: Kylin > Issue Type: Bug > Components: Metadata, REST Service, Web > Affects Versions: v1.5.3 > Reporter: Dayue Gao > Assignee: Dayue Gao > Fix For: v1.6.0 > > > Our users occasionally want to change their existing cube, such as > adding/renaming/removing a dimension. Some of these changes require > modifications to its source hive table. So our user changed the table schema > and reloaded its metadata in Kylin, then several issues can happen depends on > what he changed. > I did some schema changing tests based on 1.5.3, the results after reloading > table are listed below > || type of changes || fact table || lookup table || > | *minor* | both query and build still works | query can fail or return wrong > answer | > | *major* | fail to load related cube | fail to load related cube | > {{minor}} changes refer to those doesn't change columns used in cubes, such > as insert/append new column, remove/change unused column. > {{major}} changes are the opposite, like remove/rename/change type of used > column. > Clearly from the table, reload a changed table is problematic in certain > cases. KYLIN-1536 reports a similar problem. > So what can we do to support this kind of iterative development process (load > -> define cube -> build -> reload -> change cube -> rebuild)? > My first thought is simply detect-and-prohibit reloading used table. User > should be able to know which cube is preventing him from reloading, and then > he could drop and recreate cube after reloading. However, defining a cube is > not an easy task (consider editing 100 measures). Force users to recreate > their cube over and over again will certainly not make them happy. > A better idea is to allow cube to be editable even if it's broken due to some > columns changed after reloading. Broken cube can't be built or queried, it > can only be edit or dropped. In fact, there is a cube status called > {{RealizationStatusEnum.DESCBROKEN}} in code, but was never used. We should > take advantage of it. > An enabled cube shouldn't allow schema changes, otherwise an unintentional > reload could make it unavailable. Similarly, a disabled but unpurged cube > shouldn't allow schema changes since it still has data in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)