[ 
https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967441#comment-15967441
 ] 

kangkaisen edited comment on KYLIN-2506 at 4/14/17 6:16 AM:
------------------------------------------------------------

KYLIN-2506 Refactor Global Dictionary:
This commit has run a long time stably in our prod env. This commit contains 
the first nine points in the description.

KYLIN-2506 Refactor ZookeeperDistributedJobLock :
Refactor ZookeeperDistributedJobLock to make it more general,The main points of 
this refactor:
1 move the JobLock interface to core-common module.
2 Add watch interface. so that when zookeeper node change, the client could 
receive the notification.
3 Don't maintain lock patch itself.
4 Make the zkClient to be singleton.
5 Update the function signature and comment.

The concern of this commit is I introduce the curator-recipes dependency to 
core-common module. The reason is the DistributedJobLock.watch need to return 
PathChildrenCache so that the client could close the PathChildrenCache in time.

Any advices about this commit are very much appreciated.

KYLIN-2506 Add distributed lock for GlobalDictionaryBuilder 
The key point of distributed lock for GlobalDictionaryBuilder:

1 Use zookeeper to implement the distributed lock. the lock path is the 
TableName+ColumnName,
the lock will add in GlobalDictionaryBuilder.init and release in 
GlobalDictionaryBuilder.build or throw
exception in GlobalDictionaryBuilder.addValue.

2 when the Kylin thread creating the dict get the lock failed, it will watch 
the path and block current thread with BlockingQueue and when receive the watch 
event and get the lock successfully, the current thread will be awaked.

3 refer to
https://cwiki.apache.org/confluence/display/CURATOR/TN10
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
http://antirez.com/news/101

I use 4 ways to ensure the zookeeper lock is reliable as possible as I can:
a Enlarge the session timeout to 120s
-b Add the listener for ConnectionState.SUSPENDED and 
ConnectionState.LOST-(it's unnecessary)
c Check the client whether keep the lock when commit and process every 
1_000_000 values
d Add sanityCheck when commit the dict in HDFS


was (Author: kangkaisen):
KYLIN-2506 Refactor Global Dictionary:
This commit has run a long time stably in our prod env. This commit contains 
the first nine points in the description.

KYLIN-2506 Refactor ZookeeperDistributedJobLock :
Refactor ZookeeperDistributedJobLock to make it more general,The main points of 
this refactor:
1 move the JobLock interface to core-common module.
2 Add watch interface. so that when zookeeper node change, the client could 
receive the notification.
3 Don't maintain lock patch itself.
4 Make the zkClient to be singleton.
5 Update the function signature and comment.

The concern of this commit is I introduce the curator-recipes dependency to 
core-common module. The reason is the DistributedJobLock.watch need to return 
PathChildrenCache so that the client could close the PathChildrenCache in time.

Any advices about this commit are very much appreciated.

KYLIN-2506 Add distributed lock for GlobalDictionaryBuilder 
The key point of distributed lock for GlobalDictionaryBuilder:

1 Use zookeeper to implement the distributed lock. the lock path is the 
TableName+ColumnName,
the lock will add in GlobalDictionaryBuilder.init and release in 
GlobalDictionaryBuilder.build or throw
exception in GlobalDictionaryBuilder.addValue.

2 when the Kylin thread creating the dict get the lock failed, it will watch 
the path and block current thread with BlockingQueue and when receive the watch 
event and get the lock successfully, the current thread will be awaked.

3 refer to
https://cwiki.apache.org/confluence/display/CURATOR/TN10
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
http://antirez.com/news/101

I use 4 ways to ensure the zookeeper lock is reliable as possible as I can:
a Enlarge the session timeout to 120s
b Add the listener for ConnectionState.SUSPENDED and ConnectionState.LOST
c Check the client whether keep the lock when commit and process every 
1_000_000 values
d Add sanityCheck when commit the dict in HDFS

> Refactor Global Dictionary
> --------------------------
>
>                 Key: KYLIN-2506
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2506
>             Project: Kylin
>          Issue Type: Improvement
>          Components: General
>    Affects Versions: v2.0.0
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>             Fix For: v2.0.0
>
>
> The main points of this refactor:
> 1 Fix the bug that the RemoveListener of LoadingCache swallowed any 
> exceptions when building the GlobalDict.
> 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters.
> 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255.
> 4 Fix the bug that DictNode split failed if value length greater than 255 
> bytes.
> 5 Decouple the build and query of GlobalDict: 
> Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; 
> Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is 
> only readable.
> 6 Remove dependence of LoadingCache when building the GlobalDict.
> 7 Abstract the HDFS operations to GlobalDictStore.
> 8 Abstract the metadata of GlobalDict to GlobalDictMetadata.
> 9 Delete CachedTreeMap.
> 10 Add distributed lock for GlobalDict.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to