[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 Holding off to do a rebase once @vrozov 's PR #1163 (DRILL-6053) goes into Apache. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 Thanks, @vrozov. I'll make use of a separate lock for read-only purpose in case of `#1`. For `#2`, I need to construct a size-limited ordered set from a list of unordered elements. In this case, the elements (i.e. profiles) need to be ordered by file-name, which is a 1:1 mapping function of the start time epoch for the query. So, I need to be able to add to such a datastructure in `O(log(n))` time, remove in `O(1)` and iterate through it in sequence. So, my puts are the most expensive operation. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user vrozov commented on the issue: https://github.com/apache/drill/pull/755 @kkhatua 1. The read locks are not exclusive (single writer/multiple readers). To achieve the required functionality you need to introduce a different lock and use write (or exclusive) lock. 2. The choice for TreeSet is not obvious. What are the most common operations performed on the collection? Do you optimize for get, put or collection construction? @arina-ielchiieva my github id is `vrozov`. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 The choice for a `TreeSet` is to basically use a binary structure that keeps the (maximum permitted) profiles sorted and in memory. When Drill detect changes, (Refer https://github.com/kkhatua/drill/blob/f7ad29b9a322bb215d16b3c3b9a2bfc40abfc1ed/exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/store/LocalPersistentStore.java#L146) it will fetch all the available profiles in the PStore and reconstruct the tree (since the order of the profiles returned by the `FileSystem` is not guaranteed). I tried using the `PathFilter` to fetch only new profiles, but the cost of the `FileSystem` fetching only new profiles, versus the entire list is the same! Also, there is the possibility that some profiles might have been deleted as new ones were added, so a full reconstruction would take care of that scenario as well. To evict, as I construct the TreeSet, I simply pop the oldest (by filename) entry. The Guava cache options don't seem to provide a way to define the basis on which to evict entries. I believe, @vrozov's work on DRILL-6053 is to address locking during writes specifically. The lock I used (and need) is for reads to ensure that multiple requests don't trigger an expensive FileSystem call for the same state of the PStore. e.g. consider T# as timestamps * `currBasePathModified` = T0 * _ThreadA_ requests at t=T1 and issues a read-lock * _ThreadB_ requests at t=T2 but is waiting for read-lock If the tree exists and no change is detected, _ThreadA_ will use the `TreeSet` contents and resume by releasing the lock. If the `TreeSet` exists and a change is detected, _ThreadA_ will reconstruct the `TreeSet` before using its contents and it will update `lastBasePathModified`, before releasing the lock. When _ThreadB_ gets the read-lock, it discovers that during the wait, the `TreeSet` was already updated. So, in terms of t=T2, this is the most recent snapshot, so it proceeds to use the treeSet's contents rather than reconstruct. That will be deferred to the next request. We're using the `lastBasePathModified` as a way to provide a pseudo-versioned access to the list. That means if there are more profiles added *after* _ThreadB_ was waiting for the read-lock, it will not trigger the `FileSystem` call right away. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 @arina-ielchiieva I need to rebase this on top of the latest master considering it was originally based on nearly a year old code. When ready, i'll create a new PR or push to this one. Let me know which one works. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 @sudheeshkatkam Can you please review the PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 For 8266 profiles, when measured from Chrome browser's Network tool: ``` Load First Time: 2.43s Load Second Time (no new profiles): 829ms ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 A summary of the performance is available in this [comment](https://issues.apache.org/jira/browse/DRILL-5270?focusedCommentId=15877119=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15877119) on the JIRA (DRILL-5270) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---