[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2018-03-14 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
Holding off to do a rebase once @vrozov 's PR #1163 (DRILL-6053) goes into 
Apache.


---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2018-03-02 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
Thanks, @vrozov. I'll make use of a separate lock for read-only purpose in 
case of `#1`.
For `#2`, I need to construct a size-limited ordered set from a list of 
unordered elements.
In this case, the elements (i.e. profiles) need to be ordered by file-name, 
which is a 1:1 mapping function of the start time epoch for the query.
So, I need to be able to add to such a datastructure in `O(log(n))` time, 
remove in `O(1)` and iterate through it in sequence. So, my puts are the most 
expensive operation. 



---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2018-03-02 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/755
  
@kkhatua
1. The read locks are not exclusive (single writer/multiple readers). To 
achieve the required functionality you need to introduce a different lock and 
use write (or exclusive) lock.
2. The choice for TreeSet is not obvious. What are the most common 
operations performed on the collection? Do you optimize for get, put or 
collection construction?

@arina-ielchiieva my github id is `vrozov`.


---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2018-03-02 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
The choice for a `TreeSet` is to basically use a binary structure that 
keeps the (maximum permitted) profiles sorted and in memory. 

When Drill detect changes, 
(Refer 
https://github.com/kkhatua/drill/blob/f7ad29b9a322bb215d16b3c3b9a2bfc40abfc1ed/exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/store/LocalPersistentStore.java#L146)
 
it will fetch all the available profiles in the PStore and reconstruct the 
tree (since the order of the profiles returned by the `FileSystem` is not 
guaranteed). 

I tried using the `PathFilter` to fetch only new profiles, but the cost of 
the `FileSystem` fetching only new profiles, versus the entire list is the 
same! Also, there is the possibility that some profiles might have been deleted 
as new ones were added, so a full reconstruction would take care of that 
scenario as well. 

To evict, as I construct the TreeSet, I simply pop the oldest (by filename) 
entry. The Guava cache options don't seem to provide a way to define the basis 
on which to evict entries.

I believe, @vrozov's work on DRILL-6053 is to address locking during writes 
specifically. The lock I used (and need) is for reads to ensure that multiple 
requests don't trigger an expensive FileSystem call for the same state of the 
PStore. 
e.g. consider T# as timestamps
* `currBasePathModified` = T0 
* _ThreadA_ requests at t=T1 and issues a read-lock
* _ThreadB_ requests at t=T2 but is waiting for read-lock

If the tree exists and no change is detected, _ThreadA_ will use the 
`TreeSet` contents and resume by releasing the lock. 

If the `TreeSet` exists and a change is detected, _ThreadA_ will 
reconstruct the `TreeSet` before using its contents and it will update 
`lastBasePathModified`, before releasing the lock.

When _ThreadB_ gets the read-lock, it discovers that during the wait, the 
`TreeSet` was already updated. So, in terms of t=T2, this is the most recent 
snapshot, so it proceeds to use the treeSet's contents rather than reconstruct. 
That will be deferred to the next request.

We're using the `lastBasePathModified` as a way to provide a 
pseudo-versioned access to the list. That means if there are more profiles 
added *after* _ThreadB_ was waiting for the read-lock, it will not trigger the 
`FileSystem` call right away. 



---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2018-03-02 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
@arina-ielchiieva I need to rebase this on top of the latest master 
considering it was originally based on nearly a year old code. When ready, i'll 
create a new PR or push to this one. Let me know which one works.


---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2017-04-21 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
@sudheeshkatkam Can you please review the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2017-02-21 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
For 8266 profiles, when measured from Chrome browser's Network tool:
```
Load First Time: 2.43s 
Load Second Time (no new profiles): 829ms
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #755: DRILL-5270: Improve loading of profiles listing in the Web...

2017-02-21 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/755
  
A summary of the performance is available in this 
[comment](https://issues.apache.org/jira/browse/DRILL-5270?focusedCommentId=15877119=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15877119)
 on the JIRA (DRILL-5270)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---