Young-Seok Kim has posted comments on this change. Change subject: Make LSM bulkload append-only and write-once. ......................................................................
Patch Set 17: (26 comments) I added some comments. Will continue reviewing. https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java File hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java: Line 230: page.acquireWriteLatch(); Latch doesn't need to be acquired since bloomfilter is created by a single either flush or merge thread and the filter is not shown to any other threads until the operation is over. If we consider a situation where concurrent reader or writer during the bloomfilter building, we may introduce a flag which says whether it's built in exclusive manner or not. More importantly, in general, there should be very careful confiscated pages cleanup operations whenever there are exception during bloomfilter building operation. Otherwise, any exception during the building will make those confiscated pages unavailable pages in the buffer cache. For example, when the following initPage throws an exception in whatever reason, all confiscated pages so far should be returned back to the buffercache. Line 234: page.releaseWriteLatch(false); Latch doesn't need to be acquired since bloomfilter is created by a single either flush or merge thread and the filter is not shown to any other threads until the operation is over. Line 279: page.acquireWriteLatch(); All these latch/unlatch things are unnecessary since confiscated pages seem to be accessed exclusively anyway, right? https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java: Line 19: public interface IFreePageManagerFactory { Should it be named as MetadataManagerFactory? and the method name should be changed accordingly as well? https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java: Line 20: public interface IMetaDataManager { It will be better to add general description of this interface such as the general role/purpose of this interface. https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java: Line 104: int numPages = bufferCache.getNumPagesOfFile(fileId); It's good to have description of what the file looks like when this create() is called such as the first page is meta page whose page number is -1 and so on. According to such description, what's the possible numbers for this numPages? I'm not sure how numPages can be greater than 2 below. Line 115: //initEmptyTree(); Why is this commented out instead of removing the line? Line 185: wasActivated = true; Why there are two flags? Good to have the description of why two flags are needed. In which situation, wasActivated is true and isActive is false? Line 215: bufferCache.deleteFile(fileId, true); why should the pages in deleted file flushed? Line 282: return freePageManager; Maybe better to change the variable name to meta(data)PageManager? Line 366: bufferCache.returnPage(frontierPage); what if returnpage throws exception? Line 367: continue; continue is unnecessary. Line 413: nodeFrontiers.get(i).page.releaseWriteLatch(false); again, why latch should be acquired? Line 424: frontier.page = bufferCache.confiscatePage(-1); why -1? https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java File hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java: Line 100: protected LinkedBlockingQueue<QueueEntry> queue = new LinkedBlockingQueue<QueueEntry>(); LinkedBlockingQueue is not thread-safe. I don't see queue is accessed in a thread-safe manner. Line 101: volatile Thread writerThread; why is writerThread declared as volatile? Line 116: return new PageQueue(bufferCache, writer); Why should PageQueue be created more than once? Since there is only one writer, a global queue can be used for this purpose. If this is for reducing competing among threads, then this PageQueue seems to be used in exclusive manner by one thread. If this is the case, then ConcurrentLinkedQueue and ConcurrentHashMap member variables in the PageQueue don't have to be "Concurrent", right? Line 124: queue.wait(100l); Will it be better to use wait() and notify() pair instead of checking queue repeatedly? What's the benefit of waiting with the time? Who gives notification to the caller of this wait on queue object? Line 154: lowWater.wait(100l); Will it be better to use wait() and notify() pair instead of checking queue repeatedly? What's the benefit of waiting with the time? Line 167: while (!haltWriter) { What's gonna happen in the following case? when there are pages in the queue, haltWriter can be set to true by destroyQueue(). Then, how the rest of the pages in the queue are written to disk? It seems that destroyQueue should set the haltWriter to true when the queue was empty. Am I missing something here? Line 178: page.acquireReadLatch(); why is readLatch acquired, not writeLatch? Also, no other threads will try to read this page anyway. So, the latch is even needed? https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java File hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java: Line 612: if (curPage >= pageReplacementStrategy.getNumPages()) { pageReplacementStrategy.getNumPages() in master branch was modified by Yingyi. So, this should be modified considering that modification. Probably, this getNumPages() call can be outside of the sync block. Line 940: if (victim != null) { Is it possible to factor out this if clause if this logic is similar to the what findPage() does? if not, what's the difference that couldn't allow the refactoring? Line 1016: ((CachedPage) returnPage).virtual.set(true); what's the purpose of this set(true) call? Line 1028: } The following code also looks identical to what findPage does. Line 1056: public void returnPage(ICachedPage page, boolean reinsert) { We need to carefully think about the default value of reinsert flag for merge operation. Flush might be ok, but considering the size of components resulted from merge will be very big and the pages in the merge component less likely to be accessed right after merge. Nonetheless, if those pages go and compete to get the bucket lock, that might introduce unnecessary contention against readers and flusher threads. It's good to see the effect of changing the reinsert flag to false for merge operations with a concurrent feed job and concurrent queries. -- To view, visit https://asterix-gerrit.ics.uci.edu/255 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I80fb891b5310252143854a336b591bf3f8cd4ba7 Gerrit-PatchSet: 17 Gerrit-Project: hyracks Gerrit-Branch: master Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]> Gerrit-Reviewer: Young-Seok Kim <[email protected]> Gerrit-HasComments: Yes
