Murtadha Hubail has posted comments on this change. Change subject: ASTERIXDB-1337: Dataset Memory Management on Multi-Partition NC ......................................................................
Patch Set 1: (6 comments) https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-common/src/main/java/org/apache/asterix/common/context/AsterixVirtualBufferCacheProvider.java File asterix-common/src/main/java/org/apache/asterix/common/context/AsterixVirtualBufferCacheProvider.java: Line 40: .getVirtualBufferCaches(datasetID, ctx.getTaskAttemptId().getTaskId().getPartition()); This partition value is a task based partition (the number of partitions this task is split into which always starts from 0) and not the cluster storage partition id. For example, if you execute a query on a metadata dataset, this partition id will always be 0, whereas the storage partition for the metadata datasets could be something completely different. Therefore, this will allocate an extra VBC to this dataset and will make it exceed its limit. Similarly, in case fault tolerance is enabled, the NC will be responsible for extra storage partitions on the same dataset, and will cause extra VBCs to be allocated. I believe the best thing to do here is to pass the IO Device number. You can get it from the file split of the callers of this method as opDesc.getFileSplitProvider().getFileSplits()[partition].getIODeviceId() https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-common/src/main/java/org/apache/asterix/common/context/DatasetLifecycleManager.java File asterix-common/src/main/java/org/apache/asterix/common/context/DatasetLifecycleManager.java: Line 746: vbcs = initializeVirtualBufferCaches(partition); You might want to add a check here to make sure the number of VBCs of a dataset is <= numPartitions. https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-metadata/src/main/java/org/apache/asterix/metadata/bootstrap/MetadataBootstrap.java File asterix-metadata/src/main/java/org/apache/asterix/metadata/bootstrap/MetadataBootstrap.java: Line 353: .getVirtualBufferCaches(index.getDatasetId().getId(), metadataPartition.getPartitionId()); If you agree that the best thing is to pass the io device number, this needs to be changed to the IO device number of the metadataPartition. https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMBTreeLocalResourceMetadata.java File asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMBTreeLocalResourceMetadata.java: Line 67: List<IVirtualBufferCache> virtualBufferCaches = runtimeContextProvider.getVirtualBufferCaches(datasetID, partition); You need to pass the IO device number from RecoveryManager#startRecoveryRedoPhase of the locaResource partition. You need to add a method in PersistentLocalResourceRepository that takes a partition and return the partition IO device number on this node. (similar to PersistentLocalResourceRepository#getPartitionPath). https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMInvertedIndexLocalResourceMetadata.java File asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMInvertedIndexLocalResourceMetadata.java: Line 77: List<IVirtualBufferCache> virtualBufferCaches = runtimeContextProvider.getVirtualBufferCaches(datasetID, partition); You need to pass the IO device number from RecoveryManager#startRecoveryRedoPhase of the locaResource partition. You need to add a method in PersistentLocalResourceRepository that takes a partition and return the partition IO device number on this node. (similar to PersistentLocalResourceRepository#getPartitionPath) https://asterix-gerrit.ics.uci.edu/#/c/705/1/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMRTreeLocalResourceMetadata.java File asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/LSMRTreeLocalResourceMetadata.java: Line 79: List<IVirtualBufferCache> virtualBufferCaches = runtimeContextProvider.getVirtualBufferCaches(datasetID, partition); You need to pass the IO device number from RecoveryManager#startRecoveryRedoPhase of the locaResource partition. You need to add a method in PersistentLocalResourceRepository that takes a partition and return the partition IO device number on this node. (similar to PersistentLocalResourceRepository#getPartitionPath). -- To view, visit https://asterix-gerrit.ics.uci.edu/705 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbf08f532c1210c30be6a51c73570a789174213b Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Michael Blow <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: Yes
