>From Ian Maxon <[email protected]>: Attention is currently required from: Ritik Raj.
Ian Maxon has posted comments on this change by Ritik Raj. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959?usp=email ) Change subject: [ASTERIXDB-3702][RT][STO] LSM Sampling ...................................................................... Patch Set 7: (7 comments) File hyracks-fullstack/hyracks/hyracks-dataflow-common/src/main/java/org/apache/hyracks/dataflow/common/data/normalizers/Integer8NormalizedKeyComputerFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/91f8c6ee_d7b0b1cc?usp=email : PS7, Line 48: normalizedKeys[keyStart] = value ^ 0x80; should this be left-shifted? and won't the xor act weird because it's in an int and not a byte? File hyracks-fullstack/hyracks/hyracks-storage-am-btree/src/main/java/org/apache/hyracks/storage/am/btree/impls/BatchPredicateWithKeys.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/3ce7bc05_36ff59b6?usp=email : PS7, Line 28: /** : * : */ nit: remove File hyracks-fullstack/hyracks/hyracks-storage-am-btree/src/main/java/org/apache/hyracks/storage/am/btree/impls/DiskBTree.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/81539091_3796846e?usp=email : PS7, Line 204: enumerateLeafPageIds( this sort of makes sense but what if the component is giant? won't this array grow to be like ~125MB if the component is 1TB with 32KB pages? File hyracks-fullstack/hyracks/hyracks-storage-am-btree/src/main/java/org/apache/hyracks/storage/am/btree/impls/DiskBTreeSampleCursor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/bd245708_b62a555b?usp=email : PS7, Line 99: private long totalTimeTakenToFindRandomLeaf = 0; : private long totalTimeTakenToFindRandomTuples = 0; : private long totalLeafDrawBatches = 0; : private long totalLeafDraws = 0; : private long totalLeafDrawUniquePages = 0; : private long totalLeafPins = 0; : private long totalReusedPinnedPageHits = 0; : private boolean endedPreemptively = false; : might be cool to propagate these up somehow to the operator and make it a self-profiling operator so you can push them up into the profile https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/5958b9fa_ef6424b6?usp=email : PS7, Line 149: leafPageIds = bTree.enumerateLeafPageIds(rootPageId, bTreeOpCtx, bufferCacheOpCtx); same here about the size of that array File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-btree/src/main/java/org/apache/hyracks/storage/am/lsm/btree/dataflow/BTreeSampleCollectorOperatorDescriptorNodePushable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/296de2a7_50cee9c1?usp=email : PS7, Line 111: stats.getInputTupleCounter().update(matchingTupleCount); shouldn't this be outside the loop? File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/theta/ThetaSampler.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959/comment/3a674d21_2c9e9cca?usp=email : PS7, Line 194: temp[i] = heap.dequeueLong(); can there be some switch that stops you from accidentally reusing an instance of this object after this method gets called? since it kind of deinitializes it -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20959?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: If5c5b7eac5199b85fc40b722c87ae0bbb73eecbd Gerrit-Change-Number: 20959 Gerrit-PatchSet: 7 Gerrit-Owner: Ritik Raj <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-CC: Ian Maxon <[email protected]> Gerrit-Attention: Ritik Raj <[email protected]> Gerrit-Comment-Date: Fri, 01 May 2026 00:22:48 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No
