siddharthteotia commented on a change in pull request #4535: Implement DISTINCT clause URL: https://github.com/apache/incubator-pinot/pull/4535#discussion_r315336815
########## File path: pinot-core/src/main/java/org/apache/pinot/core/common/DataBlockCache.java ########## @@ -366,4 +366,74 @@ public boolean equals(Object obj) { return _column.equals(that._column) && _dataType == that._dataType; } } + + /** + * Row Index based APIs for Single Value columns + */ + + public int getSVIntAtIndex(final String column, final int index) { Review comment: At the end to judge uniqueness, we need to look at a row. So simply looking at the return value (which is an array containing all the projected values for the column) from getIntValuesForSVColumn will not help. We should look at each cell from each such array together with the value at the same row index for other projection column arrays to check if row has already been stored in hashset and store it if now That's why I introduced the index based APIs. I see your point though w.r.t making a function call per row... what we can do is use the existing APIs and fetch the values(the array) and then iterate over them to build row, check for existence and store if new. We can avoid the function call overhead this way... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org