wgtmac commented on code in PR #2005:
URL: https://github.com/apache/orc/pull/2005#discussion_r1743961403
##########
c++/include/orc/Reader.hh:
##########
@@ -657,6 +657,12 @@ namespace orc {
* @param rowNumber the next row the reader should return
*/
virtual void seekToRow(uint64_t rowNumber) = 0;
+
+ /**
+ * Get the current stripe position entries for the specified column.
+ * @return the position entries for the specified column.
Review Comment:
```suggestion
* Get the row group positions of the specified column in the current
stripe.
* @return the position entries for the specified column.
```
##########
c++/src/ColumnWriter.hh:
##########
@@ -179,6 +179,15 @@ namespace orc {
*/
virtual void writeDictionary();
+ /**
+ * Finalize the encoding and compressing process. This function should be
+ * called after all data required for encoding has been added. It ensures
+ * that any remaining data is processed and the final state of the streams
+ * is set. Note: the boolean type may break this spec due to some trailing
bits will be written
+ * to the next compression block.
Review Comment:
```suggestion
* that any remaining data is processed and the final state of the
streams
* is set.
* Note: boolean type cannot cut off the current byte if it is not filled
* with 8 bits, otherwise Boolean RLE may incorrectly read the unfilled
* trailing bits. In this case, the last byte will be the head of the
next
* compression block.
```
##########
c++/include/orc/Reader.hh:
##########
@@ -657,6 +657,12 @@ namespace orc {
* @param rowNumber the next row the reader should return
*/
virtual void seekToRow(uint64_t rowNumber) = 0;
+
+ /**
+ * Get the current stripe position entries for the specified column.
+ * @return the position entries for the specified column.
+ */
+ virtual std::vector<std::vector<int>>
getCurrentStripePositionEntries(uint64_t columnId) = 0;
Review Comment:
nit: I was thinking if we should design a better data structure, something
like:
```
struct RowGroupPositions {
uint64_t columnId;
std::vector<int32_t> positions;
}
```
Then it can be reused if we want to add `std::vector<RowGroupPositions>
getPositionEntries(int stripe, std::vector<int> columnIds)` in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]