gongxun0928 commented on code in PR #1364:
URL: https://github.com/apache/cloudberry/pull/1364#discussion_r2419869696


##########
contrib/pax_storage/src/cpp/storage/pax.cc:
##########
@@ -280,14 +282,19 @@ void TableWriter::Open() {
   // insert tuple into the aux table before inserting any tuples.
   cbdb::InsertMicroPartitionPlaceHolder(RelationGetRelid(relation_),
                                         current_blockno_);
+  cur_physical_size_ = 0;
 }
 
 void TableWriter::WriteTuple(TupleTableSlot *slot) {
   Assert(writer_);
   Assert(strategy_);
-  // should check split strategy before write tuple
-  // otherwise, may got a empty file in the disk
-  if (strategy_->ShouldSplit(writer_->PhysicalSize(), num_tuples_)) {
+  // Sampled split check to reduce PhysicalSize() overhead
+  // We first perform a sampled pre-write check to avoid empty files.
+  if ((num_tuples_ % PAX_SPLIT_CHECK_INTERVAL) == 0) {

Review Comment:
   Yeah, because of the CTID constraint, we have to strictly enforce the 
accuracy of the tuple count and make sure it doesn't exceed 
PAX_MAX_NUM_TUPLES_PER_FILE. That's why we kept this precise check here.
   
   On the other hand, the biggest performance hit here is the PhysicalSize() 
function. So to reduce the overhead of calling it so often, we only check the 
file size every PAX_SPLIT_CHECK_INTERVAL tuples.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to