Copilot commented on code in PR #61521:
URL: https://github.com/apache/doris/pull/61521#discussion_r2958761101


##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
     return Status::OK();
 }
 
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+    if (_data_size < config::column_writer_page_flush_threshold) {
+        return Status::OK();
+    }
+    auto offset = _file_writer->bytes_appended();
+    auto collect_uncompressed_bytes = [](const PageFooterPB& footer) {
+        return footer.uncompressed_size() + footer.ByteSizeLong() +
+               sizeof(uint32_t) /* footer size */ + sizeof(uint32_t) /* 
checksum */;
+    };
+    for (auto& page : _pages) {
+        _total_uncompressed_data_pages_size += 
collect_uncompressed_bytes(page->footer);

Review Comment:
   `collect_uncompressed_bytes` is duplicated here and in `write_data()`. 
Keeping two copies increases the risk of future divergence (especially since 
this PR is explicitly about byte accounting). Consider extracting this into a 
small private helper (e.g., static method) and reusing it in both places.
   



##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
     return Status::OK();
 }
 
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+    if (_data_size < config::column_writer_page_flush_threshold) {
+        return Status::OK();
+    }
+    auto offset = _file_writer->bytes_appended();
+    auto collect_uncompressed_bytes = [](const PageFooterPB& footer) {
+        return footer.uncompressed_size() + footer.ByteSizeLong() +
+               sizeof(uint32_t) /* footer size */ + sizeof(uint32_t) /* 
checksum */;
+    };
+    for (auto& page : _pages) {
+        _total_uncompressed_data_pages_size += 
collect_uncompressed_bytes(page->footer);
+        RETURN_IF_ERROR(_write_data_page(page.get()));
+    }
+    _total_compressed_data_pages_size += _file_writer->bytes_appended() - 
offset;
+    LOG(INFO) << fmt::format(
+            "Flushed {} pages for column_id={}, total uncompressed data pages 
size={}, "
+            "total compressed data pages size={}, flush_data_size={}",
+            _pages.size(), _opts.meta->column_id(), 
_total_uncompressed_data_pages_size,
+            _total_compressed_data_pages_size, _data_size);

Review Comment:
   This flush path logs at INFO every time the cached pages exceed the 
threshold. During heavy ingest this could be extremely frequent and flood BE 
logs. Please consider lowering the verbosity (e.g., VLOG/DEBUG or LOG_EVERY_N) 
or removing the log unless it’s needed for ongoing operational visibility.



##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -876,6 +900,7 @@ Status ScalarColumnWriter::finish_current_page() {
     }
 
     _push_back_page(std::move(page));
+    RETURN_IF_ERROR(_flush_pages_if_needed());
     _first_rowid = _next_rowid;
     return Status::OK();

Review Comment:
   New behavior (eager flushing when cached page bytes reach a threshold) isn’t 
covered by existing segment_v2 column reader/writer tests. Since there are 
already unit tests exercising ColumnWriter in 
`be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp`, please add a 
test case that sets `config::column_writer_page_flush_threshold` low enough to 
force flushing during `append`/`finish_current_page()` and verifies the 
resulting file can still be read correctly (and optionally that the 
compressed/uncompressed byte counters remain consistent).



##########
fe/fe-core/src/main/java/org/apache/doris/httpv2/config/WebServerFactoryCustomizerConfig.java:
##########
@@ -40,6 +40,21 @@ public class WebServerFactoryCustomizerConfig implements 
WebServerFactoryCustomi
     @Override
     public void customize(ConfigurableJettyWebServerFactory factory) {
 
+        // Set HTTP header size for all connectors
+        factory.addServerCustomizers(server -> {
+            for (org.eclipse.jetty.server.Connector connector : 
server.getConnectors()) {
+                if (connector instanceof ServerConnector) {
+                    ServerConnector serverConnector = (ServerConnector) 
connector;
+                    HttpConnectionFactory httpFactory =
+                            
serverConnector.getConnectionFactory(HttpConnectionFactory.class);
+                    if (httpFactory != null) {
+                        HttpConfiguration httpConfig = 
httpFactory.getHttpConfiguration();
+                        
httpConfig.setRequestHeaderSize(Config.jetty_server_max_http_header_size);
+                    }
+                }
+            }
+        });

Review Comment:
   This PR’s title/description focus on BE segment column-writer flushing, but 
this change modifies FE Jetty header handling. If this is intentional, please 
update the PR description/title to mention the FE behavior change; otherwise 
consider dropping this hunk to keep the PR scoped to segment writer 
optimization.



##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
     return Status::OK();
 }
 
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+    if (_data_size < config::column_writer_page_flush_threshold) {
+        return Status::OK();
+    }

Review Comment:
   `_data_size` is `uint64_t` but `config::column_writer_page_flush_threshold` 
is an `int64`. This signed/unsigned comparison can behave unexpectedly for 
negative config values (negative becomes a huge uint64) and may trigger 
compiler warnings. Consider validating the config to be >= 0 (or > 0) and 
casting to `uint64_t` (or changing the config type) before comparing.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to