Copilot commented on code in PR #61521:
URL: https://github.com/apache/doris/pull/61521#discussion_r2958761101
##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
return Status::OK();
}
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+ if (_data_size < config::column_writer_page_flush_threshold) {
+ return Status::OK();
+ }
+ auto offset = _file_writer->bytes_appended();
+ auto collect_uncompressed_bytes = [](const PageFooterPB& footer) {
+ return footer.uncompressed_size() + footer.ByteSizeLong() +
+ sizeof(uint32_t) /* footer size */ + sizeof(uint32_t) /*
checksum */;
+ };
+ for (auto& page : _pages) {
+ _total_uncompressed_data_pages_size +=
collect_uncompressed_bytes(page->footer);
Review Comment:
`collect_uncompressed_bytes` is duplicated here and in `write_data()`.
Keeping two copies increases the risk of future divergence (especially since
this PR is explicitly about byte accounting). Consider extracting this into a
small private helper (e.g., static method) and reusing it in both places.
##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
return Status::OK();
}
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+ if (_data_size < config::column_writer_page_flush_threshold) {
+ return Status::OK();
+ }
+ auto offset = _file_writer->bytes_appended();
+ auto collect_uncompressed_bytes = [](const PageFooterPB& footer) {
+ return footer.uncompressed_size() + footer.ByteSizeLong() +
+ sizeof(uint32_t) /* footer size */ + sizeof(uint32_t) /*
checksum */;
+ };
+ for (auto& page : _pages) {
+ _total_uncompressed_data_pages_size +=
collect_uncompressed_bytes(page->footer);
+ RETURN_IF_ERROR(_write_data_page(page.get()));
+ }
+ _total_compressed_data_pages_size += _file_writer->bytes_appended() -
offset;
+ LOG(INFO) << fmt::format(
+ "Flushed {} pages for column_id={}, total uncompressed data pages
size={}, "
+ "total compressed data pages size={}, flush_data_size={}",
+ _pages.size(), _opts.meta->column_id(),
_total_uncompressed_data_pages_size,
+ _total_compressed_data_pages_size, _data_size);
Review Comment:
This flush path logs at INFO every time the cached pages exceed the
threshold. During heavy ingest this could be extremely frequent and flood BE
logs. Please consider lowering the verbosity (e.g., VLOG/DEBUG or LOG_EVERY_N)
or removing the log unless it’s needed for ongoing operational visibility.
##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -876,6 +900,7 @@ Status ScalarColumnWriter::finish_current_page() {
}
_push_back_page(std::move(page));
+ RETURN_IF_ERROR(_flush_pages_if_needed());
_first_rowid = _next_rowid;
return Status::OK();
Review Comment:
New behavior (eager flushing when cached page bytes reach a threshold) isn’t
covered by existing segment_v2 column reader/writer tests. Since there are
already unit tests exercising ColumnWriter in
`be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp`, please add a
test case that sets `config::column_writer_page_flush_threshold` low enough to
force flushing during `append`/`finish_current_page()` and verifies the
resulting file can still be read correctly (and optionally that the
compressed/uncompressed byte counters remain consistent).
##########
fe/fe-core/src/main/java/org/apache/doris/httpv2/config/WebServerFactoryCustomizerConfig.java:
##########
@@ -40,6 +40,21 @@ public class WebServerFactoryCustomizerConfig implements
WebServerFactoryCustomi
@Override
public void customize(ConfigurableJettyWebServerFactory factory) {
+ // Set HTTP header size for all connectors
+ factory.addServerCustomizers(server -> {
+ for (org.eclipse.jetty.server.Connector connector :
server.getConnectors()) {
+ if (connector instanceof ServerConnector) {
+ ServerConnector serverConnector = (ServerConnector)
connector;
+ HttpConnectionFactory httpFactory =
+
serverConnector.getConnectionFactory(HttpConnectionFactory.class);
+ if (httpFactory != null) {
+ HttpConfiguration httpConfig =
httpFactory.getHttpConfiguration();
+
httpConfig.setRequestHeaderSize(Config.jetty_server_max_http_header_size);
+ }
+ }
+ }
+ });
Review Comment:
This PR’s title/description focus on BE segment column-writer flushing, but
this change modifies FE Jetty header handling. If this is intentional, please
update the PR description/title to mention the FE behavior change; otherwise
consider dropping this hunk to keep the PR scoped to segment writer
optimization.
##########
be/src/olap/rowset/segment_v2/column_writer.cpp:
##########
@@ -816,6 +816,30 @@ Status ScalarColumnWriter::_write_data_page(Page* page) {
return Status::OK();
}
+Status ScalarColumnWriter::_flush_pages_if_needed() {
+ if (_data_size < config::column_writer_page_flush_threshold) {
+ return Status::OK();
+ }
Review Comment:
`_data_size` is `uint64_t` but `config::column_writer_page_flush_threshold`
is an `int64`. This signed/unsigned comparison can behave unexpectedly for
negative config values (negative becomes a huge uint64) and may trigger
compiler warnings. Consider validating the config to be >= 0 (or > 0) and
casting to `uint64_t` (or changing the config type) before comparing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]