Re: [PR] Upgrade ClickHouseIO to use ClickHouse Java Client V2 [beam]

via GitHub Tue, 17 Feb 2026 09:49:21 -0800


BentsiLeviav commented on code in PR #37611:
URL: https://github.com/apache/beam/pull/37611#discussion_r2818291180



##########
sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIO.java:
##########
@@ -431,25 +552,46 @@ private void flush() throws Exception {
       }
 
       batchSize.update(buffer.size());
+
+      // Serialize rows to RowBinary format
+      ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
+
+      // Wrap ByteArrayOutputStream with ClickHouseOutputStream
+      try (com.clickhouse.data.ClickHouseOutputStream outputStream =
+          com.clickhouse.data.ClickHouseOutputStream.of(byteStream)) {
+        for (Row row : buffer) {
+          ClickHouseWriter.writeRow(outputStream, schema(), row);

Review Comment:
   To clarify: this PR doesn't change the deduplication behavior - it only 
migrates from the JDBC driver to Java Client v2 while preserving the existing 
logic. The deduplication strategy has been part of ClickHouseIO since its 
initial implementation, and ClickHouse handles deduplication server-side for 
replicated table engines.
   
   I agree that this deduplication pattern is different from typical Beam sinks 
(temp write → finalize pattern).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Upgrade ClickHouseIO to use ClickHouse Java Client V2 [beam]

Reply via email to