wary commented on code in PR #285:
URL: 
https://github.com/apache/doris-spark-connector/pull/285#discussion_r2011196053


##########
spark-doris-connector/spark-doris-connector-base/src/main/java/org/apache/doris/spark/client/write/AbstractStreamLoadProcessor.java:
##########
@@ -85,19 +83,19 @@ public abstract class AbstractStreamLoadProcessor<R> 
extends DorisWriter<R> impl
 
     private final Map<String, String> properties;
 
-    private final String format;
+    private final DataFormat format;
 
     protected String columnSeparator;
 
-    private String lineDelimiter;
+    private byte[] lineDelimiter;
 
     private final boolean isGzipCompressionEnabled;
 
     private String groupCommit;
 
     private final boolean isPassThrough;
 
-    private PipedOutputStream output;
+    private StreamLoadEntity output;

Review Comment:
   I saw the interval mistake. This PR was originally intended to fix this 
issue as well, but it had already been fixed by the time I submitted the PR. 
However, even after fixing this interval mistake, the performance is still much 
worse compared to version 1.3.2. The test results in our actual scenario are as 
follows:
   1. version with interval mistake: 100.8 GiB, Time > 24H
   2. version with interval mistake fixed: 100.8 GiB, Time: 23 min
   3. version with this PR, Time: 12 min
   4. version 1.3.2, Time: 10 min



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to