wary commented on code in PR #285: URL: https://github.com/apache/doris-spark-connector/pull/285#discussion_r2011196053
########## spark-doris-connector/spark-doris-connector-base/src/main/java/org/apache/doris/spark/client/write/AbstractStreamLoadProcessor.java: ########## @@ -85,19 +83,19 @@ public abstract class AbstractStreamLoadProcessor<R> extends DorisWriter<R> impl private final Map<String, String> properties; - private final String format; + private final DataFormat format; protected String columnSeparator; - private String lineDelimiter; + private byte[] lineDelimiter; private final boolean isGzipCompressionEnabled; private String groupCommit; private final boolean isPassThrough; - private PipedOutputStream output; + private StreamLoadEntity output; Review Comment: I saw the interval mistake. This PR was originally intended to fix this issue as well, but it had already been fixed by the time I submitted the PR. However, even after fixing this interval mistake, the performance is still much worse compared to version 1.3.2. The test results in our actual scenario are as follows: 1. version with interval mistake: 100.8 GiB, Time > 24H 2. version with interval mistake fixed: 100.8 GiB, Time: 23 min 3. version with this PR, Time: 12 min 4. version 1.3.2, Time: 10 min -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org