Jungtaek Lim created SPARK-30804: ------------------------------------ Summary: Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog Key: SPARK-30804 URL: https://issues.apache.org/jira/browse/SPARK-30804 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Jungtaek Lim
"compact" operation in FileStreamSourceLog and FileStreamSinkLog is introduced to solve "small files" problem, but introduced non-trivial latency which is another headache in long run query. There're bunch of reports from community for the same issue (see SPARK-24295, SPARK-29995, SPARK-30462) - before trying to solve the problem, it would be better to measure the latency (elapsed time) and log to help indicating the issue when the additional latency becomes concerns. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org