Jungtaek Lim created SPARK-30804:
------------------------------------

             Summary: Measure and log elapsed time for "compact" operation in 
CompactibleFileStreamLog
                 Key: SPARK-30804
                 URL: https://issues.apache.org/jira/browse/SPARK-30804
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Jungtaek Lim


"compact" operation in FileStreamSourceLog and FileStreamSinkLog is introduced 
to solve "small files" problem, but introduced non-trivial latency which is 
another headache in long run query.

There're bunch of reports from community for the same issue (see SPARK-24295, 
SPARK-29995, SPARK-30462) - before trying to solve the problem, it would be 
better to measure the latency (elapsed time) and log to help indicating the 
issue when the additional latency becomes concerns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to