maytasm edited a comment on pull request #10371:
URL: https://github.com/apache/druid/pull/10371#issuecomment-690787993


   @jihoonson 
   Thanks for taking a look. Please see answers below.
   
   > * We need documents for new metrics and APIs. Are you planning to add them 
as a follow-up?
   
   I am planning to add the docs and integration tests as a follow-up PR. I am 
planning to get this change in with the implementation (current state of the 
PR) and unit tests (coming very soon). We are also planning to add UI for 
showing these metrics. Hence, this PR will unblock the UI work.
   
   > * If you don't want to add docs in this PR, would you please add 
descriptions on each field in the API response and each metric?
   
   Sure. I'll add it to the PR description 
   
   > * Could you define "snapshot" more precisely? Is it the status of the 
latest auto-compaction run? What would it be if the latest run did nothing due 
to lack of task slots?
   
   Snapshot refers to the statistics from the latest auto-compaction run. It is 
the "snapshot" of the latest auto-compaction. If there is no available slot or 
not enough slot to get to a particular datasource then CompactSegment (after 
using up all the available slot) will iterate the CompactionSegmentIterator 
(NewestSegmentFirstIterator) until it reached the first segment that needs 
compaction for all datasource. This will still allows us to get accurate 
statistic of all datasources. So to answer your question... "scheduleStatus" 
will always be update regardless of slots. "latestScheduledTaskId" can be task 
id from previous coordinator run if the current run did not schedule a task for 
this datasource. 
"byteCompacted"/"segmentCountCompacted"/"intervalCountCompacted" will be the 
same as previous run if there is no slot (meaning no compact task scheduled). 
"byteAwaitingCompaction"/"segmentCountAwaitingCompaction"/"intervalCountAwaitingCompaction"
 will be the same as previous run if th
 ere is no slot (meaning no compact task scheduled) or it may increase if there 
are no data ingested for the datasource between the last run and this run. 
Basically, 
"byteCompacted"/"segmentCountCompacted"/"intervalCountCompacted"/"byteAwaitingCompaction"/"segmentCountAwaitingCompaction"/"intervalCountAwaitingCompaction"
 are always correct statistic of the datasource at the time of the latest run 
even if there is no slot / no task scheduled. 
   
   > * I'm not sure `snapshot` is a good API name since it's not intuitive to 
me what it means. Would `status` be better?
   
   I was thinking of "snapshot" as the snapshot taken at the point in time the 
API is called. I think "status" also works. Let me know if you still think 
"status" is better. I'm open to both
   
   > * Why do you want to distinguish datasources which auto compaction has 
never configured and others which auto compaction has paused? Is it useful to 
return the same statistics for the datasources as well which auto compaction 
has never configured?
   
   For datasources that never has auto compaction enabled then all of these 
statistics are already avaiabled or can be calculated from existing APIs. 
"scheduleStatus" will be "NOT_ENABLED", 
"byteCompacted"/"segmentCountCompacted"/"intervalCountCompacted" will be 0, 
"byteAwaitingCompaction"/"segmentCountAwaitingCompaction"/"intervalCountAwaitingCompaction"
 will be total size, number of segments, number of intervals (these are 
available in sys.segment etc.). "latestScheduledTaskId" will be null. So, not 
including those to reduce unnesseary computation in the API and reduce payload 
size.
   Although, I can see one case that this will not be true which is for 
datasource that never has auto compaction enabled but has manual compaction 
task ran. Do you think it is common and useful for datasource that never has 
auto compaction enabled but has manual compaction task? (Initially, this PR 
only aims to make auto compation more user-friendly and not really care about 
the manual compaction stuff)
   
   > * Why does the response include only one task ID? What will happen if auto 
compaction issues multiple compaction tasks?
   
   Hmm... I am thinking of having the UI component show the status (fail, 
success) of the latest task which could indicate if user action is requried or 
not.
   Another idea may be to return a list of all tasks issued during this run 
(and empty list if no slot). The UI can select which task it want to show. The 
UI can choose the last task ID or it can calcualte success vs fail % rate of 
all tasks in last run.
   
   > * Please list out new metrics in the PR description. It will help release 
manager.
   
   Sure. I'll add it to the PR description 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to