[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Reviewed-on: http://gerrit.cloudera.org:8080/13425 Tested-by: Impala Public Jenkins Reviewed-by: Michael Ho --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 297 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 9 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 18 Jun 2019 06:05:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 15 Jun 2019 03:57:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3632/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 23:08:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4480/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 22:26:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#7). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 297 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/7 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4475/ -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 19:38:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4475/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 16:16:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 6 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 16:16:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: "impala-cache-file-"; > it currently creates the trace file with open() flags to truncate the exist Yes, makes sense. Just wanna make sure those files won't accidentally pile up over time. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 04:59:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3624/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 01:24:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/3625/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 01:17:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#4). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 296 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/4 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 5: r4 fixes the comments, r5 is a rebase (had a trivial conflict to deal with from the mtime fix adding some DCHECKs in the key encoding path) -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:39:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#5). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 296 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/5 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656 PS2, Line 656: // Limit the write concurrency to avoid blocking the caller (which could be calling > I think we usually go with "start the arguments on a new line indented by f Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@870 PS2, Line 870: "); > We usually use all-caps for constants Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874 PS2, Line 874: case STORE: jw.String("S"); break; > see above, google style guide disagrees Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88 PS3, Line 88: "(Advanced) Collect a trace of all lookups in the data cache."); > nit: the indent here and other places in this change seems to be different Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: impala-cache-trace.txt > The data cache by default removes all files with names containing prefix CA it currently creates the trace file with open() flags to truncate the existing file if there is one, so it's essentially the same behavior. If a user wants to archive them before restarting, I think it's up to them. Is that what you were asking? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116 PS3, Line 116: if (file_) Flush(); > Does it make sense to also call file_->Close() here or does the d'tor take dtor handles it http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148 PS3, Line 148: > nit: unnecessary blank line. Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186 PS3, Line 186: unique_ptr underlying_logger_; : unique_ptr logger_; > Would be nice to briefly document these variables. Done http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498 PS3, Line 498: if (FLAGS_data_cache_enable_tracing) { : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME)); : RETURN_IF_ERROR(tracer_->Start()); : } > I wonder how this may affect the available storage space allocated for the yea, I didn't want to add the complexity of accounting for the trace file, since as you said, this is meant for experimentation. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887 PS3, Line 887: char buf[BUF_LEN]; > nit: same name as 'buf' declared outside of the scope. While functionally c Done -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 14 Jun 2019 00:30:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: (8 comments) LGTM. Mostly nits. http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG@10 PS3, Line 10: trace.txt impala-cache-trace.txt ? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88 PS3, Line 88: "(Advanced) Collect a trace of all lookups in the data cache."); nit: the indent here and other places in this change seems to be different from the existing code. We usually indent 4 for line continuation. Seems more consistent to follow the indent 4 convention. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98 PS3, Line 98: impala-cache-trace.txt The data cache by default removes all files with names containing prefix CACHE_FILE_PREFIX at startup. I suppose we want to keep the trace files across restart to allow more flexibility for experiment, right ? In other words, it's up to users to remove the trace files. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116 PS3, Line 116: if (file_) Flush(); Does it make sense to also call file_->Close() here or does the d'tor take care of it ? http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148 PS3, Line 148: nit: unnecessary blank line. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186 PS3, Line 186: unique_ptr underlying_logger_; : unique_ptr logger_; Would be nice to briefly document these variables. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498 PS3, Line 498: if (FLAGS_data_cache_enable_tracing) { : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME)); : RETURN_IF_ERROR(tracer_->Start()); : } I wonder how this may affect the available storage space allocated for the data cache ? The traces are relatively small compared to the cached data but I wonder if a long running Impala process may accumulate a somewhat large trace file. On the other hand, tracing is mostly for experimentation so it's not a severe concern for production environment. http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887 PS3, Line 887: char buf[BUF_LEN]; nit: same name as 'buf' declared outside of the scope. While functionally correct, it'd be less confusing to have no name collision. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 03 Jun 2019 22:37:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: Code-Review+1 (1 comment) LGTM, I'd be curious how others feel about the indentation, but I don't intend to hold the change up. http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656 PS2, Line 656: // Limit the write concurrency to avoid blocking the caller (which could be calling > Enclosed in {} but the style guide says to align the parameters on the seco I think we usually go with "start the arguments on a new line indented by four spaces and continue at that 4 space indent." throughout the codebase, though historically we've also started at the same line as the function call and then continued with 4 spaces, e.g. in L227, L510, and others. I'm personally used to our style and therefore am in favor of consistency over guidelines (https://google.github.io/styleguide/cppguide.html#Existing_Non-conformant_Code ). -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 03 Jun 2019 20:20:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3484/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 03 Jun 2019 19:18:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#3). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 293 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/3 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@504 PS2, Line 504: expect_misses); > nit: indent 4 spaces here and below see above http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h@269 PS2, Line 269: /// The file name used for the access trace. > nit: newline before this one Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@88 PS2, Line 88: c > nit: Capitalize first word Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@97 PS2, Line 97: const char* DataCache::Partition::TRACE_FILE_NAME = "impala-cache-trace.txt"; > Should we include the partition name in the trace to make it easier to copy I think concatenation should be fine, or if we ask users to collect them, we would provide a script that can do the right thing (eg using tar to preserve the original paths). http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@159 PS2, Line 159: explicit Tracer(string path) > nit: single line Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@165 PS2, Line 165: logger_->Stop(); > nit: single line Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@413 PS2, Line 413: return UNALIGNED_LOAD64(key_.data() + key_.size() - sizeof(int64_t) * 2); > Can you add a comment to point out that these hard-code the key layout? Sho Rearranged the key and added some constants to make this clearer. I didn't end up adding the extra name_len field since it seemed a bit wasteful of memory and not much gain. http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@544 PS2, Line 544: if (tracer_) tracer_->Trace(Tracer::MISS, cache_key, bytes_to_read, /*entry_len=*/-1); > nit: the rest of the file uses explicit pointer comparison (see above), sho Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656 PS2, Line 656: if (tracer_) tracer_->Trace(Tracer::STORE_FAILED_BUSY, cache_key, > style: Enclose multiple lines in {}, indent 4 spaces Enclosed in {} but the style guide says to align the parameters on the second line with the parameters on the first line: https://google.github.io/styleguide/cppguide.html#Function_Calls : bool result = DoSomething(averyveryveryverylongargument1, argument2, argument3); http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@860 PS2, Line 860: S! > Would it make automatic parsing easier if this was a single character, e.g. Done http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@871 PS2, Line 871: C > Can this be a static assert? I don't think CalcualteBase64EscapedLen is constexpr (it's defined in the .cc, not the header) http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874 PS2, Line 874:sizeof(hash), buf, buf_len); > nit: indent 4 spaces see above, google style guide disagrees http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@903 PS2, Line 903: > nit: extra empty line Done -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 03 Jun 2019 18:33:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 2: (15 comments) Had a question about the naming of the trace file and otherwise just a bunch of nits. http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@492 PS2, Line 492: SCOPED_TRACE TIL about SCOPED_TRACE :) http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@504 PS2, Line 504: expect_misses); nit: indent 4 spaces here and below http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h@269 PS2, Line 269: /// The file name used for the access trace. nit: newline before this one http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@88 PS2, Line 88: c nit: Capitalize first word http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@97 PS2, Line 97: const char* DataCache::Partition::TRACE_FILE_NAME = "impala-cache-trace.txt"; Should we include the partition name in the trace to make it easier to copy them to a common directory, i.e. when we ask users to collect them? Or do you envision that they concatenate them instead? http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@159 PS2, Line 159: explicit Tracer(string path) nit: single line http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@165 PS2, Line 165: logger_->Stop(); nit: single line http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@413 PS2, Line 413: return UNALIGNED_LOAD64(key_.data() + key_.size() - sizeof(int64_t) * 2); Can you add a comment to point out that these hard-code the key layout? Should we switch to an aligned struct (mtime, offset, name_len, filename) instead? http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@544 PS2, Line 544: if (tracer_) tracer_->Trace(Tracer::MISS, cache_key, bytes_to_read, /*entry_len=*/-1); nit: the rest of the file uses explicit pointer comparison (see above), should we keep it consistent? http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656 PS2, Line 656: if (tracer_) tracer_->Trace(Tracer::STORE_FAILED_BUSY, cache_key, style: Enclose multiple lines in {}, indent 4 spaces http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@860 PS2, Line 860: S! Would it make automatic parsing easier if this was a single character, e.g. F? http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@870 PS2, Line 870: buf_len We usually use all-caps for constants http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@871 PS2, Line 871: C Can this be a static assert? nit: missing space http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874 PS2, Line 874:sizeof(hash), buf, buf_len); nit: indent 4 spaces http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@903 PS2, Line 903: nit: extra empty line -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 31 May 2019 02:37:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13425 ) Change subject: IMPALA-8542. Add an access trace for the data cache .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/3414/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 28 May 2019 20:32:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13425 to look at the new patch set (#2). Change subject: IMPALA-8542. Add an access trace for the data cache .. IMPALA-8542. Add an access trace for the data cache This adds a relatively simple JSON-formatted access trace for the data cache feature. Each partition stores a trace file named 'trace.txt', with each line representing a hit, miss, or store into the cache. The trace is collected using the kudu::AsyncLogger class which handles buffering and deferring the actual IO to a background thread. By default, the full cache key info is written to the trace (including the file paths), but a flag can enable anonymization (128-bit city-hashing) of the file names in case any user would like to capture a trace to be shared publically without divulging their table names. Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c --- M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h 3 files changed, 277 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/2 -- To view, visit http://gerrit.cloudera.org:8080/13425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c Gerrit-Change-Number: 13425 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho