[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-17 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Reviewed-on: http://gerrit.cloudera.org:8080/13425
Tested-by: Impala Public Jenkins 
Reviewed-by: Michael Ho 
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 297 insertions(+), 4 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 18 Jun 2019 06:05:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 15 Jun 2019 03:57:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3632/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 23:08:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4480/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 22:26:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Todd Lipcon (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13425

to look at the new patch set (#7).

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 297 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/7
--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4475/


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 19:38:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4475/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 16:16:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 16:16:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98
PS3, Line 98:  "impala-cache-file-";
> it currently creates the trace file with open() flags to truncate the exist
Yes, makes sense. Just wanna make sure those files won't accidentally pile up 
over time.



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 04:59:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3624/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 01:24:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/3625/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 01:17:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Todd Lipcon (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13425

to look at the new patch set (#4).

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 296 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/4
--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 5:

r4 fixes the comments, r5 is a rebase (had a trivial conflict to deal with from 
the mtime fix adding some DCHECKs in the key encoding path)


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 00:39:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Todd Lipcon (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13425

to look at the new patch set (#5).

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 296 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/5
--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656
PS2, Line 656: // Limit the write concurrency to avoid blocking the caller 
(which could be calling
> I think we usually go with "start the arguments on a new line indented by f
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@870
PS2, Line 870: ");
> We usually use all-caps for constants
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874
PS2, Line 874: case STORE: jw.String("S"); break;
> see above, google style guide disagrees
Done


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88
PS3, Line 88: "(Advanced) Collect a trace of all lookups in the 
data cache.");
> nit: the indent here and other places in this change seems to be different
Done


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98
PS3, Line 98: impala-cache-trace.txt
> The data cache by default removes all files with names containing prefix CA
it currently creates the trace file with open() flags to truncate the existing 
file if there is one, so it's essentially the same behavior. If a user wants to 
archive them before restarting, I think it's up to them. Is that what you were 
asking?


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116
PS3, Line 116: if (file_) Flush();
> Does it make sense to also call file_->Close() here or does the d'tor take
dtor handles it


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148
PS3, Line 148:
> nit: unnecessary blank line.
Done


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186
PS3, Line 186:   unique_ptr underlying_logger_;
 :   unique_ptr logger_;
> Would be nice to briefly document these variables.
Done


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498
PS3, Line 498:   if (FLAGS_data_cache_enable_tracing) {
 : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME));
 : RETURN_IF_ERROR(tracer_->Start());
 :   }
> I wonder how this may affect the available storage space allocated for the
yea, I didn't want to add the complexity of accounting for the trace file, 
since as you said, this is meant for experimentation.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887
PS3, Line 887: char buf[BUF_LEN];
> nit: same name as 'buf' declared outside of the scope. While functionally c
Done



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 14 Jun 2019 00:30:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 3:

(8 comments)

LGTM. Mostly nits.

http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13425/3//COMMIT_MSG@10
PS3, Line 10: trace.txt
impala-cache-trace.txt ?


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@88
PS3, Line 88: "(Advanced) Collect a trace of all lookups in the 
data cache.");
nit: the indent here and other places in this change seems to be different from 
the existing code. We usually indent 4 for line continuation. Seems more 
consistent to follow the indent 4 convention.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@98
PS3, Line 98: impala-cache-trace.txt
The data cache by default removes all files with names containing prefix 
CACHE_FILE_PREFIX at startup. I suppose we want to keep the trace files across 
restart to allow more flexibility for experiment, right ? In other words, it's 
up to users to remove the trace files.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@116
PS3, Line 116: if (file_) Flush();
Does it make sense to also call file_->Close() here or does the d'tor take care 
of it ?


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@148
PS3, Line 148:
nit: unnecessary blank line.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@186
PS3, Line 186:   unique_ptr underlying_logger_;
 :   unique_ptr logger_;
Would be nice to briefly document these variables.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@498
PS3, Line 498:   if (FLAGS_data_cache_enable_tracing) {
 : tracer_.reset(new Tracer(path_ + "/" + TRACE_FILE_NAME));
 : RETURN_IF_ERROR(tracer_->Start());
 :   }
I wonder how this may affect the available storage space allocated for the data 
cache ? The traces are relatively small compared to the cached data but I 
wonder if a long running Impala process may accumulate a somewhat large trace 
file.

On the other hand, tracing is mostly for experimentation so it's not a severe 
concern for production environment.


http://gerrit.cloudera.org:8080/#/c/13425/3/be/src/runtime/io/data-cache.cc@887
PS3, Line 887: char buf[BUF_LEN];
nit: same name as 'buf' declared outside of the scope. While functionally 
correct, it'd be less confusing to have no name collision.



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 03 Jun 2019 22:37:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 3: Code-Review+1

(1 comment)

LGTM, I'd be curious how others feel about the indentation, but I don't intend 
to hold the change up.

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656
PS2, Line 656: // Limit the write concurrency to avoid blocking the caller 
(which could be calling
> Enclosed in {} but the style guide says to align the parameters on the seco
I think we usually go with "start the arguments on a new line indented by four 
spaces and continue at that 4 space indent." throughout the codebase, though 
historically we've also started at the same line as the function call and then 
continued with 4 spaces, e.g. in L227, L510, and others.


I'm personally used to our style and therefore am in favor of consistency over 
guidelines 
(https://google.github.io/styleguide/cppguide.html#Existing_Non-conformant_Code
).



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 03 Jun 2019 20:20:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3484/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 03 Jun 2019 19:18:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Todd Lipcon (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13425

to look at the new patch set (#3).

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 293 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/3
--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-06-03 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@504
PS2, Line 504:  expect_misses);
> nit: indent 4 spaces here and below
see above


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h@269
PS2, Line 269: /// The file name used for the access trace.
> nit: newline before this one
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@88
PS2, Line 88: c
> nit: Capitalize first word
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@97
PS2, Line 97: const char* DataCache::Partition::TRACE_FILE_NAME = 
"impala-cache-trace.txt";
> Should we include the partition name in the trace to make it easier to copy
I think concatenation should be fine, or if we ask users to collect them, we 
would provide a script that can do the right thing (eg using tar to preserve 
the original paths).


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@159
PS2, Line 159:   explicit Tracer(string path)
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@165
PS2, Line 165:   logger_->Stop();
> nit: single line
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@413
PS2, Line 413: return UNALIGNED_LOAD64(key_.data() + key_.size() - 
sizeof(int64_t) * 2);
> Can you add a comment to point out that these hard-code the key layout? Sho
Rearranged the key and added some constants to make this clearer. I didn't end 
up adding the extra name_len field since it seemed a bit wasteful of memory and 
not much gain.


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@544
PS2, Line 544: if (tracer_) tracer_->Trace(Tracer::MISS, cache_key, 
bytes_to_read, /*entry_len=*/-1);
> nit: the rest of the file uses explicit pointer comparison (see above), sho
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656
PS2, Line 656:   if (tracer_) tracer_->Trace(Tracer::STORE_FAILED_BUSY, 
cache_key,
> style: Enclose multiple lines in {}, indent 4 spaces
Enclosed in {} but the style guide says to align the parameters on the second 
line with the parameters on the first line:
https://google.github.io/styleguide/cppguide.html#Function_Calls :

  bool result = DoSomething(averyveryveryverylongargument1,
argument2, argument3);


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@860
PS2, Line 860: S!
> Would it make automatic parsing easier if this was a single character, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@871
PS2, Line 871: C
> Can this be a static assert?
I don't think CalcualteBase64EscapedLen is constexpr (it's defined in the .cc, 
not the header)


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874
PS2, Line 874:sizeof(hash), buf, buf_len);
> nit: indent 4 spaces
see above, google style guide disagrees


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@903
PS2, Line 903:
> nit: extra empty line
Done



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 03 Jun 2019 18:33:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-05-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 2:

(15 comments)

Had a question about the naming of the trace file and otherwise just a bunch of 
nits.

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@492
PS2, Line 492: SCOPED_TRACE
TIL about SCOPED_TRACE :)


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache-test.cc@504
PS2, Line 504:  expect_misses);
nit: indent 4 spaces here and below


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.h@269
PS2, Line 269: /// The file name used for the access trace.
nit: newline before this one


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@88
PS2, Line 88: c
nit: Capitalize first word


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@97
PS2, Line 97: const char* DataCache::Partition::TRACE_FILE_NAME = 
"impala-cache-trace.txt";
Should we include the partition name in the trace to make it easier to copy 
them to a common directory, i.e. when we ask users to collect them? Or do you 
envision that they concatenate them instead?


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@159
PS2, Line 159:   explicit Tracer(string path)
nit: single line


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@165
PS2, Line 165:   logger_->Stop();
nit: single line


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@413
PS2, Line 413: return UNALIGNED_LOAD64(key_.data() + key_.size() - 
sizeof(int64_t) * 2);
Can you add a comment to point out that these hard-code the key layout? Should 
we switch to an aligned struct (mtime, offset, name_len, filename) instead?


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@544
PS2, Line 544: if (tracer_) tracer_->Trace(Tracer::MISS, cache_key, 
bytes_to_read, /*entry_len=*/-1);
nit: the rest of the file uses explicit pointer comparison (see above), should 
we keep it consistent?


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@656
PS2, Line 656:   if (tracer_) tracer_->Trace(Tracer::STORE_FAILED_BUSY, 
cache_key,
style: Enclose multiple lines in {}, indent 4 spaces


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@860
PS2, Line 860: S!
Would it make automatic parsing easier if this was a single character, e.g. F?


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@870
PS2, Line 870: buf_len
We usually use all-caps for constants


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@871
PS2, Line 871: C
Can this be a static assert?

nit: missing space


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@874
PS2, Line 874:sizeof(hash), buf, buf_len);
nit: indent 4 spaces


http://gerrit.cloudera.org:8080/#/c/13425/2/be/src/runtime/io/data-cache.cc@903
PS2, Line 903:
nit: extra empty line



--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 31 May 2019 02:37:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-05-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13425 )

Change subject: IMPALA-8542. Add an access trace for the data cache
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/3414/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 28 May 2019 20:32:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8542. Add an access trace for the data cache

2019-05-28 Thread Todd Lipcon (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13425

to look at the new patch set (#2).

Change subject: IMPALA-8542. Add an access trace for the data cache
..

IMPALA-8542. Add an access trace for the data cache

This adds a relatively simple JSON-formatted access trace for the data
cache feature. Each partition stores a trace file named 'trace.txt',
with each line representing a hit, miss, or store into the cache.

The trace is collected using the kudu::AsyncLogger class which handles
buffering and deferring the actual IO to a background thread.

By default, the full cache key info is written to the trace (including
the file paths), but a flag can enable anonymization (128-bit
city-hashing) of the file names in case any user would like to capture a
trace to be shared publically without divulging their table names.

Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
3 files changed, 277 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/13425/2
--
To view, visit http://gerrit.cloudera.org:8080/13425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2302c19abb5db19f1d3d1cd727a82977a9e2ba9c
Gerrit-Change-Number: 13425
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho