[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-08 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 15:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 31: : 32-bit unsigned integer length delimiter
> specify whether this includes the checksum or just the following PB
Done


Line 33: : An optional Crc32 checksum of the entire header
> does this include the magic?
Done


Line 43: : An optional Crc32 checksum of the entire footer
> same - including the footer, magic, and pb len?
Done


Line 211: When checksums for the header, data, and footer are written in the 
CFile
> nit: add comma at end of line
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 223: opts.storage_attributes.compression = compression;
> oh, that's interesting, we forgot to actually use the compression here?
I think so. I noticed it while doing my testing.


Line 336: uint8_t orig = data.mutable_data()[corrupt_offset];
> Nit: could do data.data()[corrupt_offset] here I think (though it may mean 
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 141:   // Parse Footer first to find unsupported features.
> does this mean that an old server will have problems with checksummed heade
It shouldn't because the length doesn't include the checksum.


Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() > 
IncompatibleFeatures::ALL)) {
> '>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL
I put a comment on a similar older review and wasn't sure if I should change 
it. Here was the response:

> Your check is technically more correct since if there are bits in the middle 
> that are not valid but are set, it will check that. I assumed incompatible 
> feature bits would be used sequentially. Especially since we output 
> "supported" as an integer in the log message below.


Line 150: IncompatibleFeatures::ALL));
> instead of ALL, maybe a better name would be 'SUPPORTED'?
Done


Line 239:   if (footer_size >= file_size_ - kMagicAndLengthSize) {
> do we want to have some kind of max size here? am afraid a flipped bit in a
ParseMagicAndLength has a max size check:

   if (len > kMaxHeaderFooterPBSize) {
  return Status::Corruption("invalid data size");
   }


Line 416:   uint32_t checksum_size = sizeof(uint32_t);
> you have this in a few places. Maybe better to just have a global constant 
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 230: incompatible_features = IncompatibleFeatures::CHECKSUM;
> maybe |= here so it doesn't need to change when we add some other feature
Done


Line 500:   // Calculate and append a checksum data checksum.
> typo?
Done


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 50:   const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil 
usage test program.
> nit: kExpectedCrc
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 15:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6630/15/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 31: : 32-bit unsigned integer length delimiter
specify whether this includes the checksum or just the following PB


Line 33: : An optional Crc32 checksum of the entire header
does this include the magic?


Line 43: : An optional Crc32 checksum of the entire footer
same - including the footer, magic, and pb len?


Line 211: When checksums for the header, data, and footer are written in the 
CFile
nit: add comma at end of line


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 223: opts.storage_attributes.compression = compression;
oh, that's interesting, we forgot to actually use the compression here?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 141:   // Parse Footer first to find unsupported features.
does this mean that an old server will have problems with checksummed headers, 
because it will fail while reading the header rather than seeing the 
incompatible feature flag in the footer first?


Line 146:   if (PREDICT_FALSE(footer_->incompatible_features() > 
IncompatibleFeatures::ALL)) {
'>' strikes me as strange here vs doing & ~IncompatibleFeatures::ALL


PS15, Line 150: ALL
instead of ALL, maybe a better name would be 'SUPPORTED'?


Line 239:   if (footer_size >= file_size_ - kMagicAndLengthSize) {
do we want to have some kind of max size here? am afraid a flipped bit in a 
large block would cause a huge stack alloc below.

alernatively we could use faststring for the buffer below, which is still 
stack-allocated for small sizes but avoids blowing out the stack in the case of 
a corruption


PS15, Line 416:   uint32_t checksum_size = sizeof(uint32_t);
you have this in a few places. Maybe better to just have a global constant 
kChecksumSize?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS15, Line 230: =
maybe |= here so it doesn't need to change when we add some other feature


PS15, Line 500: checksum data checksum
typo?


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/util/crc-test.cc
File src/kudu/util/crc-test.cc:

Line 50:   const uint64_t expected_crc = 0xa9421b7; // Known value from crcutil 
usage test program.
nit: kExpectedCrc


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/15//COMMIT_MSG
Commit Message:

Line 7: WIP: KUDU-463. Add checksumming to cfile
Also, this is no longer a WIP, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
> I will add that they are optional. I am not sure if the cflags should go in
That's fair.


http://gerrit.cloudera.org:8080/#/c/6630/15/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 336: data.mutable_data()[corrupt_offset] = corrupt_value;
Nit: could do data.data()[corrupt_offset] here I think (though it may mean 
making 'orig' const uint8_t), since the data isn't yet beint modified.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-07 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#15).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 328 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-07 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
> Can you update this to reflect that the checksums are optional (and maybe d
I will add that they are optional. I am not sure if the cflags should go in 
this document since it describes the format and not the details of the 
reader/writer implementations.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 847: Status s = CorruptAndReadBlock(id, i, 254);
> I had something similar to this at one point but I worried I was getting to
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 438: }
> I understand, but what exactly does it mean for a ReadBlock() call to provi
Since this is a public method I want to validate the input parameters. I don't 
think there is ever a valid scenario where a block would ever have 3 bytes or 
less. However, when hasChecksums is true, this is always the case given the 
checksum is 4 bytes and is included in the size.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 14:

(1 comment)

Looks good, just two other things:
1. What was your conclusion re: bit flipping in the test?
2. Let's continue the comment thread about ReadBlock() when ptr.size() < 4.

http://gerrit.cloudera.org:8080/#/c/6630/14/docs/design-docs/cfile.md
File docs/design-docs/cfile.md:

Line 27: Header
Can you update this to reflect that the checksums are optional (and maybe 
describe the cflags that control them a little)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-04 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#14).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 324 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-04 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> No, my reading skills aren't as good as they should be. Sorry about that.
Done


Line 186: RETURN_NOT_OK_PREPEND(WriteRawData(Slice(checksum_buf, 
sizeof(checksum_buf))),
> Rather than two separate calls to WriteRawData(), could we append the check
Done


Line 494: RETURN_NOT_OK(WriteRawData(data));
> As I was reading through the comments I started getting a similar feeling t
Done


Line 504: RETURN_NOT_OK(WriteRawData(Slice(crc_buf, sizeof(crc_buf;
> Let's also combine the writes here. It's a little trickier since we need to
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-03 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> I think the main difference is that experimental is for things we don't nec
Adar is right, also here's the reference: 
https://github.com/apache/kudu/blob/master/src/kudu/util/flag_tags.h#L21

The main thing about experimental is that you must unlock them first to be able 
to use them. So if you enable something by default... but then mark it 
experimental, it means that users have to go out of their way to disable the 
experimental feature.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> Done. I am not sure I am clear on the difference between experimental and e
I think the main difference is that experimental is for things we don't 
necessarily want to draw attention to, while evolving is for something that we 
want people to use, but retain the right to adjust how it's configured in the 
future.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) 
{
> I only kept it for the sake of the TRACE_EVENT1. Can remove it.
I think every caller already has a TRACE_EVENT() of some kind in it, so we 
don't need to retain this purely for that.


Line 438: }
> This is mainly to prevent data_size -= checksum_size from becoming negative
I understand, but what exactly does it mean for a ReadBlock() call to provide 
ptr.size() < 4? Is that impossible when hasChecksums() is true?


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> I describe the defaults a bit in the patch summary. Think I should add a mo
No, my reading skills aren't as good as they should be. Sorry about that.

But, since we want people to try this feature out, I think evolving is the 
better tag choice. You could check with Todd/Mike for a second opinion.


Line 263:   // Prepend a footer checksum.
> I saw the same thing but don't have a great answer. I can change it if you 
If you can't prove to yourself that NOT using WriteRawData() is more correct, 
then change it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-02 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 244: if (FLAGS_cfile_write_checksums) {
> Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) 
Done


Line 330: source->Size(_size);
> RETURN_NOT_OK
Done


Line 333: source->Read(0, );
> RETURN_NOT_OK
Done


Line 843:   source->Size(_size);
> Should ASSERT_OK on this too.
Done


Line 847: Status s = CorruptAndReadBlock(id, i, 254);
> Just in case one of the file's bytes actually has the value 254, maybe you 
I had something similar to this at one point but I worried I was getting to 
clever in a test. I figured a simple static value that was known to corrupt all 
bytes in the given test would be a clear way to fail fast if something in the 
test changed. I will take a look again now that my corrupt code is more 
straightforward.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
> Should probably default to evolving.
Done. I am not sure I am clear on the difference between experimental and 
evolving.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) 
{
> Since this function is now just a block read, I think it's net more readabl
I only kept it for the sake of the TRACE_EVENT1. Can remove it.


Line 210: if (header.size() != header_size || checksum.size() != 
checksum_size) {
> Do we need this check now that there are no short reads?
Done


Line 215: if (FLAGS_cfile_verify_checksums) {
> If this is false, why bother reading the checksum out of the block at all?
Done


Line 226: RETURN_NOT_OK(block_->Read(off, ));
> Would be cleaner to call ReadV() regardless of whether checksums exist or n
Done


Line 227: if (header.size() != header_size) {
> Do we need this check now that there are no short reads?
Done


Line 273:   // This is done to avoid the need for a follow up read call.
> Just to be clear, if there are no checksums, the four extra bytes we're rea
That is exactly the choice being made. An earlier version of the patch did this 
in 2 reads. However because we know the header alone has enough bytes to 
support the over read, it should be safe to do this. 

The draw back here is that cfiles without checksums read 4 bytes more than 
required.


Line 277:   if (footer.size() != footer_size || checksum.size() != 
checksum_size) {
> No short reads; do we still need this?
Done


Line 283:   // incompatible_features flag tells us if a checksum exists at all.
> What's the point of the footer checksum then? It seems extraordinarily rare
Though rare, I am viewing checksumming as an all or nothing feature.

It's possible the contents of the protobuf could be parsable but incorrect. In 
the case parsing fails, we know its corrupt. In the case parsing succeeds the 
checksum ensures we parsed the right thing.

Given the importance of the fields in the footer I think it's worth being sure.


Line 438: }
> This effectively forbids zero-length block reads, right? Is that an invaria
This is mainly to prevent data_size -= checksum_size from becoming negative and 
resulting in a cryptic error.


Line 461: if (block.size() != data_size || checksum.size() != 
checksum_size) {
> No short reads...
Done


Line 474: }
> Maybe it would even be possible to incorporate the ReadV() too? Basically a
Good idea. The one place that is different is the footer since we 
"preemptively" read checksums data and the checksum comes before the data. This 
could be reused for the header and data blocks though.


Line 476: RETURN_NOT_OK(block_->Read(ptr.offset(), ));
> Let's use ReadV() for both cases, and vary whether we pass a one-slice vect
Done


Line 477: if (block.size() != data_size) {
> No short reads...
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 188:   // Returns true if the file has checksums on the header, footer, 
and data blocks
> Nit: terminate with a period.
Done


Line 189:   bool hasChecksums() const;
> Nit: should be either has_checksums() or HasChecksums()
Done


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

Line 67: TAG_FLAG(cfile_write_checksums, experimental);
> 1. Why default to false?
I describe the defaults a bit in the patch summary. Think I should add a more 
detailed comment here?

Here is a quote from the patch summary:
   
   cfile_write_checksums is defaulted to false to ensure upgrades don't
   immediately result in performance degredation and incompatible data on
   downgrade. It can and likely should be defaulted to true in a 

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS13, Line 466: if (FLAGS_cfile_verify_checksums) {
  :   uint32_t expected_checksum = 
DecodeFixed32(checksum.data());
  :   uint32_t checksum_value = crc::Crc32c(block.data(), 
block.size());
  :   if (PREDICT_FALSE(checksum_value != expected_checksum)) {
  : return Status::Corruption(
  : Substitute("Checksum does not match at offset $0 
size $1: $2 vs $3",
  :ptr.offset(), block.size(), 
checksum_value, expected_checksum));
  :   }
  : }
> I think I've seen this block of code three times; can you factor it out int
Maybe it would even be possible to incorporate the ReadV() too? Basically a 
function that takes as input the data Slice and:
1. Creates a slice vector to pass to ReadV().
2. Checks if checksums are to be verified.
3. If so, creates a checksum buffer+slice and modifies the slice vector.
4. Calls ReadV().
5. If checksums are to be verified, performs the verification.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 13:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 244: if (FLAGS_cfile_write_checksums) {
Can you make TestCFile a friend of CFileReader (see the FRIEND_TEST macro) and 
call HasChecksum() instead? Or make HasChecksum() a public method?


Line 330: source->Size(_size);
RETURN_NOT_OK


Line 333: source->Read(0, );
RETURN_NOT_OK


Line 843:   source->Size(_size);
Should ASSERT_OK on this too.


PS13, Line 847: 254
Just in case one of the file's bytes actually has the value 254, maybe you 
should do something more drastic like flip one of the bits? That way there's 
always a "change" to the byte's value.

Bonus points: if it's not too slow, add another loop which selects which of 8 
bits in the byte to flip.


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS13, Line 56: TAG_FLAG(cfile_verify_checksums, experimental);
Should probably default to evolving.


Line 137: Status CFileReader::ReadMagicAndLength(uint64_t offset, Slice* slice) 
{
Since this function is now just a block read, I think it's net more readable to 
remove it and do the block read inline wherever it was needed.


Line 210: if (header.size() != header_size || checksum.size() != 
checksum_size) {
Do we need this check now that there are no short reads?


Line 215: if (FLAGS_cfile_verify_checksums) {
If this is false, why bother reading the checksum out of the block at all?


Line 226: RETURN_NOT_OK(block_->Read(off, ));
Would be cleaner to call ReadV() regardless of whether checksums exist or not, 
and just pass either a single Slice or two Slices as needed.


Line 227: if (header.size() != header_size) {
Do we need this check now that there are no short reads?


PS13, Line 271:   // Read both the header and checksum in one call.
  :   // We read the checksum position in case one exists.
  :   // This is done to avoid the need for a follow up read call.
Just to be clear, if there are no checksums, the four extra bytes we're reading:
1. Are definitely not checksum data, but,
2. Are guaranteed to exist.

Is that right? And AFAICT, since the footer itself tells us whether the cfile 
has checksums, the only alternative is to do two separate reads, and we think 
reading an extra 4 bytes is the better trade-off, right?


Line 277:   if (footer.size() != footer_size || checksum.size() != 
checksum_size) {
No short reads; do we still need this?


PS13, Line 282:   // This needs to be done before validating the checksum since 
the
  :   // incompatible_features flag tells us if a checksum exists 
at all.
What's the point of the footer checksum then? It seems extraordinarily rare 
that we'll find a footer than can serialize properly but fails its checksum 
test.


PS13, Line 436: if (checksum_size > data_size) {
  :   return Status::Corruption("invalid data size");
  : }
This effectively forbids zero-length block reads, right? Is that an invariant 
that we already enforce?


Line 461: if (block.size() != data_size || checksum.size() != 
checksum_size) {
No short reads...


PS13, Line 466: if (FLAGS_cfile_verify_checksums) {
  :   uint32_t expected_checksum = 
DecodeFixed32(checksum.data());
  :   uint32_t checksum_value = crc::Crc32c(block.data(), 
block.size());
  :   if (PREDICT_FALSE(checksum_value != expected_checksum)) {
  : return Status::Corruption(
  : Substitute("Checksum does not match at offset $0 
size $1: $2 vs $3",
  :ptr.offset(), block.size(), 
checksum_value, expected_checksum));
  :   }
  : }
I think I've seen this block of code three times; can you factor it out into a 
helper function? Something like:

  Status VerifyChecksum(...):
if !FLAGS_cfile_verify_checksums:
  return Status::OK();



Line 476: RETURN_NOT_OK(block_->Read(ptr.offset(), ));
Let's use ReadV() for both cases, and vary whether we pass a one-slice vector 
or a two-slice vector.


Line 477: if (block.size() != data_size) {
No short reads...


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 188:   // Returns true if the file has checksums on the header, footer, 
and data blocks
Nit: terminate with a period.


Line 189:   bool hasChecksums() const;
Nit: should be either has_checksums() or HasChecksums()


http://gerrit.cloudera.org:8080/#/c/6630/13/src/kudu/cfile/cfile_writer.cc
File src/kudu/cfile/cfile_writer.cc:

PS13, Line 65: DEFINE_bool(cfile_write_checksums, false,
 :   

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-05-02 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#13).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 344 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#12).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
18 files changed, 724 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/11/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 375:   Slice*   result;  // Set to the data and length that was read (OUT)
Since there are no short reads, can we omit 'result' altogether? Without short 
reads, ReadV() either fails (in which case 'entries' is totally invalid) or it 
succeeds fully (in which case the results of the read is described fully by 
each 'length' and 'scratch' pair). If the caller needs slices, it can make its 
own.

Alternatively, perhaps we can replace vector with vector? 
Each slice describes both 'length' and 'scratch', and would be used as input 
when constructing the iov internally.

In either case, ReadV() won't modify the vector and so it should be passed by 
const ref to avoid needless copies. An alternative would to be pass it by value 
and move it, but I think the caller still needs access after ReadV(), and it 
would lose access if move were used.

Given the amount of discussion around this interface, I really think this part 
of the patch should be split out into its own gerrit change, partly to ease 
code review and partly so the new git commit can more fully document the design 
trade-offs taken.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#11).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc-test.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
18 files changed, 724 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
> DoWriteV actually changes it entire implementation instead of simulating pw
Sure, works for me.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
> BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If yo
DoWriteV actually changes it entire implementation instead of simulating 
pwritev. I can definitely move this, but it might be best in a follow up patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 262:   gscoped_ptr scratch1(new uint8_t[size1]);
> I mainly just followed the "pattern" from a few lines above. Happy to chang
Yeah, we should be using unique_ptr in new sites. Not sure why I used heap 
allocation above, but that can change too.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 174: // Simulates Linux's preadv API on OS X.
BTW, I just noticed that we simulate pwritev() inline, in DoWriteV(). If you're 
feeling generous perhaps you can pull that out into this area too, so the two 
simulated functions can be defined side by side?


Line 334:   virtual Status ReadV(uint64_t offset, vector results) 
const OVERRIDE {
> It doesn't look like any read or write calls have this now. Since these are
Hmm you're probably right; I didn't realize there weren't any calls in Write 
either. Nevermind then.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 301:   RETURN_NOT_OK(block_->ReadV(off, results));
> I think it would be a little less obtrusive to have ReadResult construct an
My goal was to match the pattern and variables used in the Read function. 
Especially since this is essentially a version of Read for multiple Slices.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 210:   bool has_checksums_;
> This will be padded to 8 bytes, which is actually quite a bit given how man
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

Line 262:   gscoped_ptr scratch1(new uint8_t[size1]);
> Nit: use unique_ptr.
I mainly just followed the "pattern" from a few lines above. Happy to change it.


Line 267:   ReadResult {
> Surprised that this style of struct initialization works; isn't it C99 but 
Will change this.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 162:   // Does not modify 'result' on error (but may modify 'scratch').
> You mean results?
I mean the result field in the results. I will make this more clear.


Line 163:   virtual Status ReadV(uint64_t offset, vector results) 
const = 0;
> nit: Consider making this vector* perhaps, based on my comments
Looking into this. Adar mentioned it too.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h
File src/kudu/fs/fs-test-util.h:

Line 72:   virtual Status ReadV(uint64_t offset, vector results) 
const OVERRIDE {
> This is a header so we retain namespace qualifications (i.e. this should be
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc
File src/kudu/util/crc.cc:

Line 50:   uint64_t crc_tmp =  static_cast(crc32);
> Nit: extra space here.
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32);
> Maybe previous_crc32 to be more clear?
Done


Line 38: uint32_t Crc32c(const void* data, size_t length, uint32_t crc32);
> Maybe name it RollingCrc32()? Also, needs a unit test.
I though it was nice to have the same name as the original function since it 
works in tandem with it. 
ie:
uint32_t crc = Crc32c(data, length);
crc = Crc32c(data, length, crc);


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 371: struct ReadResult {
> Should provide a short high-level struct document too.
Done


Line 403:   virtual Status ReadV(uint64_t offset, std::vector 
results) const = 0;
> I'm finding this to be somewhat confusing.
I agree. I had a hard time naming the struct too. I am not sure ReadEntry is 
perfect either, but it's definitely better. 

Will change and re-organize ReadResult.


Line 536:   // TODO (GH): Document
> Yep. :)
I was waiting until it was decided if I should allow short reads here or not. I 
will move the readfully logic here and document accordingly.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 175: ssize_t preadv(int fd, const struct iovec *iovec, int count, off_t 
offset) {
> Should be const struct iovec*
Done


Line 180:   return read_bytes;
> Nit: if you're going to elide the braces, merge this line into the conditio
Done


Line 182:   break;
> Don't we need to update total_read_bytes with our partial read results befo
Done


Line 334:   virtual Status ReadV(uint64_t offset, vector results) 
const OVERRIDE {
> Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they
It doesn't look like any read or write calls have this now. Since these are 
called frequently is there some overhead to worry about?


Line 341:   return iovec {
> Again, surprised this kind of initialization works in C++11.
Done


Line 353: // Adjust slice sizes based on bytes read for short reads
> As we discussed, let's make short reads an error, for Read() too. Do it as 
Will do.


Line 355: for (ReadResult& readResult : results) {
> Nit: auto is fine for short-scoped types like this one.
Done


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc
File src/kudu/util/env_util.cc:

Line 196:   size_t n = accumulate(results.begin(), results.end(), 
static_cast(0),
> nit: name this req_len ?
I can rename this. Often when in a new codebase I try to match the style that 
exists. Given this is a "mirror" of ReadFully above I used similar names.


Line 207: // Calculate the read amount of data
> Nit: Use punctuation like periods at the end of comment sentences per https

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-27 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS10, Line 277:   uint8_t footer_scratch[footer_size];
  :   Slice footer;
  : 
  :   uint32_t checksum_size = sizeof(uint32_t);
  :   uint8_t checksum_scratch[checksum_size];
  :   Slice checksum;
  : 
  :   // Read both the header and checksum in one call.
  :   // We read the checksum position in case one exists.
  :   // This is done to avoid the need for a follow up read call.
  :   // Read both the data and checksum in one call
  :   vector results = {
  :   ReadResult {
  :   .result = ,
  :   .scratch = checksum_scratch,
  :   .length = checksum_size,
  :   },
  :   ReadResult {
  :   .result = ,
  :   .scratch = footer_scratch,
  :   .length = footer_size,
  :   }
  :   };
  :   uint64_t off = file_size_ - kMagicAndLengthSize - footer_size 
- checksum_size;
  :   RETURN_NOT_OK(block_->ReadV(off, results));
I think it would be a little less obtrusive to have ReadResult construct and 
own the Slice, the scratch buffer (you could use faststring) and the length and 
use a constructor like

  explicit ReadResult(buffer_size)

So then instead of all this you could just say

  uint32_t checksum_size = sizeof(uint32_t);
  vector results = { ReadResult(checksum_size), 
ReadResult(footer_size) };
  uint64_t off = file_size_ - kMagicAndLengthSize - footer_size - checksum_size;
  RETURN_NOT_OK(block_->ReadV(off, ));

And avoid copying the vector.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

PS10, Line 163: vector
nit: Consider making this vector* perhaps, based on my comments 
elsewhere


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

PS10, Line 38: Crc32c
Maybe name it RollingCrc32()? Also, needs a unit test.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.cc
File src/kudu/util/env_util.cc:

PS10, Line 196: n
nit: name this req_len ?


Line 207: // Calculate the read amount of data
Nit: Use punctuation like periods at the end of comment sentences per 
https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-26 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 10:

(21 comments)

I didn't review the checksumming stuff yet, just looked at the rest and the 
plumbing.

Since the ReadV() stuff turned out to be pretty substantial, could you move it 
into its own commit? Would be nice to finish reviewing it separately.

http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 210:   bool has_checksums_;
This will be padded to 8 bytes, which is actually quite a bit given how many 
CFileReaders we instantiate. Perhaps we can calculate this on the fly instead, 
using a mask on footer_'s features?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager-test.cc
File src/kudu/fs/block_manager-test.cc:

PS10, Line 262: gscoped_ptr
Nit: use unique_ptr.

Actually, these scratch spaces are tiny; can you just allocate them on the 
stack?


Line 267:   ReadResult {
Surprised that this style of struct initialization works; isn't it C99 but not 
C++11 compatible?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/block_manager.h
File src/kudu/fs/block_manager.h:

Line 38: class Env;
No need for this anymore.

Though if you decide to pass the vector by pointer as I suggested in env.h, you 
can omit the env.h include, forward declare ReadResult, and continue to forward 
declare Env.


PS10, Line 162: result
You mean results?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/fs/fs-test-util.h
File src/kudu/fs/fs-test-util.h:

PS10, Line 72: vector
This is a header so we retain namespace qualifications (i.e. this should be 
std::vector).


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.cc
File src/kudu/util/crc.cc:

PS10, Line 50:  
Nit: extra space here.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/crc.h
File src/kudu/util/crc.h:

PS10, Line 38: crc32
Maybe previous_crc32 to be more clear?


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 371: struct ReadResult {
Should provide a short high-level struct document too.


PS10, Line 403: results
I'm finding this to be somewhat confusing.

Part of the issue is with the name ReadResult: the majority of stuff in it 
aren't "results"; they're actually input to the function. Maybe ReadEntry or 
ReadVEntry would be better?

Part of the issue is that the Slice* means an extra level of indirection in the 
setup that callers have to do. Perhaps they can be regular Slices, that are 
default-initialized by the caller and set by the callee? That means 'results' 
should be passed by pointer, but I think that's a clearer model anyway, since 
we pass by pointer things that the function is expected to modify.

Also, could you reorganize ReadResult so that the pure IN parameters precede 
IN/OUT and OUT? Meaning, the order should probably be length, scratch, then 
result.


Line 536:   // TODO (GH): Document
Yep. :)


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS10, Line 175: const struct iovec *
Should be const struct iovec*


Line 180:   return read_bytes;
Nit: if you're going to elide the braces, merge this line into the conditional 
line. This makes it harder to accidentally mess up the conditional if another 
line is added.


Line 182:   break;
Don't we need to update total_read_bytes with our partial read results before 
breaking?


Line 334:   virtual Status ReadV(uint64_t offset, vector results) 
const OVERRIDE {
Could you add TRACE_EVENT1() calls to ReadV() and Read()? Not sure why they 
were missing in the first place.


PS10, Line 340: ReadResult
Can be cref?


Line 341:   return iovec {
Again, surprised this kind of initialization works in C++11.


PS10, Line 347: results.size()
arraylength(iov) is more idiomatic.


Line 353: // Adjust slice sizes based on bytes read for short reads
As we discussed, let's make short reads an error, for Read() too. Do it as a 
separate patch and layer this one on top of it.


PS10, Line 355: ReadResult
Nit: auto is fine for short-scoped types like this one.


http://gerrit.cloudera.org:8080/#/c/6630/10/src/kudu/util/env_util.h
File src/kudu/util/env_util.h:

Line 66: Status ReadVFully(RandomAccessFile* file, uint64_t offset, 
std::vector results);
If you end up solving the partial read stuff directly in RandomAccessFile, this 
won't be necessary.

But if it is, please also doc it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 

[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-26 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#10).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
19 files changed, 792 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-25 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#9).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
19 files changed, 828 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-25 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#8).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
18 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-25 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#7).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Method documentation
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
18 files changed, 804 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 446: data_size -= checksum_size;
> careful of underflow
Done


Line 472: if (block.size() != data_size && checksum.size() != 
checksum_size) {
> ||
Done


http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 21: #include 
> Would prefer not to leak this sort of platform-specific thing outside of en
I agree. This was just a rough patch to see if the approach and use of preadv 
made sense before polishing.  See my comment below as well.


Line 390:   virtual Status ReadV(uint64_t offset, Slice** results,
> Would std::vector* be more intuitive for 'results'? That is, ReadV()
I agree that a vector with all the combined parameters would be better in the 
end. Something that has slice, scratch, and length so we don't need to expose 
iovec. I planned to do this, I just posted this super rough version to see if 
the approach and use of preadv made sense before spending the time polishing, 
testing, etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/util/env.h
File src/kudu/util/env.h:

Line 21: #include 
Would prefer not to leak this sort of platform-specific thing outside of 
env_posix.cc. An iovec is a pretty simple structure; perhaps you can define a 
Kudu version and adapt it to/from an iovec inside env_posix.cc?


Line 390:   virtual Status ReadV(uint64_t offset, Slice** results,
Would std::vector* be more intuitive for 'results'? That is, ReadV() 
takes a caller-provided vector of Slices and replaces its contents with a 
couple Slices of its own creation?

Might be interesting to combine 'scratches' and 'results' into the same 
structure, since I imagine the length of the two must be the same.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 6:

(2 comments)

Just did a quick look at the approach using readv, it seems like a good 
direction to me. I didn't review the new header/footer checksum stuff yet.

It may be worth checking for Adar's opinion on the API in Env, he tends to have 
opinions on that stuff. The current one doesn't seem super idiomatic to me but 
also don't really have a better suggestion :)

http://gerrit.cloudera.org:8080/#/c/6630/6/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 446: data_size -= checksum_size;
careful of underflow


PS6, Line 472: &&
||


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-20 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#6).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/fs/log_block_manager.cc
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.h
M src/kudu/util/file_cache.cc
17 files changed, 539 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-19 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 819:   unique_ptr corrupt_source(
> I will look into this. I think ideally I would like to corrupt each byte in
yea, you probably need to re-open the file and clear the block cache in between


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 363:   uint32_t data_size = ptr.size();
> ptr.size() returns uint32_t, so I kept it the same for consistency.
yea, though the style guide says to use signed quantities anywhere that you 
aren't doing bitwise ops (that way overflow will be more obvious since it will 
go to -1 instead of a very large number)


Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), 
, checksum_scratch));
> I didn't want to cache the checksum since that seamed to complicate cache r
hm, possibly, but am worried that doubling the syscalls is expensive


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-19 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#5).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M docs/design-docs/cfile.md
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
9 files changed, 346 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-18 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG
Commit Message:

Line 19: immeditaly result in performance degredation and incompatible data on
> typo
Done


Line 25: are not writen the incompatible_features "checksum" bit is not set.
> typo
Done


Line 27: cfile_verify_checksums is used in the CFileReader to enabled 
validating the
> typo
Done


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

Line 790:   TestReadWriteRawBlocks(SNAPPY, 1000);
> maybe this could be collapsed into a for loop?
Done


Line 819:   unique_ptr corrupt_source(
> would it be simpler to just corrupt the data on disk instead of this code-i
I will look into this. I think ideally I would like to corrupt each byte in a 
sample file and show it results in a corrupt/io error. I was planning to use 
this to do that. However I ran into some issues with the caching of index 
blocks. So regardless a slightly different approach than what is here is needed.


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 157:   if (PREDICT_FALSE(footer_->incompatible_features() > 
IncompatibleFeatures::ALL)) {
> this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)?
Your check is technically more correct since if there are bits in the middle 
that are not valid but are set, it will check that. I assumed incompatible 
feature bits would be used sequentially. Especially since we output "supported" 
as an integer in the log message below.


Line 363:   uint32_t data_size = ptr.size();
> best to use a signed int here
ptr.size() returns uint32_t, so I kept it the same for consistency.


Line 366: data_size = ptr.size() - sizeof(uint32_t);
> we should verify that ptr.size() >= sizeof(uint32_t) first and return a Cor
Done


Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), 
, checksum_scratch));
> I think perf wise it may be better to just allocate the extra 4 bytes in th
I didn't want to cache the checksum since that seamed to complicate cache reads.


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-18 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#4).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block, header and footer

cfile_write_checksums is defaulted to false to ensure upgrades don't
immediately result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not written the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enable validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- Update cfile docs
- Finish testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
M src/kudu/util/crc.cc
M src/kudu/util/crc.h
8 files changed, 321 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-17 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6630/3//COMMIT_MSG
Commit Message:

PS3, Line 19: immeditaly
typo


PS3, Line 25: writen
typo


PS3, Line 27: enabled
typo


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile-test.cc
File src/kudu/cfile/cfile-test.cc:

PS3, Line 772:   FLAGS_cfile_write_checksums = true;
 :   FLAGS_cfile_verify_checksums = true;
 :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
 :   TestReadWriteRawBlocks(SNAPPY, 1000);
 : 
 :   FLAGS_cfile_write_checksums = true;
 :   FLAGS_cfile_verify_checksums = false;
 :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
 :   TestReadWriteRawBlocks(SNAPPY, 1000);
 : 
 :   FLAGS_cfile_write_checksums = false;
 :   FLAGS_cfile_verify_checksums = true;
 :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
 :   TestReadWriteRawBlocks(SNAPPY, 1000);
 : 
 :   FLAGS_cfile_write_checksums = false;
 :   FLAGS_cfile_verify_checksums = false;
 :   TestReadWriteRawBlocks(NO_COMPRESSION, 1000);
 :   TestReadWriteRawBlocks(SNAPPY, 1000);
maybe this could be collapsed into a for loop?

for (FLAGS_cfile_write_checksum : {false, true}) {
  for (FLAGS_cfile_verify_checksum : {false, true}) {
...
  }
}


PS3, Line 819: CorruptReadableBlock
would it be simpler to just corrupt the data on disk instead of this 
code-injection-based approach? A hook in the fs manager code to corrupt a block 
on disk is probably more generally reusable in higher-level integration tests 
as well (where injecting this CorruptReadableBlock class may be a bit tricky).

I think that would also enable making this code quite a bit simpler by reusing 
existing ReadFile/WriteFile functions


http://gerrit.cloudera.org:8080/#/c/6630/3/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

PS3, Line 157: > IncompatibleFeatures::ALL
this strikes me as odd... don't you mean & (~IncompatibleFeatures::ALL)?


PS3, Line 363: uint32_t
best to use a signed int here


Line 366: data_size = ptr.size() - sizeof(uint32_t);
we should verify that ptr.size() >= sizeof(uint32_t) first and return a 
Corruption otherwise.


Line 392: RETURN_NOT_OK(block_->Read(checksum_offset, sizeof(uint32_t), 
, checksum_scratch));
I think perf wise it may be better to just allocate the extra 4 bytes in the 
block cache so that we can read the checksum in the same IO vs issuing the 
extra syscall here. Plus the code is likely to be a little simpler?


-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-14 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 3:

I need to handle the header and footer yet, which may require a change in how 
to handle backwards compatibility.

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-14 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#3).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/fs/fs-test-util.h
5 files changed, 195 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-13 Thread Grant Henke (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6630

to look at the new patch set (#2).

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
4 files changed, 102 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-13 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change.

Change subject: WIP: KUDU-463. Add checksumming to cfile
..


Patch Set 1:

This is just a quick push to sanity check my approach.

-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] WIP: KUDU-463. Add checksumming to cfile

2017-04-13 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6630

Change subject: WIP: KUDU-463. Add checksumming to cfile
..

WIP: KUDU-463. Add checksumming to cfile

Adds optional checksumming and validation to cfile blocks.

Introduces 2 flags to control cfile checksumming:
- cfile_write_checksums (default false)
- cfile_verify_checksums (default true)

cfile_write_checksums is used in the CFileWriter to enable computing and
appending Crc32 checksums to the end of each cfile block.

cfile_write_checksums is defaulted to false to ensure upgrades don't
immeditaly result in performance degredation and incompatible data on
downgrade. It can and likely should be defaulted to true in a later release.

When cfile_write_checksums is set to true, the existing incompatible_features
bitset in the cfile footer is used. A "checksum" bit is set to ensure a clear
error message when older versions of Kudu try to read the file. If checksums
are not writen the incompatible_features "checksum" bit is not set.

cfile_verify_checksums is used in the CFileReader to enabled validating the
data against the written checksum. cfile_verify_checksums is defaulted to
true since validation only occurs if checksums are written. Any data that
was written before checksumming was an option or when cfile_write_checksums
was false will not be verified.

TODO:
- More unit and perf testing
- Change cfile block to page to avoid term overload?

Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
---
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_util.h
M src/kudu/cfile/cfile_writer.cc
4 files changed, 103 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/6630/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6756834cd7f27af258797a3654a95244abeb0976
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke