[ 
https://issues.apache.org/jira/browse/HADOOP-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HADOOP-16836:
---------------------------
    Description: 
{code:java}
org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryMixedLengths1
org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryUnknownLength
org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryMixedLengths1
org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryUnknownLength{code}
 

4 actively-used tests above call the helper function 
`TestTFileStreams#writeRecords()` to write key-value pairs (kv pairs), then 
call `TestTFileByteArrays#readRecords()` to assert the key and the value part 
(v) of these kv pairs matched with what they wrote. All v of kv pairs are 
hardcode strings with a length of 6.

 

`readRecords()` uses 
`org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValueLength()` 
to get full length of the v of these kv pairs.  But `getValueLength()` can only 
get the full length of v when v's full length is less than the value of 
configuration parameter `tfile.io.chunk.size`, otherwise `readRecords()` will 
throw an exception. So, when `tfile.io.chunk.size` is configured/set to a value 
less than 6, these 4 tests failed because of the exception from 
`readRecords()`, even 6 is a valid value for `tfile.io.chunk.size`.

The definition of `tfile.io.chunk.size` is "Value chunk size in bytes. Default 
to 1MB. Values of the length less than the chunk size is guaranteed to have 
known value length in read time (See also 
TFile.Reader.Scanner.Entry.isValueLengthKnown())". 

*Fixes*

`readRecords()` should call 
`org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValue(byte[])` 
instead, which returns the correct full length of the `value` part despite 
whether the value's length is larger than `tfile.io.chunk.size`.

  was:
Test helper function 
`org.apache.hadoop.io.file.tfile.TestTFileByteArrays#readRecords(org.apache.hadoop.fs.FileSystem,
 org.apache.hadoop.fs.Path, int, org.apache.hadoop.conf.Configuration)` 
(abbreviate as `readRecords()` below) are called in 4 actively-used tests below:

 
{code:java}
org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryMixedLengths1
org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryUnknownLength
org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryMixedLengths1
org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryUnknownLength{code}
 

These tests first call 
`org.apache.hadoop.io.file.tfile.TestTFileStreams#writeRecords(int count, 
boolean knownKeyLength, boolean knownValueLength, boolean close)` to write 
`key-value` pair records in a `TFile` object, then call the helper function 
`readRecords()` to assert the `key` part and the `value` part of `key-value` 
pair records stored matched with what they wrote perviously. The `value` parts 
of `key-value` pairs from these tests are hardcode strings with a length of 6.

Assertions in `readRecords()` are directly related to the value of the 
configuration parameter `tfile.io.chunk.size`. The formal definition of 
`tfile.io.chunk.size` is "Value chunk size in bytes. Default to 1MB. Values of 
the length less than the chunk size is guaranteed to have known value length in 
read time (See also TFile.Reader.Scanner.Entry.isValueLengthKnown())".

When `tfile.io.chunk.size` is configured to a value less than the length of the 
`value` part of the `key-value` pairs from these 4 tests, these tests will 
fail, even though the configured value for `tfile.io.chunk.size` is correct in 
semantic.

 

*Consequence*

At least 4 actively-used tests failed on correctly configured parameters. Tests 
used `readRecords()` could fail if the length of the hardcoded `value` part 
they tested is larger than the configured value of `tfile.io.chunk.size`. This 
caused build failure of Hadoop-Common if these tests are not skipped.

 

*Root Cause*

`readRecords()` used 
`org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValueLength()` 
(abbreviate as `getValueLength()` below) to get the full length of the `value` 
part in the `key-value` pair. But `getValueLength()` can only get the full 
length of the `value` part when the full length is less than 
`tfile.io.chunk.size`, otherwise, `getValueLength()` throws an exception, 
causing `readRecords()` to fail, and thus resulting in failures in the 
aforementioned 4 tests. This is because `getValueLength()` do not know the full 
length of the `value` part when `value` part's size is larger than 
`tfile.io.chunk.size`.

 

*Fixes*

`readRecords()` should instead call 
`org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValue(byte[])` 
(abbreviate as `getValue()` below), which returns the correct full length of 
the `value` part despite whether the `value` length is larger than 
`tfile.io.chunk.size`.


> Bug in widely-used helper function caused valid configuration value to fail 
> on multiple tests, causing build failure
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16836
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.3.0, 3.2.1
>            Reporter: Ctest
>            Priority: Blocker
>              Labels: configuration, easyfix, patch, test
>         Attachments: HADOOP-16836-000.patch, HADOOP-16836-000.patch
>
>
> {code:java}
> org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryMixedLengths1
> org.apache.hadoop.io.file.tfile.TestTFileStreams#testOneEntryUnknownLength
> org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryMixedLengths1
> org.apache.hadoop.io.file.tfile.TestTFileLzoCodecsStreams#testOneEntryUnknownLength{code}
>  
> 4 actively-used tests above call the helper function 
> `TestTFileStreams#writeRecords()` to write key-value pairs (kv pairs), then 
> call `TestTFileByteArrays#readRecords()` to assert the key and the value part 
> (v) of these kv pairs matched with what they wrote. All v of kv pairs are 
> hardcode strings with a length of 6.
>  
> `readRecords()` uses 
> `org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValueLength()` 
> to get full length of the v of these kv pairs.  But `getValueLength()` can 
> only get the full length of v when v's full length is less than the value of 
> configuration parameter `tfile.io.chunk.size`, otherwise `readRecords()` will 
> throw an exception. So, when `tfile.io.chunk.size` is configured/set to a 
> value less than 6, these 4 tests failed because of the exception from 
> `readRecords()`, even 6 is a valid value for `tfile.io.chunk.size`.
> The definition of `tfile.io.chunk.size` is "Value chunk size in bytes. 
> Default to 1MB. Values of the length less than the chunk size is guaranteed 
> to have known value length in read time (See also 
> TFile.Reader.Scanner.Entry.isValueLengthKnown())". 
> *Fixes*
> `readRecords()` should call 
> `org.apache.hadoop.io.file.tfile.TFile.Reader.Scanner.Entry#getValue(byte[])` 
> instead, which returns the correct full length of the `value` part despite 
> whether the value's length is larger than `tfile.io.chunk.size`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to