[ https://issues.apache.org/jira/browse/ORC-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated ORC-144: ------------------------------ Affects Version/s: 1.4.0 > PATCHED BASE Documentation Issues > --------------------------------- > > Key: ORC-144 > URL: https://issues.apache.org/jira/browse/ORC-144 > Project: Orc > Issue Type: Bug > Components: documentation > Affects Versions: 1.4.0 > Reporter: Douglas Drinka > Assignee: Douglas Drinka > Priority: Minor > > The documentation for Patched Base encoding has two issues. > First is a repeat of "Data values (W * L bits padded to the byte)..." in the > data field description. > Second is in the example given. The sample data for all the other encoding > formats actually trigger their encoder based on the logic in the java code. > However this example sequence is too short to trigger both the 90% cutoff for > non-rebased data (1.0-.9)*10 = 0.99999999999999978 which floors to 0, and the > 95% cutoff of rebased data. At least 20 values are needed for a single patch > to occur. > I propose the following sequence: > [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070, 2080, 2090, 2100, 2110, > 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190] > Which encodes to [0x8e, 0x13, 0x2b, 0x21, 0x07, 0xd0, 0x1e, 0x00, 0x14, 0x70, > 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e, 0x78, 0x82, 0x8c, 0x96, 0xa0, > 0xaa, 0xb4, 0xbe, 0xfc, 0xe8] > Then in the description the wording should be "a length of 20 (19)". > These samples were critical for me to verify my code, and I appreciated them > being provided, particularly since I didn't find any unit tests available in > the java code to directly compare byte outputs of the encoders. -- This message was sent by Atlassian JIRA (v6.3.15#6346)