yordan-pavlov commented on issue #1111:
URL: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003446768


   I have been able to reproduce the issue where `RleDecoder` returns more keys 
than values (as explained by @tustvold  above) by adding a test very similar to 
the existing `test_arrow_array_reader_string` but using dictionary encoding 
instead of plain. Next, I will looking for a short-term fix.
   
   Here is some sample output from the test:
   
   running 1 test
   page num_values: 100, values.len(): 33
   page num_values: 100, values.len(): 38
   VariableLenPlainDecoder::new, num_values: 9
   
   ---------- reading a batch of 50 values ----------
   VariableLenDictionaryDecoder::new, num_values: 100
   VariableLenDictionaryDecoder::read_value_bytes - begin, self.num_values: 
100, num_values: 14
   VariableLenDictionaryDecoder::read_value_bytes - end, values_read: 14, 
self.num_values: 86
   **// ok so far, 33 actual values - 14 values read = 19 values still left in 
first page**
   
   ---------- reading a batch of 100 values ----------
   VariableLenPlainDecoder::new, num_values: 10
   VariableLenDictionaryDecoder::new, num_values: 100
   VariableLenDictionaryDecoder::read_value_bytes - begin, self.num_values: 86, 
num_values: 37
   VariableLenDictionaryDecoder::read_value_bytes - end, values_read: 26, 
self.num_values: 0
   **// this is a problem - only 19 values were left in the first page, but 26 
values have been read**
   
   VariableLenDictionaryDecoder::read_value_bytes - begin, self.num_values: 0, 
num_values: 11
   VariableLenDictionaryDecoder::read_value_bytes - end, values_read: 0, 
self.num_values: 0
   VariableLenDictionaryDecoder::read_value_bytes - begin, self.num_values: 
100, num_values: 11
   VariableLenDictionaryDecoder::read_value_bytes - end, values_read: 11, 
self.num_values: 89
   thread 
'arrow::arrow_array_reader::tests::test_arrow_array_reader_dict_string' 
panicked at 'assertion failed: `(left == right)`
     left: `"H"`,
    right: `"He"`', parquet\src\arrow\arrow_array_reader.rs:1745:17
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to