[ https://issues.apache.org/jira/browse/HUDI-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550233#comment-17550233 ]
Hao commented on HUDI-4192: --------------------------- The recurrence steps are as follows: In the HFile of attachment, including indexes of the 3 columns, which are "dtm", "hh", and "dvce_id" sequentially. a simple reproduction code is as follows: {code:java} HoodieHFileReader<GenericRecord> hfileReader = (HoodieHFileReader<GenericRecord>) createReader(new Configuration()); List<String> keyPrefixes = new ArrayList<>(); keyPrefixes.add("YAWXdbh2gWI="); // keyprefix of "dvce_id" keyPrefixes.add("Bkmxu5plBpg="); // keyprefix of "dtm" Iterator<GenericRecord> iterator = hfileReader.getRecordsByKeyPrefixIterator(keyPrefixes); while (iterator.hasNext()) { GenericRecord record = iterator.next(); // which will throw NullPointerException } {code} > HoodieHFileReader scan the cells of Header IndexColumn throw > NullPointerException > --------------------------------------------------------------------------------- > > Key: HUDI-4192 > URL: https://issues.apache.org/jira/browse/HUDI-4192 > Project: Apache Hudi > Issue Type: Bug > Reporter: Hao > Priority: Minor > Fix For: 0.12.0 > > Attachments: col-stats-0097_86-717-560846_20220605111639266001.hfile > > > Assume we index N columns in the MetaTable, such as col_1, col_2... col_n > When executing a query that "{*}selects * from table where col_n = 'xx' and > col1 = 'xx'{*}"". > In the process of scanning the hfiles of MetaTable, there are acually 2 steps: > Firstly, the col_n cells will be scanned in the hfile (mainly to obtain the > minmax), once the scan is completed, the scanner is already at the end of the > file. > Secondly, at this time when the cell of the col_1 is scanned, because the > seekTo is not called in time to back to the file header, it will encounter > the scanner.getCell to report the NullPointerException exception. > > !https://issues.apache.org/vision-file-storage/api/file/download/upload-v2/2022/5/5/h00424960/f2c6fbea023242939c3e804a280cd642/image.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)