Hey, I'm currently working on IMPALA-5412 <https://issues.apache.org/jira/browse/IMPALA-5412> where I enhanced FileDescMap to have a key of (partition_id, filename) pair instead of a single filename. With this I have to extend the HdfsScanNode::GetFileDesc to expect the partition_id as well. In turn one of the functions calling this GetFileDesc is HdfsScanNode::AllocateScanRange already has a partition_id as an input parameter so it seemed obvious to pass that to GetFileDesc. Unfortunately, after some debugging of the impala code that crashed during the "Loading Kudu TPCH" phase of testing I observed that this AllocateScanRange is sometimes given '-1' as a partition_id. (in one occasion it is given a scalar_reader->col_idx() as partition_id.) As a result no matching file descriptor is found by GetFileDesc where a DCHECK fails.
I naively modified how AllocateScanRange is called to receive valid partition IDs instead of '-1' and col_idx(). This fixes the issues I encountered and mentioned above, however I'm not sure that this change doesn't break some other logic. (Core tests are being run, but even if they pass I'm not sure this change is fine) Was it intentional to give those non-valid partition_id's to AllocateScanRange? Do you see any risk of passing valid IDs instead? Cheers, Gabor