Based on the comment of AllocateScanRange(), it seems like in some cases
the partition id is ignored so a dummy value can safely be passed in. I'm
honestly not sure why it was optional in those cases - it doesn't seem like
it would be hard plumb through the valid partition ID.

So the change seems reasonable to me.


On Tue, Aug 8, 2017 at 9:10 AM, Gabor Kaszab <gaborkas...@cloudera.com>
wrote:

> Hey,
>
> I'm currently working on IMPALA-5412
> <https://issues.apache.org/jira/browse/IMPALA-5412> where I enhanced
> FileDescMap to have a key of (partition_id, filename) pair instead of a
> single filename.
> With this I have to extend the HdfsScanNode::GetFileDesc to expect the
> partition_id as well. In turn one of the functions calling this GetFileDesc
> is HdfsScanNode::AllocateScanRange already has a partition_id as an input
> parameter so it seemed obvious to pass that to GetFileDesc.
> Unfortunately, after some debugging of the impala code that crashed during
> the "Loading Kudu TPCH" phase of testing I observed that this
> AllocateScanRange is sometimes given '-1' as a partition_id. (in one
> occasion it is given a scalar_reader->col_idx() as partition_id.)
> As a result no matching file descriptor is found by GetFileDesc where a
> DCHECK fails.
>
> I naively modified how AllocateScanRange is called to receive valid
> partition IDs instead of '-1' and col_idx(). This fixes the issues I
> encountered and mentioned above, however I'm not sure that this change
> doesn't break some other logic. (Core tests are being run, but even if they
> pass I'm not sure this change is fine)
>
> Was it intentional to give those non-valid partition_id's to
> AllocateScanRange? Do you see any risk of passing valid IDs instead?
>
> Cheers,
> Gabor
>

Reply via email to