Hi Raymond, HoodieROPathFilter is supposed to return true only for file matches belonging to latest version if the path refers to a Hudi partition or if the path refers to a non-hoodie partition or dataset. I looked at the test-case you referred. It only works because the path filter wrongly assumes it is a non-hoodie path. You can run this in debug mode to see the code path. From the usage perspective, this is used only from Spark (InMemoryFileIndex) where only the files are passed to this filter. So, I wouldn't classify this as a bug. But, it makes sense to make it consistent for both cases Balaji.V
Thanks,Balaji.V On Wednesday, September 9, 2020, 07:45:09 AM PDT, Raymond Xu <xu.shiyan.raym...@gmail.com> wrote: Hi Balaji, not sure if I fully get it. I'm attempting to refer to this test case https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieROTablePathFilter.java#L63-L65 where a partition path is supposed to be accepted. If I change L64 to Path partitionPath = new Path(Paths.get(basePath, "2017/01/01").toUri()); Then it resulted in not being accepted due to partitionPath ending with `/` (a directory path). To me, this seems to be a corner case not being covered. Could you kindly confirm the expectation please? Thanks. On Tue, Sep 8, 2020 at 8:58 PM Balaji Varadarajan <v.bal...@ymail.com.invalid> wrote: > Hi Raymond, > IIRC, we need to give a blob path to make HoodieROTablePathFilter to work > correctly (e.g: "base/partition/*"). The path-cache is at partition level > and not at table level so we need to extract the partition-path correctly > to be used as look-up key. To extract partition-path, the challenge here is > "Path" type does not have APIs to quickly figure if a path is a directory > or not and we should avoid making RPC calls here. > Thanks,Balaji.V > On Tuesday, September 8, 2020, 09:56:49 AM PDT, Raymond Xu < > xu.shiyan.raym...@gmail.com> wrote: > > > https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L120-L121 > > As shown in the 2 lines above, it does not seem to work with directory > Path. > It should work for both `new Path("base/partition")` and `new > Path("base/partition/")`, but it only works for the former case. In the > latter case, `folder` will be "base/partition" and `path` will be > "base/partition/", which will always result in returning false. > A potential bug? >