Understood. Thank you, Balaji, for the clarification. On Fri, Sep 11, 2020 at 2:17 PM Balaji Varadarajan <v.bal...@ymail.com.invalid> wrote:
> Hi Raymond, > > HoodieROPathFilter is supposed to return true only for file matches > belonging to latest version if the path refers to a Hudi partition or if > the path refers to a non-hoodie partition or dataset. > I looked at the test-case you referred. It only works because the path > filter wrongly assumes it is a non-hoodie path. You can run this in debug > mode to see the code path. From the usage perspective, this is used only > from Spark (InMemoryFileIndex) where only the files are passed to this > filter. So, I wouldn't classify this as a bug. But, it makes sense to make > it consistent for both cases > Balaji.V > > Thanks,Balaji.V On Wednesday, September 9, 2020, 07:45:09 AM PDT, > Raymond Xu <xu.shiyan.raym...@gmail.com> wrote: > > Hi Balaji, not sure if I fully get it. > I'm attempting to refer to this test case > > https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieROTablePathFilter.java#L63-L65 > > where a partition path is supposed to be accepted. > If I change L64 to > Path partitionPath = new Path(Paths.get(basePath, "2017/01/01").toUri()); > > Then it resulted in not being accepted due to partitionPath ending with `/` > (a directory path). To me, this seems to be a corner case not being > covered. Could you kindly confirm the expectation please? Thanks. > > On Tue, Sep 8, 2020 at 8:58 PM Balaji Varadarajan > <v.bal...@ymail.com.invalid> wrote: > > > Hi Raymond, > > IIRC, we need to give a blob path to make HoodieROTablePathFilter to > work > > correctly (e.g: "base/partition/*"). The path-cache is at partition level > > and not at table level so we need to extract the partition-path correctly > > to be used as look-up key. To extract partition-path, the challenge here > is > > "Path" type does not have APIs to quickly figure if a path is a directory > > or not and we should avoid making RPC calls here. > > Thanks,Balaji.V > > On Tuesday, September 8, 2020, 09:56:49 AM PDT, Raymond Xu < > > xu.shiyan.raym...@gmail.com> wrote: > > > > > > > https://github.com/apache/hudi/blob/9bcd3221fd440081dbae70e89d08539c3b484862/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L120-L121 > > > > As shown in the 2 lines above, it does not seem to work with directory > > Path. > > It should work for both `new Path("base/partition")` and `new > > Path("base/partition/")`, but it only works for the former case. In the > > latter case, `folder` will be "base/partition" and `path` will be > > "base/partition/", which will always result in returning false. > > A potential bug? > > >