On Tue, Jan 7, 2020, 14:20 OpenInx <[email protected]> wrote: > Well, I understand your point. you mean all data files and delta log can be > stored on s3. > Make sense. > > but seems the hudi meta files is still depending on HDFS, and almost all of > them are small > files, so how do we limit the small file count (which may pressure the HDFS > namenode) ? > We are heavily using hudi in production (Spark + S3). HDFS is just a filesystem like other s3,etc . If spark supports any filesystem then hudi will also work in case if you have issues. Please write back to the community. Hudi doesn't have any hardcore dependencies with hdfs. Hope this helps.
> > Thanks. > > On Tue, Jan 7, 2020 at 4:25 PM Syed Abdul Kather <[email protected]> > wrote: > > > Hi, > > Hudi can run in spark. Storage doesn't matter. If you configure ur spark > > use s3 filesystem that is enough. > > Thanks and Regards, > > S SYED ABDUL KATHER > > > > > > > > On Tue, Jan 7, 2020 at 12:28 PM OpenInx <[email protected]> wrote: > > > > > Hi > > > > > > I know that the Hudi can only running on HDFS, while s3 is much > cheaper > > > than hdfs storage. so just ask: > > > any plan to support hudi + s3, say make the storage layer to be > > pluggable , > > > and have the hdfs/s3/oss impl etc. > > > btw do we hudi have the strong dependency on the atomic rename > semantic > > ? > > > If sure, it may be some problem > > > to move it to s3 ? > > > > > > Thanks. > > > > > >
