Hello Wei-Chiu,

> Could you shade more lights on the ML use case? Is it because the ML 
> frameworks are not written in Java so they can't use HDFS client libraries?
> How about using WebHDFS? I'm sure there are python bindings for WebHDFS. Or 
> NFS Gateway?
Our motivation is to avoid forcing data scientist to rewrite code to move on 
development to production.
I think that NFS Gateway also can be a solution. We want something that the 
local file system and HDFS looks transparent.

ML framework like TensorFlow and Spark/MLlib supports HDFS access with 
security, however, some other is not.
It is annoyable to change protocol for each framework.
Also, in my company, several researchers write algorithms with scratch.
It is not realistic to rewrite them to be compliant with HDFS.

In honesty, I don't have a confidence that using fuse-dfs is the best solution 
for our use case.
If no one has use case of fuse-dfs in the community,
there is no reason for us to adhere to using and maintaining fuse-dfs.

Shingo

-----Original Message-----
From: Wei-Chiu Chuang <weic...@apache.org> 
Sent: Tuesday, October 2, 2018 11:22 PM
To: 古山 慎悟 <sfuru...@yahoo-corp.jp>
Cc: Hadoop Common <common-dev@hadoop.apache.org>
Subject: Re: [DISCUSS] Deprecate fuse-dfs

Thank you Shingo,

Could you shade more lights on the ML use case? Is it because the ML frameworks 
are not written in Java so they can't use HDFS client libraries?
How about using WebHDFS? I'm sure there are python bindings for WebHDFS. Or NFS 
Gateway?

The fact is, there are a few ways to access HDFS today, and I don't feel we can 
support all of them at high quality, and to support new features added into 
HDFS in the future.

Thanks for the pointer to hdfs-mount. Sending protobuf messages instead of 
running a HDFS client Java VM is definitely a viable approach.
I think it'll work for basic clusters. However getting it to support Kerberos 
and encryption can be tricky. (Unfortunately that's the majority of my 
customers)

On Mon, Oct 1, 2018 at 8:32 PM Shingo Furuyama <sfuru...@yahoo-corp.jp>
wrote:

> Hi folks,
>
> Recently we started investigating fuse-dfs for use of ML frameworks on k8s.
>
> I think that it is convenient way for data scientist to support access 
> to HDFS via FUSE.
> We also expect that we do not need to implement a function for HDFS 
> access each ML frameworks.
>
> I guess it is common use-case of HDFS for ML users.
> If there is no maintainer in the community, my staffs is capable to 
> take on the role.
>
> By the way, I wonder,
> - How do you access to data in HDFS from k8s? Any other solution 
> rather than fuse-dfs?
> - How do you think about another implementation for fuse like 
> https://github.com/Microsoft/hdfs-mount . Should we ignore fuse-dfs 
> and use or implement the other?
>
> Regards,
> Shingo
>
> On 2018/10/01 20:10:07, Wei-Chiu Chuang <w...@apache.org> wrote:
> > Hi fellow Hadoop developers,>
> >
> > I want to start this thread to raise the awareness of the quality 
> > of> fuse-dfs. It appears that this sub-component is not being 
> > developed and> maintained, and appears not many are using it.>
> >
> > In the past two years, there has been only one bug fixed 
> > (HDFS-13322).>
> >
> > <goog_437937250>>
> >
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS)%20AND%20text%20~%20fuse%20ORDER%20BY%20created%20DESC%2C%20upda
> ted%20DESC>
>
> >
> > It doesn't support keytab login, ACL permissions, rename, ... a 
> > number
> of>
> > POSIX semantics. We also recently realized fuse-dfs doesn't work 
> > under> heavy weight workload (Think running SQL applications on it)>
> >
> > So what's the status now? Is there any one who is still using 
> > fuse-dfs
> in>
> > production? Should we start the deprecation process? Or at least
> document>
> > that it is not meant for anything beyond simple data transfer? IIRC 
> > vim> would even complain if you try to edit a file in fuse_dfs 
> > directory.>
> > -- >
> > A very happy Hadoop contributor>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>


--
A very happy Hadoop contributor

Reply via email to