Re: [DISCUSS] Support configure remote flink jar

2019-11-18 Thread Yang Wang
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison  于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>


Re: [DISCUSS] Support configure remote flink jar

2019-11-18 Thread Thomas Weise
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar
file as alternative to the jar file itself. Such URL could point to an
artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:

> Hi tison,
>
> Thanks for your starting this discussion.
> * For user customized flink-dist jar, it is an useful feature. Since it
> could avoid to upload the flink-dist jar
> every time. Especially in production environment, it could accelerate the
> submission process.
> * For the standard flink-dist jar, FLINK-13938[1] could solve
> the problem.Upload a official flink release
> binary to distributed storage(hdfs) first, and then all the submission
> could benefit from it. Users could
> also upload the customized flink-dist jar to accelerate their submission.
>
> If the flink-dist jar could be specified to a remote path, maybe the user
> jar have the same situation.
>
> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>
> tison  于2019年11月19日周二 上午11:17写道:
>
> > Hi forks,
> >
> > Recently, our customers ask for a feature configuring remote flink jar.
> > I'd like to reach to you guys
> > to see whether or not it is a general need.
> >
> > ATM Flink only supports configures local file as flink jar via `-yj`
> > option. If we pass a HDFS file
> > path, due to implementation detail it will fail with
> > IllegalArgumentException. In the story we support
> > configure remote flink jar, this limitation is eliminated. We also make
> > use of YARN locality so that
> > reducing uploading overhead, instead, asking YARN to localize the jar on
> > AM container started.
> >
> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> > discussion on our
> > mailing list first.
> >
> > Are you looking forward to such a feature?
> >
> > @Yang Wang: this feature is different from that we discussed offline, it
> > only focuses on flink jar, not
> > all ship files.
> >
> > Best,
> > tison.
> >
>


Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread ouywl







I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.






  










ouywl




ou...@139.com








签名由
网易邮箱大师
定制

 


On 11/19/2019 12:11,Thomas Weise wrote: 


There is a related use case (not specific to HDFS) that I came across:It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.ThomasOn Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison  于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>







Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread Stephan Ewen
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve
this. Also more flexible, when it comes to supporting authentication
mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:

> I have implemented this feature in our env, Use ‘Init Container’ of
> docker to get URL of a jar file ,It seems a good idea.
>
> ouywl
> ou...@139.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 11/19/2019 12:11,Thomas Weise  wrote:
>
> There is a related use case (not specific to HDFS) that I came across:
>
> It would be nice if the jar upload endpoint could accept the URL of a jar
> file as alternative to the jar file itself. Such URL could point to an
> artifactory or distributed file system.
>
> Thomas
>
>
> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>
>> Hi tison,
>>
>> Thanks for your starting this discussion.
>> * For user customized flink-dist jar, it is an useful feature. Since it
>> could avoid to upload the flink-dist jar
>> every time. Especially in production environment, it could accelerate the
>> submission process.
>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>> the problem.Upload a official flink release
>> binary to distributed storage(hdfs) first, and then all the submission
>> could benefit from it. Users could
>> also upload the customized flink-dist jar to accelerate their submission.
>>
>> If the flink-dist jar could be specified to a remote path, maybe the user
>> jar have the same situation.
>>
>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>
>> tison  于2019年11月19日周二 上午11:17写道:
>>
>> > Hi forks,
>> >
>> > Recently, our customers ask for a feature configuring remote flink jar.
>> > I'd like to reach to you guys
>> > to see whether or not it is a general need.
>> >
>> > ATM Flink only supports configures local file as flink jar via `-yj`
>> > option. If we pass a HDFS file
>> > path, due to implementation detail it will fail with
>> > IllegalArgumentException. In the story we support
>> > configure remote flink jar, this limitation is eliminated. We also make
>> > use of YARN locality so that
>> > reducing uploading overhead, instead, asking YARN to localize the jar on
>> > AM container started.
>> >
>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>> > discussion on our
>> > mailing list first.
>> >
>> > Are you looking forward to such a feature?
>> >
>> > @Yang Wang: this feature is different from that we discussed offline, it
>> > only focuses on flink jar, not
>> > all ship files.
>> >
>> > Best,
>> > tison.
>> >
>>
>


Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread tison
Thanks for your participation!

@Yang: Great to hear. I'd like to know whether or not a remote flink jar
path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink
jar from shipping which possibly not works for the remote one.

@Thomas: It inspires a lot URL becomes the unified representation of
resource. I'm thinking of how to serve a unique process getting resource
from URL which points to an artifact or distributed file system.

@ouywl & Stephan: Yes this improvement can be migrated to environment like
k8s, IIRC the k8s proposal already discussed about improvement using "init
container" and other technologies. However, so far I regard it is an
improvement different from one storage to another so that we achieve then
individually.


Best,
tison.


Stephan Ewen  于2019年11月20日周三 上午12:34写道:

> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>
> For containerized setups, and init container seems like a nice way to
> solve this. Also more flexible, when it comes to supporting authentication
> mechanisms for the target storage system, etc.
>
> On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:
>
>> I have implemented this feature in our env, Use ‘Init Container’ of
>> docker to get URL of a jar file ,It seems a good idea.
>>
>> ouywl
>> ou...@139.com
>>
>> 
>> 签名由 网易邮箱大师  定制
>>
>> On 11/19/2019 12:11,Thomas Weise  wrote:
>>
>> There is a related use case (not specific to HDFS) that I came across:
>>
>> It would be nice if the jar upload endpoint could accept the URL of a jar
>> file as alternative to the jar file itself. Such URL could point to an
>> artifactory or distributed file system.
>>
>> Thomas
>>
>>
>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>>
>>> Hi tison,
>>>
>>> Thanks for your starting this discussion.
>>> * For user customized flink-dist jar, it is an useful feature. Since it
>>> could avoid to upload the flink-dist jar
>>> every time. Especially in production environment, it could accelerate the
>>> submission process.
>>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>>> the problem.Upload a official flink release
>>> binary to distributed storage(hdfs) first, and then all the submission
>>> could benefit from it. Users could
>>> also upload the customized flink-dist jar to accelerate their submission.
>>>
>>> If the flink-dist jar could be specified to a remote path, maybe the user
>>> jar have the same situation.
>>>
>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>>
>>> tison  于2019年11月19日周二 上午11:17写道:
>>>
>>> > Hi forks,
>>> >
>>> > Recently, our customers ask for a feature configuring remote flink jar.
>>> > I'd like to reach to you guys
>>> > to see whether or not it is a general need.
>>> >
>>> > ATM Flink only supports configures local file as flink jar via `-yj`
>>> > option. If we pass a HDFS file
>>> > path, due to implementation detail it will fail with
>>> > IllegalArgumentException. In the story we support
>>> > configure remote flink jar, this limitation is eliminated. We also make
>>> > use of YARN locality so that
>>> > reducing uploading overhead, instead, asking YARN to localize the jar
>>> on
>>> > AM container started.
>>> >
>>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>>> > discussion on our
>>> > mailing list first.
>>> >
>>> > Are you looking forward to such a feature?
>>> >
>>> > @Yang Wang: this feature is different from that we discussed offline,
>>> it
>>> > only focuses on flink jar, not
>>> > all ship files.
>>> >
>>> > Best,
>>> > tison.
>>> >
>>>
>>


Re: [DISCUSS] Support configure remote flink jar

2019-11-23 Thread Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late.

Yes, I think this is a very good idea. we already tweak the flink-yarn
package internally to support something similar to what @Thomas mentioned:
to support registering a Jar that has already uploaded to some DFS
(needless to be the Yarn public cache discussed in FLINK-13938).
The reason is that: we provide our internal packaged extension libraries
for our customers. And we've seen good performance improvement in our YARN
cluster during container localization phase after our customer switch to
use pre-uploaded JARs instead of having to upload every time during
deployment.

Looking forward for this feature!

--
Rong


On Tue, Nov 19, 2019 at 10:19 PM tison  wrote:

> Thanks for your participation!
>
> @Yang: Great to hear. I'd like to know whether or not a remote flink jar
> path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local
> flink jar from shipping which possibly not works for the remote one.
>
> @Thomas: It inspires a lot URL becomes the unified representation of
> resource. I'm thinking of how to serve a unique process getting resource
> from URL which points to an artifact or distributed file system.
>
> @ouywl & Stephan: Yes this improvement can be migrated to environment like
> k8s, IIRC the k8s proposal already discussed about improvement using "init
> container" and other technologies. However, so far I regard it is an
> improvement different from one storage to another so that we achieve then
> individually.
>
>
> Best,
> tison.
>
>
> Stephan Ewen  于2019年11月20日周三 上午12:34写道:
>
>> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>>
>> For containerized setups, and init container seems like a nice way to
>> solve this. Also more flexible, when it comes to supporting authentication
>> mechanisms for the target storage system, etc.
>>
>> On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:
>>
>>> I have implemented this feature in our env, Use ‘Init Container’ of
>>> docker to get URL of a jar file ,It seems a good idea.
>>>
>>> ouywl
>>> ou...@139.com
>>>
>>> 
>>> 签名由 网易邮箱大师  定制
>>>
>>> On 11/19/2019 12:11,Thomas Weise 
>>> wrote:
>>>
>>> There is a related use case (not specific to HDFS) that I came across:
>>>
>>> It would be nice if the jar upload endpoint could accept the URL of a
>>> jar file as alternative to the jar file itself. Such URL could point to an
>>> artifactory or distributed file system.
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>>>
 Hi tison,

 Thanks for your starting this discussion.
 * For user customized flink-dist jar, it is an useful feature. Since it
 could avoid to upload the flink-dist jar
 every time. Especially in production environment, it could accelerate
 the
 submission process.
 * For the standard flink-dist jar, FLINK-13938[1] could solve
 the problem.Upload a official flink release
 binary to distributed storage(hdfs) first, and then all the submission
 could benefit from it. Users could
 also upload the customized flink-dist jar to accelerate their
 submission.

 If the flink-dist jar could be specified to a remote path, maybe the
 user
 jar have the same situation.

 [1]. https://issues.apache.org/jira/browse/FLINK-13938

 tison  于2019年11月19日周二 上午11:17写道:

 > Hi forks,
 >
 > Recently, our customers ask for a feature configuring remote flink
 jar.
 > I'd like to reach to you guys
 > to see whether or not it is a general need.
 >
 > ATM Flink only supports configures local file as flink jar via `-yj`
 > option. If we pass a HDFS file
 > path, due to implementation detail it will fail with
 > IllegalArgumentException. In the story we support
 > configure remote flink jar, this limitation is eliminated. We also
 make
 > use of YARN locality so that
 > reducing uploading overhead, instead, asking YARN to localize the jar
 on
 > AM container started.
 >
 > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
 > discussion on our
 > mailing list first.
 >
 > Are you looking forward to such a feature?
 >
 > @Yang Wang: this feature is different from that we discussed offline,
 it
 > only focuses on flink jar, not
 > all ship files.
 >
 > Best,
 > tison.
 >

>>>