[DISCUSS] Service Authorization (redux)

2017-07-23 Thread Eron Wright
Hello, now might be a good time to revisit an important enhancement to
Flink security, so-called service authorization.   This means the hardening
of a Flink cluster against unauthorized use with some sort of
authentication and authorization scheme.   Today, Flink relies entirely on
network isolation to protect itself from unauthorized job submission and
control, and to protect the secrets contained within a Flink cluster.
This is a problem in multi-user environments like YARN/Mesos/K8.

Last fall, an effort was made to implement service authorization but the PR
was ultimately rejected.   The idea was to add a simple secret key to all
network communication between the client, JM, and TM.   Akka itself has
such a feature which formed the basis of the solution.  There are usability
challenges with this solution, including a dependency on SSL.

Since then, the situation has evolved somewhat, and the use of SSL mutual
authentication is more viable.   Mutual auth is supported in Akka 2.4.12+
(or could be backported to Flakka).  My proposal is:

1. Upgrade Akka or backport the functionality to Flakka (see commit
5d03902c5ec3212cd28f26c9b3ef7c3b628b9451).
2. Implement SSL on any endpoint that doesn't yet support it (e.g.
queryable state).
3. Enable mutual auth in Akka and implement it on non-Akka endpoints.
4. Implement a simple authorization layer that accepts any authenticated
connection.
5. (stretch) generate and store a certificate automatically in YARN mode.
6. (stretch) Develop an alternate authentication method for the Web UI.

Are folks interested in this capability?  Thoughts on the use of SSL mutual
auth versus something else?  Thanks!

-Eron


Re: [VOTE] Release Apache Flink-shaded 1.0 (RC1)

2017-07-23 Thread Greg Hogan
Is there a pressing need to get the release out quickly? This being the first 
release, would it be better to change the versioning now to prevent future 
confusion? Even if Flink is the only intended consumer we’ll still be 
publishing the jars.


> On Jul 23, 2017, at 9:41 AM, Stephan Ewen  wrote:
> 
> The release is technically correct, so
> +1 for the release
> 
>  - LICENSE and NOTICE are good
>  - Shaded artifacts add their licenses to the artifact where needed
>  - no binaries in the release
> 
> 
> I will send another mail with suggestions for improving things for future
> releases
> 
> 
> On Fri, Jul 21, 2017 at 11:39 AM, Robert Metzger 
> wrote:
> 
>> Thanks a lot for preparing the release artifacts.
>> While checking the source repo / release commit, I realized that you are
>> not following the versioning scheme as flink:
>> the current master has a "x.y-SNAPSHOT" version, and release candidates
>> (and releases) get a x.y.z version. I wonder if it makes sense to use the
>> same model in the flink-shaded.git repo. I think this is the default
>> assumption in maven, and some modules behave differently based on the
>> version: for example "mvn deploy" sends "-SNAPSHOT" artifacts to a snapshot
>> server, and release artifacts to a staging repository.
>> 
>> I don't think we need to cancel the release because of this, I just wanted
>> to raise this point to see what others are thinking.
>> 
>> 
>> I've checked the following
>> - The netty shaded jar contains the MIT license from netty router:
>> https://repository.apache.org/content/repositories/
>> orgapacheflink-1130/org/apache/flink/flink-shaded-
>> netty-4/1.0-4.0.27.Final/flink-shaded-netty-4-1.0-4.0.27.Final.jar
>> - In the staging repo, I didn't see any dependencies exposed.
>> - I checked some of the md5 sums in the staging and they were correct / I
>> used a mvn plugin to check the signatures in the staging repo and they were
>> okay
>> - clean install in the source repo worked (this includes a license header
>> check)
>> - LICENSE and NOTICE file are there
>> 
>> ==> +1 to release.
>> 
>> On Fri, Jul 21, 2017 at 9:45 AM, Chesnay Schepler 
>> wrote:
>> 
>>> Here's a list of things we need to check:
>>> 
>>> * correct License/Notice files
>>> * licenses of shaded dependencies are included in the jar
>>> * the versions of shaded dependencies match those used in Flink 1.4
>>> * compilation with maven works
>>> * the assembled jars only contain the shaded dependency and no
>>>   non-shaded classes
>>> * no transitive dependencies should be exposed
>>> 
>>> 
>>> On 19.07.2017 15:59, Chesnay Schepler wrote:
>>> 
 Dear Flink community,
 
 Please vote on releasing the following candidate as Apache Flink-shaded
 version 1.0.
 
 The commit to be voted in:
 https://gitbox.apache.org/repos/asf/flink-shaded/commit/fd30
 33ba9ead310478963bf43e09cd50d1e36d71
 
 Branch:
 release-1.0-rc1
 
 The release artifacts to be voted on can be found at:
 http://home.apache.org/~chesnay/flink-shaded-1.0-rc1/ <
 http://home.apache.org/%7Echesnay/flink-shaded-1.0-rc1/>
 
 The release artifacts are signed with the key with fingerprint
 19F2195E1B4816D765A2C324C2EED7B111D464BA:
 http://www.apache.org/dist/flink/KEYS
 
 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapacheflink-1130
 
 -
 
 
 The vote ends on Monday (5pm CEST), July 24th, 2017.
 
 [ ] +1 Release this package as Apache Flink-shaded 1.0
 [ ] -1 Do not release this package, because ...
 
 -
 
 
 The flink-shaded project contains a number of shaded dependencies for
 Apache Flink.
 
 This release includes asm-all:5.0.4, guava:18.0, netty-all:4.0.27-FINAL
 and netty-router:1.10 . Note that netty-all and netty-router are
>> bundled as
 a single dependency.
 
 The purpose of these dependencies is to provide a single instance of a
 shaded dependency in the Apache Flink distribution, instead of each
 individual module shading the dependency.
 
 For more information, see
 https://issues.apache.org/jira/browse/FLINK-6529.


Re: [flink-shaded] Some suggestions for improvements

2017-07-23 Thread Stephan Ewen
For the SNAPSHOT part, I do not feel too strong about that either, just a
tendency to keep it in sync with how core Flink works.

For the "tools" directory, we can keep it as it is. It seems to complicated
and really is not a big deal...

On Sun, Jul 23, 2017 at 8:31 PM, Chesnay Schepler 
wrote:

> I agree that the version scheme in the artifact isn't ideal.
>
> We can keep the tools out of the release, but not in a nice way.
> We either
>
> 1. remove it in a separate commit before each release
> 2. just omit it during the release process.
>
> 1) has the odd downside that the release branch cannot release itself, as
> the script is now missing
> 2) has the odd downside that no branch in the repository would actually
> match the release
>
> As for the SNAPSHOT suffix, given that we currently don't do any snapshot
> deployments
> for flink-shaded, nor there being a reason for that in the first place
> since the only a consumer
> for the dependencies (aka Apache Flink) would never rely on SNAPSHOT
> versions (I guess),
> I don't see a the need for it. But i don't feel strongly about this, and
> don't mind either way.
>
>
> On 23.07.2017 15:42, Stephan Ewen wrote:
>
>> A few comments what we can improve in future releases:
>>
>>- I agree with Robert's comment to change the versioning to the same
>> model as Flink where the master branch is on a SNAPSHOT version always and
>> the releases are branches/tags with stable versions.
>>
>>- The version names of the artifacts read a bit
>> strange: flink-shaded-asm-5-1.0-5.0.4
>>- I would suggest to rename them to something like
>> flink-shaded-asm_5.0.4-1.0
>>- The version of the artifact with an underscore, so separate artifact
>> version from release version. Think of it as similar to the Scala version
>> specific release artifacts.
>>
>>- I would suggest to also remove the "tools" directly from the source
>> release, if that is not too much work.
>>
>>
>


Re: [flink-shaded] Some suggestions for improvements

2017-07-23 Thread Chesnay Schepler

I agree that the version scheme in the artifact isn't ideal.

We can keep the tools out of the release, but not in a nice way.
We either

1. remove it in a separate commit before each release
2. just omit it during the release process.

1) has the odd downside that the release branch cannot release itself, 
as the script is now missing
2) has the odd downside that no branch in the repository would actually 
match the release


As for the SNAPSHOT suffix, given that we currently don't do any 
snapshot deployments
for flink-shaded, nor there being a reason for that in the first place 
since the only a consumer
for the dependencies (aka Apache Flink) would never rely on SNAPSHOT 
versions (I guess),
I don't see a the need for it. But i don't feel strongly about this, and 
don't mind either way.


On 23.07.2017 15:42, Stephan Ewen wrote:

A few comments what we can improve in future releases:

   - I agree with Robert's comment to change the versioning to the same
model as Flink where the master branch is on a SNAPSHOT version always and
the releases are branches/tags with stable versions.

   - The version names of the artifacts read a bit
strange: flink-shaded-asm-5-1.0-5.0.4
   - I would suggest to rename them to something like
flink-shaded-asm_5.0.4-1.0
   - The version of the artifact with an underscore, so separate artifact
version from release version. Think of it as similar to the Scala version
specific release artifacts.

   - I would suggest to also remove the "tools" directly from the source
release, if that is not too much work.





Re: is flink' states functionality futile?

2017-07-23 Thread ziv
Ok, Let me see if I understand you correctly. 
You actually state that flink' states functionality is introduced only to
handle recovering from failures. 
Let's take the example given in 1.3 documentary - 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html

  
it's a program that sinks messages only after enough items accumulated in
the buffer. 
Now, assume I'm not bothered of recovering failures and only want the
simplest way to implement a program that remembers data from the last run in
the stream, then, according to you, I may not use none of the elements
associated with flink' states - 
ListState
snapshotState
initializeState
restoreState
and the program still functions correctly? 





--
View this message in context: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/is-flink-states-functionality-futile-tp18867p18879.html
Sent from the Apache Flink Mailing List archive. mailing list archive at 
Nabble.com.


[jira] [Created] (FLINK-7246) Big latency shown on operator.latency

2017-07-23 Thread yinhua.dai (JIRA)
yinhua.dai created FLINK-7246:
-

 Summary: Big latency shown on operator.latency
 Key: FLINK-7246
 URL: https://issues.apache.org/jira/browse/FLINK-7246
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.1
 Environment: Local
Reporter: yinhua.dai


I was running flink 1.2.1, and I have set metrics reporter to JMX to check 
latency of my job.

But the result is that the latency I observerd is over 100ms even there is no 
processing in my job.

And then I ran the example SocketWordCount streaming job, and again I saw the 
latency is over 100ms, I am wondering if there is something misconfiguration or 
problems.

I was using start-local.bat and flink run to start up the job, all with default 
configs.

Thank you in advance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] Planning Release 1.4

2017-07-23 Thread Stephan Ewen
I agree with these points.

My personal take is that we tried to be a bit too strict with the "date to
fork" and "date to release" and it got a bit in the way of development and
testing.

On Fri, Jun 2, 2017 at 11:15 AM, Robert Metzger  wrote:

> I agree that it was quite annoying to merge everything to two branches.
> But part of that problem was that many big features were merged last minute
> and then fixed after the feature freeze.
> In an ideal world, all features are stable, tested and documented when the
> feature freeze happens and most commits go into master only.
>
> I wonder if we can manage to merge queued minor features before the feature
> freeze to avoid the issue in the future?
>
> If we all agree that this doesn't work, we can also try to delay the
> feature freeze. I just fear that this will make it harder to meet the
> release deadline.
>
>
> Robert
>
> On Thu, Jun 1, 2017 at 6:05 PM, Greg Hogan  wrote:
>
> > I’d like to propose keeping the same schedule but move branch forking
> from
> > the feature freeze to the code freeze. The early fork required duplicate
> > verification and commits for numerous bug fixes and minor features which
> > had been reviewed but were still queued. There did not look to be much
> new
> > development merged to master between the freezes.
> >
> > Greg
> >
> >
> > > On Jun 1, 2017, at 11:26 AM, Robert Metzger 
> wrote:
> > >
> > > Hi all,
> > >
> > > Flink 1.2 was released on February 2, Flink 1.3 on June 1, which means
> > > we've managed to release Flink 1.3 in almost exactly 4 months!
> > >
> > > For the 1.4 release, I've put the following deadlines into the wiki
> [1]:
> > >
> > > *Next scheduled major release*: 1.4.0
> > > *Feature freeze (branch forking)*:  4. September 2017
> > > *Code freeze (first voting RC)*:  18 September 2017
> > > *Release date*: 29 September 2017
> > >
> > > I'll try to send a message every month into this thread to have a
> > countdown
> > > to the next feature freeze.
> > >
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/
> > Flink+Release+and+Feature+Plan
> >
> >
>


[flink-shaded] Some suggestions for improvements

2017-07-23 Thread Stephan Ewen
A few comments what we can improve in future releases:

  - I agree with Robert's comment to change the versioning to the same
model as Flink where the master branch is on a SNAPSHOT version always and
the releases are branches/tags with stable versions.

  - The version names of the artifacts read a bit
strange: flink-shaded-asm-5-1.0-5.0.4
  - I would suggest to rename them to something like
flink-shaded-asm_5.0.4-1.0
  - The version of the artifact with an underscore, so separate artifact
version from release version. Think of it as similar to the Scala version
specific release artifacts.

  - I would suggest to also remove the "tools" directly from the source
release, if that is not too much work.


Re: [VOTE] Release Apache Flink-shaded 1.0 (RC1)

2017-07-23 Thread Stephan Ewen
The release is technically correct, so
+1 for the release

  - LICENSE and NOTICE are good
  - Shaded artifacts add their licenses to the artifact where needed
  - no binaries in the release


I will send another mail with suggestions for improving things for future
releases


On Fri, Jul 21, 2017 at 11:39 AM, Robert Metzger 
wrote:

> Thanks a lot for preparing the release artifacts.
> While checking the source repo / release commit, I realized that you are
> not following the versioning scheme as flink:
> the current master has a "x.y-SNAPSHOT" version, and release candidates
> (and releases) get a x.y.z version. I wonder if it makes sense to use the
> same model in the flink-shaded.git repo. I think this is the default
> assumption in maven, and some modules behave differently based on the
> version: for example "mvn deploy" sends "-SNAPSHOT" artifacts to a snapshot
> server, and release artifacts to a staging repository.
>
> I don't think we need to cancel the release because of this, I just wanted
> to raise this point to see what others are thinking.
>
>
> I've checked the following
> - The netty shaded jar contains the MIT license from netty router:
> https://repository.apache.org/content/repositories/
> orgapacheflink-1130/org/apache/flink/flink-shaded-
> netty-4/1.0-4.0.27.Final/flink-shaded-netty-4-1.0-4.0.27.Final.jar
> - In the staging repo, I didn't see any dependencies exposed.
> - I checked some of the md5 sums in the staging and they were correct / I
> used a mvn plugin to check the signatures in the staging repo and they were
> okay
> - clean install in the source repo worked (this includes a license header
> check)
> - LICENSE and NOTICE file are there
>
> ==> +1 to release.
>
> On Fri, Jul 21, 2017 at 9:45 AM, Chesnay Schepler 
> wrote:
>
> > Here's a list of things we need to check:
> >
> >  * correct License/Notice files
> >  * licenses of shaded dependencies are included in the jar
> >  * the versions of shaded dependencies match those used in Flink 1.4
> >  * compilation with maven works
> >  * the assembled jars only contain the shaded dependency and no
> >non-shaded classes
> >  * no transitive dependencies should be exposed
> >
> >
> > On 19.07.2017 15:59, Chesnay Schepler wrote:
> >
> >> Dear Flink community,
> >>
> >> Please vote on releasing the following candidate as Apache Flink-shaded
> >> version 1.0.
> >>
> >> The commit to be voted in:
> >> https://gitbox.apache.org/repos/asf/flink-shaded/commit/fd30
> >> 33ba9ead310478963bf43e09cd50d1e36d71
> >>
> >> Branch:
> >> release-1.0-rc1
> >>
> >> The release artifacts to be voted on can be found at:
> >> http://home.apache.org/~chesnay/flink-shaded-1.0-rc1/ <
> >> http://home.apache.org/%7Echesnay/flink-shaded-1.0-rc1/>
> >>
> >> The release artifacts are signed with the key with fingerprint
> >> 19F2195E1B4816D765A2C324C2EED7B111D464BA:
> >> http://www.apache.org/dist/flink/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapacheflink-1130
> >>
> >> -
> >>
> >>
> >> The vote ends on Monday (5pm CEST), July 24th, 2017.
> >>
> >> [ ] +1 Release this package as Apache Flink-shaded 1.0
> >> [ ] -1 Do not release this package, because ...
> >>
> >> -
> >>
> >>
> >> The flink-shaded project contains a number of shaded dependencies for
> >> Apache Flink.
> >>
> >> This release includes asm-all:5.0.4, guava:18.0, netty-all:4.0.27-FINAL
> >> and netty-router:1.10 . Note that netty-all and netty-router are
> bundled as
> >> a single dependency.
> >>
> >> The purpose of these dependencies is to provide a single instance of a
> >> shaded dependency in the Apache Flink distribution, instead of each
> >> individual module shading the dependency.
> >>
> >> For more information, see
> >> https://issues.apache.org/jira/browse/FLINK-6529.
> >>
> >>
> >
>


[jira] [Created] (FLINK-7245) Enhance the operators to support holding back watermarks

2017-07-23 Thread Xingcan Cui (JIRA)
Xingcan Cui created FLINK-7245:
--

 Summary: Enhance the operators to support holding back watermarks
 Key: FLINK-7245
 URL: https://issues.apache.org/jira/browse/FLINK-7245
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API
Reporter: Xingcan Cui
Assignee: Xingcan Cui


Currently the watermarks are applied and emitted by the 
{{AbstractStreamOperator}} instantly. 
{code:java}
public void processWatermark(Watermark mark) throws Exception {
if (timeServiceManager != null) {
timeServiceManager.advanceWatermark(mark);
}
output.emitWatermark(mark);
}
{code}

Some calculation results (with timestamp fields) triggered by these watermarks 
(e.g., join or aggregate results) may be regarded as delayed by the downstream 
operators since their timestamps must be less than or equal to the 
corresponding triggers. 

This issue aims to add another "working mode", which supports holding back 
watermarks, to current operators. These watermarks should be blocked and stored 
by the operators until all the corresponding new generated results are emitted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7244) Add ParquetTableSource Implementation based on ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)
godfrey he created FLINK-7244:
-

 Summary: Add ParquetTableSource Implementation based on 
ParquetInputFormat
 Key: FLINK-7244
 URL: https://issues.apache.org/jira/browse/FLINK-7244
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7243) Add ParquetInputFormat

2017-07-23 Thread godfrey he (JIRA)
godfrey he created FLINK-7243:
-

 Summary: Add ParquetInputFormat
 Key: FLINK-7243
 URL: https://issues.apache.org/jira/browse/FLINK-7243
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he
Assignee: godfrey he






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)