[jira] [Created] (FLINK-6620) Add KeyGroupCheckpointedOperator interface that works for checkpointing key-groups

2017-05-17 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-6620:
---

 Summary: Add KeyGroupCheckpointedOperator interface that works for 
checkpointing key-groups
 Key: FLINK-6620
 URL: https://issues.apache.org/jira/browse/FLINK-6620
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Reporter: Jingsong Lee
Priority: Minor


[~aljoscha] We have discussed it on: 
https://issues.apache.org/jira/browse/BEAM-1393

{code}
/**
 * This interface is used to checkpoint key-groups state.
 */
public interface KeyGroupCheckpointedOperator extends KeyGroupRestoringOperator{
  /**
   * Snapshots the state for a given {@code keyGroupIdx}.
   *
   * AbstractStreamOperator would call this hook in
   * AbstractStreamOperator.snapshotState() while iterating over the key groups.
   * @param keyGroupIndex the id of the key-group to be put in the snapshot.
   * @param out the stream to write to.
   */
  void snapshotKeyGroupState(int keyGroupIndex, DataOutputStream out) throws 
Exception;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6619) Check Table API & SQL support for 1.3.0 RC01 Release

2017-05-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-6619:
--

 Summary: Check Table API & SQL support for 1.3.0 RC01 Release
 Key: FLINK-6619
 URL: https://issues.apache.org/jira/browse/FLINK-6619
 Project: Flink
  Issue Type: Test
  Components: Table API & SQL
Affects Versions: 1.3.0
Reporter: sunjincheng
Assignee: sunjincheng


In this JIRA. I will do the following tasks for Flink 1.3.0 RC01 Release.
* Check that the JAVA and SCALA logical plans are consistent.
* Check that the SQL and Table API logical plans are consistent.
* Check that UDF, UDTF, and UDAF are working properly in group-windows and 
over-windows.
* Check that all built-in Agg on Batch and Stream are working properly.

When I do the task above, I'll created some sub-task.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Proposal about inner join in Flink

2017-05-17 Thread Hongyuhong
Hi Xingcan,
Thanks for the proposal.
I have glanced at the design document but not detailedly. The semantics of 
Record-to-window Join is already in progress in FLINK-5725,.
It would be great if you can share your ideas about the implementions.

Thanks very much.
Yuhong

-邮件原件-
发件人: Shaoxuan Wang [mailto:wshaox...@gmail.com] 
发送时间: 2017年5月17日 22:48
收件人: Dev
主题: Re: Proposal about inner join in Flink

Hello Xingcan,
Thanks for the proposal. It seems (I may miss something) the proposed 
semantics for unbounded inner join is similar as the one proposed in 
FLINK-5878.
I did not create the PR for FLINK-5878, as the implementation for inner join is 
closely associated with "Retract" (proposed in FLINK-6047).
I have not completely read through your doc, but the window-joins that you 
mentioned are definitely the topics that we are also interested. Will read it 
carefully and left comments on your doc. Thanks!

Regards,
Shaoxuan


On Wed, May 17, 2017 at 8:56 PM, Xingcan Cui  wrote:

> Hi everyone,
>
> Recently, I drafted a proposal about inner join in Flink ( 
> http://goo.gl/4AdR7h).
>
> This document reviews some related work on the Table/SQL topic and it 
> provides a relatively complete view about the inner join semantics and 
> implementation. Besides, I also share my (objective) thoughts about 
> unifying the batch/stream query processing.
>
> I know there are lots of developers who are interested in this subject.
> Please share your ideas and all suggestions are welcome.
>
> Thanks,
> Xingcan
>


[jira] [Created] (FLINK-6618) Fix `GroupWindow` JAVA logical plans not consistent with SCALA logical plans.

2017-05-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-6618:
--

 Summary: Fix `GroupWindow` JAVA logical plans not consistent with 
SCALA logical plans.
 Key: FLINK-6618
 URL: https://issues.apache.org/jira/browse/FLINK-6618
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.3.0
Reporter: sunjincheng
Assignee: sunjincheng


I find 2 bugs as follows:
1. `GroupWindowStringExpressionTest` testcase bug, 
   `Assert.assertEquals("Logical Plans do not match", resJava.logicalPlan, 
resJava.logicalPlan)` -> `Assert.assertEquals("Logical Plans do not match", 
resJava.logicalPlan, resScala.logicalPlan)`
2. When i fix the bug above, we got anther bug:
{code}
java.lang.AssertionError: Logical Plans do not match 
Expected :Project(ListBuffer('string, 'TMP_4, 'TMP_5, 'TMP_6, ('TMP_7 * 2) as 
'_c4),WindowAggregate(List('string),SlidingGroupWindow('w, 'rowtime, 
1440.millis, 720.millis),List(),List(CountAggFunction(List('string)) as 
'TMP_4, sum('int) as 'TMP_5, WeightedAvg(List('long, 'int)) as 'TMP_6, 
WeightedAvg(List('int, 'int)) as 'TMP_7),Project(ArrayBuffer('string, 'int, 
'long, 
'rowtime),CatalogNode(WrappedArray(_DataStreamTable_0),RecordType(INTEGER int, 
BIGINT long, VARCHAR(2147483647) string, TIMESTAMP(3) rowtime)
Actual   :Project(ListBuffer('string, 'TMP_0, 'TMP_1, 'TMP_2, ('TMP_3 * 2) as 
'_c4),WindowAggregate(ArrayBuffer('string),SlidingGroupWindow('w, 'rowtime, 
1440.millis, 
720.millis),List(),List(CountAggFunction(ArrayBuffer('string)) as 'TMP_0, 
sum('int) as 'TMP_1, WeightedAvg(ArrayBuffer('long, 'int)) as 'TMP_2, 
WeightedAvg(ArrayBuffer('int, 'int)) as 'TMP_3),Project(ArrayBuffer('string, 
'int, 'long, 
'rowtime),CatalogNode(WrappedArray(_DataStreamTable_0),RecordType(INTEGER int, 
BIGINT long, VARCHAR(2147483647) string, TIMESTAMP(3) rowtime)
 {code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6617) Improve JAVA and SCALA logical plans consistent test

2017-05-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-6617:
--

 Summary: Improve JAVA and SCALA logical plans consistent test
 Key: FLINK-6617
 URL: https://issues.apache.org/jira/browse/FLINK-6617
 Project: Flink
  Issue Type: Test
  Components: Table API & SQL
Affects Versions: 1.3.0
Reporter: sunjincheng
Assignee: sunjincheng


Currently,we need some `StringExpression` test,for all JAVA and SCALA API.
Such as:`GroupAggregations`,`GroupWindowAggregaton`(Session,Tumble),`Calc` etc.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


RE: ListState to List

2017-05-17 Thread Radu Tudoran
Hi Aljoscha,

Thanks for the clarification. I understand that there might be advantages in 
some cases not to have the List-like interface, while in other scenarios (like 
the one I described there aren't). Considering this, why not having 2 type of 
states: ListState and StreamInListState - users would use the one it is more 
appropriate. What do you think?

-Original Message-
From: Aljoscha Krettek [mailto:aljos...@apache.org] 
Sent: Thursday, May 18, 2017 12:15 AM
To: dev@flink.apache.org
Subject: Re: ListState to List

Hi,
The interface is restrictive on purpose because depending on the state backend 
it might not be possible to provide a List-like interface. There might be state 
backends that stream in the list from somewhere else or other restrictions. If 
we now allowed a more general interface here we would possibly prevent 
optimisations in the future or make certain implementations very hard to to 
efficiently.

Best,
Aljoscha

> On 16. May 2017, at 21:56, Radu Tudoran  wrote:
> 
> Hi,
> 
> I would like to work with ListState, more specifically I would need to access 
> the contents and sort them. For this I would need a collection type (e.g., 
> the List, Array...).
> However, I see that if I have a variable of type <> the 
> only interfaces I have are:
> state.get -> which returns an Iterable Or state.get.getIterator which 
> returns an Iterator
> 
> Basically if I use any of these I need now to copy the contents in an actual 
> List of Array.  Is there any way to avoid this? ..perhaps there is an 
> implicit type that I can convert to...
> 



Re: ListState to List

2017-05-17 Thread Aljoscha Krettek
Hi,
The interface is restrictive on purpose because depending on the state backend 
it might not be possible to provide a List-like interface. There might be state 
backends that stream in the list from somewhere else or other restrictions. If 
we now allowed a more general interface here we would possibly prevent 
optimisations in the future or make certain implementations very hard to to 
efficiently.

Best,
Aljoscha

> On 16. May 2017, at 21:56, Radu Tudoran  wrote:
> 
> Hi,
> 
> I would like to work with ListState, more specifically I would need to access 
> the contents and sort them. For this I would need a collection type (e.g., 
> the List, Array...).
> However, I see that if I have a variable of type <> the 
> only interfaces I have are:
> state.get -> which returns an Iterable
> Or
> state.get.getIterator which returns an Iterator
> 
> Basically if I use any of these I need now to copy the contents in an actual 
> List of Array.  Is there any way to avoid this? ..perhaps there is an 
> implicit type that I can convert to...
> 



Re: [DISCUSS] Release 1.3.0 RC1 (Non voting, testing release candidate)

2017-05-17 Thread Eron Wright
Robert, please add FLINK-6606 to the list of JIRAs that you're tracking,
thanks.

On Tue, May 16, 2017 at 8:30 AM, Robert Metzger  wrote:

> I totally forgot to post a document with testing tasks in the RC0 thread,
> so I'll do it in the RC1 thread.
>
> Please use this document:
> https://docs.google.com/document/d/11WCfV15VwQNF-
> Rar4E0RtWiZw1ddEbg5WWf4RFSQ_2Q/edit#
>
> If I have the feeling that not enough people are seeing the document, I'll
> write a dedicated email to user@ and dev@ :)
>
>
> On Tue, May 16, 2017 at 9:26 AM, Robert Metzger 
> wrote:
>
> > Thanks for the pointer. I'll keep an eye on the JIRA.
> >
> > I've gone through the JIRAs tagged with 1.3.0 yesterday to create a list
> > of new features in 1.3. Feel free to add more / change it in the wiki:
> > https://cwiki.apache.org/confluence/display/FLINK/
> > Flink+Release+and+Feature+Plan#FlinkReleaseandFeaturePlan-Flink1.3
> >
> > On Mon, May 15, 2017 at 10:29 PM, Gyula Fóra 
> wrote:
> >
> >> Thanks Robert,
> >>
> >> Just for the record I think there are still some problems with
> incremental
> >> snapshots, I think Stefan is still working on it.
> >>
> >> I added some comments to https://issues.apache.org/
> jira/browse/FLINK-6537
> >>
> >> Gyula
> >>
> >> Robert Metzger  ezt írta (időpont: 2017. máj. 15.,
> >> H,
> >> 19:41):
> >>
> >> > Hi Devs,
> >> >
> >> > This is the second non-voting RC. The last RC had some big issues,
> >> making
> >> > it hard to start Flink locally. I hope this RC proves to be more
> stable.
> >> >
> >> > I hope to create the first voting RC by end of this week.
> >> >
> >> > -
> >> >
> >> > The release commit is 3659a82f553fedf8afe8b5fae75922075fe17e85
> >> >
> >> > The artifacts are located here:
> >> > http://people.apache.org/~rmetzger/flink-1.3.0-rc1/
> >> >
> >> > The maven staging repository is here:
> >> > https://repository.apache.org/content/repositories/
> orgapacheflink-1119
> >> >
> >> > -
> >> >
> >> > Happy testing!
> >> >
> >> > Regards,
> >> > Robert
> >> >
> >>
> >
> >
>


[jira] [Created] (FLINK-6616) Clarify provenance of official Docker images

2017-05-17 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-6616:
-

 Summary: Clarify provenance of official Docker images
 Key: FLINK-6616
 URL: https://issues.apache.org/jira/browse/FLINK-6616
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.3.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Critical
 Fix For: 1.3.0


Note that the official Docker images for Flink are community supported and not 
an official release of the Apache Flink PMC.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Official Flink Docker images

2017-05-17 Thread Henry Saputra
We already have a Dockerfile in our source repo as part of simple test:
  https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
but it was never automatically build by our build system AFAIK.

- Henry

On Tue, May 16, 2017 at 8:58 AM, Robert Metzger  wrote:

> How hard would it be to integrate the docker images into the Flink release
> process?
>
> Ideally, we could provide something like a staging directory for the docker
> images, so that we can include them into the vote.
> Once the vote has passed, the images will be made public through docker hub
> and apache.
>
> On Tue, May 16, 2017 at 3:12 PM, Till Rohrmann 
> wrote:
>
> > Great to hear Ismaël. This will make running Flink even easier :-)
> >
> > Cheers,
> > Till
> >
> > On Tue, May 16, 2017 at 1:38 PM, Ismaël Mejía  wrote:
> >
> > > As a follow up for this thread, the docker official images are out
> > > since last week ago so you please guys go ahead and try them. The blog
> > > post that presents them was published today.
> > >
> > > https://flink.apache.org/news/2017/05/16/official-docker-image.html
> > >
> > > On Sun, May 7, 2017 at 3:51 PM, Ismaël Mejía 
> wrote:
> > > > Sure we would do it. I will sync with Patrick to do this quickly.
> > > >
> > > > On May 5, 2017 3:45 PM, "Robert Metzger" 
> wrote:
> > > >>
> > > >> Thanks a lot for your efforts!
> > > >>
> > > >> Maybe we can even put a small blog post on the Flink post once its
> > > >> available on docker hub.
> > > >>
> > > >> On Fri, Apr 28, 2017 at 11:00 AM, Ismaël Mejía 
> > > wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > I am absolutely happy to see that this is finally happening!
> > > >> > We have a really neat image right now and it is great that it will
> > be
> > > >> > soon so easy to use.
> > > >> >
> > > >> > One extra thing to mention is that Flink will have now two docker
> > > >> > images, one based on debian and the other one based on Alpine as
> > most
> > > >> > official java-based projects do.
> > > >> >
> > > >> > In the future we expect to improve the documentation on how to use
> > the
> > > >> > image with kubernetes and continue improving the actual
> > documentation
> > > >> > with docker. If anyone wants to join to also document something or
> > add
> > > >> > any improvement/feature you need, you are all welcome.
> > > >> >
> > > >> > Finally, I would also like to thank Maximilian Michels which
> > > >> > contributed and reviewed some of my early changes on the image.
> > > >> >
> > > >> > Regards,
> > > >> > Ismaël
> > > >> >
> > > >> > ps. We will 'announce' when the official images are available on
> > > >> > docker hub, so everyone can start to use them.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, Apr 27, 2017 at 1:38 PM, Patrick Lucas
> > > >> >  wrote:
> > > >> > > I've been informed that images don't make it through the list!
> > > >> > >
> > > >> > > You can see the aforementioned squirrel here
> > > >> > >  > > >> > uploads/8/7/6/3/9/ar12988558393678.JPG>
> > > >> > > .
> > > >> > >
> > > >> > > --
> > > >> > > Patrick Lucas
> > > >> > >
> > > >> > > On Thu, Apr 27, 2017 at 12:21 PM, Patrick Lucas <
> > > >> > patr...@data-artisans.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> As part of an ongoing effort to improve the experience of using
> > > Flink
> > > >> > >> on
> > > >> > >> Docker, some work has been done over the last two months to
> > publish
> > > >> > >> official Flink Docker images to Docker Hub. The goal in the
> short
> > > >> > >> term
> > > >> > is
> > > >> > >> to make running a simple Flink cluster (almost) as easy as
> > running
> > > >> > docker
> > > >> > >> run flink. In the long term, we would like these images to be
> > good
> > > >> > enough
> > > >> > >> to use in production directly, or as base images for use in an
> > > >> > >> existing
> > > >> > >> Docker workflow.
> > > >> > >>
> > > >> > >> Flink 1.2.1 has some fixes over the last few releases that make
> > > >> > >> running
> > > >> > it
> > > >> > >> on Docker nicer—and in some cases, possible. Notably,
> FLINK-2821
> > > >> > >>  allowed
> > > Dockerized
> > > >> > >> Flink to run across multiple hosts, and FLINK-4326
> > > >> > >>  added an
> > > option to
> > > >> > run
> > > >> > >> Flink in the foreground, which is greatly preferred when
> running
> > in
> > > >> > Docker.
> > > >> > >>
> > > >> > >> We (Ismaël Mejía and myself, with some discussion with Stephan
> > > Ewen)
> > > >> > >> decided it made sense to bring the actual Dockerfiles outside
> of
> > > the
> > > >> > Apache
> > > >> > >> Flink git repo, primarily to conform with every other Apache
> > > project
> > > >> > that
> > > >> > >> has official images, 

Re: Proposal about inner join in Flink

2017-05-17 Thread Shaoxuan Wang
Hello Xingcan,
Thanks for the proposal. It seems (I may miss something) the proposed
semantics for unbounded inner join is similar as the one proposed
in FLINK-5878.
I did not create the PR for FLINK-5878, as the implementation for inner
join is closely associated with "Retract" (proposed in FLINK-6047).
I have not completely read through your doc, but the window-joins that you
mentioned are definitely the topics that we are also interested. Will read
it carefully and left comments on your doc. Thanks!

Regards,
Shaoxuan


On Wed, May 17, 2017 at 8:56 PM, Xingcan Cui  wrote:

> Hi everyone,
>
> Recently, I drafted a proposal about inner join in Flink (
> http://goo.gl/4AdR7h).
>
> This document reviews some related work on the Table/SQL topic and it
> provides a relatively complete view about the inner join semantics and
> implementation. Besides, I also share my (objective) thoughts about
> unifying the batch/stream query processing.
>
> I know there are lots of developers who are interested in this subject.
> Please share your ideas and all suggestions are welcome.
>
> Thanks,
> Xingcan
>


[jira] [Created] (FLINK-6615) tmp directory not cleaned up on shutdown

2017-05-17 Thread Andrey (JIRA)
Andrey created FLINK-6615:
-

 Summary: tmp directory not cleaned up on shutdown
 Key: FLINK-6615
 URL: https://issues.apache.org/jira/browse/FLINK-6615
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Andrey


Steps to reproduce:
1) Stop task manager gracefully (kill -6 )
2) In the logs:
{code}
2017-05-17 13:35:50,147 INFO  org.apache.zookeeper.ClientCnxn   
- EventThread shut down [main-EventThread]
2017-05-17 13:35:50,200 ERROR 
org.apache.flink.runtime.io.disk.iomanager.IOManager  - IOManager 
failed to properly clean up temp file directory: 
/mnt/flink/tmp/flink-io-66f1d0ec-8976-41bf-9575-f80b181b0e47 
[flink-akka.actor.default-dispatcher-2]
java.nio.file.DirectoryNotEmptyException: 
/mnt/flink/tmp/flink-io-66f1d0ec-8976-41bf-9575-f80b181b0e47
at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.flink.util.FileUtils.deleteDirectory(FileUtils.java:154)
at 
org.apache.flink.runtime.io.disk.iomanager.IOManager.shutdown(IOManager.java:109)
at 
org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.shutdown(IOManagerAsync.java:185)
at 
org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:241)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
{code}

Expected:
* on shutdown delete non-empty directory anyway. 

Notes:
* after process terminated, I've checked 
"/mnt/flink/tmp/flink-io-66f1d0ec-8976-41bf-9575-f80b181b0e47" directory and 
didn't find anything there. So it looks like timing issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Proposal about inner join in Flink

2017-05-17 Thread Xingcan Cui
Hi everyone,

Recently, I drafted a proposal about inner join in Flink (
http://goo.gl/4AdR7h).

This document reviews some related work on the Table/SQL topic and it
provides a relatively complete view about the inner join semantics and
implementation. Besides, I also share my (objective) thoughts about
unifying the batch/stream query processing.

I know there are lots of developers who are interested in this subject.
Please share your ideas and all suggestions are welcome.

Thanks,
Xingcan


[jira] [Created] (FLINK-6614) Applying function on window auxiliary function fails

2017-05-17 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-6614:


 Summary: Applying function on window auxiliary function fails
 Key: FLINK-6614
 URL: https://issues.apache.org/jira/browse/FLINK-6614
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.3.0, 1.4.0
Reporter: Fabian Hueske


SQL queries that apply a function or expression on a window auxiliary function 
({{TUMBLE_START}}, {{TUMBLE_END}}, {{HOP_START}}, etc). cannot be translated 
and fail with a {{CodeGenException}}:
 
{code}
Exception in thread "main" org.apache.flink.table.codegen.CodeGenException: 
Unsupported call: TUMBLE_END
{code}

Example query:

{code}
SELECT 
  a, 
  toLong(TUMBLE_END(rowtime, INTERVAL '10' MINUTE)) AS t, 
  COUNT(b) AS cntB
FROM myTable
GROUP BY a, TUMBLE(rowtime, INTERVAL '10' MINUTE)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6613) OOM during reading big messages from Kafka

2017-05-17 Thread Andrey (JIRA)
Andrey created FLINK-6613:
-

 Summary: OOM during reading big messages from Kafka
 Key: FLINK-6613
 URL: https://issues.apache.org/jira/browse/FLINK-6613
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Andrey


Steps to reproduce:
1) Setup Task manager with 2G heap size
2) Setup job that reads messages from Kafka 10 (i.e. FlinkKafkaConsumer010)
3) Send 3300 messages each 635Kb. So total size is ~2G
4) OOM in task manager.

According to heap dump:
1) KafkaConsumerThread read messages with total size ~1G.
2) Pass them to the next operator using 
org.apache.flink.streaming.connectors.kafka.internal.Handover
3) Then began to read another batch of messages. 
4) Task manager was able to read next batch of ~500Mb messages until OOM.

Expected:
1) Either have constraint like "number of messages in-flight" OR
2) Read next batch of messages only when previous batch processed OR
3) Any other option which will solve OOM.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6612) ZooKeeperStateHandleStore does not guard against concurrent delete operations

2017-05-17 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6612:


 Summary: ZooKeeperStateHandleStore does not guard against 
concurrent delete operations
 Key: FLINK-6612
 URL: https://issues.apache.org/jira/browse/FLINK-6612
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination, State Backends, Checkpointing
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Critical
 Fix For: 1.3.0, 1.4.0


The {{ZooKeeperStateHandleStore}} does not guard against concurrent delete 
operations which could happen in case of a lost leadership and a new leadership 
grant. The problem is that checkpoint nodes can get deleted even after they 
have been recovered by another {{ZooKeeperCompletedCheckpointStore}}. This 
corrupts the recovered checkpoint and thwarts future recoveries.

I propose to add reference counting to the {{ZooKeeperStateHandleStore}}. That 
way, we can monitor how many concurrent processes have a hold on a given 
checkpoint node. Only if the reference count reaches {{0}}, we are allowed to 
delete the checkpoint node and dispose the checkpoint data.

Stephan proposed to use ephemeral child nodes to track the reference count of a 
checkpoint node. That way we are sure that locks on the a checkpoint node are 
released in case of {{JobManager}} failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6611) When TaskManager or JobManager crash

2017-05-17 Thread Mauro Cortellazzi (JIRA)
Mauro Cortellazzi created FLINK-6611:


 Summary: When TaskManager or JobManager crash
 Key: FLINK-6611
 URL: https://issues.apache.org/jira/browse/FLINK-6611
 Project: Flink
  Issue Type: Bug
  Components: Startup Shell Scripts
Reporter: Mauro Cortellazzi
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6610) WebServer could not be created,when set the "jobmanager.web.submit.enable" to false

2017-05-17 Thread zhihao chen (JIRA)
zhihao chen created FLINK-6610:
--

 Summary: WebServer could not be created,when set the 
"jobmanager.web.submit.enable" to false
 Key: FLINK-6610
 URL: https://issues.apache.org/jira/browse/FLINK-6610
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 1.3.0
Reporter: zhihao chen
Assignee: zhihao chen


WebServer could not be created,when set the "jobmanager.web.submit.enable" to 
false  
because the WebFrontendBootstrap will check uploadDir not allow be null 
this.uploadDir = Preconditions.checkNotNull(directory);
{code}
2017-05-17 15:15:46,938 ERROR 
org.apache.flink.runtime.webmonitor.WebMonitorUtils   - WebServer could 
not be created
java.lang.NullPointerException
at 
org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:58)
at 
org.apache.flink.runtime.webmonitor.utils.WebFrontendBootstrap.(WebFrontendBootstrap.java:73)
at 
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor.(WebRuntimeMonitor.java:359)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.flink.runtime.webmonitor.WebMonitorUtils.startWebRuntimeMonitor(WebMonitorUtils.java:135)
at 
org.apache.flink.runtime.clusterframework.BootstrapTools.createWebMonitorIfConfigured(BootstrapTools.java:242)
at 
org.apache.flink.yarn.YarnApplicationMasterRunner.runApplicationMaster(YarnApplicationMasterRunner.java:352)
at 
org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:195)
at 
org.apache.flink.yarn.YarnApplicationMasterRunner$1.call(YarnApplicationMasterRunner.java:192)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at 
org.apache.flink.yarn.YarnApplicationMasterRunner.run(YarnApplicationMasterRunner.java:192)
at 
org.apache.flink.yarn.YarnApplicationMasterRunner.main(YarnApplicationMasterRunner.java:116)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6609) Wrong version assignment when multiple TAKEs transitions

2017-05-17 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-6609:
---

 Summary: Wrong version assignment when multiple TAKEs transitions
 Key: FLINK-6609
 URL: https://issues.apache.org/jira/browse/FLINK-6609
 Project: Flink
  Issue Type: Bug
  Components: CEP
Affects Versions: 1.3.0
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
Priority: Blocker
 Fix For: 1.3.0


This test fails due to wrong version assignment for TAKEs from the same state.

{code}
@Test
public void testMultipleTakesVersionCollision() {
List inputEvents = new ArrayList<>();

Event startEvent = new Event(40, "c", 1.0);
Event middleEvent1 = new Event(41, "a", 2.0);
Event middleEvent2 = new Event(41, "a", 3.0);
Event middleEvent3 = new Event(41, "a", 4.0);
Event middleEvent4 = new Event(41, "a", 5.0);
Event middleEvent5 = new Event(41, "a", 6.0);
Event end = new Event(44, "b", 5.0);

inputEvents.add(new StreamRecord<>(startEvent, 1));
inputEvents.add(new StreamRecord<>(middleEvent1, 3));
inputEvents.add(new StreamRecord<>(middleEvent2, 4));
inputEvents.add(new StreamRecord<>(middleEvent3, 5));
inputEvents.add(new StreamRecord<>(middleEvent4, 6));
inputEvents.add(new StreamRecord<>(middleEvent5, 7));
inputEvents.add(new StreamRecord<>(end, 10));

Pattern pattern = 
Pattern.begin("start").where(new SimpleCondition() {
private static final long serialVersionUID = 
5726188262756267490L;

@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("c");
}
}).followedBy("middle1").where(new SimpleCondition() {
private static final long serialVersionUID = 
5726188262756267490L;

@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("a");
}

}).oneOrMore().allowCombinations().followedBy("middle2").where(new 
SimpleCondition() {
private static final long serialVersionUID = 
5726188262756267490L;

@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("a");
}
}).oneOrMore().allowCombinations().followedBy("end").where(new 
SimpleCondition() {
private static final long serialVersionUID = 
5726188262756267490L;

@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("b");
}
});

NFA nfa = NFACompiler.compile(pattern, 
Event.createTypeSerializer(), false);

final List resultingPatterns = 
feedNFA(inputEvents, nfa);

compareMaps(resultingPatterns, Lists.newArrayList(

Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, middleEvent5, end),

Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent3, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent3, middleEvent4, middleEvent5, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent4, end),
Lists.newArrayList(startEvent, middleEvent1, 
middleEvent2, middleEvent3, middleEvent5, end),
Lists.newArrayList(startEvent, 

[jira] [Created] (FLINK-6608) Relax Kerberos login contexts parsing by trimming whitespaces in contexts list

2017-05-17 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-6608:
--

 Summary: Relax Kerberos login contexts parsing by trimming 
whitespaces in contexts list
 Key: FLINK-6608
 URL: https://issues.apache.org/jira/browse/FLINK-6608
 Project: Flink
  Issue Type: Improvement
  Components: Configuration, Security
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
Priority: Minor
 Fix For: 1.3.0, 1.4.0


The Kerberos login contexts list parsing right now isn't quite user-friendly.
The list must be provided as: {{security.kerberos.login.contexts: 
Client,KafkaClient}}, without any whitespace in between.

We can relax this to be more user-friendly by trimming any whitespaces in the 
list.

A user actually stumbled across this: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problems-with-Kerberos-Kafka-connection-in-version-1-2-0-td12580.html#a12589



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Flink issue

2017-05-17 Thread Till Rohrmann
Hi Francisco,

is there anything suspicious in the logs files of the Taskmanager and the
JobManager?

Cheers,
Till


On Tue, May 16, 2017 at 5:49 PM, Francisco Alves  wrote:

> Hi,
>
> I'm running flink in a yarn session, and when a try to cancel a running
> program with parameters "-yid" and "job id", the flink job state remains in
> canceling mode and the resources remains occupied. A similar issue at
> http://stackoverflow.com/questions/34227470/flink-cancel-command-does-not-
> cancel-a-running-task. Any idea why?
>
>
>
> Thanks,
>
> Francisco
>
>
>  cancel-command-does-not-cancel-a-running-task>
>