contributor permission

2019-08-01 Thread sss
|
Hi,

I want to contribute to Apache Flink.
Would you please give methe contributor permission?
My JIRA ID iszhangyongchao
|

Flink issue

2019-08-01 Thread Karthick Thanigaimani
Hi Team,
We are facing frequent issues with the Flink job manager in one environment 
when the processing happens.
CHAIN Join(Remap EDGES id: TO) -> Map (Key Extractor) -> Combine (Deduplicate 
edges including bi-directional edges) (57/80)Timestamp: 2019-08-02, 4:13:25 
Location: flink-taskmanager-c:6121

We have tried changing the EC2 sizes to a bigger one and increased heap size 
etc but still the same problem. The below is the error message that we see 
Could someone provide some guidance.

org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Lost connection to task manager 'flink-taskmanager-b/2:6121'. This 
indicates that the remote task manager was lost. at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:146)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java.lang.Thread.run(Thread.java:748)Caused by: java.io.IOException: 
Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) 
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at 
sun.nio.ch.IOUtil.read(IOUtil.java:192) at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at 
org.apache.flink.shaded.netty4.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
 at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
 ... 6 more


RegardsKarthtick

Re: [DISCUSS] CPU flame graph for a job vertex in web UI.

2019-08-01 Thread Jark Wu
Hi David,

The demo looks charming! I think it will definitely help a lot when
performance tuning.
A big +1 for this.

I cc-ed Yadong who's one of the main contributors of the new Web UI.
Maybe he can give some help on the front end.

Regards,
Jark

On Fri, 2 Aug 2019 at 04:26, David Morávek  wrote:

> Hi Till, thanks for the feedback! These endpoints are only called when the
> vertex is selected in the UI, so there should be any heavy RPC load. For
> back-pressure, we only sample top 3 calls of the stack (depth = 3). For the
> flame-graph, we want to sample the whole stack trace and we need different
> sampling rate (longer period, more samples). Those are the main reasons to
> split these in two "trackers", but I may be missing something.
>
> I've prepared a little demo, so others can have a better idea of what I
> have in mind.
>
> https://youtu.be/GUNDehj9z9o
>
> Please note that this is a proof of concept and I'm not frontend person, so
> it may look little clumsy :)
>
> D.
>
> On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann 
> wrote:
>
> > Hi David,
> >
> > thanks for starting this discussion. I like the idea of improving
> insights
> > into Flink's execution and I believe that a flame graph could be helpful.
> >
> > I quickly glanced over your changes and I think they go in a good
> > direction. One idea could be to share the `StackTraceSample` produced by
> > the `StackTraceSampleCoordinator` between the different
> > `StackTraceOperatorTracker` so that we don't send multiple requests for
> the
> > same operators. That way we would decrease a bit the RPC load.
> >
> > Apart from that, I think the next steps would be to find a committer who
> > could shepherd this effort and help you with merging it.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jul 31, 2019 at 7:05 PM David Morávek  wrote:
> >
> > > Hello,
> > >
> > > While looking into Flink internals, I've noticed that there is already
> a
> > > mechanism for stack-trace sampling of a particular job vertex.
> > >
> > > I think it may be really useful to allow user to easily render a cpu
> > > flamegraph  in a new UI
> > for
> > > a
> > > selected vertex (new tab next to back pressure) of a running job. Back
> > > pressure tab already provides a good idea of which vertex causes
> trouble,
> > > but it's hard to say what's actually going on.
> > >
> > > I've tried to implement a basic REST endpoint
> > > <
> > >
> >
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> > > >,
> > > that prepares data for the flame graph rendering and it seems to be
> > > providing good insight.
> > >
> > > It should be straightforward to render data from the endpoint in new UI
> > > using existing  javascript
> > > libraries.
> > >
> > > WDYT? Is this worth pushing forward?
> > >
> > > D.
> > >
> >
>


Re: [RESULT][VOTE] Migrate to sponsored Travis account

2019-08-01 Thread Jark Wu
Hi Chesnay,

Can we assign Flink Committers the permission of flink-ci/flink repo?
Several times, when I pushed some new commits, the old build jobs are still
in pending and not canceled.
Before we fix that, we can manually cancel some old jobs to save build
resource.

Best,
Jark


On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler  wrote:

> Your best bet would be to check the first commit in the PR and check the
> parent commit.
>
> To re-run things, you will have to rebase the PR on the latest master.
>
> On 10/07/2019 03:32, Kurt Young wrote:
> > Thanks for all your efforts Chesnay, it indeed improves a lot for our
> > develop experience. BTW, do you know how to find the master branch
> > information which the CI runs with?
> >
> > For example, like this one:
> > https://travis-ci.com/flink-ci/flink/jobs/214542568
> > It shows pass with the commits, which rebased on the master when the CI
> > is triggered. But it's both possible that the master branch CI runs on is
> > the
> > same or different with current master. If it's the same, I can simply
> rely
> > on the
> > passed information to push commits, but if it's not, I think i should
> find
> > another
> > way to re-trigger tests based on the newest master.
> >
> > Do you know where can I get such information?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler 
> wrote:
> >
> >> The kinks have been worked out; the bot is running again and pr builds
> >> are yet again no longer running on ASF resources.
> >>
> >> PRs are mirrored to: https://github.com/flink-ci/flink
> >> Bot source: https://github.com/flink-ci/ci-bot
> >>
> >> On 08/07/2019 17:14, Chesnay Schepler wrote:
> >>> I have temporarily re-enabled running PR builds on the ASF account;
> >>> migrating to the Travis subscription caused some issues in the bot
> >>> that I have to fix first.
> >>>
> >>> On 07/07/2019 23:01, Chesnay Schepler wrote:
>  The vote has passed unanimously in favor of migrating to a separate
>  Travis account.
> 
>  I will now set things up such that no PullRequest is no longer run on
>  the ASF servers.
>  This is a major setup in reducing our usage of ASF resources.
>  For the time being we'll use free Travis plan for flink-ci (i.e. 5
>  workers, which is the same the ASF gives us). Over the course of the
>  next week we'll setup the Ververica subscription to increase this
> limit.
> 
>   From now now, a bot will mirror all new and updated PullRequests to a
>  mirror repository (https://github.com/flink-ci/flink-ci) and write an
>  update into the PR once the build is complete.
>  I have ran the bots for the past 3 days in parallel to our existing
>  Travis and it was working without major issues.
> 
>  The biggest change that contributors will see is that there's no
>  longer a icon next to each commit. We may revisit this in the future.
> 
>  I'll setup a repo with the source of the bot later.
> 
>  On 04/07/2019 10:46, Chesnay Schepler wrote:
> > I've raised a JIRA
> > with INFRA to
> > inquire whether it would be possible to switch to a different Travis
> > account, and if so what steps would need to be taken.
> > We need a proper confirmation from INFRA since we are not in full
> > control of the flink repository (for example, we cannot access the
> > settings page).
> >
> > If this is indeed possible, Ververica is willing sponsor a Travis
> > account for the Flink project.
> > This would provide us with more than enough resources than we need.
> >
> > Since this makes the project more reliant on resources provided by
> > external companies I would like to vote on this.
> >
> > Please vote on this proposal, as follows:
> > [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > account, provided that INFRA approves
> > [ ] -1, Do not approach the migration to a Ververica-sponsored
> > Travis account
> >
> > The vote will be open for at least 24h, and until we have
> > confirmation from INFRA. The voting period may be shorter than the
> > usual 3 days since our current is effectively not working.
> >
> > On 04/07/2019 06:51, Bowen Li wrote:
> >> Re: > Are they using their own Travis CI pool, or did the switch to
> >> an entirely different CI service?
> >>
> >> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> >> currently moving away from ASF's Travis to their own in-house metal
> >> machines at [1] with custom CI application at [2]. They've seen
> >> significant improvement w.r.t both much higher performance and
> >> basically no resource waiting time, "night-and-day" difference
> >> quoting Wes.
> >>
> >> Re: > If we can just switch to our own Travis pool, just for our
> >> project, then this might be something we can 

[Question] What is the difference between Embedded and SingleLeaderElectionService?

2019-08-01 Thread Zili Chen
Hi devs,

I found that these two classes are quite similar except
SingleLeaderElectionService has a pre-config leader id.

However, I don't see use points of that leader id. Also
a random UUID would work as a DEFAULT_LEADER_ID(0).
I consider whether we could replace SingleLeaderElectionService
with EmbeddedLeaderElectionService, or merge.

Best,
tison.


Re: instable checkpointing after migration to flink 1.8

2019-08-01 Thread Congxian Qiu
cc Bekir

Best,
Congxian


Congxian Qiu  于2019年8月2日周五 下午12:23写道:

> Hi Bekir
> I’ll first summary the problem here(please correct me if I’m wrong)
> 1. The same program runs on 1.6 never encounter such problems
> 2. Some checkpoints completed too long (15+ min), but other normal
> checkpoints complete less than 1 min
> 3. Some  bad checkpoint will have a large sync time, async time seems ok
> 4. Some bad checkpoint, the e2e duration will much bigger than (sync_time
> + async_time)
> First, answer the last question, the e2e duration is ack_time -
> trigger_time, so it always bigger than (sync_time + async_time), but we
> have a big gap here, this may be problematic.
> According to all the information, maybe the problem is some task start to
> do checkpoint too late and the sync checkpoint part took some time too
> long, Could you please share some more information such below:
> - A Screenshot of summary for one bad checkpoint(we call it A here)
> - The detailed information of checkpoint A(includes all the problematic
> subtasks)
> - Jobmanager.log and the taskmanager.log for the problematic task and a
> health task
> - Share the screenshot of subtasks for the problematic task(includes the
> `Bytes received`, `Records received`, `Bytes sent`, `Records sent` column),
> here wants to compare the problematic parallelism and good parallelism’s
> information, please also share the information is there has a data skew
> among the parallelisms,
> - could you please share some jstacks of the problematic parallelism —
> here wants to check whether the task is too busy to handle the barrier.
> (flame graph or other things is always welcome here)
>
> Best,
> Congxian
>
>
> Congxian Qiu  于2019年8月1日周四 下午8:26写道:
>
>> Hi Bekir
>>
>> I'll first comb through all the information here, and try to find out the
>> reason with you, maybe need you to share some more information :)
>>
>> Best,
>> Congxian
>>
>>
>> Bekir Oguz  于2019年8月1日周四 下午5:00写道:
>>
>>> Hi Fabian,
>>> Thanks for sharing this with us, but we’re already on version 1.8.1.
>>>
>>> What I don’t understand is which mechanism in Flink adds 15 minutes to
>>> the checkpoint duration occasionally. Can you maybe give us some hints on
>>> where to look at? Is there a default timeout of 15 minutes defined
>>> somewhere in Flink? I couldn’t find one.
>>>
>>> In our pipeline, most of the checkpoints complete in less than a minute
>>> and some of them completed in 15 minutes+(less than a minute).
>>> There’s definitely something which adds 15 minutes. This is happening in
>>> one or more subtasks during checkpointing.
>>>
>>> Please see the screenshot below:
>>>
>>> Regards,
>>> Bekir
>>>
>>>
>>>
>>> Op 23 jul. 2019, om 16:37 heeft Fabian Hueske  het
>>> volgende geschreven:
>>>
>>> Hi Bekir,
>>>
>>> Another user reported checkpointing issues with Flink 1.8.0 [1].
>>> These seem to be resolved with Flink 1.8.1.
>>>
>>> Hope this helps,
>>> Fabian
>>>
>>> [1]
>>>
>>> https://lists.apache.org/thread.html/991fe3b09fd6a052ff52e5f7d9cdd9418545e68b02e23493097d9bc4@%3Cuser.flink.apache.org%3E
>>>
>>> Am Mi., 17. Juli 2019 um 09:16 Uhr schrieb Congxian Qiu <
>>> qcx978132...@gmail.com>:
>>>
>>> Hi Bekir
>>>
>>> First of all, I think there is something wrong.  the state size is almost
>>> the same,  but the duration is different so much.
>>>
>>> The checkpoint for RocksDBStatebackend is dump sst files, then copy the
>>> needed sst files(if you enable incremental checkpoint, the sst files
>>> already on remote will not upload), then complete checkpoint. Can you
>>> check
>>> the network bandwidth usage during checkpoint?
>>>
>>> Best,
>>> Congxian
>>>
>>>
>>> Bekir Oguz  于2019年7月16日周二 下午10:45写道:
>>>
>>> Hi all,
>>> We have a flink job with user state, checkpointing to RocksDBBackend
>>> which is externally stored in AWS S3.
>>> After we have migrated our cluster from 1.6 to 1.8, we see occasionally
>>> that some slots do to acknowledge the checkpoints quick enough. As an
>>> example: All slots acknowledge between 30-50 seconds except only one slot
>>> acknowledges in 15 mins. Checkpoint sizes are similar to each other, like
>>> 200-400 MB.
>>>
>>> We did not experience this weird behaviour in Flink 1.6. We have 5 min
>>> checkpoint interval and this happens sometimes once in an hour sometimes
>>> more but not in all the checkpoint requests. Please see the screenshot
>>> below.
>>>
>>> Also another point: For the faulty slots, the duration is consistently 15
>>> mins and some seconds, we couldn’t find out where this 15 mins response
>>> time comes from. And each time it is a different task manager, not always
>>> the same one.
>>>
>>> Do you guys aware of any other users having similar issues with the new
>>> version and also a suggested bug fix or solution?
>>>
>>>
>>>
>>>
>>> Thanks in advance,
>>> Bekir Oguz
>>>
>>>
>>>
>>>


Re: instable checkpointing after migration to flink 1.8

2019-08-01 Thread Congxian Qiu
Hi Bekir
I’ll first summary the problem here(please correct me if I’m wrong)
1. The same program runs on 1.6 never encounter such problems
2. Some checkpoints completed too long (15+ min), but other normal
checkpoints complete less than 1 min
3. Some  bad checkpoint will have a large sync time, async time seems ok
4. Some bad checkpoint, the e2e duration will much bigger than (sync_time +
async_time)
First, answer the last question, the e2e duration is ack_time -
trigger_time, so it always bigger than (sync_time + async_time), but we
have a big gap here, this may be problematic.
According to all the information, maybe the problem is some task start to
do checkpoint too late and the sync checkpoint part took some time too
long, Could you please share some more information such below:
- A Screenshot of summary for one bad checkpoint(we call it A here)
- The detailed information of checkpoint A(includes all the problematic
subtasks)
- Jobmanager.log and the taskmanager.log for the problematic task and a
health task
- Share the screenshot of subtasks for the problematic task(includes the
`Bytes received`, `Records received`, `Bytes sent`, `Records sent` column),
here wants to compare the problematic parallelism and good parallelism’s
information, please also share the information is there has a data skew
among the parallelisms,
- could you please share some jstacks of the problematic parallelism — here
wants to check whether the task is too busy to handle the barrier. (flame
graph or other things is always welcome here)

Best,
Congxian


Congxian Qiu  于2019年8月1日周四 下午8:26写道:

> Hi Bekir
>
> I'll first comb through all the information here, and try to find out the
> reason with you, maybe need you to share some more information :)
>
> Best,
> Congxian
>
>
> Bekir Oguz  于2019年8月1日周四 下午5:00写道:
>
>> Hi Fabian,
>> Thanks for sharing this with us, but we’re already on version 1.8.1.
>>
>> What I don’t understand is which mechanism in Flink adds 15 minutes to
>> the checkpoint duration occasionally. Can you maybe give us some hints on
>> where to look at? Is there a default timeout of 15 minutes defined
>> somewhere in Flink? I couldn’t find one.
>>
>> In our pipeline, most of the checkpoints complete in less than a minute
>> and some of them completed in 15 minutes+(less than a minute).
>> There’s definitely something which adds 15 minutes. This is happening in
>> one or more subtasks during checkpointing.
>>
>> Please see the screenshot below:
>>
>> Regards,
>> Bekir
>>
>>
>>
>> Op 23 jul. 2019, om 16:37 heeft Fabian Hueske  het
>> volgende geschreven:
>>
>> Hi Bekir,
>>
>> Another user reported checkpointing issues with Flink 1.8.0 [1].
>> These seem to be resolved with Flink 1.8.1.
>>
>> Hope this helps,
>> Fabian
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/991fe3b09fd6a052ff52e5f7d9cdd9418545e68b02e23493097d9bc4@%3Cuser.flink.apache.org%3E
>>
>> Am Mi., 17. Juli 2019 um 09:16 Uhr schrieb Congxian Qiu <
>> qcx978132...@gmail.com>:
>>
>> Hi Bekir
>>
>> First of all, I think there is something wrong.  the state size is almost
>> the same,  but the duration is different so much.
>>
>> The checkpoint for RocksDBStatebackend is dump sst files, then copy the
>> needed sst files(if you enable incremental checkpoint, the sst files
>> already on remote will not upload), then complete checkpoint. Can you
>> check
>> the network bandwidth usage during checkpoint?
>>
>> Best,
>> Congxian
>>
>>
>> Bekir Oguz  于2019年7月16日周二 下午10:45写道:
>>
>> Hi all,
>> We have a flink job with user state, checkpointing to RocksDBBackend
>> which is externally stored in AWS S3.
>> After we have migrated our cluster from 1.6 to 1.8, we see occasionally
>> that some slots do to acknowledge the checkpoints quick enough. As an
>> example: All slots acknowledge between 30-50 seconds except only one slot
>> acknowledges in 15 mins. Checkpoint sizes are similar to each other, like
>> 200-400 MB.
>>
>> We did not experience this weird behaviour in Flink 1.6. We have 5 min
>> checkpoint interval and this happens sometimes once in an hour sometimes
>> more but not in all the checkpoint requests. Please see the screenshot
>> below.
>>
>> Also another point: For the faulty slots, the duration is consistently 15
>> mins and some seconds, we couldn’t find out where this 15 mins response
>> time comes from. And each time it is a different task manager, not always
>> the same one.
>>
>> Do you guys aware of any other users having similar issues with the new
>> version and also a suggested bug fix or solution?
>>
>>
>>
>>
>> Thanks in advance,
>> Bekir Oguz
>>
>>
>>
>>


[jira] [Created] (FLINK-13546) Run TPC-H E2E test on travis cron job

2019-08-01 Thread Jark Wu (JIRA)
Jark Wu created FLINK-13546:
---

 Summary: Run TPC-H E2E test on travis cron job 
 Key: FLINK-13546
 URL: https://issues.apache.org/jira/browse/FLINK-13546
 Project: Flink
  Issue Type: Task
  Components: Travis
Reporter: Jark Wu
Assignee: Caizhi Weng
 Fix For: 1.9.0, 1.10.0


FLINK-13436 added a TPC-H e2e test but didn't include it in travis. We should 
add it to travis cron job. One place is {{split_misc.sh}}, another choice is 
{{split_heavy.sh}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13545) JoinToMultiJoinRule should not match SEMI/ANTI LogicalJoin

2019-08-01 Thread godfrey he (JIRA)
godfrey he created FLINK-13545:
--

 Summary: JoinToMultiJoinRule should not match SEMI/ANTI LogicalJoin
 Key: FLINK-13545
 URL: https://issues.apache.org/jira/browse/FLINK-13545
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.10.0
Reporter: godfrey he
 Fix For: 1.9.0, 1.10.0


run tpcds 14.a on blink planner, an exception will thrown

java.lang.ArrayIndexOutOfBoundsException: 84

at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule$InputReferenceCounter.visitInputRef(JoinToMultiJoinRule.java:564)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule$InputReferenceCounter.visitInputRef(JoinToMultiJoinRule.java:555)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
at 
org.apache.calcite.rex.RexVisitorImpl.visitCall(RexVisitorImpl.java:80)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:191)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule.addOnJoinFieldRefCounts(JoinToMultiJoinRule.java:481)
at 
org.apache.calcite.rel.rules.JoinToMultiJoinRule.onMatch(JoinToMultiJoinRule.java:166)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:284)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202)


the reason is {{JoinToMultiJoinRule}} should match SEMI/ANTI LogicalJoin. 
before calcite-1.20, SEMI join is represented by {{SemiJoin}} which is not 
matched {{JoinToMultiJoinRule}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-01 Thread qi luo
Agree that using Optional will improve code robustness. However we’re 
hesitating to use Optional in data intensive operations.

For example, SingleInputGate is already creating Optional for every 
BufferOrEvent in getNextBufferOrEvent(). How much performance gain would we get 
if it’s replaced by null check?

Regards,
Qi

> On Aug 1, 2019, at 11:00 PM, Andrey Zagrebin  wrote:
> 
> Hi all,
> 
> This is the next follow up discussion about suggestions for the recent
> thread about code style guide in Flink [1].
> 
> In general, one could argue that any variable, which is nullable, can be
> replaced by wrapping it with Optional to explicitly show that it can be
> null. Examples are:
> 
>   - returned values to force user to check not null
>   - optional function arguments, e.g. with implicit default values
>   - even class fields as e.g. optional config options with implicit
>   default values
> 
> 
> At the same time, we also have @Nullable annotation to express this
> intention.
> 
> Also, when the class Optional was introduced, Oracle posted a guideline
> about its usage [2]. Basically, it suggests to use it mostly in APIs for
> returned values to inform and force users to check the returned value
> instead of returning null and avoid NullPointerException.
> 
> Wrapping with Optional also comes with the performance overhead.
> 
> Following the Oracle's guide in general, the suggestion is:
> 
>   - Avoid using Optional in any performance critical code
>   - Use Optional only to return nullable values in the API/public methods
>   unless it is performance critical then rather use @Nullable
>   - Passing an Optional argument to a method can be allowed if it is
>   within a private helper method and simplifies the code, example is in [3]
>   - Optional should not be used for class fields
> 
> 
> Please, feel free to share you thoughts.
> 
> Best,
> Andrey
> 
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> [2]
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> [3]
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95



[jira] [Created] (FLINK-13544) Set parallelism of table sink operator to input transformation parallelism

2019-08-01 Thread Jark Wu (JIRA)
Jark Wu created FLINK-13544:
---

 Summary: Set parallelism of table sink operator to input 
transformation parallelism
 Key: FLINK-13544
 URL: https://issues.apache.org/jira/browse/FLINK-13544
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Common, Table SQL / Planner
Reporter: Jark Wu


Currently, there are a lot of {{TableSink}} connectors uses 
{{dataStream.addSink()}} without {{setParallelism}} explicitly. This will use 
default parallelism of the environment. However, the parallelism of input 
transformation might not be env.parallelism, for example, global aggregation 
has 1 parallelism. In this case, it will lead to data reorder, and result in 
incorrect result.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13543) Enable reuse forks for integration tests in blink planner

2019-08-01 Thread Jark Wu (JIRA)
Jark Wu created FLINK-13543:
---

 Summary: Enable reuse forks for integration tests in blink planner
 Key: FLINK-13543
 URL: https://issues.apache.org/jira/browse/FLINK-13543
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.9.0, 1.10.0


As discussed in https://github.com/apache/flink/pull/9180 , we find that with 
enabling reuse forks we can save ~20min (50min -> 30min) for blink planner 
test. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Jira issue assignment privs

2019-08-01 Thread Zili Chen
Hi Mans,

There has been an announcement by the community[1] which
apply a restrictive JIRA workflow. Briefly now it only
allow committers and PMC members to assign somebody to
a JIRA ticket. This is what you meet.

You could send a request for assigning you to the ticket
on dev list and hopefully a committer could do it for you.

Best,
tison.

[1]
https://lists.apache.org/thread.html/4ed570c7110b7b55b5c3bd52bb61ff35d5bda88f47939d8e7f1844c4@%3Cdev.flink.apache.org%3E


M Singh  于2019年8月2日周五 上午6:29写道:

> Hi:
> I've created a JIRA issue
> https://issues.apache.org/jira/browse/FLINK-13542?filter=-2 and have
> submitted a pull request, but I am not able to assign it to myself.  I
> believe I had the privs earlier.  Can you please let me know how I can
> assign the issue to myself.
> Thanks
> Mans


Re: [DISCUSS] ARM support for Flink

2019-08-01 Thread Yikun Jiang
@Chesnay @ Stephan  Thanks for the suggestion and help, and I open a JIRA
in [1].

Any other questions you could feel free to ping us.

[1]  https://issues.apache.org/jira/browse/INFRA-18822

Regards,
Yikun

Jiang Yikun(Kero)
Mail: yikunk...@gmail.com


Stephan Ewen  于2019年8月1日周四 下午4:41写道:

> Asking INFRA to add support means filing a JIRA ticket.
>
> That works the same way as filing a FLINK Jira ticket, but selecting INFRA
> as the project to file the ticket for.
>
> On Thu, Aug 1, 2019 at 4:17 AM Xiyuan Wang 
> wrote:
>
> > Thanks for your reply.
> >
> > We are now keeping investigating and debugging Flink on ARM.  It's hard
> for
> > us to say How many kinds of test are enough for ARM support at this
> moment,
> > but `core` and `test` are necessary of cause I think. What we do now is
> > following travis-ci, added all the module that tarvis-ci contains.
> >
> > During out local test, there are just few tests failed[1]. We have
> > solutions for some of them, others are still under debugging. Flink
> team's
> > idea is welcome. And very thanks for your jira issue[2], we will keep
> > updating it then.
> >
> > It'll be great if Infra Team could add OpenLab App[3](or other CI if
> Flink
> > choose) to Flink repo. I'm not  clear how to talk with Infra Team, should
> > Flink team start the discussion? Or I send a mail list to Infra? Need
> your
> > help.
> >
> > Then once app is added, perhaps we can add `core` and `test` jobs as the
> > first step, making them run stable and successful and then adding more
> > modules if needed.
> >
> > [1]: https://etherpad.net/p/flink_arm64_support
> > [2]: https://issues.apache.org/jira/browse/FLINK-13448
> > [3]: https://github.com/apps/theopenlab-ci
> >
> > Regards
> > wangxiyuan
> >
> > Stephan Ewen  于2019年7月31日周三 下午9:46写道:
> >
> > > Wow, that is pretty nice work, thanks a lot!
> > >
> > > We need some support from Apache Infra to see if we can connect the
> Flink
> > > Github Repo with the OpenLab CI.
> > > We would also need a discussion on the developer mailing list, to get
> > > community agreement.
> > >
> > > Have you looked at whether we need to run all tests with ARM, or
> whether
> > > maybe only the "core" and "tests" profile would be enough to get
> > confidence
> > > that Flink runs on ARM?
> > > Just asking because Flink has a lot of long running tests by now that
> can
> > > easily eat up a lot of CI capacity.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > >
> > > On Tue, Jul 30, 2019 at 3:45 AM Xiyuan Wang 
> > > wrote:
> > >
> > > > Hi Stephan,
> > > >   Maybe I misled you in the previous email. We don't need to migrate
> CI
> > > > completely, travis-ci is still there working for X86 arch. What we
> need
> > > to
> > > > do is to add another CI tool for ARM arch.
> > > >
> > > >   There are some ways to do it. As I wrote on
> > > > https://issues.apache.org/jira/browse/FLINK-13199 to @Chesnay:
> > > >
> > > > 1. Add OpenLab CI system for ARM arch test.OpenLab is very similar
> with
> > > > travis-ci. What Flilnk need to do is adding the openlab github app to
> > the
> > > > repo, then add the job define files inner Flink repo, Here is a POC
> by
> > > me:
> > > > https://github.com/theopenlab/flink/pull/1
> > > > 2. OpenLab will donate ARM resouces to Apache Infra team as well.
> Then
> > > > Flink can use the Apache offical  Jenkins system for Flink ARM test
> in
> > > the
> > > > future. https://builds.apache.org/
> > > > 3. Use Drony CI which support ARM arch as well. https://drone.io/
> > > >
> > > > Since I'm from OpenLab community, if Flink choose OpenLab CI, My
> > OpenLab
> > > > colleague and I can keep helping and maintaining the ARM CI job. If
> > > choose
> > > > the 2nd way, the CI maintainance work may be handled by apache-infra
> > > team I
> > > > guess.  If choose the 3rd Drony CI, what we can help is very limited.
> > > > AFAIK, Drony use container for CI test, which may not satisfy some
> > > > requiremnts. And OpenLab use VM for test.
> > > >
> > > > Need Flink core team's decision and reply.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > Stephan Ewen  于2019年7月29日周一 下午6:05写道:
> > > >
> > > > > I don't think it is feasible for Flink to migrate CI completely.
> > > > >
> > > > > Is there a way to add ARM tests on an external CI in addition?
> > > > > @Chesnay what do you think?
> > > > >
> > > > >
> > > > > On Fri, Jul 12, 2019 at 4:45 AM Xiyuan Wang <
> > wangxiyuan1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Stephan
> > > > > >   yeah, we should add an ARM CI first. But Travis CI doesn't
> > support
> > > > ARM
> > > > > > arch itself. OpenLab community support it. As I mentioned before,
> > > > OpenLab
> > > > > > is an opensource CI system like travis-ci.[1], it uses opensource
> > CI
> > > > > > project `zuul`[2] for its deployment. Now some opensource project
> > has
> > > > > > intergreted with it already. For example, `contained` project
> from
> > > > > > CNCF 

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-01 Thread SHI Xiaogang
Hi Andrey,

Thanks for bringing this. Personally, I prefer to the following style which
(1) puts the right parenthese in the next line
(2) a new line for each exception if exceptions can not be put in the same
line

That way, parentheses are aligned in a similar way to braces and exceptions
can be well aligned.

*public **void func(*
*int arg1,*
*int arg2,*
*...
*) throws E1, E2, E3 {*
*...
*}*

or

*public **void func(*
*int arg1,*
*int arg2,*
*...
*) throws
*E1,
*E2,
*E3 {*
*...
*}*

Regards,
Xiaogang

Andrey Zagrebin  于2019年8月1日周四 下午11:19写道:

> Hi all,
>
> This is one more small suggestion for the recent thread about code style
> guide in Flink [1].
>
> We already have a note about using a new line for each chained call in
> Scala, e.g. either:
>
> *values**.stream()**.map(...)**,collect(...);*
>
> or
>
> *values*
> *.stream()*
> *.map(*...*)*
> *.collect(...)*
>
> if it would result in a too long line by keeping all chained calls in one
> line.
>
> The suggestion is to have it for Java as well and add the same rule for a
> long list of function arguments. So it is either:
>
> *public void func(int arg1, int arg2, ...) throws E1, E2, E3 {*
> *...*
> *}*
>
> or
>
> *public **void func(*
> *int arg1,*
> *int arg2,*
> *...)** throws E1, E2, E3 {*
> *...*
> *}*
>
> but thrown exceptions stay on the same last line.
>
> Please, feel free to share you thoughts.
>
> Best,
> Andrey
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
>


Jira issue assignment privs

2019-08-01 Thread M Singh
Hi:
I've created a JIRA issue 
https://issues.apache.org/jira/browse/FLINK-13542?filter=-2 and have submitted 
a pull request, but I am not able to assign it to myself.  I believe I had the 
privs earlier.  Can you please let me know how I can assign the issue to myself.
Thanks
Mans

[jira] [Created] (FLINK-13542) Flink Datadog metrics reporter sends empty series if there is no metrics

2019-08-01 Thread Mans Singh (JIRA)
Mans Singh created FLINK-13542:
--

 Summary: Flink Datadog metrics reporter sends empty series if 
there is no metrics
 Key: FLINK-13542
 URL: https://issues.apache.org/jira/browse/FLINK-13542
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Affects Versions: 1.8.1
Reporter: Mans Singh
 Fix For: 1.9.0


If there are no metrics, Datadog reporter still sends empty series array to 
Datadog.  The reporter can check the size of the series and only send if there 
are metrics collected.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] CPU flame graph for a job vertex in web UI.

2019-08-01 Thread David Morávek
Hi Till, thanks for the feedback! These endpoints are only called when the
vertex is selected in the UI, so there should be any heavy RPC load. For
back-pressure, we only sample top 3 calls of the stack (depth = 3). For the
flame-graph, we want to sample the whole stack trace and we need different
sampling rate (longer period, more samples). Those are the main reasons to
split these in two "trackers", but I may be missing something.

I've prepared a little demo, so others can have a better idea of what I
have in mind.

https://youtu.be/GUNDehj9z9o

Please note that this is a proof of concept and I'm not frontend person, so
it may look little clumsy :)

D.

On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann  wrote:

> Hi David,
>
> thanks for starting this discussion. I like the idea of improving insights
> into Flink's execution and I believe that a flame graph could be helpful.
>
> I quickly glanced over your changes and I think they go in a good
> direction. One idea could be to share the `StackTraceSample` produced by
> the `StackTraceSampleCoordinator` between the different
> `StackTraceOperatorTracker` so that we don't send multiple requests for the
> same operators. That way we would decrease a bit the RPC load.
>
> Apart from that, I think the next steps would be to find a committer who
> could shepherd this effort and help you with merging it.
>
> Cheers,
> Till
>
> On Wed, Jul 31, 2019 at 7:05 PM David Morávek  wrote:
>
> > Hello,
> >
> > While looking into Flink internals, I've noticed that there is already a
> > mechanism for stack-trace sampling of a particular job vertex.
> >
> > I think it may be really useful to allow user to easily render a cpu
> > flamegraph  in a new UI
> for
> > a
> > selected vertex (new tab next to back pressure) of a running job. Back
> > pressure tab already provides a good idea of which vertex causes trouble,
> > but it's hard to say what's actually going on.
> >
> > I've tried to implement a basic REST endpoint
> > <
> >
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> > >,
> > that prepares data for the flame graph rendering and it seems to be
> > providing good insight.
> >
> > It should be straightforward to render data from the endpoint in new UI
> > using existing  javascript
> > libraries.
> >
> > WDYT? Is this worth pushing forward?
> >
> > D.
> >
>


[jira] [Created] (FLINK-13541) State Processor Api sets the wrong key selector when writing savepoints

2019-08-01 Thread Seth Wiesman (JIRA)
Seth Wiesman created FLINK-13541:


 Summary: State Processor Api sets the wrong key selector when 
writing savepoints
 Key: FLINK-13541
 URL: https://issues.apache.org/jira/browse/FLINK-13541
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream, Runtime / State Backends
Reporter: Seth Wiesman
 Fix For: 1.9.0, 1.10.0


The state processor api is setting the wrong key selector for its StreamConfig 
when writing savepoints. It uses two key selectors internally that happen to 
output the same value for integer keys but not in general. 


{noformat}
Caused by: java.lang.RuntimeException: Exception occurred while setting the 
current key context.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setCurrentKey(AbstractStreamOperator.java:641)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setKeyContextElement(AbstractStreamOperator.java:627)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setKeyContextElement1(AbstractStreamOperator.java:615)
at 
org.apache.flink.state.api.output.BoundedStreamTask.performDefaultAction(BoundedStreamTask.java:83)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.execution.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:140)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378)
at 
org.apache.flink.state.api.output.BoundedOneInputStreamTaskRunner.mapPartition(BoundedOneInputStreamTaskRunner.java:76)
at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:688)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:518)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.String
at 
org.apache.flink.api.common.typeutils.base.StringSerializer.serialize(StringSerializer.java:33)
at 
org.apache.flink.contrib.streaming.state.RocksDBSerializedCompositeKeyBuilder.serializeKeyGroupAndKey(RocksDBSerializedCompositeKeyBuilder.java:159)
at 
org.apache.flink.contrib.streaming.state.RocksDBSerializedCompositeKeyBuilder.setKeyAndKeyGroup(RocksDBSerializedCompositeKeyBuilder.java:96)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.setCurrentKey(RocksDBKeyedStateBackend.java:303)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setCurrentKey(AbstractStreamOperator.java:639)
... 12 more

{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-01 Thread Piotr Nowojski
Hi Andrey,

I think that’s a good compromise easy to follow, so +1 from my side.

Piotrek

> On 1 Aug 2019, at 18:00, Andrey Zagrebin  wrote:
> 
> EDIT: for Optional in public API vs performance concerns
> 
> Hi all,
> 
> This is the next follow up discussion about suggestions for the recent
> thread about code style guide in Flink [1].
> 
> In general, one could argue that any variable, which is nullable, can be
> replaced by wrapping it with Optional to explicitly show that it can be
> null. Examples are:
> 
>   - returned values to force user to check not null
>   - optional function arguments, e.g. with implicit default values
>   - even class fields as e.g. optional config options with implicit
>   default values
> 
> 
> At the same time, we also have @Nullable annotation to express this
> intention.
> 
> Also, when the class Optional was introduced, Oracle posted a guideline
> about its usage [2]. Basically, it suggests to use it mostly in APIs for
> returned values to inform and force users to check the returned value
> instead of returning null and avoid NullPointerException.
> 
> Wrapping with Optional also comes with the performance overhead.
> 
> Following the Oracle's guide in general, the suggestion is:
> 
>   - Always use Optional only to return nullable values in the API/public
>   methods
>  - Only if you can prove that Optional usage would lead to a
>  performance degradation in critical code then return nullable value
>  directly and annotate it with @Nullable
>   - Passing an Optional argument to a method can be allowed if it is
>   within a private helper method and simplifies the code, example is in [3]
>   - Optional should not be used for class fields
> 
> 
> Please, feel free to share you thoughts.
> 
> Best,
> Andrey
> 
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> [2]
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> [3]
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
> 
> On Thu, Aug 1, 2019 at 6:00 PM Andrey Zagrebin  wrote:
> 
>> Hi all,
>> 
>> This is the next follow up discussion about suggestions for the recent
>> thread about code style guide in Flink [1].
>> 
>> In general, one could argue that any variable, which is nullable, can be
>> replaced by wrapping it with Optional to explicitly show that it can be
>> null. Examples are:
>> 
>>   - returned values to force user to check not null
>>   - optional function arguments, e.g. with implicit default values
>>   - even class fields as e.g. optional config options with implicit
>>   default values
>> 
>> 
>> At the same time, we also have @Nullable annotation to express this
>> intention.
>> 
>> Also, when the class Optional was introduced, Oracle posted a guideline
>> about its usage [2]. Basically, it suggests to use it mostly in APIs for
>> returned values to inform and force users to check the returned value
>> instead of returning null and avoid NullPointerException.
>> 
>> Wrapping with Optional also comes with the performance overhead.
>> 
>> Following the Oracle's guide in general, the suggestion is:
>> 
>>   - Avoid using Optional in any performance critical code
>>   - Use Optional only to return nullable values in the API/public
>>   methods unless it is performance critical then rather use @Nullable
>>   - Passing an Optional argument to a method can be allowed if it is
>>   within a private helper method and simplifies the code, example is in [3]
>>   - Optional should not be used for class fields
>> 
>> 
>> Please, feel free to share you thoughts.
>> 
>> Best,
>> Andrey
>> 
>> [1]
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
>> [2]
>> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
>> [3]
>> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
>> 



Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-01 Thread Andrey Zagrebin
EDIT: for Optional in public API vs performance concerns

Hi all,

This is the next follow up discussion about suggestions for the recent
thread about code style guide in Flink [1].

In general, one could argue that any variable, which is nullable, can be
replaced by wrapping it with Optional to explicitly show that it can be
null. Examples are:

   - returned values to force user to check not null
   - optional function arguments, e.g. with implicit default values
   - even class fields as e.g. optional config options with implicit
   default values


At the same time, we also have @Nullable annotation to express this
intention.

Also, when the class Optional was introduced, Oracle posted a guideline
about its usage [2]. Basically, it suggests to use it mostly in APIs for
returned values to inform and force users to check the returned value
instead of returning null and avoid NullPointerException.

Wrapping with Optional also comes with the performance overhead.

Following the Oracle's guide in general, the suggestion is:

   - Always use Optional only to return nullable values in the API/public
   methods
  - Only if you can prove that Optional usage would lead to a
  performance degradation in critical code then return nullable value
  directly and annotate it with @Nullable
   - Passing an Optional argument to a method can be allowed if it is
   within a private helper method and simplifies the code, example is in [3]
   - Optional should not be used for class fields


Please, feel free to share you thoughts.

Best,
Andrey

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
[2]
https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
[3]
https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95

On Thu, Aug 1, 2019 at 6:00 PM Andrey Zagrebin  wrote:

> Hi all,
>
> This is the next follow up discussion about suggestions for the recent
> thread about code style guide in Flink [1].
>
> In general, one could argue that any variable, which is nullable, can be
> replaced by wrapping it with Optional to explicitly show that it can be
> null. Examples are:
>
>- returned values to force user to check not null
>- optional function arguments, e.g. with implicit default values
>- even class fields as e.g. optional config options with implicit
>default values
>
>
> At the same time, we also have @Nullable annotation to express this
> intention.
>
> Also, when the class Optional was introduced, Oracle posted a guideline
> about its usage [2]. Basically, it suggests to use it mostly in APIs for
> returned values to inform and force users to check the returned value
> instead of returning null and avoid NullPointerException.
>
> Wrapping with Optional also comes with the performance overhead.
>
> Following the Oracle's guide in general, the suggestion is:
>
>- Avoid using Optional in any performance critical code
>- Use Optional only to return nullable values in the API/public
>methods unless it is performance critical then rather use @Nullable
>- Passing an Optional argument to a method can be allowed if it is
>within a private helper method and simplifies the code, example is in [3]
>- Optional should not be used for class fields
>
>
> Please, feel free to share you thoughts.
>
> Best,
> Andrey
>
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> [2]
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> [3]
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
>


[jira] [Created] (FLINK-13540) DDL do not support key of properties contains number or "-"

2019-08-01 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-13540:


 Summary: DDL do not support key of properties contains number or 
"-"
 Key: FLINK-13540
 URL: https://issues.apache.org/jira/browse/FLINK-13540
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jingsong Lee
 Fix For: 1.9.0, 1.10.0


But many connector have key of properties contains number or "-", like kafka..

So as long as we don't solve this problem, it's hard for users to use these 
connectors.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13539) DDL do not support CSV tableFactory because CSV require format.fields

2019-08-01 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-13539:


 Summary: DDL do not support CSV tableFactory because CSV require 
format.fields
 Key: FLINK-13539
 URL: https://issues.apache.org/jira/browse/FLINK-13539
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jingsong Lee
 Fix For: 1.9.0, 1.10.0


(Now DDL do not support key of properties contains number or "-".)

And old csv validator require "format.fields.#.type". So there is an validation 
exception now.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Hequn Cheng
+1 (non-binding)

Thanks a lot for driving this! @jincheng sun 

Best, Hequn

On Thu, Aug 1, 2019 at 11:00 PM Biao Liu  wrote:

> Thanks Jincheng for working on this.
>
> +1 (non-binding)
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Thu, Aug 1, 2019 at 8:55 PM Jark Wu  wrote:
>
> > +1 (non-binding)
> >
> > Cheers,
> > Jark
> >
> > On Thu, 1 Aug 2019 at 17:45, Yu Li  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks for driving this!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Thu, 1 Aug 2019 at 11:41, Till Rohrmann 
> wrote:
> > >
> > > > +1
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Aug 1, 2019 at 10:39 AM vino yang 
> > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Jeff Zhang  于2019年8月1日周四 下午4:33写道:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Stephan Ewen  于2019年8月1日周四 下午4:29写道:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Jincheng,
> > > > > > > >
> > > > > > > > Thanks a lot for driving this.
> > > > > > > > +1 (non-binding).
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Dian
> > > > > > > >
> > > > > > > > > 在 2019年8月1日,下午3:24,jincheng sun 
> > 写道:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Publish the PyFlink into PyPI is very important for our
> user,
> > > > > Please
> > > > > > > vote
> > > > > > > > > on the following proposal:
> > > > > > > > >
> > > > > > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > > > > > "apache-flink"
> > > > > > > > > 2. Release one binary with the default Scala version same
> > with
> > > > > flink
> > > > > > > > > default config.
> > > > > > > > > 3. Create an account, named "pyflink" as owner(only PMC can
> > > > manage
> > > > > > it).
> > > > > > > > PMC
> > > > > > > > > can add account for the Release Manager, but Release
> Manager
> > > can
> > > > > not
> > > > > > > > delete
> > > > > > > > > the release.
> > > > > > > > >
> > > > > > > > > [ ] +1, Approve the proposal.
> > > > > > > > > [ ] -1, Disapprove the proposal, because ...
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 72 hours. It is adopted
> > by a
> > > > > > simple
> > > > > > > > > majority with a minimum of three positive votes.
> > > > > > > > >
> > > > > > > > > See discussion threads for more details [1].
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jincheng
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Jeff Zhang
> > > > > >
> > > > >
> > > >
> > >
> >
>


[DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-01 Thread Andrey Zagrebin
Hi all,

This is one more small suggestion for the recent thread about code style
guide in Flink [1].

We already have a note about using a new line for each chained call in
Scala, e.g. either:

*values**.stream()**.map(...)**,collect(...);*

or

*values*
*.stream()*
*.map(*...*)*
*.collect(...)*

if it would result in a too long line by keeping all chained calls in one
line.

The suggestion is to have it for Java as well and add the same rule for a
long list of function arguments. So it is either:

*public void func(int arg1, int arg2, ...) throws E1, E2, E3 {*
*...*
*}*

or

*public **void func(*
*int arg1,*
*int arg2,*
*...)** throws E1, E2, E3 {*
*...*
*}*

but thrown exceptions stay on the same last line.

Please, feel free to share you thoughts.

Best,
Andrey

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E


Re: [DISCUSS][CODE STYLE] Create collections always with initial capacity

2019-08-01 Thread Piotr Nowojski
Hi,

> - a bit more code, increases maintenance burden.

I think there is even more to that. It’s almost like a code duplication, albeit 
expressed in very different way, with all of the drawbacks of duplicated code: 
initial capacity can drift out of sync, causing confusion. Also it’s not “a bit 
more code”, it might be non trivial reasoning/calculation how to set the 
initial value. Whenever we change something/refactor the code, "maintenance 
burden” will mostly come from that. 

Also I think this just usually falls under a premature optimisation rule.

Besides:

> The conclusion is the following at the moment:
> Only set the initial capacity if you have a good idea about the expected size.

I would add a clause to set the initial capacity “only for good proven 
reasons”. It’s not about whether we can set it, but whether it makes sense to 
do so (to avoid the before mentioned "maintenance burden”).

Piotrek

> On 1 Aug 2019, at 14:41, Xintong Song  wrote:
> 
> +1 on setting initial capacity only when have good expectation on the
> collection size.
> 
> Thank you~
> 
> Xintong Song
> 
> 
> 
> On Thu, Aug 1, 2019 at 2:32 PM Andrey Zagrebin  wrote:
> 
>> Hi all,
>> 
>> As you probably already noticed, Stephan has triggered a discussion thread
>> about code style guide for Flink [1]. Recently we were discussing
>> internally some smaller concerns and I would like start separate threads
>> for them.
>> 
>> This thread is about creating collections always with initial capacity. As
>> you might have seen, some parts of our code base always initialise
>> collections with some non-default capacity. You can even activate a check
>> in IntelliJ Idea that can monitor and highlight creation of collection
>> without initial capacity.
>> 
>> Pros:
>> - performance gain if there is a good reasoning about initial capacity
>> - the capacity is always deterministic and does not depend on any changes
>> of its default value in Java
>> - easy to follow: always initialise, has IDE support for detection
>> 
>> Cons (for initialising w/o good reasoning):
>> - We are trying to outsmart JVM. When there is no good reasoning about
>> initial capacity, we can rely on JVM default value.
>> - It is even confusing e.g. for hash maps as the real size depends on the
>> load factor.
>> - It would only add minor performance gain.
>> - a bit more code, increases maintenance burden.
>> 
>> The conclusion is the following at the moment:
>> Only set the initial capacity if you have a good idea about the expected
>> size.
>> 
>> Please, feel free to share you thoughts.
>> 
>> Best,
>> Andrey
>> 
>> [1]
>> 
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
>> 



[DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-01 Thread Andrey Zagrebin
Hi all,

This is the next follow up discussion about suggestions for the recent
thread about code style guide in Flink [1].

In general, one could argue that any variable, which is nullable, can be
replaced by wrapping it with Optional to explicitly show that it can be
null. Examples are:

   - returned values to force user to check not null
   - optional function arguments, e.g. with implicit default values
   - even class fields as e.g. optional config options with implicit
   default values


At the same time, we also have @Nullable annotation to express this
intention.

Also, when the class Optional was introduced, Oracle posted a guideline
about its usage [2]. Basically, it suggests to use it mostly in APIs for
returned values to inform and force users to check the returned value
instead of returning null and avoid NullPointerException.

Wrapping with Optional also comes with the performance overhead.

Following the Oracle's guide in general, the suggestion is:

   - Avoid using Optional in any performance critical code
   - Use Optional only to return nullable values in the API/public methods
   unless it is performance critical then rather use @Nullable
   - Passing an Optional argument to a method can be allowed if it is
   within a private helper method and simplifies the code, example is in [3]
   - Optional should not be used for class fields


Please, feel free to share you thoughts.

Best,
Andrey

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
[2]
https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
[3]
https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Biao Liu
Thanks Jincheng for working on this.

+1 (non-binding)

Thanks,
Biao /'bɪ.aʊ/



On Thu, Aug 1, 2019 at 8:55 PM Jark Wu  wrote:

> +1 (non-binding)
>
> Cheers,
> Jark
>
> On Thu, 1 Aug 2019 at 17:45, Yu Li  wrote:
>
> > +1 (non-binding)
> >
> > Thanks for driving this!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 1 Aug 2019 at 11:41, Till Rohrmann  wrote:
> >
> > > +1
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Aug 1, 2019 at 10:39 AM vino yang 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Jeff Zhang  于2019年8月1日周四 下午4:33写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Stephan Ewen  于2019年8月1日周四 下午4:29写道:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> > > wrote:
> > > > > >
> > > > > > > Hi Jincheng,
> > > > > > >
> > > > > > > Thanks a lot for driving this.
> > > > > > > +1 (non-binding).
> > > > > > >
> > > > > > > Regards,
> > > > > > > Dian
> > > > > > >
> > > > > > > > 在 2019年8月1日,下午3:24,jincheng sun 
> 写道:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Publish the PyFlink into PyPI is very important for our user,
> > > > Please
> > > > > > vote
> > > > > > > > on the following proposal:
> > > > > > > >
> > > > > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > > > > "apache-flink"
> > > > > > > > 2. Release one binary with the default Scala version same
> with
> > > > flink
> > > > > > > > default config.
> > > > > > > > 3. Create an account, named "pyflink" as owner(only PMC can
> > > manage
> > > > > it).
> > > > > > > PMC
> > > > > > > > can add account for the Release Manager, but Release Manager
> > can
> > > > not
> > > > > > > delete
> > > > > > > > the release.
> > > > > > > >
> > > > > > > > [ ] +1, Approve the proposal.
> > > > > > > > [ ] -1, Disapprove the proposal, because ...
> > > > > > > >
> > > > > > > > The vote will be open for at least 72 hours. It is adopted
> by a
> > > > > simple
> > > > > > > > majority with a minimum of three positive votes.
> > > > > > > >
> > > > > > > > See discussion threads for more details [1].
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jincheng
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-13538) Give field names in deserializers thrown exceptions

2019-08-01 Thread JIRA
François Lacombe created FLINK-13538:


 Summary: Give field names in deserializers thrown exceptions
 Key: FLINK-13538
 URL: https://issues.apache.org/jira/browse/FLINK-13538
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.8.0
Reporter: François Lacombe


Deserializers like JsonRowDeserializerSchema parse JSON strings according to a 
TypeInformation object.

Types mistakes can occur and it usually rise a IOException caused by a 
IllegalStateException. Here I try to parse "field":"blabla" described with 
Type.INT

 
{code:java}
java.io.IOException: Failed to deserialize JSON object.
    at 
org.apache.flink.formats.json.JsonRowDeserializationSchema.deserialize(JsonRowDeserializationSchema.java:97)
    at 
com.dcbrain.etl.inputformat.JsonInputFormat.nextRecord(JsonInputFormat.java:96)
    at 
com.dcbrain.etl.inputformat.JsonInputFormat.nextRecord(JsonInputFormat.java:1)
    at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:192)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unsupported type information 
'Integer' for node: "blabla"
    at 
org.apache.flink.formats.json.JsonRowDeserializationSchema.convert(JsonRowDeserializationSchema.java:191)
    at 
org.apache.flink.formats.json.JsonRowDeserializationSchema.convertRow(JsonRowDeserializationSchema.java:212)
    at 
org.apache.flink.formats.json.JsonRowDeserializationSchema.deserialize(JsonRowDeserializationSchema.java:95)
    ... 5 common frames omitted{code}
 

The message nor the exception objects contains reference to field causing this 
error which require time to inspect complex input data to find where the error 
really is.

Could it be possible to improve messages or even Exceptions objects thrown by 
Serializers/Deserializers to get which field is responsible of the error please?

JsonRowDeserializerSchema isn't the only one touched by such issues.

 

This will allow to produce more useful logs to be read by users or 
administrators.

 

All the best



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13537) Changing Kafka producer pool size and scaling out may create overlapping transaction IDs

2019-08-01 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13537:
---

 Summary: Changing Kafka producer pool size and scaling out may 
create overlapping transaction IDs
 Key: FLINK-13537
 URL: https://issues.apache.org/jira/browse/FLINK-13537
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.8.1, 1.9.0
Reporter: Nico Kruber


The Kafka producer's transaction IDs are only generated once when there was no 
previous state for that operator. In the case where we restore and increase 
parallelism (scale-out), some operators may not have previous state and create 
new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with 
the old ones which should never happen!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13536) Improve nullability handling in Types

2019-08-01 Thread JIRA
François Lacombe created FLINK-13536:


 Summary: Improve nullability handling in Types
 Key: FLINK-13536
 URL: https://issues.apache.org/jira/browse/FLINK-13536
 Project: Flink
  Issue Type: Improvement
  Components: API / Type Serialization System, Formats (JSON, Avro, 
Parquet, ORC, SequenceFile)
Affects Versions: 1.8.0
Reporter: François Lacombe


Currently, Avro to Flink type matching doesn't propagate nullability definition.

In Avro :
{code:java}
"type":["null","string"]{code}
allows Java String myField=null;

while
{code:java}
"type":"string"{code}
doesn't.

It may be good to find corresponding property in Flink types too as to check 
for nullability in JsonRowDeserializationSchema for instance (null or absent 
field in parsed JSON should only be possible on nullable fields)

 

Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release

2019-08-01 Thread Kurt Young
Update: RC1 for 1.9.0 has been created. Please see [1] for the preview
source / binary releases and Maven artifacts.

Best,
Kurt

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PREVIEW-Apache-Flink-1-9-0-release-candidate-1-td31233.html


On Tue, Jul 30, 2019 at 2:36 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Biao,
>
> Thanks for working on FLINK-9900. The ticket is already assigned to you
> now.
>
> Cheers,
> Gordon
>
> On Tue, Jul 30, 2019 at 2:31 PM Biao Liu  wrote:
>
> > Hi Gordon,
> >
> > Thanks for updating progress.
> >
> > Currently I'm working on FLINK-9900. I need a committer to assign the
> > ticket to me.
> >
> > Tzu-Li (Gordon) Tai 于2019年7月30日 周二13:01写道:
> >
> > > Hi all,
> > >
> > > There are quite a few instabilities in our builds right now (master +
> > > release-1.9), some of which are directed or suspiciously related to the
> > 1.9
> > > release.
> > >
> > > I'll categorize the instabilities into ones which we were already
> > tracking
> > > in the 1.9 Burndown Kanban board [1] prior to this email, and which
> ones
> > > seems to be new or were not monitored so that we draw additional
> > attention
> > > to them:
> > >
> > > *Instabilities that were already being tracked*
> > >
> > > - FLINK-13242: StandaloneResourceManagerTest.testStartupPeriod fails on
> > > Travis [2]
> > > A fix for this is coming with FLINK-13408 (Schedule
> > > StandaloneResourceManager.setFailUnfulfillableRequest whenever the
> > > leadership is acquired) [3]
> > >
> > > *New discovered instabilities that we should also start monitoring*
> > >
> > > - FLINK-13484: ConnectedComponents E2E fails with
> > > ResourceNotAvailableException [4]
> > > - FLINK-13487:
> > > TaskExecutorPartitionLifecycleTest.testPartitionReleaseAfterReleaseCall
> > > failed on Travis [5]. FLINK-13476 (Partitions not being properly
> released
> > > on cancel) could be the cause [6].
> > > - FLINK-13488: flink-python fails to build on Travis due to Python 3.3
> > > install failure [7]
> > > - FLINK-13489: Heavy deployment E2E fails quite consistently on Travis
> > with
> > > TM heartbeat timeout [8]
> > > - FLINK-9900:
> > >
> >
> ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles
> > > deadlocks [9]
> > > - FLINK-13377: Streaming SQ E2E fails on Travis with mismatching
> outputs
> > > (could just be that the SQL query tested on Travis is indeterministic)
> > [10]
> > >
> > > Cheers,
> > > Gordon
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?projectKey=FLINK=328
> > >
> > > [2]  https://issues.apache.org/jira/browse/FLINK-13242
> > > [3]  https://issues.apache.org/jira/browse/FLINK-13408
> > > [4]  https://issues.apache.org/jira/browse/FLINK-13484
> > > [5]  https://issues.apache.org/jira/browse/FLINK-13487
> > > [6]  https://issues.apache.org/jira/browse/FLINK-13476
> > > [7]  https://issues.apache.org/jira/browse/FLINK-13488
> > > [8]  https://issues.apache.org/jira/browse/FLINK-13489
> > > [9]  https://issues.apache.org/jira/browse/FLINK-9900
> > > [10] https://issues.apache.org/jira/browse/FLINK-13377
> > >
> > > On Sun, Jul 28, 2019 at 6:14 AM zhijiang  > > .invalid>
> > > wrote:
> > >
> > > > Hi Gordon,
> > > >
> > > > Thanks for the following updates of current progress.
> > > > In addition, it might be better to also cover the fix of network
> > resource
> > > > leak in jira ticket [1] which would be merged soon I think.
> > > >
> > > > [1] FLINK-13245: This fixes the leak of releasing reader/view with
> > > > partition in network stack.
> > > >
> > > > Best,
> > > > Zhijiang
> > > > --
> > > > From:Tzu-Li (Gordon) Tai 
> > > > Send Time:2019年7月27日(星期六) 10:41
> > > > To:dev 
> > > > Subject:Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0
> release
> > > >
> > > > Hi all,
> > > >
> > > > It's been a while since our last update for the release testing of
> > 1.9.0,
> > > > so I want to bring attention to the current status of the release.
> > > >
> > > > We are approaching RC1 soon, waiting on the following specific last
> > > ongoing
> > > > threads to be closed:
> > > > - FLINK-13241: This fixes a problem where when using YARN, slot
> > > allocation
> > > > requests may be ignored [1]
> > > > - FLINK-13371: Potential partitions resource leak in case of producer
> > > > restarts [2]
> > > > - FLINK-13350: Distinguish between temporary tables and persisted
> > tables
> > > > [3]. Strictly speaking this would be a new feature, but there was a
> > > > discussion here [4] to include a workaround for now in 1.9.0, and a
> > > proper
> > > > solution later on in 1.10.x.
> > > > - FLINK-12858: Potential distributed deadlock in case of synchronous
> > > > savepoint failure [5]
> > > >
> > > > The above is the critical path for moving forward with an RC1 for
> > > official
> > > > voting.
> > > > All of them have PRs already, and are currently being reviewed or
> close
> > > to
> > > > being 

[PREVIEW] Apache Flink 1.9.0, release candidate #1

2019-08-01 Thread Kurt Young
Hi Flink devs,

RC1 for Apache Flink 1.9.0 has been created. Just as RC0, this is still
a preview-only RC to drive the current testing efforts. This has all the
artifacts that we would typically have for a release, except for a source
code tag and a PR for the release announcement.

RC1 contains the following notable fixes:

* fix various unstable integration tests
* flink-planner and blink-planner now has each own uber jar and can
co-exist in the same classpath
* fix various issues when using connectors with blink planner
* the BucketingSink is now deprecated and the RollingSink is removed
* the MapR artifact repository dependency has been removed
* fix leaking files issue in network stack
* fix YarnResourceManager does not handle slot allocations in certain cases
* fix potential distributed deadlock in case of synchronous savepoint
failure

This RC contains the following contents:

* the preview source release and binary convenience releases [1], which are
signed with the key with fingerprint
CAF4118AFAD5821A45BFC30FD51C02C7F7059BA4 [2],
* all artifacts that would normally be deployed to the Maven Central
Repository [3],

To test with these artifacts, you can create a settings.xml file with the
content shown below [4].
This settings file can be referenced in your maven commands via `--settings
/path/to/settings.xml`.
This is useful for creating a quickstart project based on the staged
release and also for building against the staged jars.

Stay tuned!

Best,
Kurt

[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.0-rc1/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1232
[4]
```

 
   flink-1.9.0
 
 
   
 flink-1.9.0
 
   
 flink-1.9.0
 

https://repository.apache.org/content/repositories/orgapacheflink-1232
 
   
   
 archetype
 

https://repository.apache.org/content/repositories/orgapacheflink-1232
 
   
 
   
 

```


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Jark Wu
+1 (non-binding)

Cheers,
Jark

On Thu, 1 Aug 2019 at 17:45, Yu Li  wrote:

> +1 (non-binding)
>
> Thanks for driving this!
>
> Best Regards,
> Yu
>
>
> On Thu, 1 Aug 2019 at 11:41, Till Rohrmann  wrote:
>
> > +1
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 1, 2019 at 10:39 AM vino yang  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Jeff Zhang  于2019年8月1日周四 下午4:33写道:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Stephan Ewen  于2019年8月1日周四 下午4:29写道:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> > wrote:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > Thanks a lot for driving this.
> > > > > > +1 (non-binding).
> > > > > >
> > > > > > Regards,
> > > > > > Dian
> > > > > >
> > > > > > > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Publish the PyFlink into PyPI is very important for our user,
> > > Please
> > > > > vote
> > > > > > > on the following proposal:
> > > > > > >
> > > > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > > > "apache-flink"
> > > > > > > 2. Release one binary with the default Scala version same with
> > > flink
> > > > > > > default config.
> > > > > > > 3. Create an account, named "pyflink" as owner(only PMC can
> > manage
> > > > it).
> > > > > > PMC
> > > > > > > can add account for the Release Manager, but Release Manager
> can
> > > not
> > > > > > delete
> > > > > > > the release.
> > > > > > >
> > > > > > > [ ] +1, Approve the proposal.
> > > > > > > [ ] -1, Disapprove the proposal, because ...
> > > > > > >
> > > > > > > The vote will be open for at least 72 hours. It is adopted by a
> > > > simple
> > > > > > > majority with a minimum of three positive votes.
> > > > > > >
> > > > > > > See discussion threads for more details [1].
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jincheng
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
>


Re: [DISCUSS][CODE STYLE] Create collections always with initial capacity

2019-08-01 Thread Xintong Song
+1 on setting initial capacity only when have good expectation on the
collection size.

Thank you~

Xintong Song



On Thu, Aug 1, 2019 at 2:32 PM Andrey Zagrebin  wrote:

> Hi all,
>
> As you probably already noticed, Stephan has triggered a discussion thread
> about code style guide for Flink [1]. Recently we were discussing
> internally some smaller concerns and I would like start separate threads
> for them.
>
> This thread is about creating collections always with initial capacity. As
> you might have seen, some parts of our code base always initialise
> collections with some non-default capacity. You can even activate a check
> in IntelliJ Idea that can monitor and highlight creation of collection
> without initial capacity.
>
> Pros:
> - performance gain if there is a good reasoning about initial capacity
> - the capacity is always deterministic and does not depend on any changes
> of its default value in Java
> - easy to follow: always initialise, has IDE support for detection
>
> Cons (for initialising w/o good reasoning):
> - We are trying to outsmart JVM. When there is no good reasoning about
> initial capacity, we can rely on JVM default value.
> - It is even confusing e.g. for hash maps as the real size depends on the
> load factor.
> - It would only add minor performance gain.
> - a bit more code, increases maintenance burden.
>
> The conclusion is the following at the moment:
> Only set the initial capacity if you have a good idea about the expected
> size.
>
> Please, feel free to share you thoughts.
>
> Best,
> Andrey
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
>


[jira] [Created] (FLINK-13535) Do not abort transactions twice during KafkaProducer startup

2019-08-01 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-13535:
---

 Summary: Do not abort transactions twice during KafkaProducer 
startup
 Key: FLINK-13535
 URL: https://issues.apache.org/jira/browse/FLINK-13535
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Affects Versions: 1.8.1, 1.9.0
Reporter: Nico Kruber
Assignee: Nico Kruber


During startup of a transactional Kafka producer from previous state, we 
recover in two steps:
# in {{TwoPhaseCommitSinkFunction}}, we commit pending commit-transactions and 
abort pending transactions and then call into {{finishRecoveringContext()}}
# in {{FlinkKafkaProducer#finishRecoveringContext()}} we iterate over all 
recovered transaction IDs and abort them
This may lead to some transactions being worked on twice. Since this is quite 
some expensive operation, we unnecessarily slow down the job startup but could 
easily give {{finishRecoveringContext()}} a set of transactions that 
{{TwoPhaseCommitSinkFunction}} already covered instead.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[DISCUSS][CODE STYLE] Create collections always with initial capacity

2019-08-01 Thread Andrey Zagrebin
Hi all,

As you probably already noticed, Stephan has triggered a discussion thread
about code style guide for Flink [1]. Recently we were discussing
internally some smaller concerns and I would like start separate threads
for them.

This thread is about creating collections always with initial capacity. As
you might have seen, some parts of our code base always initialise
collections with some non-default capacity. You can even activate a check
in IntelliJ Idea that can monitor and highlight creation of collection
without initial capacity.

Pros:
- performance gain if there is a good reasoning about initial capacity
- the capacity is always deterministic and does not depend on any changes
of its default value in Java
- easy to follow: always initialise, has IDE support for detection

Cons (for initialising w/o good reasoning):
- We are trying to outsmart JVM. When there is no good reasoning about
initial capacity, we can rely on JVM default value.
- It is even confusing e.g. for hash maps as the real size depends on the
load factor.
- It would only add minor performance gain.
- a bit more code, increases maintenance burden.

The conclusion is the following at the moment:
Only set the initial capacity if you have a good idea about the expected
size.

Please, feel free to share you thoughts.

Best,
Andrey

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E


[jira] [Created] (FLINK-13534) Unable to query Hive table with decimal column

2019-08-01 Thread Rui Li (JIRA)
Rui Li created FLINK-13534:
--

 Summary: Unable to query Hive table with decimal column
 Key: FLINK-13534
 URL: https://issues.apache.org/jira/browse/FLINK-13534
 Project: Flink
  Issue Type: Bug
Reporter: Rui Li


Hit the following exception when access a Hive table with decimal column:

{noformat}

Caused by: org.apache.flink.table.api.TableException: TableSource of type 
org.apache.flink.batch.connectors.hive.HiveTableSource returned a DataSet of 
data type ROW<`x` LEGACY(BigDecimal)> that does not match with the data type 
ROW<`x` DECIMAL(10, 0)> declared by the TableSource.getProducedDataType() 
method. Please validate the implementation of the TableSource.
 at 
org.apache.flink.table.plan.nodes.dataset.BatchTableSourceScan.translateToPlan(BatchTableSourceScan.scala:118)
 at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.translate(BatchTableEnvImpl.scala:303)
 at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.translate(BatchTableEnvImpl.scala:281)
 at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.writeToSink(BatchTableEnvImpl.scala:117)
 at 
org.apache.flink.table.api.internal.TableEnvImpl.insertInto(TableEnvImpl.scala:564)
 at 
org.apache.flink.table.api.internal.TableEnvImpl.insertInto(TableEnvImpl.scala:516)
 at 
org.apache.flink.table.api.internal.BatchTableEnvImpl.insertInto(BatchTableEnvImpl.scala:59)
 at org.apache.flink.table.api.internal.TableImpl.insertInto(TableImpl.java:428)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: instable checkpointing after migration to flink 1.8

2019-08-01 Thread Congxian Qiu
Hi Bekir

I'll first comb through all the information here, and try to find out the
reason with you, maybe need you to share some more information :)

Best,
Congxian


Bekir Oguz  于2019年8月1日周四 下午5:00写道:

> Hi Fabian,
> Thanks for sharing this with us, but we’re already on version 1.8.1.
>
> What I don’t understand is which mechanism in Flink adds 15 minutes to the
> checkpoint duration occasionally. Can you maybe give us some hints on where
> to look at? Is there a default timeout of 15 minutes defined somewhere in
> Flink? I couldn’t find one.
>
> In our pipeline, most of the checkpoints complete in less than a minute
> and some of them completed in 15 minutes+(less than a minute).
> There’s definitely something which adds 15 minutes. This is happening in
> one or more subtasks during checkpointing.
>
> Please see the screenshot below:
>
> Regards,
> Bekir
>
>
>
> Op 23 jul. 2019, om 16:37 heeft Fabian Hueske  het
> volgende geschreven:
>
> Hi Bekir,
>
> Another user reported checkpointing issues with Flink 1.8.0 [1].
> These seem to be resolved with Flink 1.8.1.
>
> Hope this helps,
> Fabian
>
> [1]
>
> https://lists.apache.org/thread.html/991fe3b09fd6a052ff52e5f7d9cdd9418545e68b02e23493097d9bc4@%3Cuser.flink.apache.org%3E
>
> Am Mi., 17. Juli 2019 um 09:16 Uhr schrieb Congxian Qiu <
> qcx978132...@gmail.com>:
>
> Hi Bekir
>
> First of all, I think there is something wrong.  the state size is almost
> the same,  but the duration is different so much.
>
> The checkpoint for RocksDBStatebackend is dump sst files, then copy the
> needed sst files(if you enable incremental checkpoint, the sst files
> already on remote will not upload), then complete checkpoint. Can you check
> the network bandwidth usage during checkpoint?
>
> Best,
> Congxian
>
>
> Bekir Oguz  于2019年7月16日周二 下午10:45写道:
>
> Hi all,
> We have a flink job with user state, checkpointing to RocksDBBackend
> which is externally stored in AWS S3.
> After we have migrated our cluster from 1.6 to 1.8, we see occasionally
> that some slots do to acknowledge the checkpoints quick enough. As an
> example: All slots acknowledge between 30-50 seconds except only one slot
> acknowledges in 15 mins. Checkpoint sizes are similar to each other, like
> 200-400 MB.
>
> We did not experience this weird behaviour in Flink 1.6. We have 5 min
> checkpoint interval and this happens sometimes once in an hour sometimes
> more but not in all the checkpoint requests. Please see the screenshot
> below.
>
> Also another point: For the faulty slots, the duration is consistently 15
> mins and some seconds, we couldn’t find out where this 15 mins response
> time comes from. And each time it is a different task manager, not always
> the same one.
>
> Do you guys aware of any other users having similar issues with the new
> version and also a suggested bug fix or solution?
>
>
>
>
> Thanks in advance,
> Bekir Oguz
>
>
>
>


[jira] [Created] (FLINK-13533) CassandraConnectorITCase fails on Java 11

2019-08-01 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-13533:


 Summary: CassandraConnectorITCase fails on Java 11
 Key: FLINK-13533
 URL: https://issues.apache.org/jira/browse/FLINK-13533
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Cassandra, Tests
Reporter: Chesnay Schepler
 Fix For: 1.10.0


The \{{CassandraConnectorITCase}} fails on Java 11 with a timeout. We may have 
to disable the test since cassandra only supports Java 11 in 4.x which isn't 
released yet.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13532) Broken links in documentation

2019-08-01 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-13532:


 Summary: Broken links in documentation
 Key: FLINK-13532
 URL: https://issues.apache.org/jira/browse/FLINK-13532
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Chesnay Schepler


{code:java}
[2019-07-31 15:58:08] ERROR `/zh/dev/table/hive_integration_example.html' not 
found.
[2019-07-31 15:58:10] ERROR `/zh/dev/table/types.html' not found.
[2019-07-31 15:58:10] ERROR `/zh/dev/table/hive_integration.html' not found.
[2019-07-31 15:58:14] ERROR `/zh/dev/restart_strategies.html' not found.
http://localhost:4000/zh/dev/table/hive_integration_example.html:
Remote file does not exist -- broken link!!!
--
http://localhost:4000/zh/dev/table/types.html:
Remote file does not exist -- broken link!!!
http://localhost:4000/zh/dev/table/hive_integration.html:
Remote file does not exist -- broken link!!!
--
http://localhost:4000/zh/dev/restart_strategies.html:
Remote file does not exist -- broken link!!!{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13531) Do not print log and call release if no requests should be evicted in slot sharing

2019-08-01 Thread Yun Gao (JIRA)
Yun Gao created FLINK-13531:
---

 Summary: Do not print log and call release if no requests should 
be evicted in slot sharing
 Key: FLINK-13531
 URL: https://issues.apache.org/jira/browse/FLINK-13531
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Yun Gao


After adding the logic to bookkeeping the resource used in the shared slots, 
the resource requests will be recorded inside the MultiTaskSlot and when the 
underlying slot is allocated, all the resource requests will be checked if 
there is over-subscription, if so, some requests will be failed.

In the current implementation, the code does not check the amount to fail 
before printing the over-allocated debug log and tries to fail them. This 
should not cause actual errors, but it will 
 # Print a debug log saying some requests will be failed even if no one to fail.
 # If the total number of requests is 0 (This is possible if there already 
AllocatedSlot before the first request), the _release_ method will be called. 
Although it will do nothing with the current implementation (the slot is still 
being created and not added to any other data structure), it may cause error if 
the release logic changes in the future.

To fix this issue, we should add a explicit check on the number of requests to 
fail.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] Backport FLINK-13326 to 1.9 release

2019-08-01 Thread Stephan Ewen
For clarification: It does not fix the loop fault tolerance in the existing
API. There, the concepts are a bit messy and this needs a bigger overhaul.

It is a building block for building fault tolerant loops in downstream
projects.

On Thu, Aug 1, 2019 at 11:45 AM Becket Qin  wrote:

> +1 as well. If this affects the fault tolerance of streaming iteration, I'd
> consider this as a bug fix.
>
> On Thu, Aug 1, 2019 at 11:44 AM Till Rohrmann 
> wrote:
>
> > I've quickly glanced over the changes and I would be ok with backporting
> it
> > if it helps fixing fault tolerance of streaming iterations. Hence +1 from
> > my side.
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 1, 2019 at 11:20 AM Ufuk Celebi  wrote:
> >
> > > Thanks for checking. No concerns on my side. +1 to back port. Fixing
> > fault
> > > tolerance of streaming iterations sounds like a very valuable thing to
> > > unblock with this release.
> > >
> > > – Ufuk
> > >
> > >
> > > On Thu, Aug 1, 2019 at 11:02 AM Stephan Ewen  wrote:
> > >
> > > > Hi all!
> > > >
> > > > I would like to backport a minor chance from 'master' to
> 'release-1.9'.
> > > > It is a very minor change
> > > >
> > > > I am checking here because this is not technically a bug fix, but a
> way
> > > of
> > > > exposing the raw keyed state stream in tasks a bit different. It
> would
> > > > unblock some work in a project that tries to fix the fault tolerance
> of
> > > > streaming iterations.
> > > >
> > > > Any concerns?
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > >
> >
>


Re: [DISCUSS] Backport FLINK-13326 to 1.9 release

2019-08-01 Thread Becket Qin
+1 as well. If this affects the fault tolerance of streaming iteration, I'd
consider this as a bug fix.

On Thu, Aug 1, 2019 at 11:44 AM Till Rohrmann  wrote:

> I've quickly glanced over the changes and I would be ok with backporting it
> if it helps fixing fault tolerance of streaming iterations. Hence +1 from
> my side.
>
> Cheers,
> Till
>
> On Thu, Aug 1, 2019 at 11:20 AM Ufuk Celebi  wrote:
>
> > Thanks for checking. No concerns on my side. +1 to back port. Fixing
> fault
> > tolerance of streaming iterations sounds like a very valuable thing to
> > unblock with this release.
> >
> > – Ufuk
> >
> >
> > On Thu, Aug 1, 2019 at 11:02 AM Stephan Ewen  wrote:
> >
> > > Hi all!
> > >
> > > I would like to backport a minor chance from 'master' to 'release-1.9'.
> > > It is a very minor change
> > >
> > > I am checking here because this is not technically a bug fix, but a way
> > of
> > > exposing the raw keyed state stream in tasks a bit different. It would
> > > unblock some work in a project that tries to fix the fault tolerance of
> > > streaming iterations.
> > >
> > > Any concerns?
> > >
> > > Best,
> > > Stephan
> > >
> >
>


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Yu Li
+1 (non-binding)

Thanks for driving this!

Best Regards,
Yu


On Thu, 1 Aug 2019 at 11:41, Till Rohrmann  wrote:

> +1
>
> Cheers,
> Till
>
> On Thu, Aug 1, 2019 at 10:39 AM vino yang  wrote:
>
> > +1 (non-binding)
> >
> > Jeff Zhang  于2019年8月1日周四 下午4:33写道:
> >
> > > +1 (non-binding)
> > >
> > > Stephan Ewen  于2019年8月1日周四 下午4:29写道:
> > >
> > > > +1 (binding)
> > > >
> > > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> wrote:
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > Thanks a lot for driving this.
> > > > > +1 (non-binding).
> > > > >
> > > > > Regards,
> > > > > Dian
> > > > >
> > > > > > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Publish the PyFlink into PyPI is very important for our user,
> > Please
> > > > vote
> > > > > > on the following proposal:
> > > > > >
> > > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > > "apache-flink"
> > > > > > 2. Release one binary with the default Scala version same with
> > flink
> > > > > > default config.
> > > > > > 3. Create an account, named "pyflink" as owner(only PMC can
> manage
> > > it).
> > > > > PMC
> > > > > > can add account for the Release Manager, but Release Manager can
> > not
> > > > > delete
> > > > > > the release.
> > > > > >
> > > > > > [ ] +1, Approve the proposal.
> > > > > > [ ] -1, Disapprove the proposal, because ...
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by a
> > > simple
> > > > > > majority with a minimum of three positive votes.
> > > > > >
> > > > > > See discussion threads for more details [1].
> > > > > >
> > > > > > Thanks,
> > > > > > Jincheng
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>


Re: [DISCUSS] Backport FLINK-13326 to 1.9 release

2019-08-01 Thread Till Rohrmann
I've quickly glanced over the changes and I would be ok with backporting it
if it helps fixing fault tolerance of streaming iterations. Hence +1 from
my side.

Cheers,
Till

On Thu, Aug 1, 2019 at 11:20 AM Ufuk Celebi  wrote:

> Thanks for checking. No concerns on my side. +1 to back port. Fixing fault
> tolerance of streaming iterations sounds like a very valuable thing to
> unblock with this release.
>
> – Ufuk
>
>
> On Thu, Aug 1, 2019 at 11:02 AM Stephan Ewen  wrote:
>
> > Hi all!
> >
> > I would like to backport a minor chance from 'master' to 'release-1.9'.
> > It is a very minor change
> >
> > I am checking here because this is not technically a bug fix, but a way
> of
> > exposing the raw keyed state stream in tasks a bit different. It would
> > unblock some work in a project that tries to fix the fault tolerance of
> > streaming iterations.
> >
> > Any concerns?
> >
> > Best,
> > Stephan
> >
>


[jira] [Created] (FLINK-13530) AbstractServerTest failed on Travis

2019-08-01 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-13530:


 Summary: AbstractServerTest failed on Travis
 Key: FLINK-13530
 URL: https://issues.apache.org/jira/browse/FLINK-13530
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Queryable State, Tests
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.0


Likely just a port conflict (the range used in the test only covers 3 ports)
{code:java}
09:21:38.371 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 0.091 s <<< FAILURE! - in 
org.apache.flink.queryablestate.network.AbstractServerTest
09:21:38.371 [ERROR] 
testPortRangeSuccess(org.apache.flink.queryablestate.network.AbstractServerTest)
  Time elapsed: 0.062 s  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
at 
org.apache.flink.queryablestate.network.AbstractServerTest.testPortRangeSuccess(AbstractServerTest.java:125){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Till Rohrmann
+1

Cheers,
Till

On Thu, Aug 1, 2019 at 10:39 AM vino yang  wrote:

> +1 (non-binding)
>
> Jeff Zhang  于2019年8月1日周四 下午4:33写道:
>
> > +1 (non-binding)
> >
> > Stephan Ewen  于2019年8月1日周四 下午4:29写道:
> >
> > > +1 (binding)
> > >
> > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu  wrote:
> > >
> > > > Hi Jincheng,
> > > >
> > > > Thanks a lot for driving this.
> > > > +1 (non-binding).
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Publish the PyFlink into PyPI is very important for our user,
> Please
> > > vote
> > > > > on the following proposal:
> > > > >
> > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > "apache-flink"
> > > > > 2. Release one binary with the default Scala version same with
> flink
> > > > > default config.
> > > > > 3. Create an account, named "pyflink" as owner(only PMC can manage
> > it).
> > > > PMC
> > > > > can add account for the Release Manager, but Release Manager can
> not
> > > > delete
> > > > > the release.
> > > > >
> > > > > [ ] +1, Approve the proposal.
> > > > > [ ] -1, Disapprove the proposal, because ...
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by a
> > simple
> > > > > majority with a minimum of three positive votes.
> > > > >
> > > > > See discussion threads for more details [1].
> > > > >
> > > > > Thanks,
> > > > > Jincheng
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > >
> > > >
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


Re: [DISCUSS] CPU flame graph for a job vertex in web UI.

2019-08-01 Thread Till Rohrmann
Hi David,

thanks for starting this discussion. I like the idea of improving insights
into Flink's execution and I believe that a flame graph could be helpful.

I quickly glanced over your changes and I think they go in a good
direction. One idea could be to share the `StackTraceSample` produced by
the `StackTraceSampleCoordinator` between the different
`StackTraceOperatorTracker` so that we don't send multiple requests for the
same operators. That way we would decrease a bit the RPC load.

Apart from that, I think the next steps would be to find a committer who
could shepherd this effort and help you with merging it.

Cheers,
Till

On Wed, Jul 31, 2019 at 7:05 PM David Morávek  wrote:

> Hello,
>
> While looking into Flink internals, I've noticed that there is already a
> mechanism for stack-trace sampling of a particular job vertex.
>
> I think it may be really useful to allow user to easily render a cpu
> flamegraph  in a new UI for
> a
> selected vertex (new tab next to back pressure) of a running job. Back
> pressure tab already provides a good idea of which vertex causes trouble,
> but it's hard to say what's actually going on.
>
> I've tried to implement a basic REST endpoint
> <
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> >,
> that prepares data for the flame graph rendering and it seems to be
> providing good insight.
>
> It should be straightforward to render data from the endpoint in new UI
> using existing  javascript
> libraries.
>
> WDYT? Is this worth pushing forward?
>
> D.
>


Re: [DISCUSS] Backport FLINK-13326 to 1.9 release

2019-08-01 Thread Ufuk Celebi
Thanks for checking. No concerns on my side. +1 to back port. Fixing fault
tolerance of streaming iterations sounds like a very valuable thing to
unblock with this release.

– Ufuk


On Thu, Aug 1, 2019 at 11:02 AM Stephan Ewen  wrote:

> Hi all!
>
> I would like to backport a minor chance from 'master' to 'release-1.9'.
> It is a very minor change
>
> I am checking here because this is not technically a bug fix, but a way of
> exposing the raw keyed state stream in tasks a bit different. It would
> unblock some work in a project that tries to fix the fault tolerance of
> streaming iterations.
>
> Any concerns?
>
> Best,
> Stephan
>


[DISCUSS] Backport FLINK-13326 to 1.9 release

2019-08-01 Thread Stephan Ewen
Hi all!

I would like to backport a minor chance from 'master' to 'release-1.9'.
It is a very minor change

I am checking here because this is not technically a bug fix, but a way of
exposing the raw keyed state stream in tasks a bit different. It would
unblock some work in a project that tries to fix the fault tolerance of
streaming iterations.

Any concerns?

Best,
Stephan


[jira] [Created] (FLINK-13529) Verify and correct agg function's semantic for Blink planner

2019-08-01 Thread Jing Zhang (JIRA)
Jing Zhang created FLINK-13529:
--

 Summary: Verify and correct agg function's semantic for Blink 
planner
 Key: FLINK-13529
 URL: https://issues.apache.org/jira/browse/FLINK-13529
 Project: Flink
  Issue Type: Task
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.10.0
Reporter: Jing Zhang
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] ARM support for Flink

2019-08-01 Thread Stephan Ewen
Asking INFRA to add support means filing a JIRA ticket.

That works the same way as filing a FLINK Jira ticket, but selecting INFRA
as the project to file the ticket for.

On Thu, Aug 1, 2019 at 4:17 AM Xiyuan Wang  wrote:

> Thanks for your reply.
>
> We are now keeping investigating and debugging Flink on ARM.  It's hard for
> us to say How many kinds of test are enough for ARM support at this moment,
> but `core` and `test` are necessary of cause I think. What we do now is
> following travis-ci, added all the module that tarvis-ci contains.
>
> During out local test, there are just few tests failed[1]. We have
> solutions for some of them, others are still under debugging. Flink team's
> idea is welcome. And very thanks for your jira issue[2], we will keep
> updating it then.
>
> It'll be great if Infra Team could add OpenLab App[3](or other CI if Flink
> choose) to Flink repo. I'm not  clear how to talk with Infra Team, should
> Flink team start the discussion? Or I send a mail list to Infra? Need your
> help.
>
> Then once app is added, perhaps we can add `core` and `test` jobs as the
> first step, making them run stable and successful and then adding more
> modules if needed.
>
> [1]: https://etherpad.net/p/flink_arm64_support
> [2]: https://issues.apache.org/jira/browse/FLINK-13448
> [3]: https://github.com/apps/theopenlab-ci
>
> Regards
> wangxiyuan
>
> Stephan Ewen  于2019年7月31日周三 下午9:46写道:
>
> > Wow, that is pretty nice work, thanks a lot!
> >
> > We need some support from Apache Infra to see if we can connect the Flink
> > Github Repo with the OpenLab CI.
> > We would also need a discussion on the developer mailing list, to get
> > community agreement.
> >
> > Have you looked at whether we need to run all tests with ARM, or whether
> > maybe only the "core" and "tests" profile would be enough to get
> confidence
> > that Flink runs on ARM?
> > Just asking because Flink has a lot of long running tests by now that can
> > easily eat up a lot of CI capacity.
> >
> > Best,
> > Stephan
> >
> >
> >
> > On Tue, Jul 30, 2019 at 3:45 AM Xiyuan Wang 
> > wrote:
> >
> > > Hi Stephan,
> > >   Maybe I misled you in the previous email. We don't need to migrate CI
> > > completely, travis-ci is still there working for X86 arch. What we need
> > to
> > > do is to add another CI tool for ARM arch.
> > >
> > >   There are some ways to do it. As I wrote on
> > > https://issues.apache.org/jira/browse/FLINK-13199 to @Chesnay:
> > >
> > > 1. Add OpenLab CI system for ARM arch test.OpenLab is very similar with
> > > travis-ci. What Flilnk need to do is adding the openlab github app to
> the
> > > repo, then add the job define files inner Flink repo, Here is a POC by
> > me:
> > > https://github.com/theopenlab/flink/pull/1
> > > 2. OpenLab will donate ARM resouces to Apache Infra team as well. Then
> > > Flink can use the Apache offical  Jenkins system for Flink ARM test in
> > the
> > > future. https://builds.apache.org/
> > > 3. Use Drony CI which support ARM arch as well. https://drone.io/
> > >
> > > Since I'm from OpenLab community, if Flink choose OpenLab CI, My
> OpenLab
> > > colleague and I can keep helping and maintaining the ARM CI job. If
> > choose
> > > the 2nd way, the CI maintainance work may be handled by apache-infra
> > team I
> > > guess.  If choose the 3rd Drony CI, what we can help is very limited.
> > > AFAIK, Drony use container for CI test, which may not satisfy some
> > > requiremnts. And OpenLab use VM for test.
> > >
> > > Need Flink core team's decision and reply.
> > >
> > > Thanks.
> > >
> > >
> > > Stephan Ewen  于2019年7月29日周一 下午6:05写道:
> > >
> > > > I don't think it is feasible for Flink to migrate CI completely.
> > > >
> > > > Is there a way to add ARM tests on an external CI in addition?
> > > > @Chesnay what do you think?
> > > >
> > > >
> > > > On Fri, Jul 12, 2019 at 4:45 AM Xiyuan Wang <
> wangxiyuan1...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Stephan
> > > > >   yeah, we should add an ARM CI first. But Travis CI doesn't
> support
> > > ARM
> > > > > arch itself. OpenLab community support it. As I mentioned before,
> > > OpenLab
> > > > > is an opensource CI system like travis-ci.[1], it uses opensource
> CI
> > > > > project `zuul`[2] for its deployment. Now some opensource project
> has
> > > > > intergreted with it already. For example, `contained` project from
> > > > > CNCF community[3]. And I have a POC for Flink ARM build and test
> > using
> > > > > OpenLab. Now the build is passed[4], and I'm working on debugging
> > with
> > > > the
> > > > > `test` part[5]. Is it fine for Flink to using?
> > > > >
> > > > > [1]: https://openlabtesting.org
> > > > > [2]: https://zuul-ci.org/docs/zuul/
> > > > > [3]: https://status.openlabtesting.org/projects
> > > > > [4]:
> > > > >
> > >
> https://status.openlabtesting.org/build/2aa33f1a87854679b70f36bd6f75a890
> > > > > [5]: https://github.com/theopenlab/flink/pull/1
> > > > >
> > > > >
> > > > > Stephan Ewen  于2019年7月11日周四 

Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread vino yang
+1 (non-binding)

Jeff Zhang  于2019年8月1日周四 下午4:33写道:

> +1 (non-binding)
>
> Stephan Ewen  于2019年8月1日周四 下午4:29写道:
>
> > +1 (binding)
> >
> > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu  wrote:
> >
> > > Hi Jincheng,
> > >
> > > Thanks a lot for driving this.
> > > +1 (non-binding).
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> > > >
> > > > Hi all,
> > > >
> > > > Publish the PyFlink into PyPI is very important for our user, Please
> > vote
> > > > on the following proposal:
> > > >
> > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > "apache-flink"
> > > > 2. Release one binary with the default Scala version same with flink
> > > > default config.
> > > > 3. Create an account, named "pyflink" as owner(only PMC can manage
> it).
> > > PMC
> > > > can add account for the Release Manager, but Release Manager can not
> > > delete
> > > > the release.
> > > >
> > > > [ ] +1, Approve the proposal.
> > > > [ ] -1, Disapprove the proposal, because ...
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by a
> simple
> > > > majority with a minimum of three positive votes.
> > > >
> > > > See discussion threads for more details [1].
> > > >
> > > > Thanks,
> > > > Jincheng
> > > >
> > > > [1]
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > >
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [DISCUSS] ARM support for Flink

2019-08-01 Thread Chesnay Schepler
Please open a JIRA with INFRA and ask whether OpenLab/Drone are 
supported by INFRA.


On 01/08/2019 04:16, Xiyuan Wang wrote:

Thanks for your reply.

We are now keeping investigating and debugging Flink on ARM.  It's hard for
us to say How many kinds of test are enough for ARM support at this moment,
but `core` and `test` are necessary of cause I think. What we do now is
following travis-ci, added all the module that tarvis-ci contains.

During out local test, there are just few tests failed[1]. We have
solutions for some of them, others are still under debugging. Flink team's
idea is welcome. And very thanks for your jira issue[2], we will keep
updating it then.

It'll be great if Infra Team could add OpenLab App[3](or other CI if Flink
choose) to Flink repo. I'm not  clear how to talk with Infra Team, should
Flink team start the discussion? Or I send a mail list to Infra? Need your
help.

Then once app is added, perhaps we can add `core` and `test` jobs as the
first step, making them run stable and successful and then adding more
modules if needed.

[1]: https://etherpad.net/p/flink_arm64_support
[2]: https://issues.apache.org/jira/browse/FLINK-13448
[3]: https://github.com/apps/theopenlab-ci

Regards
wangxiyuan

Stephan Ewen  于2019年7月31日周三 下午9:46写道:


Wow, that is pretty nice work, thanks a lot!

We need some support from Apache Infra to see if we can connect the Flink
Github Repo with the OpenLab CI.
We would also need a discussion on the developer mailing list, to get
community agreement.

Have you looked at whether we need to run all tests with ARM, or whether
maybe only the "core" and "tests" profile would be enough to get confidence
that Flink runs on ARM?
Just asking because Flink has a lot of long running tests by now that can
easily eat up a lot of CI capacity.

Best,
Stephan



On Tue, Jul 30, 2019 at 3:45 AM Xiyuan Wang 
wrote:


Hi Stephan,
   Maybe I misled you in the previous email. We don't need to migrate CI
completely, travis-ci is still there working for X86 arch. What we need

to

do is to add another CI tool for ARM arch.

   There are some ways to do it. As I wrote on
https://issues.apache.org/jira/browse/FLINK-13199 to @Chesnay:

1. Add OpenLab CI system for ARM arch test.OpenLab is very similar with
travis-ci. What Flilnk need to do is adding the openlab github app to the
repo, then add the job define files inner Flink repo, Here is a POC by

me:

https://github.com/theopenlab/flink/pull/1
2. OpenLab will donate ARM resouces to Apache Infra team as well. Then
Flink can use the Apache offical  Jenkins system for Flink ARM test in

the

future. https://builds.apache.org/
3. Use Drony CI which support ARM arch as well. https://drone.io/

Since I'm from OpenLab community, if Flink choose OpenLab CI, My OpenLab
colleague and I can keep helping and maintaining the ARM CI job. If

choose

the 2nd way, the CI maintainance work may be handled by apache-infra

team I

guess.  If choose the 3rd Drony CI, what we can help is very limited.
AFAIK, Drony use container for CI test, which may not satisfy some
requiremnts. And OpenLab use VM for test.

Need Flink core team's decision and reply.

Thanks.


Stephan Ewen  于2019年7月29日周一 下午6:05写道:


I don't think it is feasible for Flink to migrate CI completely.

Is there a way to add ARM tests on an external CI in addition?
@Chesnay what do you think?


On Fri, Jul 12, 2019 at 4:45 AM Xiyuan Wang 
wrote:


Hi Stephan
   yeah, we should add an ARM CI first. But Travis CI doesn't support

ARM

arch itself. OpenLab community support it. As I mentioned before,

OpenLab

is an opensource CI system like travis-ci.[1], it uses opensource CI
project `zuul`[2] for its deployment. Now some opensource project has
intergreted with it already. For example, `contained` project from
CNCF community[3]. And I have a POC for Flink ARM build and test

using

OpenLab. Now the build is passed[4], and I'm working on debugging

with

the

`test` part[5]. Is it fine for Flink to using?

[1]: https://openlabtesting.org
[2]: https://zuul-ci.org/docs/zuul/
[3]: https://status.openlabtesting.org/projects
[4]:


https://status.openlabtesting.org/build/2aa33f1a87854679b70f36bd6f75a890

[5]: https://github.com/theopenlab/flink/pull/1


Stephan Ewen  于2019年7月11日周四 下午9:56写道:


I think an ARM release would be cool.

To actually support that properly, we would need something like an

ARM

profile for the CI builds (at least in the nightly tests),

otherwise

ARM

support would probably be broken frequently.
Maybe that could be a way to start? Create a Travis CI ARM build

(if

possible) and see what tests pass and which parts of the system

would

need

to be adjusted?

On Thu, Jul 11, 2019 at 9:24 AM Xiyuan Wang <

wangxiyuan1...@gmail.com>

wrote:


Hi yun:
   I didn't try to build rocksdb with vagrant, but just `make -j8
rocksdbjava` directly in an ARM machine.  We hit some issues as

well.

My

colleague has created an issue in rocksdb[1]. Rocksdb doesn't

contains

ARM

.so file 

Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Jeff Zhang
+1 (non-binding)

Stephan Ewen  于2019年8月1日周四 下午4:29写道:

> +1 (binding)
>
> On Thu, Aug 1, 2019 at 9:52 AM Dian Fu  wrote:
>
> > Hi Jincheng,
> >
> > Thanks a lot for driving this.
> > +1 (non-binding).
> >
> > Regards,
> > Dian
> >
> > > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> > >
> > > Hi all,
> > >
> > > Publish the PyFlink into PyPI is very important for our user, Please
> vote
> > > on the following proposal:
> > >
> > > 1. Create PyPI Project for Apache Flink Python API, named:
> "apache-flink"
> > > 2. Release one binary with the default Scala version same with flink
> > > default config.
> > > 3. Create an account, named "pyflink" as owner(only PMC can manage it).
> > PMC
> > > can add account for the Release Manager, but Release Manager can not
> > delete
> > > the release.
> > >
> > > [ ] +1, Approve the proposal.
> > > [ ] -1, Disapprove the proposal, because ...
> > >
> > > The vote will be open for at least 72 hours. It is adopted by a simple
> > > majority with a minimum of three positive votes.
> > >
> > > See discussion threads for more details [1].
> > >
> > > Thanks,
> > > Jincheng
> > >
> > > [1]
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> >
> >
>


-- 
Best Regards

Jeff Zhang


Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Stephan Ewen
+1 (binding)

On Thu, Aug 1, 2019 at 9:52 AM Dian Fu  wrote:

> Hi Jincheng,
>
> Thanks a lot for driving this.
> +1 (non-binding).
>
> Regards,
> Dian
>
> > 在 2019年8月1日,下午3:24,jincheng sun  写道:
> >
> > Hi all,
> >
> > Publish the PyFlink into PyPI is very important for our user, Please vote
> > on the following proposal:
> >
> > 1. Create PyPI Project for Apache Flink Python API, named: "apache-flink"
> > 2. Release one binary with the default Scala version same with flink
> > default config.
> > 3. Create an account, named "pyflink" as owner(only PMC can manage it).
> PMC
> > can add account for the Release Manager, but Release Manager can not
> delete
> > the release.
> >
> > [ ] +1, Approve the proposal.
> > [ ] -1, Disapprove the proposal, because ...
> >
> > The vote will be open for at least 72 hours. It is adopted by a simple
> > majority with a minimum of three positive votes.
> >
> > See discussion threads for more details [1].
> >
> > Thanks,
> > Jincheng
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
>
>


[jira] [Created] (FLINK-13528) Kafka 0.10/0.11 E2E test fail on Java 11

2019-08-01 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-13528:


 Summary: Kafka 0.10/0.11 E2E test fail on Java 11
 Key: FLINK-13528
 URL: https://issues.apache.org/jira/browse/FLINK-13528
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka, Tests
Reporter: Chesnay Schepler
 Fix For: 1.10.0


The kafka 0.10/0.11 E2E tests fail on Java 11 with a timeout. Since kafka added 
support for Java 11 in 2.1.0 we may have to just disable them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13527) Instable KafkaProducerExactlyOnceITCase due to CheckpointFailureManager

2019-08-01 Thread Yun Tang (JIRA)
Yun Tang created FLINK-13527:


 Summary: Instable KafkaProducerExactlyOnceITCase due to 
CheckpointFailureManager
 Key: FLINK-13527
 URL: https://issues.apache.org/jira/browse/FLINK-13527
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Yun Tang
 Fix For: 1.9.0


[~banmoy] and I met this instable test below:

[https://api.travis-ci.org/v3/job/565270958/log.txt]
 [https://api.travis-ci.com/v3/job/221237628/log.txt]

The root cause is task {{Source: Custom Source -> Map -> Sink: Unnamed (1/1)}} 
failed due to expected artificial test failure and then free task resource 
including closing the registry. However, the async checkpoint thread in 
{{SourceStreamTask}} would then failed and send decline checkpoint message to 
JM.
 The key logs is like:
{code:java}
03:36:46,639 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph   
 - Source: Custom Source -> Map -> Sink: Unnamed (1/1) 
(f45ff068d2c80da22c2a958739ec0c87) switched from RUNNING to FAILED.
java.lang.Exception: Artificial Test Failure
at 
org.apache.flink.streaming.connectors.kafka.testutils.FailingIdentityMapper.map(FailingIdentityMapper.java:79)
at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at 
org.apache.flink.streaming.connectors.kafka.testutils.IntegerSource.run(IntegerSource.java:75)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:172)
03:36:46,637 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Decline checkpoint 12 by task f45ff068d2c80da22c2a958739ec0c87 of job 
d5b629623731c66f1bac89dec3e87b89 at 03cbfd77-0727-4366-83c4-9aa4923fc817 @ 
localhost (dataPort=-1).
03:36:46,640 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
 - Discarding checkpoint 12 of job d5b629623731c66f1bac89dec3e87b89.
org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
snapshot 12 for operator Source: Custom Source -> Map -> Sink: Unnamed (1/1). 
Failure reason: Checkpoint was declined.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:431)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1248)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1182)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:853)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:758)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:667)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.triggerCheckpoint(SourceStreamTask.java:147)
at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot register Closeable, registry is already 
closed. Closing argument.
at 
org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:122)
at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:110)
at 

Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Dian Fu
Hi Jincheng,

Thanks a lot for driving this.
+1 (non-binding).

Regards,
Dian

> 在 2019年8月1日,下午3:24,jincheng sun  写道:
> 
> Hi all,
> 
> Publish the PyFlink into PyPI is very important for our user, Please vote
> on the following proposal:
> 
> 1. Create PyPI Project for Apache Flink Python API, named: "apache-flink"
> 2. Release one binary with the default Scala version same with flink
> default config.
> 3. Create an account, named "pyflink" as owner(only PMC can manage it). PMC
> can add account for the Release Manager, but Release Manager can not delete
> the release.
> 
> [ ] +1, Approve the proposal.
> [ ] -1, Disapprove the proposal, because ...
> 
> The vote will be open for at least 72 hours. It is adopted by a simple
> majority with a minimum of three positive votes.
> 
> See discussion threads for more details [1].
> 
> Thanks,
> Jincheng
> 
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html



Re: [DISCUSS] Publish the PyFlink into PyPI

2019-08-01 Thread jincheng sun
Thanks for your confirm Till !
Publish the PyFlink into PyPI is very important for our user,  I
have initiated a voting thread.

Best,
Jincheng

Till Rohrmann  于2019年7月29日周一 下午3:01写道:

> Sounds good to me. Thanks for driving this discussion.
>
> Cheers,
> Till
>
> On Mon, Jul 29, 2019 at 9:24 AM jincheng sun 
> wrote:
>
> > Yes Till, I think you are correct that we should make sure that the
> > published Flink Python API cannot be arbitrarily deleted.
> >
> > So, It seems that our current consensus is:
> >
> > 1. Should we re publish the PyFlink into PyPI --> YES
> > 2. PyPI Project Name ---> apache-flink
> > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one
> binary
> > with the default Scala version same with flink default config.
> > 4. PyPI account for release --> Create an account such as 'pyflink' as
> > owner(only PMC can manage it) and adds the release manager's account as
> > maintainers of the project. Release managers publish the package to PyPI
> > using their own account but can not delete the release.
> >
> > So, If there no other comments, I think we should initiate a voting
> thread.
> >
> > What do you think?
> >
> > Best, Jincheng
> >
> >
> > Till Rohrmann  于2019年7月24日周三 下午1:17写道:
> >
> > > Sorry for chiming in so late. I would be in favor of option #2.
> > >
> > > I guess that the PMC would need to give the credentials to the release
> > > manager for option #1. Hence, the PMC could also add the release
> manager
> > as
> > > a maintainer which makes sure that only the PMC can delete artifacts.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun <
> sunjincheng...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for all of your reply!
> > > >
> > > > Hi Stephan, thanks for the reply and prove the details we need to pay
> > > > attention to. such as: Readme and Trademark compliance. Regarding the
> > > PyPI
> > > > account for release,  #1 may have some risk that our release package
> > can
> > > be
> > > > deleted by anyone who know the password of the account. And in this
> > case
> > > > PMC would not have means to correct problems. So, I think the #2 is
> > > pretty
> > > > safe for flink community.
> > > >
> > > > Hi Jeff, thanks for share your thoughts. Python API just a
> > language
> > > > entry point. I think which binary should be contained in the release
> we
> > > > should make consistency with Java release policy.  So, currently we
> do
> > > not
> > > > add the Hadoop, connectors JARs into the release package.
> > > >
> > > > Hi Chesnay, agree that we should ship the very common binary in
> feature
> > > if
> > > > Java side already make the decision.
> > > >
> > > > So, our current consensus is:
> > > > 1. Should we re publish the PyFlink into PyPI --> YES
> > > > 2. PyPI Project Name ---> apache-flink
> > > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one
> > > binary
> > > > with the default Scala version same with flink default config.
> > > >
> > > > We still need discuss how to manage PyPI account for release:
> > > > 
> > > > > 1) Create an account such as 'pyflink' as the owner share it with
> all
> > > the
> > > > release managers and then release managers can publish the package to
> > > PyPI
> > > > using this account.
> > > > 2) Create an account such as 'pyflink' as owner(only PMC can
> manage
> > > it)
> > > > and adds the release manager's account as maintainers of the project.
> > > > Release managers publish the package to PyPI using their own account.
> > > > 
> > > > Stephan like the #1 but want PMC can correct the problems. (sounds
> like
> > > #2)
> > > > can you conform that ? @Stephan
> > > > Chesnay and I prefer to #2
> > > >
> > > > Best, Jincheng
> > > >
> > > > Chesnay Schepler  于2019年7月24日周三 下午3:57写道:
> > > >
> > > > > if we ship a binary, we should ship the binary we usually ship, not
> > > some
> > > > > highly customized version.
> > > > >
> > > > > On 24/07/2019 05:19, Dian Fu wrote:
> > > > > > Hi Stephan & Jeff,
> > > > > >
> > > > > > Thanks a lot for sharing your thoughts!
> > > > > >
> > > > > > Regarding the bundled jars, currently only the jars in the flink
> > > binary
> > > > > distribution is packaged in the pyflink package. That maybe a good
> > idea
> > > > to
> > > > > also bundle the other jars such as flink-hadoop-compatibility. We
> may
> > > > need
> > > > > also consider whether to bundle the format jars such as flink-avro,
> > > > > flink-json, flink-csv and the connector jars such as
> > > > flink-connector-kafka,
> > > > > etc.
> > > > > >
> > > > > > If FLINK_HOME is set, the binary distribution specified by
> > FLINK_HOME
> > > > > will be used instead.
> > > > > >
> > > > > > Regards,
> > > > > > Dian
> > > > > >
> > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang  写道:
> > > > > >>
> > > > > >> +1 for publishing pyflink to pypi.
> > > > > >>
> > > > > >> Regarding including jar, I just want to make sure which 

[jira] [Created] (FLINK-13526) Switching to a non existing database crashes sql-client

2019-08-01 Thread Rui Li (JIRA)
Rui Li created FLINK-13526:
--

 Summary: Switching to a non existing database crashes sql-client
 Key: FLINK-13526
 URL: https://issues.apache.org/jira/browse/FLINK-13526
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Reporter: Rui Li


sql-client crashes if user tries to switch to a non-existing DB:
{noformat}
Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Unexpected exception. This is a bug. Please consider filing an issue.
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:206)
Caused by: org.apache.flink.table.catalog.exceptions.CatalogException: A 
database with name [foo] does not exist in the catalog: [myhive].
at 
org.apache.flink.table.catalog.CatalogManager.setCurrentDatabase(CatalogManager.java:286)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.useDatabase(TableEnvironmentImpl.java:398)
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$useDatabase$5(LocalExecutor.java:258)
at 
org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:216)
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.useDatabase(LocalExecutor.java:256)
at 
org.apache.flink.table.client.cli.CliClient.callUseDatabase(CliClient.java:434)
at 
org.apache.flink.table.client.cli.CliClient.callCommand(CliClient.java:282)
at java.util.Optional.ifPresent(Optional.java:159)
at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:200)
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:123)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:105)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread jincheng sun
Hi all,

Publish the PyFlink into PyPI is very important for our user, Please vote
on the following proposal:

1. Create PyPI Project for Apache Flink Python API, named: "apache-flink"
2. Release one binary with the default Scala version same with flink
default config.
3. Create an account, named "pyflink" as owner(only PMC can manage it). PMC
can add account for the Release Manager, but Release Manager can not delete
the release.

[ ] +1, Approve the proposal.
[ ] -1, Disapprove the proposal, because ...

The vote will be open for at least 72 hours. It is adopted by a simple
majority with a minimum of three positive votes.

See discussion threads for more details [1].

Thanks,
Jincheng

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html


[jira] [Created] (FLINK-13525) Improve Expression validation test

2019-08-01 Thread Jark Wu (JIRA)
Jark Wu created FLINK-13525:
---

 Summary: Improve Expression validation test
 Key: FLINK-13525
 URL: https://issues.apache.org/jira/browse/FLINK-13525
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.10.0


Currently, the expression validation tests under 
{{org.apache.flink.table.expressions.validation}} are using following pattern 
to verify an exception should be thrown:


{code:scala}
@Test
  def testWrongKeyType(): Unit = {
testAllApis('f2.at(12), "f2.at(12)", "f2[12]", "FAIL")
  }
{code}


However, this only covers the first expression {{'f2.at(12)}} throw the 
expected exception. All the other expression will be ignored. We should improve 
this. And it's nice to test the expected exception message too.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13524) Typo in Builder method name from Elasticsearch example

2019-08-01 Thread Alberto Romero (JIRA)
Alberto Romero created FLINK-13524:
--

 Summary: Typo in Builder method name from Elasticsearch example
 Key: FLINK-13524
 URL: https://issues.apache.org/jira/browse/FLINK-13524
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.8.1
Reporter: Alberto Romero


The Builder method name from class ElasticsearchSink has got a typo (missing 
'd') in the Elasticsearch connector section for Scala (ES version 6.x).
{code:java}
new ElasticsearchSink.Builer[String]({code}
should be:
{code:java}
new ElasticsearchSink.Builder[String]({code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Looking for reviewer: FLINK-13127

2019-08-01 Thread David Morávek
Hi Till, the PR should be ready ;) Can you please do the final check?

Best,
D.

On Tue, Jul 23, 2019 at 3:01 PM Till Rohrmann  wrote:

> I've assigned the issue to you David. I think this feature makes sense. The
> only question I have is why we need to sort the values. But let's discuss
> the issue on the Github PR.
>
> @Tison and @Xintong, let me know once the PR is in mergeable state.
>
> Cheers,
> Till
>
> On Tue, Jul 23, 2019 at 4:14 AM Xintong Song 
> wrote:
>
> > David,
> >
> > Thank you for opening this PR. I also left a few comments.
> >
> > And I think we need a committer to assign this jira ticket to David.
> Maybe
> > Till or any other committer could look into this?
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Jul 22, 2019 at 8:37 PM Zili Chen  wrote:
> >
> > > Hi David,
> > >
> > > Just reviewed and left several comments. Thanks for your contribution!
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > David Morávek  于2019年7月22日周一 下午5:59写道:
> > >
> > > > Hi, I've prepared a small patch related to Yarn deployment. Would
> > anyone
> > > > please have a time to take a look at it? Thanks
> > > >
> > > > https://github.com/apache/flink/pull/9022
> > > >
> > > > Regards,
> > > > D.
> > > >
> > >
> >
>