[jira] [Created] (FLINK-22735) HiveTableSourceITCase.testStreamPartitionReadByCreateTime failed because of times out

2021-05-20 Thread Guowei Ma (Jira)
Guowei Ma created FLINK-22735:
-

 Summary: HiveTableSourceITCase.testStreamPartitionReadByCreateTime 
failed because of times out 
 Key: FLINK-22735
 URL: https://issues.apache.org/jira/browse/FLINK-22735
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.14.0
Reporter: Guowei Ma


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18205&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=e7f339b2-a7c3-57d9-00af-3712d4b15354&l=23726


{code:java}
May 20 22:22:26 [ERROR] Tests run: 19, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 225.004 s <<< FAILURE! - in 
org.apache.flink.connectors.hive.HiveTableSourceITCase
May 20 22:22:26 [ERROR] 
testStreamPartitionReadByCreateTime(org.apache.flink.connectors.hive.HiveTableSourceITCase)
  Time elapsed: 120.182 s  <<< ERROR!
May 20 22:22:26 org.junit.runners.model.TestTimedOutException: test timed out 
after 12 milliseconds
May 20 22:22:26 at java.lang.Thread.sleep(Native Method)
May 20 22:22:26 at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sleepBeforeRetry(CollectResultFetcher.java:237)
May 20 22:22:26 at 
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:113)
May 20 22:22:26 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106)
May 20 22:22:26 at 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
May 20 22:22:26 at 
org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:370)
May 20 22:22:26 at 
org.apache.flink.connectors.hive.HiveTableSourceITCase.fetchRows(HiveTableSourceITCase.java:712)
May 20 22:22:26 at 
org.apache.flink.connectors.hive.HiveTableSourceITCase.testStreamPartitionReadByCreateTime(HiveTableSourceITCase.java:652)
May 20 22:22:26 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
May 20 22:22:26 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
May 20 22:22:26 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
May 20 22:22:26 at java.lang.reflect.Method.invoke(Method.java:498)
May 20 22:22:26 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
May 20 22:22:26 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
May 20 22:22:26 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
May 20 22:22:26 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
May 20 22:22:26 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
May 20 22:22:26 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
May 20 22:22:26 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
May 20 22:22:26 at java.lang.Thread.run(Thread.java:748)

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22734) ExtractionUtils#getClassReader causes "open too many files" error

2021-05-20 Thread zhiping lin (Jira)
zhiping lin created FLINK-22734:
---

 Summary: ExtractionUtils#getClassReader causes "open too many 
files" error
 Key: FLINK-22734
 URL: https://issues.apache.org/jira/browse/FLINK-22734
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.12.2
Reporter: zhiping lin


when we get classReader in ExtractionUtils 

 
{code:java}
    private static ClassReader getClassReader(Class cls) {        
final String className = cls.getName().replaceFirst("^.*\\.", "") + 
".class";        
try {            
return new ClassReader(cls.getResourceAsStream(className));        
} catch (IOException e) {
            throw new IllegalStateException("Could not instantiate 
ClassReader.", e);
        }
    }
{code}
we open a inputStream by "cls.getResourceAsStream(className)" and set it as a 
construct param for  ClassReader without any other steps to close it. Also 
ClassReader's construct method with only a outside inputStream, classReader 
won't close the inputStream while it finished read the file. This will leads to 
fd leak with "open too many files" error when we parse a lot independent sqls 
with different envs in one jvm(the case is that we want to extract the 
table/data lineage of our flink jobs).
{code:java}
lsof -p 8011 | wc -l 
10534
{code}
After checked all files and I found they are opened when loaded the udf jars 
for each loop to build the env. Then I test this method with try-with-source 
and it works well. So I wonder if it is necessary to change a little bit just 
like this

 
{code:java}
private static ClassReader getClassReader(Class cls) {
   final String className = cls.getName().replaceFirst("^.*\\.", "") + ".class";
   try (InputStream inputStream = cls.getResourceAsStream(className)) {
  return new ClassReader(inputStream);
   } catch (IOException e) {
  throw new IllegalStateException("Could not instantiate ClassReader.", e);
   }
}
{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-05-20 Thread Thomas Weise
Hi Nicholas,

Thanks for taking a look at the PR!

1. Regarding switching mechanism:

There has been previous discussion in this thread regarding the pros
and cons of how the switching can be exposed to the user.

With fixed start positions, no special switching interface to transfer
information between enumerators is required. Sources are configured as
they would be when used standalone and just plugged into HybridSource.
I expect that to be a common use case. You can find an example for
this in the ITCase:

https://github.com/apache/flink/pull/15924/files#diff-fe1407d135d7b7b3a72aeb4471ab53ccd2665a58ff0129a83db1ec19cea06f1bR101

For dynamic start position, the checkpoint state is used to transfer
information from old to new enumerator. An example for that can be
found here:

https://github.com/apache/flink/pull/15924/files#diff-fe1407d135d7b7b3a72aeb4471ab53ccd2665a58ff0129a83db1ec19cea06f1bR112-R136

That may look verbose, but the code to convert from one state to
another can be factored out into a utility and the function becomes a
one-liner.

For common sources like files and Kafka we can potentially (later)
implement the conversion logic as part of the respective connector's
checkpoint and split classes.

I hope that with the PR up for review, we can soon reach a conclusion
on how we want to expose this to the user.

Following is an example for Files -> Files -> Kafka that I'm using for
e2e testing. It exercises both ways of setting the start position.

https://gist.github.com/tweise/3139d66461e87986f6eddc70ff06ef9a


2. Regarding the events used to implement the actual switch between
enumerator and readers: I updated the PR with javadoc to clarify the
intent. Please let me know if that helps or let's continue to discuss
those details on the PR?


Thanks,
Thomas


On Mon, May 17, 2021 at 1:03 AM Nicholas Jiang  wrote:
>
> Hi Thomas,
>
>Sorry for later reply for your POC. I have reviewed the based abstract
> implementation of your pull request:
> https://github.com/apache/flink/pull/15924. IMO, for the switching
> mechanism, this level of abstraction is not concise enough, which doesn't
> make connector contribution easier. In theory, it is necessary to introduce
> a set of interfaces to support the switching mechanism. The SwitchableSource
> and SwitchableSplitEnumerator interfaces are needed for connector
> expansibility.
>In other words, the whole switching process of above mentioned PR is
> different from that mentioned in FLIP-150. In the above implementation, the
> source reading switching is executed after receving the SwitchSourceEvent,
> which could be before the sending SourceReaderFinishEvent. This timeline of
> source reading switching could be discussed here.
>@Stephan @Becket, if you are available, please help to review the
> abstract implementation, and compare with the interfaces mentioned in
> FLIP-150.
>
> Thanks,
> Nicholas Jiang
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/


[jira] [Created] (FLINK-22733) Type mismatch thrown in KeyedStream.union in Python DataStream API

2021-05-20 Thread Dian Fu (Jira)
Dian Fu created FLINK-22733:
---

 Summary: Type mismatch thrown in KeyedStream.union in Python 
DataStream API
 Key: FLINK-22733
 URL: https://issues.apache.org/jira/browse/FLINK-22733
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.13.0, 1.12.0
Reporter: Dian Fu
 Fix For: 1.13.1, 1.12.5


See 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/PyFlink-DataStream-union-type-mismatch-td43855.html]
 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22732) Restrict ALTER TABLE from setting empty table options

2021-05-20 Thread Jane Chan (Jira)
Jane Chan created FLINK-22732:
-

 Summary: Restrict ALTER TABLE from setting empty table options
 Key: FLINK-22732
 URL: https://issues.apache.org/jira/browse/FLINK-22732
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jane Chan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Watermark propagation with Sink API

2021-05-20 Thread Eron Wright
Filed FLIP-167: Watermarks for Sink API:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-167%3A+Watermarks+for+Sink+API

I'd like to call a vote next week, is that reasonable?


On Wed, May 19, 2021 at 6:28 PM Zhou, Brian  wrote:

> Hi Arvid and Eron,
>
> Thanks for the discussion and I read through Eron's pull request and I
> think this can benefit Pravega Flink connector as well.
>
> Here is some background. Pravega had the watermark concept through the
> event stream since two years ago, and here is a blog introduction[1] for
> Pravega watermark.
> Pravega Flink connector also had this watermark integration last year that
> we wanted to propagate the Flink watermark to Pravega in the SinkFunction,
> and at that time we just used the existing Flink API that we keep the last
> watermark in memory and check if watermark changes for each event[2] which
> is not efficient. With such new interface, we can also manage the watermark
> propagation much more easily.
>
> [1] https://pravega.io/blog/2019/11/08/pravega-watermarking-support/
> [2]
> https://github.com/pravega/flink-connectors/blob/master/src/main/java/io/pravega/connectors/flink/FlinkPravegaWriter.java#L465
>
> -Original Message-
> From: Arvid Heise 
> Sent: Wednesday, May 19, 2021 16:06
> To: dev
> Subject: Re: [DISCUSS] Watermark propagation with Sink API
>
>
> [EXTERNAL EMAIL]
>
> Hi Eron,
>
> Thanks for pushing that topic. I can now see that the benefit is even
> bigger than I initially thought. So it's worthwhile anyways to include that.
>
> I also briefly thought about exposing watermarks to all UDFs, but here I
> really have an issue to see specific use cases. Could you maybe take a few
> minutes to think about it as well? I could only see someone misusing Async
> IO as a sink where a real sink would be more appropriate. In general, if
> there is not a clear use case, we shouldn't add the functionality as it's
> just increased maintenance for no value.
>
> If we stick to the plan, I think your PR is already in a good shape. We
> need to create a FLIP for it though, since it changes Public interfaces
> [1]. I was initially not convinced that we should also change the old
> SinkFunction interface, but seeing how little the change is, I wouldn't
> mind at all to increase consistency. Only when we wrote the FLIP and
> approved it (which should be minimal and fast), we should actually look at
> the PR ;).
>
> The only thing which I would improve is the name of the function.
> processWatermark sounds as if the sink implementer really needs to
> implement it (as you would need to do it on a custom operator). I would
> make them symmetric to the record writing/invoking method (e.g.
> writeWatermark and invokeWatermark).
>
> As a follow-up PR, we should then migrate KafkaShuffle to the new API. But
> that's something I can do.
>
> [1]
>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/Flink*Improvement*Proposals__;Kys!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMnp6nc7o$
> [cwiki[.]apache[.]org]
>
> On Wed, May 19, 2021 at 3:34 AM Eron Wright  .invalid>
> wrote:
>
> > Update: opened an issue and a PR.
> >
> > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLIN
> > K-22700__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dM
> > plbgRO4$ [issues[.]apache[.]org]
> > https://urldefense.com/v3/__https://github.com/apache/flink/pull/15950
> > __;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMtScmG7a
> > $ [github[.]com]
> >
> >
> > On Tue, May 18, 2021 at 10:03 AM Eron Wright 
> > wrote:
> >
> > > Thanks Arvid and David for sharing your ideas on this subject.  I'm
> > > glad to hear that you're seeing use cases for watermark propagation
> > > via an enhanced sink interface.
> > >
> > > As you've guessed, my interest is in Pulsar and am exploring some
> > > options for brokering watermarks across stream processing pipelines.
> > > I think
> > Arvid
> > > is speaking to a high-fidelity solution where the difference between
> > intra-
> > > and inter-pipeline flow is eliminated.  My goal is more limited; I
> > > want
> > to
> > > write the watermark that arrives at the sink to Pulsar.  Simply
> > > imagine that Pulsar has native support for watermarking in its
> > > producer/consumer API, and we'll leave the details to another forum.
> > >
> > > David, I like your invariant.  I see lateness as stemming from the
> > problem
> > > domain and from system dynamics (e.g. scheduling, batching, lag).
> > > When
> > one
> > > depends on order-of-observation to generate watermarks, the app may
> > become
> > > unduly sensitive to dynamics which bear on order-of-observation.  My
> > > goal is to factor out the system dynamics from lateness determination.
> > >
> > > Arvid, to be most valuable (at least for my purposes) the
> > > enhancement is needed on SinkFunction.  This will allow us to easily
> > > evolve the existing Pulsar connector.
> > >
> > > Next step, I will 

[jira] [Created] (FLINK-22731) Environment variables are not passed to TM (YARN)

2021-05-20 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-22731:
-

 Summary: Environment variables are not passed to TM (YARN)
 Key: FLINK-22731
 URL: https://issues.apache.org/jira/browse/FLINK-22731
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Reporter: Roman Khachatryan


Reported in 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-forwarding-environment-variables-td43859.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Component labels in PR/commit messages

2021-05-20 Thread Roman Khachatryan
Thanks for raising this issue.

I agree with the above points.
One simple argument against labels is that they consume space in the
commit messages.

+1 to make labels optional

Regards,
Roman

On Thu, May 20, 2021 at 9:31 AM Robert Metzger  wrote:
>
> +1 to Till's proposal to update the wording.
>
> Regarding c) The guide [1] actually mentions a good heuristic for coming up
> with a label that is also suitable for newcomers: The maven module name
> where most of the changes are.
>
> [1]
> https://flink.apache.org/contributing/code-style-and-quality-pull-requests.html#commit-naming-conventions
>
> On Wed, May 19, 2021 at 11:14 AM Till Rohrmann  wrote:
>
> > I think a big problem with the component labels is that there is
> >
> > a) no defined set of labels
> > b) no way to enforce the usage of them
> > c) no easy way to figure out which label to use
> >
> > Due to these problems they are used very inconsistently in the project.
> >
> > I do agree with Arvid's observation that they are less and less often used
> > in new commits. Given this, we could think about adjusting our guidelines
> > to better reflect reality and make them "optional"/"nice-to-have" for
> > example.
> >
> > Cheers,
> > Till
> >
> > On Wed, May 19, 2021 at 10:52 AM Chesnay Schepler 
> > wrote:
> >
> > > For commit messages the labels are useful mostly when scanning the
> > > commit history, like searching for some commit that could've caused
> > > something /without knowing where that change was made/, because it
> > > enables you to quickly filter out commits by their label instead of
> > > having to read the entire title.
> > >
> > > I think in particular there is value in labeling documentation/build
> > > system changes; it allows me to worry less about the phrasing because I
> > > can assume the reader to have some context. For example,
> > >
> > > "[FLINK-X] Remove deprecated methods" vs "[FLINK-X][docs] Remove
> > > deprecated methods".
> > >
> > > You could of course argue to use "[FLINK-X] Remove deprecated methods
> > > from docs", but that's just a worse version of labeling.
> > >
> > >
> > > On 5/19/2021 10:31 AM, Arvid Heise wrote:
> > > > Dear devs,
> > > >
> > > > In the last couple of weeks, I have noticed that we are slacking a bit
> > on
> > > > the components in PR/commit messages. I'd like to gather some feedback
> > if
> > > > we still want to include them and if so, how we can improve the process
> > > of
> > > > finding the correct label.
> > > >
> > > > My personal opinion: So far, I have usually added the component because
> > > > it's in the coding guidelines. I have not really understood the
> > benefit.
> > > It
> > > > might be coming from a time where git support in IDE was lacking and it
> > > was
> > > > necessary to maintain an overview. I also have a hard time to find the
> > > > correct component at times; I often just repeat the component that I
> > find
> > > > in a blame. Nevertheless, I value consistency over personal taste and
> > > would
> > > > stick to the plan (and guide contributions towards it) if other devs
> > > > (especially committers) do it as well. But this has been causing some
> > > > friction in a couple of reviews for me.
> > > >
> > > > Could you please give your opinion on this matter? I think it's
> > important
> > > > to note that if long-term committers are not following it, it's really
> > > hard
> > > > for newer devs to follow that (git blame not helping in choosing the
> > > > component). Then we should remove it from the guidelines to make
> > > > contributions easier.
> > > >
> > > > Thanks
> > > >
> > > > Arvid
> > > >
> > >
> > >
> >


[jira] [Created] (FLINK-22730) Lookup join condition with CURRENT_DATE fails to filter records

2021-05-20 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-22730:
---

 Summary: Lookup join condition with CURRENT_DATE fails to filter 
records
 Key: FLINK-22730
 URL: https://issues.apache.org/jira/browse/FLINK-22730
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.13.0, 1.12.0
Reporter: Caizhi Weng


Add the following test case to 
org.apache.flink.table.api.TableEnvironmentITCase to reproduce this bug.

{code:scala}
@Test
def myTest(): Unit = {
  val id1 = TestValuesTableFactory.registerData(
Seq(Row.of("abc", LocalDateTime.of(2000, 1, 1, 0, 0
  val ddl1 =
s"""
   |CREATE TABLE Ta (
   |  id VARCHAR,
   |  ts TIMESTAMP,
   |  proc AS PROCTIME()
   |) WITH (
   |  'connector' = 'values',
   |  'data-id' = '$id1',
   |  'bounded' = 'true'
   |)
   |""".stripMargin
  tEnv.executeSql(ddl1)

  val id2 = TestValuesTableFactory.registerData(
Seq(Row.of("abc", LocalDateTime.of(2000, 1, 2, 0, 0
  val ddl2 =
s"""
   |CREATE TABLE Tb (
   |  id VARCHAR,
   |  ts TIMESTAMP
   |) WITH (
   |  'connector' = 'values',
   |  'data-id' = '$id2',
   |  'bounded' = 'true'
   |)
   |""".stripMargin
  tEnv.executeSql(ddl2)

  val it = tEnv.executeSql(
"""
  |SELECT * FROM Ta AS t1
  |INNER JOIN Tb FOR SYSTEM_TIME AS OF t1.proc AS t2
  |ON t1.id = t2.id
  |WHERE CAST(coalesce(t1.ts, t2.ts) AS VARCHAR) >= 
CONCAT(CAST(CURRENT_DATE AS VARCHAR), ' 00:00:00')
  |""".stripMargin).collect()

  while (it.hasNext) {
System.out.println(it.next())
  }
}
{code}

The result is
{code}
+I[abc, 2000-01-01T00:00, 2021-05-20T14:30:47.735Z, abc, 2000-01-02T00:00]
{code}

which is obviously incorrect.

The generated operator is as follows

{code:java}
public class JoinTableFuncCollector$22 extends 
org.apache.flink.table.runtime.collector.TableFunctionCollector {

org.apache.flink.table.data.GenericRowData out = new 
org.apache.flink.table.data.GenericRowData(2);
org.apache.flink.table.data.utils.JoinedRowData joinedRow$9 = new 
org.apache.flink.table.data.utils.JoinedRowData();
private static final java.util.TimeZone timeZone =
java.util.TimeZone.getTimeZone("Asia/Shanghai");
private org.apache.flink.table.data.TimestampData timestamp;
private org.apache.flink.table.data.TimestampData localTimestamp;
private int date;

private final org.apache.flink.table.data.binary.BinaryStringData str$17 = 
org.apache.flink.table.data.binary.BinaryStringData.fromString(" 00:00:00");


public JoinTableFuncCollector$22(Object[] references) throws Exception {

}

@Override
public void open(org.apache.flink.configuration.Configuration parameters) 
throws Exception {

}

@Override
public void collect(Object record) throws Exception {
org.apache.flink.table.data.RowData in1 = 
(org.apache.flink.table.data.RowData) getInput();
org.apache.flink.table.data.RowData in2 = 
(org.apache.flink.table.data.RowData) record;
org.apache.flink.table.data.binary.BinaryStringData field$7;
boolean isNull$7;
org.apache.flink.table.data.TimestampData field$8;
boolean isNull$8;
org.apache.flink.table.data.TimestampData field$10;
boolean isNull$10;
boolean isNull$13;
org.apache.flink.table.data.binary.BinaryStringData result$14;
boolean isNull$15;
org.apache.flink.table.data.binary.BinaryStringData result$16;
boolean isNull$18;
org.apache.flink.table.data.binary.BinaryStringData result$19;
boolean isNull$20;
boolean result$21;
isNull$8 = in2.isNullAt(1);
field$8 = null;
if (!isNull$8) {
field$8 = in2.getTimestamp(1, 6);
}
isNull$7 = in2.isNullAt(0);
field$7 = 
org.apache.flink.table.data.binary.BinaryStringData.EMPTY_UTF8;
if (!isNull$7) {
field$7 = ((org.apache.flink.table.data.binary.BinaryStringData) 
in2.getString(0));
}
isNull$10 = in1.isNullAt(1);
field$10 = null;
if (!isNull$10) {
field$10 = in1.getTimestamp(1, 6);
}



boolean result$11 = !isNull$10;
org.apache.flink.table.data.TimestampData result$12 = null;
boolean isNull$12;
if (result$11) {

isNull$12 = isNull$10;
if (!isNull$12) {
result$12 = field$10;
}
}
else {

isNull$12 = isNull$8;
if (!isNull$12) {
result$12 = field$8;
}
}
isNull$13 = isNull$12;
result$14 = 
org.apache.flink.table.data.binary.BinaryStringData.EMPTY_UTF8;
if (!isNull$13) {

result$14 = 
org.apache.flink.table.data.binary.BinaryStringData.fromString(org.apache.flink.table.runtime.funct

Re: [Statefun] Truncated Messages in Python workers

2021-05-20 Thread Stephan Ewen
Thanks for reporting this, it looks indeed like a potential bug.

I filed this Jira for it: https://issues.apache.org/jira/browse/FLINK-22729

Could you share (here ot in Jira) what the stack on the Python Worker side
is (for example which HTTP server)? Do you know if the message truncation
happens reliably at a certain message size?


On Wed, May 19, 2021 at 2:12 PM Jan Brusch 
wrote:

> Hi,
>
> recently we started seeing the following faulty behaviour in the Flink
> Stateful Functions HTTP communication towards external Python workers.
> This is only occuring when the system is under heavy load.
>
> The Java Application will send HTTP Messages to an external Python
> Function but the external Function fails to parse the message with a
> "Truncated Message Error". Printouts show that the truncated message
> looks as follows:
>
> --
>
> 
>
> my.protobuf.MyClass: 
>
> my.protobuf.MyClass: 
>
> my.protobuf.MyClass: 
>
> my.protobuf.MyClass: 
> --
>
> Which leads to the following Error in the Python worker:
>
> --
>
> Error Parsing Message: Truncated Message
>
> --
>
> Either the sender or the receiver (or something in between) seems to be
> truncacting some (not all) messages at some random point in the payload.
> The source code in both Flink SDKs looks to be correct. We temporarily
> solved this by setting the "maxNumBatchRequests" parameter in the
> external function definition really low. But this is not an ideal
> solution as we believe this adds considerable communication overhead
> between the Java and the Python Functions.
>
> The Stateful Function version is 2.2.2, java8. The Java App as well as
> the external Python workers are deployed in the same kubernetes cluster.
>
>
> Has anyone ever seen this problem before?
>
> Best regards
>
> Jan
>
> --
> neuland  – Büro für Informatik GmbH
> Konsul-Smidt-Str. 8g, 28217 Bremen
>
> Telefon (0421) 380107 57
> Fax (0421) 380107 99
> https://www.neuland-bfi.de
>
> https://twitter.com/neuland
> https://facebook.com/neulandbfi
> https://xing.com/company/neulandbfi
>
>
> Geschäftsführer: Thomas Gebauer, Jan Zander
> Registergericht: Amtsgericht Bremen, HRB 23395 HB
> USt-ID. DE 246585501
>
>


[jira] [Created] (FLINK-22729) Truncated Messages in Python workers

2021-05-20 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-22729:


 Summary: Truncated Messages in Python workers
 Key: FLINK-22729
 URL: https://issues.apache.org/jira/browse/FLINK-22729
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Affects Versions: statefun-2.2.2
 Environment: The Stateful Function version is 2.2.2, java8. The Java 
App as well as
the external Python workers are deployed in the same kubernetes cluster.
Reporter: Stephan Ewen
 Fix For: statefun-3.1.0


Recently we started seeing the following faulty behavior in the Flink
Stateful Functions HTTP communication towards external Python workers.
This is only occurring when the system is under heavy load.

The Java Application will send HTTP Messages to an external Python
Function but the external Function fails to parse the message with a
"Truncated Message Error". Printouts show that the truncated message
looks as follows:

{code}


my.protobuf.MyClass: 

my.protobuf.MyClass: 

my.protobuf.MyClass: 

my.protobuf.MyClass: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefun-Truncated-Messages-in-Python-workers-td43831.html





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Issue with using siddhi extension function with flink

2021-05-20 Thread Dipanjan Mazumder
Hi ,

   i am trying to integrate siddhi with flink while trying to use siddhi 
extension function on deploying the job in flink cluster it is not able to find 
those libraries at run time , so i had to explicitly put those libraries to the 
/opt/flink/lib folder for the jobmanager and taskmanager , fat jar of the flink 
job application has those libraries but it cannot identify those extension 
functions at runtime and putting them to the lib folder is not a feasible 
choice. Can you give some pointer on this problem.. thanks in advance ..




I have tried multiple ways to load the classes using class.forname etc.. but 
nothing works even if the fat jar for the flink job application has the siddhi 
extensions in it.i don’t want to add those libraries to the jobmanage and 
taskmanagers lib folder everytime.




Any help will be appreciated.




Regards

Dipanjan


[jira] [Created] (FLINK-22728) a problem of loading udf

2021-05-20 Thread JYXL (Jira)
JYXL created FLINK-22728:


 Summary: a problem of loading udf
 Key: FLINK-22728
 URL: https://issues.apache.org/jira/browse/FLINK-22728
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.13.0
 Environment: python3.7
centos 8
pyflink1.13.0
java1.11
Reporter: JYXL


hi:
 I'm using the stream udf by python.
 udf as bellow:
 
 class MyKeySelector(KeySelector):
 def __init__(self, partitions: int=6):
 self.partitions = partitions
 def get_key(self, value):
 return random.randint(0, self.partitions)
 
 when I code it with the main task in the same script, it works, 
 but when I make it in a simgle script, it cannot work.
 the archives as bellow:
 
 project:
 | __init__.py
 | key_function.py
 | main_task.py
 
 I'm confused when I use env.add_python_file method, it cannot work either, 
 no matter the parameter `file_path` is '~/project' or 
'~/project/key_function.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22727) SerializableAvroSchema cannot handle large schema

2021-05-20 Thread Serge Travin (Jira)
Serge Travin created FLINK-22727:


 Summary: SerializableAvroSchema cannot handle large schema
 Key: FLINK-22727
 URL: https://issues.apache.org/jira/browse/FLINK-22727
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.13.0
Reporter: Serge Travin


The issue is very similar to the 
https://issues.apache.org/jira/browse/FLINK-19491 

Here is the stack trace for the problem.
{noformat}
Caused by: java.lang.RuntimeException: Could not serialize serializer into the 
configuration.Caused by: java.lang.RuntimeException: Could not serialize 
serializer into the configuration. at 
org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.writeParametersToConfig(RuntimeSerializerFactory.java:61)
 ~[flink-core-1.13.0.jar:1.13.0] at 
org.apache.flink.runtime.operators.util.TaskConfig.setTypeSerializerFactory(TaskConfig.java:1211)
 ~[flink-runtime_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.runtime.operators.util.TaskConfig.setOutputSerializer(TaskConfig.java:594)
 ~[flink-runtime_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.connectJobVertices(JobGraphGenerator.java:1336)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.translateChannel(JobGraphGenerator.java:820)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.postVisit(JobGraphGenerator.java:664)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] ... 15 moreCaused by: 
java.io.UTFDataFormatException at 
java.io.ObjectOutputStream$BlockDataOutputStream.writeUTF(ObjectOutputStream.java:2164)
 ~[?:1.8.0_202] at 
java.io.ObjectOutputStream$BlockDataOutputStream.writeUTF(ObjectOutputStream.java:2007)
 ~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeUTF(ObjectOutputStream.java:869) ~[?:1.8.0_202] 
at 
org.apache.flink.formats.avro.typeutils.SerializableAvroSchema.writeObject(SerializableAvroSchema.java:55)
 ~[flink-avro-1.13.0.jar:1.13.0] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_202] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_202] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_202] at java.lang.reflect.Method.invoke(Method.java:498) 
~[?:1.8.0_202] at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
~[?:1.8.0_202] at 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) 
~[?:1.8.0_202] at 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:624)
 ~[flink-core-1.13.0.jar:1.13.0] at 
org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:546)
 ~[flink-core-1.13.0.jar:1.13.0] at 
org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.writeParametersToConfig(RuntimeSerializerFactory.java:59)
 ~[flink-core-1.13.0.jar:1.13.0] at 
org.apache.flink.runtime.operators.util.TaskConfig.setTypeSerializerFactory(TaskConfig.java:1211)
 ~[flink-runtime_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.runtime.operators.util.TaskConfig.setOutputSerializer(TaskConfig.java:594)
 ~[flink-runtime_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.connectJobVertices(JobGraphGenerator.java:1336)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.translateChannel(JobGraphGenerator.java:820)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] at 
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.postVisit(JobGraphGenerator.java:664)
 ~[flink-optimizer_2.12-1.13.0.jar:1.13.0] ... 15 more
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22726) Hive GROUPING__ID returns different value in older versions

2021-05-20 Thread Rui Li (Jira)
Rui Li created FLINK-22726:
--

 Summary: Hive GROUPING__ID returns different value in older 
versions
 Key: FLINK-22726
 URL: https://issues.apache.org/jira/browse/FLINK-22726
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22725) SlotManagers should unregister metrics at the start of suspend()

2021-05-20 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-22725:


 Summary: SlotManagers should unregister metrics at the start of 
suspend()
 Key: FLINK-22725
 URL: https://issues.apache.org/jira/browse/FLINK-22725
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination, Runtime / Metrics
Affects Versions: 1.13.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.14.0, 1.13.1


Slotmanagers register metrics in start(), but only unregister them in close().

This has 2 issues:

a) If the SM is restarted it cannot re-register the metrics because the old 
ones are still present; this isn't a critical problem (because the old ones 
still work) but it produces unnecesasry logging noise

b) The metric may produce an NPE when accessed by a reporter during suspend().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22724) NoSuchMethodError PackagedProgramUtils.isPython (Application mode in native kubernetes)

2021-05-20 Thread HYUNHOO KWON (Jira)
HYUNHOO KWON created FLINK-22724:


 Summary: NoSuchMethodError PackagedProgramUtils.isPython 
(Application mode in native kubernetes)
 Key: FLINK-22724
 URL: https://issues.apache.org/jira/browse/FLINK-22724
 Project: Flink
  Issue Type: Bug
Reporter: HYUNHOO KWON






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22723) multi-field-dynamic index is not supported by elasticsearch connector

2021-05-20 Thread Xiang Yang (Jira)
Xiang Yang created FLINK-22723:
--

 Summary: multi-field-dynamic index  is not supported by  
elasticsearch connector
 Key: FLINK-22723
 URL: https://issues.apache.org/jira/browse/FLINK-22723
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.12.3
Reporter: Xiang Yang


The current es connector can only use one field to generate dynamic index. I 
think it is useful to support multi fields. And we have done this work in my 
company.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22722) Add Documentation for Kafka New Source

2021-05-20 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-22722:
-

 Summary: Add Documentation for Kafka New Source
 Key: FLINK-22722
 URL: https://issues.apache.org/jira/browse/FLINK-22722
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Qingsheng Ren
 Fix For: 1.14.0


Documentation describing the usage of Kafka FLIP-27 new source is required in 
Flink documentations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Flink 1.13.1

2021-05-20 Thread Robert Metzger
Thanks Dawid for helping with the release & Thanks Konstantin for the
summary!


> * https://issues.apache.org/jira/browse/FLINK-22266: Assigned to Robert.
No PR yet.

There's actually a PR (Jira is not showing all links by default, you need
to click "show more"), and it is somewhat close to completion.


On Thu, May 20, 2021 at 10:15 AM Dawid Wysakowicz 
wrote:

> * https://issues.apache.org/jira/browse/FLINK-22686: This is marked as
> Blocker, but there is no one assigned yet. Piotr/Arvid: I wouldn't make
> this a blocker for this bug fix as it only affects unaligned checkpoints in
> combination with broadcast state. What do you think? Can you already give
> an estimated time to resolution?
>
> I forgot to assign myself. I am actually working on it and I am positive we 
> can fix it this week.
>
> If anyone would like to help preparing the RC I am happy to help with the 
> steps that require PMC permissions. If no one volunteers until Monday I can 
> also do it myself. However I'd appreciate if someone else could take care of 
> tracking the progress of the issues we want to include in the release.
>
> Best,
> Dawid
>
> On 20/05/2021 09:59, Konstantin Knauf wrote:
>
> Hi everyone,
>
> Let's see where we stand towards releasing Flink 1.13.1. I am aware of one
> additional license-related blocker that popped 
> up:https://issues.apache.org/jira/browse/FLINK-22706.  Overall:
>
> Done or close to done:
>
> * https://issues.apache.org/jira/browse/FLINK-22666: Done
> * https://issues.apache.org/jira/browse/FLINK-22494: Done
> * https://issues.apache.org/jira/browse/FLINK-22688: Matthias/Chesnay: PR
> available.
> * https://issues.apache.org/jira/browse/FLINK-22706
>
> Unclear:
>
> * https://issues.apache.org/jira/browse/FLINK-22686: This is marked as
> Blocker, but there is no one assigned yet. Piotr/Arvid: I wouldn't make
> this a blocker for this bug fix as it only affects unaligned checkpoints in
> combination with broadcast state. What do you think? Can you already give
> an estimated time to resolution?
> * https://issues.apache.org/jira/browse/FLINK-22266: Assigned to Robert. No
> PR yet.
> * https://issues.apache.org/jira/browse/FLINK-22646: Chesnay/David: Closed
> as "Won't fix", but the PR discussion is not fully concluded.
>
> Is there anything else?
>
> Overall, I would propose to wait until Friday for fixes to come in and
> create a release candidate early on Monday next week. I think we've already
> accumulated quite a few fixes that justify a patch release. Of course, it'd
> be nice if more fixes make it in, in particular FLINK-22686 would be nice
> to have so that we don't need a quick follow up patch release soon.
>
> Does any PMC member have time to manage the actual release early next week?
>
> Cheers,
>
> Konstantin
>
>
> On Tue, May 18, 2021 at 5:17 PM David Morávek  
> 
> wrote:
>
>
> Hi Konstantin,
>
> Would it be possible to add FLINK-22646 [1] into the release? This is a
> regression, that we need to workaround in order to support 1.13.x in Apache
> Beam [2].
>
> Best,
> D.
>
> [1] https://issues.apache.org/jira/browse/FLINK-22646
> [2] https://github.com/apache/beam/pull/14719
>
> On Tue, May 18, 2021 at 1:16 PM Matthias Pohl  
> 
> wrote:
>
>
> Thanks for initiating this discussion, Konstantin. FLINK-22494 [1] (Avoid
> discarding checkpoints in case of failure) is reviewed and backported. I
> have FLINK-22688 [2] (exception history not working properly for
>
> unassigned
>
> tasks) that came up recently. I provided a fix for it [3] already and
>
> hope
>
> to get it reviewed soon.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-22494
> [2] https://issues.apache.org/jira/browse/FLINK-22688
> [3] https://github.com/apache/flink/pull/15945
>
> On Mon, May 17, 2021 at 2:13 PM Arvid Heise  
>  wrote:
>
>
> I'd also like to have a fix 
> forhttps://issues.apache.org/jira/browse/FLINK-22686 that was reported
>
> today.
>
> On Mon, May 17, 2021 at 11:52 AM Timo Walther  
> 
>
> wrote:
>
> Hi Konstantin,
>
> thanks for starting the discussion. From the Table API side, we also
> fixed a couple of critical issues already that justify releasing a
> 1.13.1 asap.
>
> Personally, I would like to 
> includehttps://issues.apache.org/jira/browse/FLINK-22666 that fixes some
>
> last
>
> issues with the Scala Table API to DataStream conversion. It should
>
> be
>
> fixed today or tomorrow.
>
> Otherwise +1.
>
> Regards,
> Timo
>
>
> On 17.05.21 10:11, Robert Metzger wrote:
>
> Thanks a lot for starting the discussion about the release.
> I'd like to include
>
> https://issues.apache.org/jira/browse/FLINK-22266
>
> as
>
> well (I forgot to set the fixVersion accordingly). It's an
>
> important
>
> fix
>
> for the stop with savepoint operation of the adaptive scheduler.
>
> On Mon, May 17, 2021 at 10:03 AM Konstantin Knauf <
>
> kna...@apache.org
>
> wrote:
>
> Hi everyone,
>
> I would like to start discussing Flink 1.13.1. There are already
>
> quite a
>
> few critical fixes merged, speci

[jira] [Created] (FLINK-22721) Breaking HighAvailabilityServices interface by adding new method

2021-05-20 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-22721:
-

 Summary: Breaking HighAvailabilityServices interface by adding new 
method
 Key: FLINK-22721
 URL: https://issues.apache.org/jira/browse/FLINK-22721
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.0, 1.13.1, 1.12.5
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.14.0, 1.13.1, 1.12.5


As part of FLINK-20695 we introduced a new method to the 
{{HighAvailabilityServices.cleanupJobData}} interface. Since this method has 
not default implementation it is currently breaking change. Since Flink allows 
to implement custom Ha services using this interface, I suggest adding a 
default implementation for this method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Flink 1.13.1

2021-05-20 Thread Dawid Wysakowicz
* https://issues.apache.org/jira/browse/FLINK-22686: This is marked as
Blocker, but there is no one assigned yet. Piotr/Arvid: I wouldn't make
this a blocker for this bug fix as it only affects unaligned checkpoints
in combination with broadcast state. What do you think? Can you already
give an estimated time to resolution?

I forgot to assign myself. I am actually working on it and I am positive we can 
fix it this week.

If anyone would like to help preparing the RC I am happy to help with the steps 
that require PMC permissions. If no one volunteers until Monday I can also do 
it myself. However I'd appreciate if someone else could take care of tracking 
the progress of the issues we want to include in the release.

Best,
Dawid 

On 20/05/2021 09:59, Konstantin Knauf wrote:
> Hi everyone,
>
> Let's see where we stand towards releasing Flink 1.13.1. I am aware of one
> additional license-related blocker that popped up:
> https://issues.apache.org/jira/browse/FLINK-22706.  Overall:
>
> Done or close to done:
>
> * https://issues.apache.org/jira/browse/FLINK-22666: Done
> * https://issues.apache.org/jira/browse/FLINK-22494: Done
> * https://issues.apache.org/jira/browse/FLINK-22688: Matthias/Chesnay: PR
> available.
> * https://issues.apache.org/jira/browse/FLINK-22706
>
> Unclear:
>
> * https://issues.apache.org/jira/browse/FLINK-22686: This is marked as
> Blocker, but there is no one assigned yet. Piotr/Arvid: I wouldn't make
> this a blocker for this bug fix as it only affects unaligned checkpoints in
> combination with broadcast state. What do you think? Can you already give
> an estimated time to resolution?
> * https://issues.apache.org/jira/browse/FLINK-22266: Assigned to Robert. No
> PR yet.
> * https://issues.apache.org/jira/browse/FLINK-22646: Chesnay/David: Closed
> as "Won't fix", but the PR discussion is not fully concluded.
>
> Is there anything else?
>
> Overall, I would propose to wait until Friday for fixes to come in and
> create a release candidate early on Monday next week. I think we've already
> accumulated quite a few fixes that justify a patch release. Of course, it'd
> be nice if more fixes make it in, in particular FLINK-22686 would be nice
> to have so that we don't need a quick follow up patch release soon.
>
> Does any PMC member have time to manage the actual release early next week?
>
> Cheers,
>
> Konstantin
>
>
> On Tue, May 18, 2021 at 5:17 PM David Morávek 
> wrote:
>
>> Hi Konstantin,
>>
>> Would it be possible to add FLINK-22646 [1] into the release? This is a
>> regression, that we need to workaround in order to support 1.13.x in Apache
>> Beam [2].
>>
>> Best,
>> D.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-22646
>> [2] https://github.com/apache/beam/pull/14719
>>
>> On Tue, May 18, 2021 at 1:16 PM Matthias Pohl 
>> wrote:
>>
>>> Thanks for initiating this discussion, Konstantin. FLINK-22494 [1] (Avoid
>>> discarding checkpoints in case of failure) is reviewed and backported. I
>>> have FLINK-22688 [2] (exception history not working properly for
>> unassigned
>>> tasks) that came up recently. I provided a fix for it [3] already and
>> hope
>>> to get it reviewed soon.
>>>
>>> Matthias
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-22494
>>> [2] https://issues.apache.org/jira/browse/FLINK-22688
>>> [3] https://github.com/apache/flink/pull/15945
>>>
>>> On Mon, May 17, 2021 at 2:13 PM Arvid Heise  wrote:
>>>
 I'd also like to have a fix for
 https://issues.apache.org/jira/browse/FLINK-22686 that was reported
>>> today.
 On Mon, May 17, 2021 at 11:52 AM Timo Walther 
>>> wrote:
> Hi Konstantin,
>
> thanks for starting the discussion. From the Table API side, we also
> fixed a couple of critical issues already that justify releasing a
> 1.13.1 asap.
>
> Personally, I would like to include
> https://issues.apache.org/jira/browse/FLINK-22666 that fixes some
>> last
> issues with the Scala Table API to DataStream conversion. It should
>> be
> fixed today or tomorrow.
>
> Otherwise +1.
>
> Regards,
> Timo
>
>
> On 17.05.21 10:11, Robert Metzger wrote:
>> Thanks a lot for starting the discussion about the release.
>> I'd like to include
>>> https://issues.apache.org/jira/browse/FLINK-22266
 as
>> well (I forgot to set the fixVersion accordingly). It's an
>> important
 fix
>> for the stop with savepoint operation of the adaptive scheduler.
>>
>> On Mon, May 17, 2021 at 10:03 AM Konstantin Knauf <
>> kna...@apache.org
> wrote:
>>> Hi everyone,
>>>
>>> I would like to start discussing Flink 1.13.1. There are already
 quite a
>>> few critical fixes merged, specifically:
>>>
>>> * https://issues.apache.org/jira/browse/FLINK-22555 (LGPL-2.1
>> files
 in
>>> flink-python jars)
>>> * https://issues.apache.org/jira/browse/FLINK-17170 (Cannot stop
> streaming
>>> job with savepoint which uses kinesis 

[RESULT][VOTE] Release 1.12.4, release candidate #1

2021-05-20 Thread Arvid Heise
Dear devs,

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Roman Khachatryan
* Dawid Wysakowicz (binding)
* Robert Metzger (binding)
* Xingbo Huang
* Chesnay Schepler (binding)
* Leonard Xu
* Thomas Weise (binding)

There are no disapproving votes.

Thanks everyone!

Your friendly release manager Arvid

On Thu, May 20, 2021 at 7:51 AM Thomas Weise  wrote:

> +1 (binding)
>
> - built from source
> - run internal tests
> - run job on provided convenience binaries
>
> Thanks for managing the release, Arvid!
>
>
> On Wed, May 19, 2021 at 9:36 PM Leonard Xu  wrote:
>
> > +1 (non-binding)
> >
> > - verified signatures and hashes
> > - started a cluster, WebUI was accessible
> > - ran some queries in SQL Client, no suspicious log output
> > - the web PR looks good
> >
> > Best,
> > Leonard Xu
> >
> > > 在 2021年5月19日,18:42,Chesnay Schepler  写道:
> > >
> > > +1 (binding)
> > >
> > > - checked repository contents
> > > - reviewed flink-web PR
> > >
> > > On 5/17/2021 11:40 AM, Xingbo Huang wrote:
> > >> +1 (non-binding)
> > >>
> > >> - Verified checksums and signatures
> > >> - Check python package contents
> > >> - Pip install python wheel package
> > >> - Run python udf job in python shell
> > >>
> > >> Best,
> > >> Xingbo
> > >>
> > >> Robert Metzger  于2021年5月17日周一 下午4:07写道:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> - Checked diff to 1.12.3:
> > >>>
> > >>>
> >
> https://github.com/apache/flink/compare/release-1.12.3-rc1...release-1.12.4-rc1
> > >>> - checked some jars in the staging repo
> > >>> - checked contents of flink-dist jar of 2.11 release
> > >>> - Submitted demo job
> > >>>
> > >>>
> > >>> On Sun, May 16, 2021 at 8:40 PM Dawid Wysakowicz <
> > dwysakow...@apache.org>
> > >>> wrote:
> > >>>
> >  +1 (binding)
> > 
> > - Verified checksums and signatures
> > - Checked no significant version changes compared to 1.12.3 (one
> > new
> > test scope dependency)
> > - Checked no changes to the NOTICE files
> > - Built from sources
> > - Run example using binary 2.12 distribution
> > - verified a random class in flink-scala_2.11 and _2.12 if it was
> > compiled with the correct scala version
> > 
> >  Best,
> > 
> >  Dawid
> >  On 10/05/2021 23:34, Arvid Heise wrote:
> > 
> >  Hi everyone,
> > 
> >  Please review and vote on the release candidate #1 for the version
> > >>> 1.12.4,
> >  as follows:
> >  [ ] +1, Approve the release
> >  [ ] -1, Do not approve the release (please provide specific
> comments)
> > 
> >  The complete staging area is available for your review, which
> > includes:
> >  * JIRA release notes [1],
> >  * the official Apache source release and binary convenience releases
> > to
> > >>> be
> >  deployed to dist.apache.org [2], which are signed with the key with
> >  fingerprint 476DAA5D1FF08189 [3],
> >  * all artifacts to be deployed to the Maven Central Repository [4],
> >  * source code tag "release-1.12.4-rc1" [5],
> >  * website pull request listing the new release and adding
> announcement
> > >>> blog
> >  post [6].
> > 
> >  The vote will be open for at least 72 hours. It is adopted by
> majority
> >  approval, with at least 3 PMC affirmative votes.
> > 
> >  Thanks,
> >  Your friendly release manager Arvid
> > 
> >  [1]
> > >>>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350110
> >  [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.4-rc1/
> >  [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >  [4]
> > >>>
> https://repository.apache.org/content/repositories/orgapacheflink-1421
> >  [5] https://github.com/apache/flink/releases/tag/release-1.12.4-rc1
> >  [6] https://github.com/apache/flink-web/pull/446
> > 
> > 
> > >
> >
> >
>


Re: [DISCUSS] Releasing Flink 1.13.1

2021-05-20 Thread Konstantin Knauf
Hi everyone,

Let's see where we stand towards releasing Flink 1.13.1. I am aware of one
additional license-related blocker that popped up:
https://issues.apache.org/jira/browse/FLINK-22706.  Overall:

Done or close to done:

* https://issues.apache.org/jira/browse/FLINK-22666: Done
* https://issues.apache.org/jira/browse/FLINK-22494: Done
* https://issues.apache.org/jira/browse/FLINK-22688: Matthias/Chesnay: PR
available.
* https://issues.apache.org/jira/browse/FLINK-22706

Unclear:

* https://issues.apache.org/jira/browse/FLINK-22686: This is marked as
Blocker, but there is no one assigned yet. Piotr/Arvid: I wouldn't make
this a blocker for this bug fix as it only affects unaligned checkpoints in
combination with broadcast state. What do you think? Can you already give
an estimated time to resolution?
* https://issues.apache.org/jira/browse/FLINK-22266: Assigned to Robert. No
PR yet.
* https://issues.apache.org/jira/browse/FLINK-22646: Chesnay/David: Closed
as "Won't fix", but the PR discussion is not fully concluded.

Is there anything else?

Overall, I would propose to wait until Friday for fixes to come in and
create a release candidate early on Monday next week. I think we've already
accumulated quite a few fixes that justify a patch release. Of course, it'd
be nice if more fixes make it in, in particular FLINK-22686 would be nice
to have so that we don't need a quick follow up patch release soon.

Does any PMC member have time to manage the actual release early next week?

Cheers,

Konstantin


On Tue, May 18, 2021 at 5:17 PM David Morávek 
wrote:

> Hi Konstantin,
>
> Would it be possible to add FLINK-22646 [1] into the release? This is a
> regression, that we need to workaround in order to support 1.13.x in Apache
> Beam [2].
>
> Best,
> D.
>
> [1] https://issues.apache.org/jira/browse/FLINK-22646
> [2] https://github.com/apache/beam/pull/14719
>
> On Tue, May 18, 2021 at 1:16 PM Matthias Pohl 
> wrote:
>
> > Thanks for initiating this discussion, Konstantin. FLINK-22494 [1] (Avoid
> > discarding checkpoints in case of failure) is reviewed and backported. I
> > have FLINK-22688 [2] (exception history not working properly for
> unassigned
> > tasks) that came up recently. I provided a fix for it [3] already and
> hope
> > to get it reviewed soon.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-22494
> > [2] https://issues.apache.org/jira/browse/FLINK-22688
> > [3] https://github.com/apache/flink/pull/15945
> >
> > On Mon, May 17, 2021 at 2:13 PM Arvid Heise  wrote:
> >
> > > I'd also like to have a fix for
> > > https://issues.apache.org/jira/browse/FLINK-22686 that was reported
> > today.
> > >
> > > On Mon, May 17, 2021 at 11:52 AM Timo Walther 
> > wrote:
> > >
> > > > Hi Konstantin,
> > > >
> > > > thanks for starting the discussion. From the Table API side, we also
> > > > fixed a couple of critical issues already that justify releasing a
> > > > 1.13.1 asap.
> > > >
> > > > Personally, I would like to include
> > > > https://issues.apache.org/jira/browse/FLINK-22666 that fixes some
> last
> > > > issues with the Scala Table API to DataStream conversion. It should
> be
> > > > fixed today or tomorrow.
> > > >
> > > > Otherwise +1.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 17.05.21 10:11, Robert Metzger wrote:
> > > > > Thanks a lot for starting the discussion about the release.
> > > > > I'd like to include
> > https://issues.apache.org/jira/browse/FLINK-22266
> > > as
> > > > > well (I forgot to set the fixVersion accordingly). It's an
> important
> > > fix
> > > > > for the stop with savepoint operation of the adaptive scheduler.
> > > > >
> > > > > On Mon, May 17, 2021 at 10:03 AM Konstantin Knauf <
> kna...@apache.org
> > >
> > > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I would like to start discussing Flink 1.13.1. There are already
> > > quite a
> > > > >> few critical fixes merged, specifically:
> > > > >>
> > > > >> * https://issues.apache.org/jira/browse/FLINK-22555 (LGPL-2.1
> files
> > > in
> > > > >> flink-python jars)
> > > > >> * https://issues.apache.org/jira/browse/FLINK-17170 (Cannot stop
> > > > streaming
> > > > >> job with savepoint which uses kinesis consumer)
> > > > >> * https://issues.apache.org/jira/browse/FLINK-22574 (Adaptive
> > > > Scheduler:
> > > > >> Can not cancel restarting job)
> > > > >>
> > > > >> The following two tickets are in progress:
> > > > >>
> > > > >> * https://issues.apache.org/jira/browse/FLINK-22502
> > > > >> (DefaultCompletedCheckpointStore
> > > > >> drops unrecoverable checkpoints silently)
> > > > >> * https://issues.apache.org/jira/browse/FLINK-22494 (Avoid
> > discarding
> > > > >> checkpoints in case of failure)
> > > > >>
> > > > >> Matthias, Roman: could you give an update on those?
> > > > >>
> > > > >> Are there any other open tickets that we should wait for? Is
> there a
> > > PMC
> > > > >> member who would like to manage the release?
> > > > >>
> > > > >> Be

Re: The timeline of Flink support Reactive mode in Yarn

2021-05-20 Thread Robert Metzger
Hey Bing,

The idea of reactive mode is that it reacts to changing resource
availability, where an outside service is adding or removing machines to a
Flink cluster.
Reactive Mode doesn't work on an active resource manager such as YARN,
because you can not tell YARN from the outside to remove resources from a
Flink Application. The Flink application has to do this by itself.

For reactive mode, we introduced Adaptive Scheduler (FLIP-160) and
declarative resource management (FLIP-138). Both of these changes are very
important building blocks for adding autoscaling to active resource
managers -- I would say we've completed most of the heavy lifting for
autoscaling already.

The main blocker for autoscaling in my opinion is figuring out a good API
for determining the scale of a running job: Do we expose a REST API where
users can adjust the parallelism of individual operators? Do we expose an
API where some code runs somewhere adjusting the parallelism of operators?
Do we provide a set of configuration parameters defining the "min",
"target", "max" parallelism, or some scaling based on some metrics (Kafka
consumer lag, latency, throughput, backpressure, cpu load, ...).

I've personally started playing a bit with a small prototype for
autoscaling, but it's not ready to share yet. From the topics the main
contributors are currently working on, I'm not aware of anybody working on
autoscaling in the Flink 1.14 release cycle, hence I believe it is unlikely
to be part of 1.14.

Let me know what you think, I'm in particular interested in how you would
like to use autoscaling!

Best,
Robert


On Thu, May 20, 2021 at 9:11 AM Bing Jiang  wrote:

> Hi, community.
>
> For streaming applications, auto scaling is critical to guarantee our
> resource provision, not big or small. i.e. it is cost efficiency
> requirements from public cloud (AWS) players.
>
> I read the document of FLIP-159
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode
> >,
> and it is claimed that it aims at the requirements of elastic scale.
> Unfortunately, it only supports standalone clusters. So I'd like to figure
> out whether the community has already tried out this feature on Yarn, and
> which kinds of issues will be the blocker to support this feature from Yarn
> perspective?
>
> I appreciate your insights!
>
> Thanks
> Bing
> --
>


Re: [DISCUSS] Feedback Collection Jira Bot

2021-05-20 Thread Robert Metzger
+1
This would also cover test instabilities, which I personally believe should
not be auto-deprioritized until they've been analyzed.

On Wed, May 19, 2021 at 1:46 PM Till Rohrmann  wrote:

> I like this idea. +1 for your proposal Konstantin.
>
> Cheers,
> Till
>
> On Wed, May 19, 2021 at 1:30 PM Konstantin Knauf  >
> wrote:
>
> > Hi everyone,
> >
> > Till and I recently discussed whether we should disable the
> > "stale-blocker", "stale-critical", "stale-major" and "stale-minor" rules
> > for tickets that have a fixVersion set. This would allow people to plan
> the
> > upcoming release without tickets being deprioritized by the bot during
> the
> > release cycle.
> >
> > From my point of view, this is a good idea as long as we can agree to use
> > the "fixVersion" a bit more conservatively. What do I mean by that? If
> you
> > would categorize tickets planned for an upcoming release into:
> >
> > * Must Have
> > * Should Have
> > * Nice-To-Have
> >
> > only "Must Have" and "Should Have" tickets should get a fixVersion. From
> my
> > observation, we currently often set the fixVersion if we just wished a
> > feature was included in an upcoming release. Similarly, I often see bulk
> > changes of fixVersion that "roll over" many tickets to the next release
> if
> > they have not made into the previous release although there is no
> concrete
> > plan to fix them or they have even become obsolete by then. Excluding
> those
> > from the bot would be counterproductive.
> >
> > What do you think?
> >
> > Cheers,
> >
> > Konstantin
> >
> >
> > On Fri, Apr 23, 2021 at 2:25 PM Konstantin Knauf 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > After some offline conversations, I think, it makes sense to already
> open
> > > this thread now in order to collect feedback and suggestions around the
> > > Jira Bot.
> > >
> > > The following two changes I will do right away:
> > >
> > > * increase "stale-assigned.stale-days" to 14 days (Marta, Stephan, Nico
> > > have provided feedback that this is too aggressive).
> > >
> > > * exclude Sub-Tasks from all rules except the "stale-assigned" rule (I
> > > think, this was just an oversight in the original discussion.)
> > >
> > > Keep it coming.
> > >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> > >
> >
> >
> > --
> >
> > Konstantin Knauf | Head of Product
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica 
> >
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
> > Wehner
> >
>


Re: [DISCUSS] Component labels in PR/commit messages

2021-05-20 Thread Robert Metzger
+1 to Till's proposal to update the wording.

Regarding c) The guide [1] actually mentions a good heuristic for coming up
with a label that is also suitable for newcomers: The maven module name
where most of the changes are.

[1]
https://flink.apache.org/contributing/code-style-and-quality-pull-requests.html#commit-naming-conventions

On Wed, May 19, 2021 at 11:14 AM Till Rohrmann  wrote:

> I think a big problem with the component labels is that there is
>
> a) no defined set of labels
> b) no way to enforce the usage of them
> c) no easy way to figure out which label to use
>
> Due to these problems they are used very inconsistently in the project.
>
> I do agree with Arvid's observation that they are less and less often used
> in new commits. Given this, we could think about adjusting our guidelines
> to better reflect reality and make them "optional"/"nice-to-have" for
> example.
>
> Cheers,
> Till
>
> On Wed, May 19, 2021 at 10:52 AM Chesnay Schepler 
> wrote:
>
> > For commit messages the labels are useful mostly when scanning the
> > commit history, like searching for some commit that could've caused
> > something /without knowing where that change was made/, because it
> > enables you to quickly filter out commits by their label instead of
> > having to read the entire title.
> >
> > I think in particular there is value in labeling documentation/build
> > system changes; it allows me to worry less about the phrasing because I
> > can assume the reader to have some context. For example,
> >
> > "[FLINK-X] Remove deprecated methods" vs "[FLINK-X][docs] Remove
> > deprecated methods".
> >
> > You could of course argue to use "[FLINK-X] Remove deprecated methods
> > from docs", but that's just a worse version of labeling.
> >
> >
> > On 5/19/2021 10:31 AM, Arvid Heise wrote:
> > > Dear devs,
> > >
> > > In the last couple of weeks, I have noticed that we are slacking a bit
> on
> > > the components in PR/commit messages. I'd like to gather some feedback
> if
> > > we still want to include them and if so, how we can improve the process
> > of
> > > finding the correct label.
> > >
> > > My personal opinion: So far, I have usually added the component because
> > > it's in the coding guidelines. I have not really understood the
> benefit.
> > It
> > > might be coming from a time where git support in IDE was lacking and it
> > was
> > > necessary to maintain an overview. I also have a hard time to find the
> > > correct component at times; I often just repeat the component that I
> find
> > > in a blame. Nevertheless, I value consistency over personal taste and
> > would
> > > stick to the plan (and guide contributions towards it) if other devs
> > > (especially committers) do it as well. But this has been causing some
> > > friction in a couple of reviews for me.
> > >
> > > Could you please give your opinion on this matter? I think it's
> important
> > > to note that if long-term committers are not following it, it's really
> > hard
> > > for newer devs to follow that (git blame not helping in choosing the
> > > component). Then we should remove it from the guidelines to make
> > > contributions easier.
> > >
> > > Thanks
> > >
> > > Arvid
> > >
> >
> >
>


The timeline of Flink support Reactive mode in Yarn

2021-05-20 Thread Bing Jiang
Hi, community.

For streaming applications, auto scaling is critical to guarantee our
resource provision, not big or small. i.e. it is cost efficiency
requirements from public cloud (AWS) players.

I read the document of FLIP-159
,
and it is claimed that it aims at the requirements of elastic scale.
Unfortunately, it only supports standalone clusters. So I'd like to figure
out whether the community has already tried out this feature on Yarn, and
which kinds of issues will be the blocker to support this feature from Yarn
perspective?

I appreciate your insights!

Thanks
Bing
--