[jira] [Created] (FLINK-28603) Flink runtime.rpc.RpcServiceUtils code style

2022-07-18 Thread wuqingzhi (Jira)
wuqingzhi created FLINK-28603:
-

 Summary: Flink runtime.rpc.RpcServiceUtils code style
 Key: FLINK-28603
 URL: https://issues.apache.org/jira/browse/FLINK-28603
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / RPC
Reporter: wuqingzhi


Hello, I found a code style problem in , which is located in

RpcServiceUtils, where nextNameOffset should be capitalized and separated by an 
underscore.

eg: 

private static final AtomicLong nextNameOffset = new AtomicLong(0L);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28602) Changelog cannot close stream normally while enabling compression

2022-07-18 Thread Hangxiang Yu (Jira)
Hangxiang Yu created FLINK-28602:


 Summary: Changelog cannot close stream normally while enabling 
compression
 Key: FLINK-28602
 URL: https://issues.apache.org/jira/browse/FLINK-28602
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.1, 1.16.0
Reporter: Hangxiang Yu
 Fix For: 1.16.0, 1.15.2


While enabling compression, Changelog part will wrap output stream using   

StreamCompressionDecorator#decorateWithCompression.

As the comment said, "IMPORTANT: For streams returned by this method, \{@link 
OutputStream#close()} is not propagated to the inner stream. The inner stream 
must be closed separately.".

But StateChangeFsUploader will not close inner stream if wrapped stream has 
been closed.

So the upload will not complete when enabling compression even if it returns 
success.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28601) Support FeatureHasher in FlinkML

2022-07-18 Thread weibo zhao (Jira)
weibo zhao created FLINK-28601:
--

 Summary: Support FeatureHasher in FlinkML
 Key: FLINK-28601
 URL: https://issues.apache.org/jira/browse/FLINK-28601
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: weibo zhao


Support FeatureHasher in FlinkML.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28600) Support FilterPushDown in flink-connector-jdbc

2022-07-18 Thread Hailin Wang (Jira)
Hailin Wang created FLINK-28600:
---

 Summary: Support FilterPushDown in flink-connector-jdbc
 Key: FLINK-28600
 URL: https://issues.apache.org/jira/browse/FLINK-28600
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Hailin Wang


Support FilterPushDown in flink-connector-jdbc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28599) Adding FlinkJoinToMultiJoinRule to support left/right outer join can be translated to multi join

2022-07-18 Thread Yunhong Zheng (Jira)
Yunhong Zheng created FLINK-28599:
-

 Summary: Adding FlinkJoinToMultiJoinRule to support left/right 
outer join can be translated to multi join
 Key: FLINK-28599
 URL: https://issues.apache.org/jira/browse/FLINK-28599
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.16.0
Reporter: Yunhong Zheng
 Fix For: 1.16.0


Now, Flink use Calcite's rule 
{code:java}
JOIN_TO_MULTI_JOIN{code}
 to convert multiple joins into a join set, which can be used by join reorder. 
However, calcite's rule can not adapte to all outer joins. For left or right 
outer join, if they meet certain conditions, it can also be converted to multi 
join. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28598) ClusterEntryPoint can't get the real exit reason when shutting down

2022-07-18 Thread zlzhang0122 (Jira)
zlzhang0122 created FLINK-28598:
---

 Summary: ClusterEntryPoint can't get the real exit reason when 
shutting down
 Key: FLINK-28598
 URL: https://issues.apache.org/jira/browse/FLINK-28598
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Runtime / Task
Affects Versions: 1.15.1, 1.14.2
Reporter: zlzhang0122


When the cluster is starting and some error occurs, the ClusterEntryPoint will 
shutDown the cluster asynchronous, but if it can't get a Throwable, the 
shutDown reason will be null, but actually if it's a user code problem and this 
may happen. 

I think we can get the real exit reason caused by user code and pass it to the 
diagnostics parameter, this may help users a lot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-248: Introduce dynamic partition pruning

2022-07-18 Thread godfrey he
Hi, Jingong, Jark, Jing,

Thanks for for the important inputs.
Lake storage is a very important scenario, and consider more generic
and extended case,
I also would like to use "dynamic filtering" concept instead of
"dynamic partition".

>maybe the FLIP should also demonstrate the EXPLAIN result, which
is also an API.
I will add a section to describe the EXPLAIN result.

>Does DPP also support streaming queries?
Yes, but for bounded source.

>it requires the SplitEnumerator must implements new introduced
`SupportsHandleExecutionAttemptSourceEvent` interface,
+1

I will update the document and the poc code.

Best,
Godfrey

Jing Zhang  于2022年7月13日周三 20:22写道:
>
> Hi Godfrey,
> Thanks for driving this discussion.
> This is an important improvement for batch sql jobs.
> I agree with Jingsong to expand the capability to more than just partitions.
> Besides, I have two points:
> 1. Based on FLIP-248[1],
>
> > Dynamic partition pruning mechanism can improve performance by avoiding
> > reading large amounts of irrelevant data, and it works for both batch and
> > streaming queries.
>
> Does DPP also support streaming queries?
> It seems the proposed changes in the FLIP-248 does not work for streaming
> queries,
> because the dimension table might be an unbounded inputs.
> Or does it require all dimension tables to be bounded inputs for streaming
> jobs if the job wanna enable DPP?
>
> 2. I notice there are changes on SplitEnumerator for Hive source and File
> source.
> And they now depend on SourceEvent to pass PartitionData.
> In FLIP-245, if enable speculative execution for sources based on FLIP-27
> which use SourceEvent,
> it requires the SplitEnumerator must implements new introduced
> `SupportsHandleExecutionAttemptSourceEvent` interface,
> otherwise an exception would be thrown out.
> Since hive and File sources are commonly used for batch jobs, it's better
> to take this point into consideration.
>
> Best,
> Jing Zhang
>
> [1] FLIP-248:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning
> [2] FLIP-245:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-245%3A+Source+Supports+Speculative+Execution+For+Batch+Job
>
>
> Jark Wu  于2022年7月12日周二 13:16写道:
>
> > I agree with Jingsong. DPP is a particular case of Dynamic Filter Pushdown
> > that the join key contains partition fields.  Extending this FLIP to
> > general filter
> > pushdown can benefit more optimizations, and they can share the same
> > interface.
> >
> > For example, Trino Hive Connector leverages dynamic filtering to support:
> > - dynamic partition pruning for partitioned tables
> > - and dynamic bucket pruning for bucket tables
> > - and dynamic filter pushed into the ORC and Parquet readers to perform
> > stripe
> >   or row-group pruning and save on disk I/O.
> >
> > Therefore, +1 to extend this FLIP to Dynamic Filter Pushdown (or Dynamic
> > Filtering),
> > just like Trino [1].  The interfaces should also be adapted for that.
> >
> > Besides, maybe the FLIP should also demonstrate the EXPLAIN result, which
> > is also an API.
> >
> > Best,
> > Jark
> >
> > [1]: https://trino.io/docs/current/admin/dynamic-filtering.html
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, 12 Jul 2022 at 09:59, Jingsong Li  wrote:
> >
> > > Thanks Godfrey for driving.
> > >
> > > I like this FLIP.
> > >
> > > We can restrict this capability to more than just partitions.
> > > Here are some inputs from Lake Storage.
> > >
> > > The format of the splits generated by Lake Storage is roughly as follows:
> > > Split {
> > >Path filePath;
> > >Statistics[] fieldStats;
> > > }
> > >
> > > Stats contain the min and max of each column.
> > >
> > > If the storage is sorted by a column, this means that the split
> > > filtering on that column will be very good, so not only the partition
> > > field, but also this column is worthy of being pushed down the
> > > RuntimeFilter.
> > > This information can only be known by source, so I suggest that source
> > > return which fields are worthy of being pushed down.
> > >
> > > My overall point is:
> > > This FLIP can be extended to support Source Runtime Filter push-down
> > > for all fields, not just dynamic partition pruning.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, Jul 8, 2022 at 10:12 PM godfrey he  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to open a discussion on FLIP-248: Introduce dynamic
> > > > partition pruning.
> > > >
> > > >  Currently, Flink supports static partition pruning: the conditions in
> > > > the WHERE clause are analyzed
> > > > to determine in advance which partitions can be safely skipped in the
> > > > optimization phase.
> > > > Another common scenario: the partitions information is not available
> > > > in the optimization phase but in the execution phase.
> > > > That's the problem this FLIP is trying to solve: dynamic partition
> > > > pruning, which could reduce the partition 

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-07-18 Thread Alexander Fedulov
Hi all,

I updated the FLIP [1] to make it more extensible with the
introduction of *SourceReaderFactory.
*It gives users the ability to further customize the data generation and
emission process if needed. I also incorporated the suggestion from
Qingsheng and moved to the generator function design with an initializer
method to support more sophisticated functions with non-serializable
fields. I am personally pretty happy with the current prototype [2], [3].
Let me know if you have any other feedback, otherwise, I am going to start
the vote.

[1] https://cwiki.apache.org/confluence/x/9Av1D
[2]
https://github.com/afedulov/flink/blob/e5f8e43a67b9983c42b5db2a7617361fa459/flink-core/src/main/java/org/apache/flink/api/connector/source/lib/DataGeneratorSourceV4.java#L52
[3]
https://github.com/afedulov/flink/blob/e5f8e43a67b9983c42b5db2a7617361fa459/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/GeneratorSourcePOC.java#L92

Best,
Alexander Fedulov




On Thu, Jul 7, 2022 at 12:08 AM Alexander Fedulov 
wrote:

> Hi Becket,
>
> interesting points about the discrepancies in the *RuntimeContext*
> "wrapping" throughout the framework, but I agree - this is something that
> needs to be tackled separately.
> For now, I adjusted the FLIP and the PoC implementation to only expose the
> parallelism.
>
> Best,
> Alexander Fedulov
>
> On Wed, Jul 6, 2022 at 2:42 AM Becket Qin  wrote:
>
>> Hi Alex,
>>
>> Personally I prefer the latter option, i.e. just add the
>> currentParallelism() method. It is easy to add more stuff to the
>> SourceReaderContext in the future, and it is likely that most of the stuff
>> in the RuntimeContext is not required by the SourceReader implementations.
>> For the purpose of this FLIP, adding the method is probably good enough.
>>
>> That said, I don't see a consistent pattern adopted in the project to
>> handle similar cases. The FunctionContext wraps the RuntimeContext and
>> only
>> exposes necessary stuff. CEPRuntimeContext extends the RuntimeContext and
>> overrides some methods that it does not want to expose with exception
>> throwing logic. Some internal context classes simply expose the entire
>> RuntimeContext with some additional methods. If we want to make things
>> clean, I'd imagine all these variations of context can become some
>> specific
>> combination of a ReadOnlyRuntimeContext and some "write" methods. But this
>> may require a closer look at all these cases to make sure the
>> ReadOnlyRuntimeContext is generally suitable. I feel that it will take
>> some
>> time and could be a bigger discussion than the data generator source
>> itself. So maybe we can just go with adding a method at the moment. And
>> evolving the SourceReaderContext to use the ReadOnlyRuntimeContext in the
>> future.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Tue, Jul 5, 2022 at 8:31 PM Alexander Fedulov > >
>> wrote:
>>
>> > Hi Becket,
>> >
>> > I agree with you. We could introduce a *ReadOnlyRuntimeContext* that
>> would
>> > act as a holder for the *RuntimeContext* data. This would also require
>> > read-only wrappers for the exposed fields, such as *ExecutionConfig*.
>> > Alternatively, we just add the *currentParallelism()* method for now and
>> > see if anything else might actually be needed later on. What do you
>> think?
>> >
>> > Best,
>> > Alexander Fedulov
>> >
>> > On Tue, Jul 5, 2022 at 2:30 AM Becket Qin  wrote:
>> >
>> > > Hi Alex,
>> > >
>> > > While it is true that the RuntimeContext gives access to all the stuff
>> > the
>> > > framework can provide, it seems a little overkilling for the
>> > SourceReader.
>> > > It is probably OK to expose all the read-only information in the
>> > > RuntimeContext to the SourceReader, but we may want to hide the
>> "write"
>> > > methods, such as creating states, writing stuff to distributed cache,
>> > etc,
>> > > because these methods may not work well with the SourceReader design
>> and
>> > > cause confusion. For example, users may wonder why the snapshotState()
>> > > method exists while they can use the state directly.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > >
>> > > On Tue, Jul 5, 2022 at 7:37 AM Alexander Fedulov <
>> > alexan...@ververica.com>
>> > > wrote:
>> > >
>> > > > Hi Becket,
>> > > >
>> > > > I updated and extended FLIP-238 accordingly.
>> > > >
>> > > > Here is also my POC branch [1].
>> > > > DataGeneratorSourceV3 is the class that I currently converged on
>> [2].
>> > It
>> > > is
>> > > > based on the expanded SourceReaderContext.
>> > > > A couple more relevant classes [3] [4]
>> > > >
>> > > > Would appreciate it if you could take a quick look.
>> > > >
>> > > > [1]
>> > https://github.com/afedulov/flink/tree/FLINK-27919-generator-source
>> > > > [2]
>> > > >
>> > > >
>> > >
>> >
>> 

[jira] [Created] (FLINK-28597) Empty checkpoint folders not deleted on job cancellation if their shared state is still in use

2022-07-18 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-28597:
-

 Summary: Empty checkpoint folders not deleted on job cancellation 
if their shared state is still in use
 Key: FLINK-28597
 URL: https://issues.apache.org/jira/browse/FLINK-28597
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.16.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.16.0


After FLINK-25872, SharedStateRegistry registers all state handles, including 
private ones.
Once the state isn't use AND the checkpoint is subsumed, it will actually be 
discarded.
This is done to prevent premature deletion when recovering in CLAIM mode:
1. RocksDB native savepoint folder (shared state is stored in chk-xx folder so 
it might fail the deletion)
2. Initial non-changelog checkpoint when switching to changelog-based 
checkpoints (private state of the initial checkpoint might be included into 
later checkpoints and its deletion would invalidate them)

Additionally, checkpoint folders are not deleted for a longer time which might 
be confusing.
In case of a crash, more folders will remain.

cc: [~Yanfei Lei], [~ym]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28596) Support writing arrays to postgres array columns in Flink SQL JDBC connector

2022-07-18 Thread Bobby Richard (Jira)
Bobby Richard created FLINK-28596:
-

 Summary: Support writing arrays to postgres array columns in Flink 
SQL JDBC connector
 Key: FLINK-28596
 URL: https://issues.apache.org/jira/browse/FLINK-28596
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Affects Versions: 1.15.0
Reporter: Bobby Richard






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28595) KafkaSource should not read metadata of unmatched regex topics

2022-07-18 Thread Afzal Mazhar (Jira)
Afzal Mazhar created FLINK-28595:


 Summary: KafkaSource should not read metadata of unmatched regex 
topics
 Key: FLINK-28595
 URL: https://issues.apache.org/jira/browse/FLINK-28595
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Afzal Mazhar


When we use a regex to subscribe to topics, the current connector gets a list 
of all topics, then runs describe against all of them, and finally filters by 
the regex pattern. This is not performant, as well as could possibly trigger 
audit alarms against sensitive topics that do not match the regex.

Proposed fix: move the regex filtering from the TopicPatternSubscriber's set() 
down into KafkaSubscriberUtils getAllTopicMetadata(). Get the list of topics, 
filter by pattern (if any), then get metadata. Create appropriate tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28594) Add metrics for FlinkService

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28594:
-

 Summary: Add metrics for FlinkService
 Key: FLINK-28594
 URL: https://issues.apache.org/jira/browse/FLINK-28594
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


We would need some metrics for the `FlinkService` to be able to tell how long 
does it take to perform most of the blocking operations we have in this service



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28593) Introduce default ingress templates at operator level

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28593:
-

 Summary: Introduce default ingress templates at operator level
 Key: FLINK-28593
 URL: https://issues.apache.org/jira/browse/FLINK-28593
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


Ingress templates are currently [defined at CR 
level|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/ingress/],
 but these rules can be enabled globally at operator level too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28592) Implement custom resource counters as counters not gauges

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28592:
-

 Summary: Implement custom resource counters as counters not gauges
 Key: FLINK-28592
 URL: https://issues.apache.org/jira/browse/FLINK-28592
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


* change to current implementation to counters
 * add counters at global level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28591) Array> is not serialized correctly

2022-07-18 Thread Andrzej Swatowski (Jira)
Andrzej Swatowski created FLINK-28591:
-

 Summary: Array> is not serialized correctly
 Key: FLINK-28591
 URL: https://issues.apache.org/jira/browse/FLINK-28591
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Andrzej Swatowski


When using Table API to insert data into array of rows, the data apparently is 
incorrectly serialized internally, which leads to incorrect serialization at 
the connectors.

E.g., a following table:
{code:java}
CREATE TABLE wrongArray (
    foo bigint,
    bar ARRAY>,
    strings ARRAY,
    intRows ARRAY>
) WITH (
  'connector' = 'filesystem',
  'path' = 'file:///home/jovyan/issue.json',
  'format' = 'json'
) {code}
along with the following insert:
{code:java}
insert into wrongArray (
    SELECT
        1,
        array[
            ('Field1', 'Value1'),
            ('Field2', 'Value2')
        ],
        array['foo', 'bar', 'foobar'],
        array[ROW(1, 1), ROW(2, 2)]
    FROM (VALUES(1))
) {code}
gets serialized into: 
{code:java}
{
  "foo":1,
  "bar":[
    {
      "foo1":"Field2",
      "foo2":"Value2"
    },
    {
      "foo1":"Field2",
      "foo2":"Value2"
    }
  ],
  "strings":[
    "foo",
    "bar",
    "foobar"
  ],
  "intRows":[
    {
      "a":2,
      "b":2
    },
    {
      "a":2,
      "b":2
    }
  ]
}{code}
It is easy to spot that `strings` (being an Array of String) yields the correct 
values. However, both `bar` (an Array of Rows with two Strings) and `intRows` 
(an Array of Rows with two Integers) consists of duplicates of the last row in 
the array.

It is not an error connected with either a specific connector or format. I have 
done a bit of debugging when trying to implement my own format and it seems 
that `BinaryArrayData` holding the row values has wrong data saved in its 
`MemorySegment`, i.e. calling: 
{code:java}
for (var i = 0; i < array.size(); i++) {
  Object element = arrayDataElementGetter.getElementOrNull(array, i);
}{code}
correctly calculates offsets but yields the same result as the data is 
malformed in the array's `MemorySegment`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28590) flink on yarn checkpoint exception

2022-07-18 Thread wangMu (Jira)
wangMu created FLINK-28590:
--

 Summary: flink on yarn checkpoint exception
 Key: FLINK-28590
 URL: https://issues.apache.org/jira/browse/FLINK-28590
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.15.1, 1.15.0
Reporter: wangMu
 Attachments: image-2022-07-18-17-52-59-892.png

当我在 cdh6.2.1 上使用 flick on yarn 提交时,jobmanager 日志打印以下异常:

!image-2022-07-18-17-52-59-892.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28589) Enhance Web UI for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28589:
---

 Summary: Enhance Web UI for Speculative Execution
 Key: FLINK-28589
 URL: https://issues.apache.org/jira/browse/FLINK-28589
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Affects Versions: 1.16.0
Reporter: Gen Luo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28588) Enhance REST API for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28588:
---

 Summary: Enhance REST API for Speculative Execution
 Key: FLINK-28588
 URL: https://issues.apache.org/jira/browse/FLINK-28588
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST
Affects Versions: 1.16.0
Reporter: Gen Luo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28587) FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28587:
---

 Summary: FLIP-249: Flink Web UI Enhancement for Speculative 
Execution
 Key: FLINK-28587
 URL: https://issues.apache.org/jira/browse/FLINK-28587
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.16.0
Reporter: Gen Luo


As a follow-up step of FLIP-168 and FLIP-224, the Flink Web UI needs to be 
enhanced to display the related information if the speculative execution 
mechanism is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28586) Speculative execution for new sources

2022-07-18 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-28586:
---

 Summary: Speculative execution for new sources
 Key: FLINK-28586
 URL: https://issues.apache.org/jira/browse/FLINK-28586
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common, Runtime / Coordination
Reporter: Zhu Zhu
 Fix For: 1.16.0


This task enables new sources(FLIP-27) for speculative execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28585) Speculative execution for InputFormat sources

2022-07-18 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-28585:
---

 Summary: Speculative execution for InputFormat sources
 Key: FLINK-28585
 URL: https://issues.apache.org/jira/browse/FLINK-28585
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
 Fix For: 1.16.0


This task enables InputFormat sources for speculative execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28584) Add an option to ConfigMap that can be set to immutable

2022-07-18 Thread liuzhuo (Jira)
liuzhuo created FLINK-28584:
---

 Summary: Add an option to ConfigMap that can be set to immutable
 Key: FLINK-28584
 URL: https://issues.apache.org/jira/browse/FLINK-28584
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: liuzhuo


When a job is started in the kubernetes environment, multiple configmaps are 
usually created to mount data (eg: flink-config, hadoop-config, etc.). If a 
cluster runs too many jobss, the configmap will become a performance 
bottleneck, and occasionally An exception of file mount failure occurs, 
resulting in slower pod startup time


According to kubernetes' description of configmap, if the immutable parameter 
is enabled, it will greatly reduce the pressure on kube-apiserver and improve 
cluster performance.

[configmap-immutable|https://kubernetes.io/zh-cn/docs/concepts/configuration/configmap/#configmap-immutable]


In my understanding, parameter information such as flink-config, hadoop-config 
is loaded at startup, and even if it is subsequently modified, it cannot affect 
the running of the job. Should we provide a control switch to choose whether to 
set the configmap to immutable?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28583) make flink dist log4j dependency simple and clear

2022-07-18 Thread jackylau (Jira)
jackylau created FLINK-28583:


 Summary: make flink dist log4j dependency simple and clear
 Key: FLINK-28583
 URL: https://issues.apache.org/jira/browse/FLINK-28583
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Scripts
Affects Versions: 1.16.0
Reporter: jackylau
 Fix For: 1.16.0


flink don't shade the log4j and want to put it flink/lib using shade exclusion 
like this 
{code:java}

   
   org.apache.logging.log4j:* {code}
and add this to put them to flink/lib
{code:java}

   org.apache.logging.log4j:log4j-api
   org.apache.logging.log4j:log4j-core
   org.apache.logging.log4j:log4j-slf4j-impl
   org.apache.logging.log4j:log4j-1.2-api
 {code}
i suggest to make the log4j to provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28582) LSM tree structure may be incorrect when multiple jobs are committing into the same bucket

2022-07-18 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-28582:
---

 Summary: LSM tree structure may be incorrect when multiple jobs 
are committing into the same bucket
 Key: FLINK-28582
 URL: https://issues.apache.org/jira/browse/FLINK-28582
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Caizhi Weng
 Fix For: table-store-0.2.0


Currently `FileStoreCommitImpl` only checks for conflicts by checking the files 
we're going to delete (due to compaction) are still there.

However, consider two jobs committing into the same LSM tree at the same time. 
For their first compaction no conflict is detected because they'll only delete 
their own level 0 files. But they will both produce level 1 files and the key 
ranges of these files may overlap. This is not correct for our LSM tree 
structure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)