Re: [DISCUSS] Drop Zookeeper 3.4

2021-12-07 Thread Dongwon Kim
When should I prepare for upgrading ZK to 3.5 or newer?
We're operating a Hadoop cluster w/ ZK 3.4.6 for running only Flink jobs.
Just hope that the rolling update is not that painful - any advice on this?

Best,

Dongwon

On Tue, Dec 7, 2021 at 3:22 AM Chesnay Schepler  wrote:

> Current users of ZK 3.4 and below would need to upgrade their Zookeeper
> installation that is used by Flink to 3.5+.
>
> Whether K8s users are affected depends on whether they use ZK or not. If
> they do, see above, otherwise they are not affected at all.
>
> On 06/12/2021 18:49, Arvid Heise wrote:
>
> Could someone please help me understand the implications of the upgrade?
>
> As far as I understood this upgrade would only affect users that have a
> zookeeper shared across multiple services, some of which require ZK 3.4-? A
> workaround for those users would be to run two ZKs with different versions,
> eventually deprecating old ZK, correct?
>
> If that is the only limitation, I'm +1 for the proposal since ZK 3.4 is
> already EOL.
>
> How are K8s users affected?
>
> Best,
>
> Arvid
>
> On Mon, Dec 6, 2021 at 2:00 PM Chesnay Schepler 
> wrote:
>
>> ping @users; any input on how this would affect you is highly appreciated.
>>
>> On 25/11/2021 22:39, Chesnay Schepler wrote:
>> > I included the user ML in the thread.
>> >
>> > @users Are you still using Zookeeper 3.4? If so, were you planning to
>> > upgrade Zookeeper in the near future?
>> >
>> > I'm not sure about ZK compatibility, but we'd also upgrade Curator to
>> > 5.x, which doesn't support ookeeperK 3.4 anymore.
>> >
>> > On 25/11/2021 21:56, Till Rohrmann wrote:
>> >> Should we ask on the user mailing list whether anybody is still using
>> >> ZooKeeper 3.4 and thus needs support for this version or can a
>> ZooKeeper
>> >> 3.5/3.6 client talk to a ZooKeeper 3.4 cluster? I would expect that
>> >> not a
>> >> lot of users depend on it but just to make sure that we aren't
>> >> annoying a
>> >> lot of our users with this change. Apart from that +1 for removing it
>> if
>> >> not a lot of user depend on it.
>> >>
>> >> Cheers,
>> >> Till
>> >>
>> >> On Wed, Nov 24, 2021 at 11:03 AM Matthias Pohl > >
>> >> wrote:
>> >>
>> >>> Thanks for starting this discussion, Chesnay. +1 from my side. It's
>> >>> time to
>> >>> move forward with the ZK support considering the EOL of 3.4 you
>> already
>> >>> mentioned. The benefits we gain from upgrading Curator to 5.x as a
>> >>> consequence is another plus point. Just for reference on the
>> >>> inconsistent
>> >>> state issue you mentioned: FLINK-24543 [1].
>> >>>
>> >>> Matthias
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-24543
>> >>>
>> >>> On Wed, Nov 24, 2021 at 10:19 AM Chesnay Schepler > >
>> >>> wrote:
>> >>>
>>  Hello,
>> 
>>  I'd like to drop support for Zookeeper 3.4 in 1.15, upgrading the
>>  default to 3.5 with an opt-in for 3.6.
>> 
>>  Supporting Zookeeper 3.4 (which is already EOL) prevents us from
>>  upgrading Curator to 5.x, which would allow us to properly fix an
>>  issue
>>  with inconsistent state. It is also required to eventually support ZK
>> >>> 3.6.
>> >
>> >
>>
>>
>


[jira] [Created] (FLINK-21218) "state.checkpoints.dir" should be required only when "execution.checkpointing.interval" is specified

2021-01-30 Thread Dongwon Kim (Jira)
Dongwon Kim created FLINK-21218:
---

 Summary: "state.checkpoints.dir" should be required only when 
"execution.checkpointing.interval" is specified
 Key: FLINK-21218
 URL: https://issues.apache.org/jira/browse/FLINK-21218
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
        Reporter: Dongwon Kim


Users have to specify "state.checkpoints.dir" even when to use a state backend 
as an unreliable per-key state storage. 

Thread in user ML : 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Setting-quot-unreliable-quot-RocksDB-state-backend-w-o-quot-execution-checkpointing-interval-quot-ans-td41003.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-08-11 Thread Dongwon Kim
 Big +1 for this FLIP.

Recently I'm working on some Kafka topics that have timestamps as
metadata, not in the message body. I want to declare a table from the
topics with DDL but "rowtime_column_name" in  seems
to accept only existing columns.

> :
>   WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
>
>
I raised an issue in user@ list but committers advise to use alternative
approaches that call for detailed knowledge of Flink like custom decoding
format or conversion between DataStream API and TableEnvironment. It is
definitely against the main advantage of Flink SQL, simplicity and ease of
use. This FLIP must be implemented IMHO in order for users to derive tables
freely from any Kafka topic without having to involve DataStream API.

Best,

Dongwon

On 2020/03/01 14:30:31, Dawid Wysakowicz  wrote:
> Hi,>
>
> I would like to propose an improvement that would enable reading table>
> columns from different parts of source records. Besides the main payload>
> majority (if not all of the sources) expose additional information. It>
> can be simply a read-only metadata such as offset, ingestion time or a>
> read and write  parts of the record that contain data but additionally>
> serve different purposes (partitioning, compaction etc.), e.g. key or>
> timestamp in Kafka.>
>
> We should make it possible to read and write data from all of those>
> locations. In this proposal I discuss reading partitioning data, for>
> completeness this proposal discusses also the partitioning when writing>
> data out.>
>
> I am looking forward to your comments.>
>
> You can access the FLIP here:>
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode>

>
> Best,>
>
> Dawid>
>
>
>


[jira] [Created] (FLINK-10917) Bump up the version of codahale's metrics-core to 3.2.3 or higher

2018-11-18 Thread Dongwon Kim (JIRA)
Dongwon Kim created FLINK-10917:
---

 Summary: Bump up the version of codahale's metrics-core to 3.2.3 
or higher
 Key: FLINK-10917
 URL: https://issues.apache.org/jira/browse/FLINK-10917
 Project: Flink
  Issue Type: Improvement
  Components: Metrics
Reporter: Dongwon Kim


I've experienced back pressure once in a while from a streaming pipeline of 
mine [1].
I strongly suspect SlidingTimeWindowReservoir from 
io.dropwizard.metrics:metrics-core:3.1.5.
It is known to cause long GCs [2] so a new implementation called 
SlidingTimeWindowArrayReservoir is introduced in v3.2.3.

So I suggest to bump up codahale's metrics-core to v3.2.3 or higher to use the 
new implementation to prevent back pressure which actually has nothing to do 
with Flink itself.

I just tested compatibility very simply by importing 
io.dropwizard.metrics:metrics-core:4.0.3 in my own project in order to shadow 
v3.1.5 which is introduced by flink-metrics-dropwizard.
It works without any incompatibility issues for me; there was NoSuchMethodError.

However, I'm not sure whether bumping up to 3.2.x or 4.x is okay for other 
users.

[1] 
https://www.slideshare.net/ssuser6bb12d/realtime-driving-score-service-using-flink/30
[2] https://github.com/dropwizard/metrics/pull/1139



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8431) Allow to specify # GPUs for TaskManager in Mesos

2018-01-13 Thread Dongwon Kim (JIRA)
Dongwon Kim created FLINK-8431:
--

 Summary: Allow to specify # GPUs for TaskManager in Mesos
 Key: FLINK-8431
 URL: https://issues.apache.org/jira/browse/FLINK-8431
 Project: Flink
  Issue Type: Improvement
  Components: Cluster Management, Mesos
Reporter: Dongwon Kim
Priority: Minor


Mesos provides first-class support for Nvidia GPUs [1], but Flink does not 
exploit it when scheduling TaskManagers. If Mesos agents are configured to 
isolate GPUs as shown in [2], TaskManagers that do not specify to use GPUs 
cannot see GPUs at all.

We, therefore, need to introduce a new configuration property named 
"mesos.resourcemanager.tasks.gpus" to allow users to specify # of GPUs for each 
TaskManager process in Mesos.

[1] http://mesos.apache.org/documentation/latest/gpu-support/
[2] http://mesos.apache.org/documentation/latest/gpu-support/#agent-flags



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Want Flink startup issues :-)

2016-02-06 Thread Dongwon Kim
Hi Chiwan!

That's what I wanted to know!
Thanks!

Dongwon Kim

2016-02-06 22:00 GMT+09:00 Chiwan Park :
> Hi Dongwon,
>
> Yes, the things to do are picking an issue (by assigning the issue to you or 
> commenting on the issue) and make changes and send a pull request for it.
>
> Welcome! :)
>
> Regards,
> Chiwan Park
>
>> On Feb 6, 2016, at 3:31 PM, Dongwon Kim  wrote:
>>
>> Hi Fabian, Matthias, Robert!
>>
>> Thank you for welcoming me to the community :-)
>> I'm taking a look at JIRA and "How to contribute" as you guys suggested.
>> One trivial question is whether I just need to make a pull request
>> after figuring out issues?
>> Then I'll pick up any issue, figure it out, and then make a pull
>> request by myself ;-)
>>
>> Meanwhile, I also read the roadmap and I find few plans capturing my 
>> interest.
>> - Making YARN resource dynamic
>> - DataSet API Enhancements
>> - Expose more runtime metrics
>> Would any of you informs me of new or existing issues regarding the above?
>>
>> Thanks!
>>
>> Dongwon
>>
>> 2016-02-06 4:55 GMT+09:00 Fabian Hueske :
>>> Hi Dongwon,
>>>
>>> welcome to the Flink mailing list!
>>> What kind of issues are you interested in?
>>>
>>> - API / library features: DataSet API, DataStream API, SQL, StreamSQL,
>>> Graphs (Gelly)
>>> - Processing runtime: Batch, Streaming
>>> - Connectors to other systems: Stream sources/sinks
>>> - Web dashboard
>>> - Compatibility: Storm, Hadoop
>>>
>>> You can also have a look into Flink's issue tracker JIRA [1]. Right now, we
>>> have about 600 issues listed with any kind of difficulty and effort.
>>> If you find an issue that sounds interesting, just drop a note and we can
>>> give you some details about if you want to learn more.
>>>
>>> Best, Fabian
>>>
>>> [1]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved
>>>
>>> 2016-02-05 17:14 GMT+01:00 Dongwon Kim :
>>>
>>>> Hello,
>>>>
>>>> I'm Dongwon Kim and I want to get involved in Flink community.
>>>> Can anyone guide me through contributing to Flink with some startup issues?
>>>> Although my research interest lie in big data systems including Flink,
>>>> Spark, MapReduce, and Tez, I've never participated in open source
>>>> communities.
>>>>
>>>> FYI, I've done the following things for past few years:
>>>> - I've studied Apache Hadoop (MRv1, MRv2, and YARN), Apache Tez, and
>>>> Apache Spark through the source code.
>>>> - My doctoral thesis is about improving the performance of MRv1 by
>>>> making network pipelines between mappers and reducers like what Flink
>>>> does.
>>>> - I've used Ganglia to monitor the cluster performance and I've been
>>>> interested in metrics and counters in big data systems.
>>>> - I gave a talk named "a comparative performance evaluation of Flink"
>>>> at last Flink Forward.
>>>>
>>>> I would be very appreciated if someone can help me get involved in the
>>>> most promising ASF project :-)
>>>>
>>>> Greetings,
>>>> Dongwon Kim
>>>>
>


Re: Want Flink startup issues :-)

2016-02-05 Thread Dongwon Kim
Hi Fabian, Matthias, Robert!

Thank you for welcoming me to the community :-)
I'm taking a look at JIRA and "How to contribute" as you guys suggested.
One trivial question is whether I just need to make a pull request
after figuring out issues?
Then I'll pick up any issue, figure it out, and then make a pull
request by myself ;-)

Meanwhile, I also read the roadmap and I find few plans capturing my interest.
- Making YARN resource dynamic
- DataSet API Enhancements
- Expose more runtime metrics
Would any of you informs me of new or existing issues regarding the above?

Thanks!

Dongwon

2016-02-06 4:55 GMT+09:00 Fabian Hueske :
> Hi Dongwon,
>
> welcome to the Flink mailing list!
> What kind of issues are you interested in?
>
> - API / library features: DataSet API, DataStream API, SQL, StreamSQL,
> Graphs (Gelly)
> - Processing runtime: Batch, Streaming
> - Connectors to other systems: Stream sources/sinks
> - Web dashboard
> - Compatibility: Storm, Hadoop
>
> You can also have a look into Flink's issue tracker JIRA [1]. Right now, we
> have about 600 issues listed with any kind of difficulty and effort.
> If you find an issue that sounds interesting, just drop a note and we can
> give you some details about if you want to learn more.
>
> Best, Fabian
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved
>
> 2016-02-05 17:14 GMT+01:00 Dongwon Kim :
>
>> Hello,
>>
>> I'm Dongwon Kim and I want to get involved in Flink community.
>> Can anyone guide me through contributing to Flink with some startup issues?
>> Although my research interest lie in big data systems including Flink,
>> Spark, MapReduce, and Tez, I've never participated in open source
>> communities.
>>
>> FYI, I've done the following things for past few years:
>> - I've studied Apache Hadoop (MRv1, MRv2, and YARN), Apache Tez, and
>> Apache Spark through the source code.
>> - My doctoral thesis is about improving the performance of MRv1 by
>> making network pipelines between mappers and reducers like what Flink
>> does.
>> - I've used Ganglia to monitor the cluster performance and I've been
>> interested in metrics and counters in big data systems.
>> - I gave a talk named "a comparative performance evaluation of Flink"
>> at last Flink Forward.
>>
>> I would be very appreciated if someone can help me get involved in the
>> most promising ASF project :-)
>>
>> Greetings,
>> Dongwon Kim
>>


Want Flink startup issues :-)

2016-02-05 Thread Dongwon Kim
Hello,

I'm Dongwon Kim and I want to get involved in Flink community.
Can anyone guide me through contributing to Flink with some startup issues?
Although my research interest lie in big data systems including Flink,
Spark, MapReduce, and Tez, I've never participated in open source
communities.

FYI, I've done the following things for past few years:
- I've studied Apache Hadoop (MRv1, MRv2, and YARN), Apache Tez, and
Apache Spark through the source code.
- My doctoral thesis is about improving the performance of MRv1 by
making network pipelines between mappers and reducers like what Flink
does.
- I've used Ganglia to monitor the cluster performance and I've been
interested in metrics and counters in big data systems.
- I gave a talk named "a comparative performance evaluation of Flink"
at last Flink Forward.

I would be very appreciated if someone can help me get involved in the
most promising ASF project :-)

Greetings,
Dongwon Kim