Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Wesley

Nice news. Congrats!


Leonard Xu wrote:

Congratulations!

Thanks Zhijiang and Piotr for the great work, and thanks everyone involved!

Best,
Leonard Xu



Compress DataSink Output

2016-08-17 Thread Wesley Kerr
Hello -

Forgive me if this has been asked before, but I'm trying to determine the
best way to add compression to DataSink Outputs (starting with
TextOutputFormat).  Realistically I would like each partition file (based
on parallelism) to be compressed independently with gzip, but am open to
other solutions.

My first thought was to extend TextOutputFormat with a new class that
compresses after closing and before returning, but I'm not sure that would
work across all possible file systems (S3, Local, and HDFS).

Any thoughts?

Thanks!

Wes


Re: Compress DataSink Output

2016-08-19 Thread Wesley Kerr
That looks good.  Thanks!

On Fri, Aug 19, 2016 at 6:15 AM Robert Metzger  wrote:

> Hi Wes,
>
> Flink's own OutputFormats don't support compression, but we have some
> tools to use Hadoop's OutputFormats with Flink [1], and those support
> compression:
> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html
>
> Let me know if you need more information.
>
> Regards,
> Robert
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html
>
>
> On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr 
> wrote:
>
>> Hello -
>>
>> Forgive me if this has been asked before, but I'm trying to determine the
>> best way to add compression to DataSink Outputs (starting with
>> TextOutputFormat).  Realistically I would like each partition file
>> (based on parallelism) to be compressed independently with gzip, but am
>> open to other solutions.
>>
>> My first thought was to extend TextOutputFormat with a new class that
>> compresses after closing and before returning, but I'm not sure that would
>> work across all possible file systems (S3, Local, and HDFS).
>>
>> Any thoughts?
>>
>> Thanks!
>>
>> Wes
>>
>>
>>
>


Re: flink sql row_number() over () OOM

2019-09-04 Thread Wesley Peng

Hi

on 2019/9/4 19:30, liu ze wrote:
I use the row_number() over() function to do topN, the total amount of 
data is 60,000, and the state is 12G .

Finally, oom, is there any way to optimize it?


ref: 
https://stackoverflow.com/questions/50812837/flink-taskmanager-out-of-memory-and-memory-configuration


The total amount of required physical and heap memory is quite difficult 
to compute since it strongly depends on your user code, your job's 
topology and which state backend you use.


As a rule of thumb, if you experience OOM and are still using the 
FileSystemStateBackend or the MemoryStateBackend, then you should switch 
to RocksDBStateBackend, because it can gracefully spill to disk if the 
state grows too big.


If you are still experiencing OOM exceptions as you have described, then 
you should check your user code whether it keeps references to state 
objects or generates in some other way large objects which cannot be 
garbage collected. If this is the case, then you should try to refactor 
your code to rely on Flink's state abstraction, because with RocksDB it 
can go out of core.


RocksDB itself needs native memory which adds to Flink's memory 
footprint. This depends on the block cache size, indexes, bloom filters 
and memtables. You can find out more about these things and how to 
configure them here.


Last but not least, you should not activate 
taskmanager.memory.preallocate when running streaming jobs, because 
streaming jobs currently don't use managed memory. Thus, by activating 
preallocation, you would allocate memory for Flink's managed memory 
which is reduces the available heap space.


Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Wesley Peng

On 2019/9/6 8:55 下午, Fabian Hueske wrote:

I'm very happy to announce that Kostas Kloudas is joining the Flink PMC.
Kostas is contributing to Flink for many years and puts lots of effort 
in helping our users and growing the Flink community.


Please join me in congratulating Kostas!


congratulation Kostas!

regards.


Re: [flink-1.9] how to read local json file through Flink SQL

2019-09-08 Thread Wesley Peng

On 2019/9/8 5:40 下午, Anyang Hu wrote:
In flink1.9, is there a way to read local json file in Flink SQL like 
the reading of csv file?


hi,

might this thread help you?
http://mail-archives.apache.org/mod_mbox/flink-dev/201604.mbox/%3cCAK+0a_o5=c1_p3sylrhtznqbhplexpb7jg_oq-sptre2neo...@mail.gmail.com%3e

regards.


Re: [DISCUSS] Drop older versions of Kafka Connectors (0.9, 0.10) for Flink 1.10

2019-09-11 Thread Wesley Peng




on 2019/9/11 16:17, Stephan Ewen wrote:

We still maintain connectors for Kafka 0.8 and 0.9 in Flink.
I would suggest to drop those with Flink 1.10 and start supporting only 
Kafka 0.10 onwards.


Are there any concerns about this, or still a significant number of 
users of these versions?


But we still have a large scale of deployment kafka 0.9 in production. 
Do you mean the new coming flink won't support kafka 0.9?

Though I understand for it, but just sounds sorry. :)

regards.


Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Wesley Peng

Hi

on 2019/9/11 17:22, Till Rohrmann wrote:
I'm very happy to announce that Zili Chen (some of you might also know 
him as Tison Kun) accepted the offer of the Flink PMC to become a 
committer of the Flink project.


Congratulations Zili Chen.

regards.