Re: [ANNOUNCE] Apache Flink 1.11.0 released
Nice news. Congrats! Leonard Xu wrote: Congratulations! Thanks Zhijiang and Piotr for the great work, and thanks everyone involved! Best, Leonard Xu
Compress DataSink Output
Hello - Forgive me if this has been asked before, but I'm trying to determine the best way to add compression to DataSink Outputs (starting with TextOutputFormat). Realistically I would like each partition file (based on parallelism) to be compressed independently with gzip, but am open to other solutions. My first thought was to extend TextOutputFormat with a new class that compresses after closing and before returning, but I'm not sure that would work across all possible file systems (S3, Local, and HDFS). Any thoughts? Thanks! Wes
Re: Compress DataSink Output
That looks good. Thanks! On Fri, Aug 19, 2016 at 6:15 AM Robert Metzger wrote: > Hi Wes, > > Flink's own OutputFormats don't support compression, but we have some > tools to use Hadoop's OutputFormats with Flink [1], and those support > compression: > https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html > > Let me know if you need more information. > > Regards, > Robert > > [1]: > https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html > > > On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr > wrote: > >> Hello - >> >> Forgive me if this has been asked before, but I'm trying to determine the >> best way to add compression to DataSink Outputs (starting with >> TextOutputFormat). Realistically I would like each partition file >> (based on parallelism) to be compressed independently with gzip, but am >> open to other solutions. >> >> My first thought was to extend TextOutputFormat with a new class that >> compresses after closing and before returning, but I'm not sure that would >> work across all possible file systems (S3, Local, and HDFS). >> >> Any thoughts? >> >> Thanks! >> >> Wes >> >> >> >
Re: flink sql row_number() over () OOM
Hi on 2019/9/4 19:30, liu ze wrote: I use the row_number() over() function to do topN, the total amount of data is 60,000, and the state is 12G . Finally, oom, is there any way to optimize it? ref: https://stackoverflow.com/questions/50812837/flink-taskmanager-out-of-memory-and-memory-configuration The total amount of required physical and heap memory is quite difficult to compute since it strongly depends on your user code, your job's topology and which state backend you use. As a rule of thumb, if you experience OOM and are still using the FileSystemStateBackend or the MemoryStateBackend, then you should switch to RocksDBStateBackend, because it can gracefully spill to disk if the state grows too big. If you are still experiencing OOM exceptions as you have described, then you should check your user code whether it keeps references to state objects or generates in some other way large objects which cannot be garbage collected. If this is the case, then you should try to refactor your code to rely on Flink's state abstraction, because with RocksDB it can go out of core. RocksDB itself needs native memory which adds to Flink's memory footprint. This depends on the block cache size, indexes, bloom filters and memtables. You can find out more about these things and how to configure them here. Last but not least, you should not activate taskmanager.memory.preallocate when running streaming jobs, because streaming jobs currently don't use managed memory. Thus, by activating preallocation, you would allocate memory for Flink's managed memory which is reduces the available heap space.
Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC
On 2019/9/6 8:55 下午, Fabian Hueske wrote: I'm very happy to announce that Kostas Kloudas is joining the Flink PMC. Kostas is contributing to Flink for many years and puts lots of effort in helping our users and growing the Flink community. Please join me in congratulating Kostas! congratulation Kostas! regards.
Re: [flink-1.9] how to read local json file through Flink SQL
On 2019/9/8 5:40 下午, Anyang Hu wrote: In flink1.9, is there a way to read local json file in Flink SQL like the reading of csv file? hi, might this thread help you? http://mail-archives.apache.org/mod_mbox/flink-dev/201604.mbox/%3cCAK+0a_o5=c1_p3sylrhtznqbhplexpb7jg_oq-sptre2neo...@mail.gmail.com%3e regards.
Re: [DISCUSS] Drop older versions of Kafka Connectors (0.9, 0.10) for Flink 1.10
on 2019/9/11 16:17, Stephan Ewen wrote: We still maintain connectors for Kafka 0.8 and 0.9 in Flink. I would suggest to drop those with Flink 1.10 and start supporting only Kafka 0.10 onwards. Are there any concerns about this, or still a significant number of users of these versions? But we still have a large scale of deployment kafka 0.9 in production. Do you mean the new coming flink won't support kafka 0.9? Though I understand for it, but just sounds sorry. :) regards.
Re: [ANNOUNCE] Zili Chen becomes a Flink committer
Hi on 2019/9/11 17:22, Till Rohrmann wrote: I'm very happy to announce that Zili Chen (some of you might also know him as Tison Kun) accepted the offer of the Flink PMC to become a committer of the Flink project. Congratulations Zili Chen. regards.