Re: Broadcast state

2019-10-19 Thread Congxian Qiu
By using Redis, you can store all data in one job in one single Redis, no
need one slot one Redis, what do you think?

Best,
Congxian


Navneeth Krishnan  于2019年10月18日周五 上午4:47写道:

> Ya, there will not be a problem of duplicates. But what I'm trying to
> achieve is if there a large static state which needs to be present just one
> per node rather than storing it per slot that would be ideal. The reason
> being is that the state is quite large around 100GB of mostly static data
> and it is not needed at per slot level. It can be at per instance level
> where each slot can read from this shared memory.
>
> Thanks
>
> On Wed, Oct 9, 2019 at 12:13 AM Congxian Qiu 
> wrote:
>
>> Hi,
>>
>> After using Redis, why there need to care about eliminate duplicated
>> data, if you specify the same key, then Redis will do the deduplicate
>> things.
>>
>> Best,
>> Congxian
>>
>>
>> Fabian Hueske  于2019年10月2日周三 下午5:30写道:
>>
>>> Hi,
>>>
>>> State is always associated with a single task in Flink.
>>> The state of a task cannot be accessed by other tasks of the same
>>> operator or tasks of other operators.
>>> This is true for every type of state, including broadcast state.
>>>
>>> Best, Fabian
>>>
>>>
>>> Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth Krishnan <
>>> reachnavnee...@gmail.com>:
>>>
 Hi,

 I can use redis but I’m still having hard time figuring out how I can
 eliminate duplicate data. Today without broadcast state in 1.4 I’m using
 cache to lazy load the data. I thought the broadcast state will be similar
 to that of kafka streams where I have read access to the state across the
 pipeline. That will indeed solve a lot of problems. Is there some way I can
 do the same with flink?

 Thanks!

 On Mon, Sep 30, 2019 at 10:36 PM Congxian Qiu 
 wrote:

> Hi,
>
> Could you use some cache system such as HBase or Reids to storage this
> data, and query from the cache if needed?
>
> Best,
> Congxian
>
>
> Navneeth Krishnan  于2019年10月1日周二 上午10:15写道:
>
>> Thanks Oytun. The problem with doing that is the same data will be
>> have to be stored multiple times wasting memory. In my case there will
>> around million entries which needs to be used by at least two operators 
>> for
>> now.
>>
>> Thanks
>>
>> On Mon, Sep 30, 2019 at 5:42 PM Oytun Tez  wrote:
>>
>>> This is how we currently use broadcast state. Our states are
>>> re-usable (code-wise), every operator that wants to consume basically 
>>> keeps
>>> the same descriptor state locally by processBroadcastElement'ing into a
>>> local state.
>>>
>>> I am open to suggestions. I see this as a hard drawback of dataflow
>>> programming or Flink framework?
>>>
>>>
>>>
>>> ---
>>> Oytun Tez
>>>
>>> *M O T A W O R D*
>>> The World's Fastest Human Translation Platform.
>>> oy...@motaword.com — www.motaword.com
>>>
>>>
>>> On Mon, Sep 30, 2019 at 8:40 PM Oytun Tez 
>>> wrote:
>>>
 You can re-use the broadcasted state (along with its descriptor)
 that comes into your KeyedBroadcastProcessFunction, in another operator
 downstream. that's basically duplicating the broadcasted state 
 whichever
 operator you want to use, every time.



 ---
 Oytun Tez

 *M O T A W O R D*
 The World's Fastest Human Translation Platform.
 oy...@motaword.com — www.motaword.com


 On Mon, Sep 30, 2019 at 8:29 PM Navneeth Krishnan <
 reachnavnee...@gmail.com> wrote:

> Hi All,
>
> Is it possible to access a broadcast state across the pipeline?
> For example, say I have a KeyedBroadcastProcessFunction which adds the
> incoming data to state and I have downstream operator where I need 
> the same
> state as well, would I be able to just read the broadcast state with a
> readonly view. I know this is possible in kafka streams.
>
> Thanks
>



Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Hequn Cheng
Thanks a lot to Jark, Jincheng, and everyone that make this release
possible.

Best, Hequn

On Sat, Oct 19, 2019 at 10:29 PM Zili Chen  wrote:

> Thanks a lot for being release manager Jark. Great work!
>
> Best,
> tison.
>
>
> Till Rohrmann  于2019年10月19日周六 下午10:15写道:
>
>> Thanks a lot for being our release manager Jark and thanks to everyone
>> who has helped to make this release possible.
>>
>> Cheers,
>> Till
>>
>> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>>
>>>  Hi,
>>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>>> 1.9 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>>
>>> The full release notes are available in Jira:
>>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who helped to verify this release and made this release possible!
>>> Great thanks to @Jincheng for helping finalize this release.
>>>
>>> Regards,
>>> Jark Wu
>>>
>>


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Hequn Cheng
Thanks a lot to Jark, Jincheng, and everyone that make this release
possible.

Best, Hequn

On Sat, Oct 19, 2019 at 10:29 PM Zili Chen  wrote:

> Thanks a lot for being release manager Jark. Great work!
>
> Best,
> tison.
>
>
> Till Rohrmann  于2019年10月19日周六 下午10:15写道:
>
>> Thanks a lot for being our release manager Jark and thanks to everyone
>> who has helped to make this release possible.
>>
>> Cheers,
>> Till
>>
>> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>>
>>>  Hi,
>>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>>> 1.9 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>>
>>> The full release notes are available in Jira:
>>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who helped to verify this release and made this release possible!
>>> Great thanks to @Jincheng for helping finalize this release.
>>>
>>> Regards,
>>> Jark Wu
>>>
>>


Submitting jobs via REST

2019-10-19 Thread Timothy Victor
I have a flink docker image with my job's JAR already contained within.  I
would like to run a job with this jar via the REST api.  Is that possible?

I know I can run a job via REST using JarID (ID assigned by flink when a
jar is uploaded).   However I don't have such an ID since this jar is
already part of the image.

Via CLI I can start a job using classpath.  But can I do the same via the
REST api.   Any other ways to achieve this?

Thanks

Tim


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Zili Chen
Thanks a lot for being release manager Jark. Great work!

Best,
tison.


Till Rohrmann  于2019年10月19日周六 下午10:15写道:

> Thanks a lot for being our release manager Jark and thanks to everyone who
> has helped to make this release possible.
>
> Cheers,
> Till
>
> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>
>>  Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>
>> We would like to thank all contributors of the Apache Flink community who
>> helped to verify this release and made this release possible!
>> Great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Jark Wu
>>
>


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Zili Chen
Thanks a lot for being release manager Jark. Great work!

Best,
tison.


Till Rohrmann  于2019年10月19日周六 下午10:15写道:

> Thanks a lot for being our release manager Jark and thanks to everyone who
> has helped to make this release possible.
>
> Cheers,
> Till
>
> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>
>>  Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>
>> We would like to thank all contributors of the Apache Flink community who
>> helped to verify this release and made this release possible!
>> Great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Jark Wu
>>
>


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Till Rohrmann
Thanks a lot for being our release manager Jark and thanks to everyone who
has helped to make this release possible.

Cheers,
Till

On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:

>  Hi,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.9.1, which is the first bugfix release for the Apache Flink 1.9
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>
> We would like to thank all contributors of the Apache Flink community who
> helped to verify this release and made this release possible!
> Great thanks to @Jincheng for helping finalize this release.
>
> Regards,
> Jark Wu
>


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Till Rohrmann
Thanks a lot for being our release manager Jark and thanks to everyone who
has helped to make this release possible.

Cheers,
Till

On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:

>  Hi,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.9.1, which is the first bugfix release for the Apache Flink 1.9
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>
> We would like to thank all contributors of the Apache Flink community who
> helped to verify this release and made this release possible!
> Great thanks to @Jincheng for helping finalize this release.
>
> Regards,
> Jark Wu
>


Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Yong Hua

On 2019/10/19 9:07 下午, Jark Wu wrote:
The Apache Flink community is very happy to announce the release of 
Apache Flink 1.9.1, which is the first bugfix release for the Apache 
Flink 1.9 series.


Apache Flink® is an open-source stream processing framework for 
distributed, high-performing, always-available, and accurate data 
streaming applications.


that's awesome. thanks for your work.

regards.


[ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Jark Wu
 Hi,

The Apache Flink community is very happy to announce the release of Apache
Flink 1.9.1, which is the first bugfix release for the Apache Flink 1.9
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/news/2019/10/18/release-1.9.1.html

The full release notes are available in Jira:
https://issues.apache.org/jira/projects/FLINK/versions/12346003

We would like to thank all contributors of the Apache Flink community who
helped to verify this release and made this release possible!
Great thanks to @Jincheng for helping finalize this release.

Regards,
Jark Wu


[ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Jark Wu
 Hi,

The Apache Flink community is very happy to announce the release of Apache
Flink 1.9.1, which is the first bugfix release for the Apache Flink 1.9
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/news/2019/10/18/release-1.9.1.html

The full release notes are available in Jira:
https://issues.apache.org/jira/projects/FLINK/versions/12346003

We would like to thank all contributors of the Apache Flink community who
helped to verify this release and made this release possible!
Great thanks to @Jincheng for helping finalize this release.

Regards,
Jark Wu


Re: Customize Part file naming (Flink 1.9.0)

2019-10-19 Thread Taher Koitawala
Beware when using Bucketing sink as it does not follow exactly once
semantics. Also it has issues with s3 consistency.



On Sat, Oct 19, 2019, 1:42 PM Ravi Bhushan Ratnakar <
ravibhushanratna...@gmail.com> wrote:

> Hi,
>
> As an alternative, you may use BucketingSink which provides you the
> provision to customize suffix/prefix.
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html
>
> Regards,
> Ravi
>
> On Sat, Oct 19, 2019 at 3:54 AM amran dean  wrote:
>
>> Hello,
>> StreamingFileSink's part file naming convention is not adjustable. It has
>> form: *part--. *
>>
>> My use case for StreamingFileSink is a Kafka -> S3 pipeline, and files
>> are read and processed from S3 using spark. In almost all cases, I want to
>> compress raw data before writing to S3 using the BulkFormat.
>>
>> Spark relies on filename extensions to do compression inference, so the
>> current naming scheme results in gibberish. I see that 1.10 currently
>> provides the ability to customize the suffix/prefix, but I really need an
>> alternative solution to this as soon as possible. Can this be backported to
>> 1.9, or are there alternatives?
>>
>>
>>


Re: Customize Part file naming (Flink 1.9.0)

2019-10-19 Thread Ravi Bhushan Ratnakar
Hi,

As an alternative, you may use BucketingSink which provides you the
provision to customize suffix/prefix.
https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html

Regards,
Ravi

On Sat, Oct 19, 2019 at 3:54 AM amran dean  wrote:

> Hello,
> StreamingFileSink's part file naming convention is not adjustable. It has
> form: *part--. *
>
> My use case for StreamingFileSink is a Kafka -> S3 pipeline, and files are
> read and processed from S3 using spark. In almost all cases, I want to
> compress raw data before writing to S3 using the BulkFormat.
>
> Spark relies on filename extensions to do compression inference, so the
> current naming scheme results in gibberish. I see that 1.10 currently
> provides the ability to customize the suffix/prefix, but I really need an
> alternative solution to this as soon as possible. Can this be backported to
> 1.9, or are there alternatives?
>
>
>