Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread Hung
Hi guys,

I found the testing part mentioned 

make sources and sinks pluggable in your production code and inject special
test sources and test sinks in your tests.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-flink-jobs

I think it would be useful to have a documented example as the section
*testing stateful operato*r does, which demonstrates by WindowOperatorTest
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#unit-testing-stateful-or-timely-udfs--custom-operators

or, is there perhaps already a test that plugs sources and sinks?





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread Hung
Hi guys,

I found the testing part mentioned 

make sources and sinks pluggable in your production code and inject special
test sources and test sinks in your tests.
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#testing-flink-jobs

I think it would be useful to have a documented example as the section
*testing stateful operato*r does, which demonstrates by WindowOperatorTest
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#unit-testing-stateful-or-timely-udfs--custom-operators

or, is there perhaps already a test that plugs sources and sinks?





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Flink Standalone cluster - production settings

2019-02-21 Thread Hung
/ Each job has 3 asynch operators 
with Executors with thread counts of 20,20,100/

Flink handles parallelisms for you. If you want a higher parallelism of a
operator, you can call setParallelism()
for example,

flatMap(new Mapper1()).setParallelism(20)
flatMap(new Mapper2()).setParallelism(20)
flatMap(new Mapper3()).setParallelism(100)

You can check the official document here
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/parallel.html#setting-the-parallelism

/Currently we are using parallelism = 1/
I guess you set the job level parallelism

I would suggest you replace Executors with the use of Flink parallelisms. It
would be more efficient so 
you don't create the other thread pool although you already have one that
flink provides you(I maybe not right describing this concept)

Cheers,

Sendoh





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Apache Flink Examples

2018-04-27 Thread Hung
in my case I usually check the tests they write for each function I want to
use. 

Take CountTrigger as an example, if I want to customize my own way of
counting, I will have a look at 
the test the write

https://github.com/apache/flink/blob/8dfb9d00653271ea4adbeb752da8f62d7647b6d8/flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CountTriggerTest.java

Then I understand how this function is expected to work, and then I write my
own test with my expected  result.

Test is the best documentation I would say. 

Also there is an example folder in github.
https://github.com/apache/flink/tree/master/flink-examples

Best,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: End-to-end exactly once from kafka source to S3 sink

2018-01-31 Thread Hung
"Flink will only commit the kafka offsets when the data has been saved to S3"
-> no, you can check the BucketingSink code, and it would mean BucketingSink
depends on Kafka which is not reasonable.

Flink stores checkpoint in disk of each worker, not Kafka.
(KafkaStream, the other streaming API provided by Kafka, stores checkpoint
back to Kafka)

So, bucket size doesn't affect the commit frequency.

Best,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Advice or best practices on adding metadata to stream events

2018-01-31 Thread Hung
So there are three ways. 
1. make your model as stream source
2. let master read the model once, distribute it via constructor, and update
it periodically
3. let worker read the model and update it periodically(you mentioned)

option 3 would be problematic if you scale a lot and use many parallelisms
because there are too many connections.

option 2 is the best, if you don't have to update your model. otherwise you
have to restart your flink job to get the new model, or implement this
update logic your own.

option 1 for me is the best if you need to update the model. so you can
control how often you read

Best,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: [Window] Per key window

2018-01-31 Thread Hung
after you keyBy() each of your window has its group of events.

or what you want is a global window?

Best,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: How does BucketingSink generate a SUCCESS file when a directory is finished

2018-01-31 Thread Hung
it depends on how you partition your file. in my case I write file per hour,
so I'm sure that file is ready after that hour period, in processing time.
Here, read to be ready means this file contains all the data in that hour
period.

If the downstream runs in a batch way, you may want to ensure the file is
ready.
In this case, ready to read can mean all the data before watermark as
arrived.
You could take the BucketingSink and implement this logic there, maybe wait
until watermark
reaches

Best,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Scaling Flink

2018-01-24 Thread Hung
What about scaling up with #task slots left?
You can obtain this information from Flink's endpoint.

Cheers,

Sendoh



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Is it possible to restart only the function that fails instead of entire job?

2016-07-01 Thread Chia-Hung Lin
After reading the document and configuring to test failure strategy,
it seems to me Flink restarts the job once any failures (e.g.
exception thrown, etc.) occur.

https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

My question:

Is it possible to configure in allowing the function that fails to
recover instead of restarting entire job (like Erlang's One For One
Supervision)? For instance within a job the parallelism is configured
to 100, so at runtime 100 maps instances are executed. Now one of map
functions fails, we want to recover the failed map function because
other map functions are functioning normally. Is it possible to
achieve such effect?

Thanks