Set Parallelism and keyBy

2016-12-26 Thread Dominik Bruhn

Hey,
I have a flink job which has a default parallelism set to 2. I want to 
key the stream and then apply some flatMap on the keyed stream. The 
flatMap operation is quiet costly, so I want to have a much higher 
parallelism here (lets say 16). Additionally, it is important that the 
flatMap operation is executed for the same key always in the same 
process or in the same task.


I have the following code:


env.setParallelism(2)
val input: DataStream[Event] = /* from somewhere */
input.keyBy(_.eventType).flatMap(new ExpensiveOperation()).print()


This works fine, and the "ExpensiveOperation" is executed always on the 
same tasks for the same keys.


Now I tried two things:

1.

env.setParallelism(2)
val input: DataStream[Event] = /* from somewhere */
input.keyBy(_.eventType).setParallelism(16).flatMap(new 
ExpensiveOperation()).print()


This fails with an exception because I can't set the parallelism on the 
keyBy operator.


2.
-
env.setParallelism(2)
val input: DataStream[Event] = /* from somewhere */
input.keyBy(_.eventType).flatMap(new 
ExpensiveOperation()).setParallelism(16).print()

-
While this executes, it breaks the assignment of the keys to the tasks: 
The "ExpensiveOperation" is now not executed on the same nodes anymore 
all the time (visible by the prefixes in the print()).


What am I doing wrong? Is the only chance to set the whole parallelism 
of the whole flink job to 16?


Thanks, have nice holidays,
Dominik


RE: [E] writeBufferLowWaterMark cannot be greater than writeBufferHighWaterMark error

2016-12-26 Thread kanagaraj . vengidasamy

Hi,

I am getting writeBufferLowWaterMark cannot be greater than 
writeBufferHighWaterMark  error frequently ...  and those task managers not 
processing messages after that error. 
 What could be wrong in my configuration?  What I need to do to avoid this 
error.?

Have 8x32 VM's - 8 machines ( running 35 task managers - each one has 8 slots)
taskmanager.heap.mb: 4096
taskmanager.numberOfTaskSlots: 8
taskmanager.network.numberOfBuffers: 22000
taskmanager.memory.segment-size: 131072


2016-12-26 14:26:06,548|  |WARN  io.netty.bootstrap.ServerBootstrap  - Failed 
to set a channel option: [id: 0xf1ef59e6, /138.83.31.4:60812 => 
/138.83.31.9:41304]
java.lang.IllegalArgumentException: writeBufferLowWaterMark cannot be greater 
than writeBufferHighWaterMark (65536): 131073
at 
io.netty.channel.DefaultChannelConfig.setWriteBufferLowWaterMark(DefaultChannelConfig.java:334)
at 
io.netty.channel.socket.DefaultSocketChannelConfig.setWriteBufferLowWaterMark(DefaultSocketChannelConfig.java:332)
at 
io.netty.channel.socket.DefaultSocketChannelConfig.setWriteBufferLowWaterMark(DefaultSocketChannelConfig.java:35)
at 
io.netty.channel.DefaultChannelConfig.setOption(DefaultChannelConfig.java:183)
at 
io.netty.channel.socket.DefaultSocketChannelConfig.setOption(DefaultSocketChannelConfig.java:121)
at 
io.netty.bootstrap.ServerBootstrap$ServerBootstrapAcceptor.channelRead(ServerBootstrap.java:238)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at 
io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)


Thanks
Kanagaraj Vengidasamy
RTCI

7701 E Telecom PKWY
Temple Terrace, FL 33637

O 813.978.4372 | M 813.455.9757





Is there some way to use broadcastSet in streaming ?

2016-12-26 Thread 刘喆
hi, everyone
Is there some way to use broadcastSet in streaming ?  It seams that
broadcastSet is only usable in batch.


Re: Dynamic Scaling

2016-12-26 Thread Govindarajan Srinivasaraghavan
Hi All,

It would great if someone can help me with my questions. Appreciate all the 
help.

Thanks.

> On Dec 23, 2016, at 12:11 PM, Govindarajan Srinivasaraghavan 
>  wrote:
> 
> Hi,
> 
> We have a computation heavy streaming flink job which will be processing 
> around 100 million message at peak time and around 1 million messages in non 
> peak time. We need the capability to dynamically scale so that the 
> computation operator can scale up and down during high or low work loads 
> respectively without restarting the job in order to lower the machine costs.
> 
> Is there an ETA on when the rescaling a single operator without restart 
> feature will be released?
> 
> Is it possible to auto scale one of the operators with docker swarm or Amazon 
> ECS auto scaling based on kafka consumer lag or cpu consumption? If so can I 
> get some documentation or steps to achieve this behaviour.
> 
> Also is there any document on what are the tasks of a job manager apart from 
> scheduling and reporting status? 
> 
> Since there is just one job manager we just wanted to check if there would be 
> any potential scaling limitations as the processing capacity increases. 
> 
> Thanks
> Govind
>