Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

I have another question. So if i cache something on the operator, to keep
it up to date,  i will always need to add and connect another stream of
changes to the operator ?

Is this right for every case ?

Cheers

On Wed, Aug 19, 2015 at 3:21 PM, Welly Tambunan  wrote:

> Hi Gyula,
>
> That's really helpful. The docs is improving so much since the last time
> (0.9).
>
> Thanks a lot !
>
> Cheers
>
> On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra  wrote:
>
>> Hey,
>>
>> If it is always better to check the events against a more up-to-date
>> model (even if the events we are checking arrived before the update) then
>> it is fine to keep the model outside of the system.
>>
>> In this case we need to make sure that we can push the updates to the
>> external system consistently. If you are using the PersistentKafkaSource
>> for instance it can happen that some messages are replayed in case of
>> failure. In this case you need to make sure that you remove duplicate
>> updates or have idempotent updates.
>>
>> You can read about the checkpoint mechanism in the Flink website:
>> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
>>
>> Cheers,
>> Gyula
>>
>> On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan  wrote:
>>
>>> Thanks Gyula,
>>>
>>> Another question i have..
>>>
>>> > ... while external model updates would be *tricky *to keep consistent.
>>>
>>> Is that still the case if the Operator treat the external model as
>>> read-only ? We create another stream that will update the external model
>>> separately.
>>>
>>> Could you please elaborate more about this one ?
>>>
>>> Cheers
>>>
>>> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra 
>>> wrote:
>>>
 In that case I would apply a map to wrap in some common type, like a n
 Either before the union.

 And then in the coflatmap you can unwrap it.
 On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan 
 wrote:

> Hi Gyula,
>
> Thanks.
>
> However update1 and update2 have a different type. Based on my
> understanding, i don't think we can use union. How can we handle this one 
> ?
>
> We like to create our event strongly type to get the domain language
> captured.
>
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra 
> wrote:
>
>> Hey,
>>
>> One input of your co-flatmap would be model updates and the other
>> input would be events to check against the model if I understand 
>> correctly.
>>
>> This means that if your model updates come from more than one stream
>> you need to union them into a single stream before connecting them with 
>> the
>> event stream and applying the coatmap.
>>
>> DataStream updates1 = 
>> DataStream updates2 = 
>> DataStream events = 
>>
>> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>>
>> Does this answer your question?
>>
>> Gyula
>>
>>
>> On Wednesday, August 19, 2015, Welly Tambunan 
>> wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks for your response.
>>>
>>> However the model can received multiple event for update. How can we
>>> do that with co-flatmap as i can see the connect API only received 
>>> single
>>> datastream ?
>>>
>>>
>>> > ... while external model updates would be tricky to keep
>>> consistent.
>>> Is that still the case if the Operator treat the external model as
>>> read-only ? We create another stream that will update the external model
>>> separately.
>>>
>>> Cheers
>>>
>>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra 
>>> wrote:
>>>
 Hey!

 I think it is safe to say that the best approach in this case is
 creating a co-flatmap that will receive updates on one input. The 
 events
 should probably be broadcasted in this case so you can check in 
 parallel.

 This approach can be used effectively with Flink's checkpoint
 mechanism, while external model updates would be tricky to keep 
 consistent.

 Cheers,
 Gyula




 On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
 wrote:

> Hi All,
>
> We have a streaming computation that required to validate the data
> stream against the model provided by the user.
>
> Right now what I have done is to load the model into flink
> operator and then validate against it. However the model can be 
> updated and
> changed frequently. Fortunately we always publish this event to 
> RabbitMQ.
>
> I think we can
>
>
>1. Create RabbitMq listener for model changed event from
>inside the operator, then update the model if event arrived.
>
>  

Re: when use broadcast variable and run on bigdata display this error please help

2015-08-19 Thread hagersaleh
please help



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2456.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


when use broadcast variable and run on bigdata display this error please help

2015-08-19 Thread hagersaleh
when run this program in big data display this error but when run on small
data not display error why


ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet customer = getCustomerDataSet(env,mask,l,map);

DataSet order= getOrdersDataSet(env,maskorder,l1,maporder);
customer.filter(new RichFilterFunction() {
private Collection  order1;

@Override
public void open(Configuration parameters) throws Exception {
order1 = getRuntimeContext().getBroadcastVariable("order");

}

@Override
public boolean filter(Customer c) throws Exception {
for(Orders o: order1){
//   System.out.println("c.f0="+c.f0+"o.f0="+o.f0+"   
"+c.f0.equals(o.f0));
if(((c.f0.equals(o.f1)) && (c.f1.equals("AUTOMOBILE"))) &&
((o.f2.equals("O")) || (o.f0==7)))
  return true;
}

return false;
}
}).withBroadcastSet(order,"order").writeAsCsv("/home/hadoop/Desktop/Dataset/complex_query_optimization.csv","\n","|",
WriteMode.OVERWRITE);

env.execute();

Error
08/19/2015 07:49:23Job execution switched to status RUNNING.
08/19/2015 07:49:23DataSource (at getOrdersDataSet(TPCHQuery3.java:319)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to SCHEDULED
08/19/2015 07:49:23DataSource (at getOrdersDataSet(TPCHQuery3.java:319)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to DEPLOYING
08/19/2015 07:49:23DataSource (at
getCustomerDataSet(TPCHQuery3.java:282)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to SCHEDULED
08/19/2015 07:49:23DataSource (at
getCustomerDataSet(TPCHQuery3.java:282)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to DEPLOYING
08/19/2015 07:49:23DataSource (at
getCustomerDataSet(TPCHQuery3.java:282)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to RUNNING
08/19/2015 07:49:23DataSource (at getOrdersDataSet(TPCHQuery3.java:319)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to RUNNING
08/19/2015 07:49:23Filter (Filter at main(TPCHQuery3.java:240))(1/1)
switched to SCHEDULED
08/19/2015 07:49:23Filter (Filter at main(TPCHQuery3.java:240))(1/1)
switched to DEPLOYING
08/19/2015 07:49:23Filter (Filter at main(TPCHQuery3.java:240))(1/1)
switched to RUNNING
08/19/2015 07:50:04Filter (Filter at main(TPCHQuery3.java:240))(1/1)
switched to FAILED
java.io.IOException: Materialization of the broadcast variable failed.
at
org.apache.flink.runtime.broadcast.BroadcastVariableMaterialization.materializeVariable(BroadcastVariableMaterialization.java:154)
at
org.apache.flink.runtime.broadcast.BroadcastVariableManager.materializeBroadcastVariable(BroadcastVariableManager.java:50)
at
org.apache.flink.runtime.operators.RegularPactTask.readAndSetBroadcastInput(RegularPactTask.java:439)
at
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:358)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Long.valueOf(Long.java:577)
at
org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:68)
at
org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:27)
at
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:127)
at
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:110)
at
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:64)
at
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
at
org.apache.flink.runtime.broadcast.BroadcastVariableMaterialization.materializeVariable(BroadcastVariableMaterialization.java:115)
... 5 more

08/19/2015 07:50:04Job execution switched to status FAILING.
08/19/2015 07:50:04DataSource (at
getCustomerDataSet(TPCHQuery3.java:282)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to CANCELING
08/19/2015 07:50:04DataSource (at getOrdersDataSet(TPCHQuery3.java:319)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to CANCELING
08/19/2015 07:50:04DataSink (CsvOutputFormat (path:
/home/hadoop/Desktop/Dataset/complex_query_optimization.csv, delimiter:
|))(1/1) switched to CANCELED
08/19/2015 07:50:04DataSource (at getOrdersDataSet(TPCHQuery3.java:319)
(org.apache.flink.api.java.io.CsvInputFormat))(1/1) switched to CANCELED
08/19/2015 07:50:04DataSource (at
getCustomerDataSet(TPCHQuery3.java:

what max jobmanger heab size and taskmanger heap size

2015-08-19 Thread hagersaleh
what max job manger heap size and task manger heap size



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/what-max-jobmanger-heab-size-and-taskmanger-heap-size-tp2454.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink test environment

2015-08-19 Thread Hermann Azong

Hey Andreas,

thank you much!

Cheers,
Hermann

Am 19.08.2015 um 15:19 schrieb Andreas Fritzler:

Hi Hermann,

there is a docker-compose setup for Flink:
https://github.com/apache/flink/tree/master/flink-contrib/docker-flink

Regards,
Andreas

On Wed, Aug 19, 2015 at 3:11 PM, Hermann Azong 
mailto:hermann.az...@gmail.com>> wrote:


Hey Flinkers,

for testing purposes on cluster, I would like to know if there is
a virtual machine where flink allerady work as standalone or on yarn.
Thank you in advance for answers!

Cheers,
Hermann






Re: Flink test environment

2015-08-19 Thread Hermann Azong

Hey Chiwan Park,

thank you, I will give a look!

Cheers,
Hermann

Am 19.08.2015 um 15:23 schrieb Chiwan Park:

Hi Hermann,

In 16 page of Slim’s slides [1], there is a pre-installed virtual machine based 
on VMWare. I haven’t run Flink on that machine. But maybe It works.

Regards,
Chiwan Park

[1] 
http://www.slideshare.net/sbaltagi/apache-flinkcrashcoursebyslimbaltagiandsrinipalthepu


On Aug 19, 2015, at 10:11 PM, Hermann Azong  wrote:

Hey Flinkers,

for testing purposes on cluster, I would like to know if there is a virtual 
machine where flink allerady work as standalone or on yarn.
Thank you in advance for answers!

Cheers,
Hermann







Re: Flink test environment

2015-08-19 Thread Chiwan Park
Hi Hermann,

In 16 page of Slim’s slides [1], there is a pre-installed virtual machine based 
on VMWare. I haven’t run Flink on that machine. But maybe It works.

Regards,
Chiwan Park

[1] 
http://www.slideshare.net/sbaltagi/apache-flinkcrashcoursebyslimbaltagiandsrinipalthepu

> On Aug 19, 2015, at 10:11 PM, Hermann Azong  wrote:
> 
> Hey Flinkers,
> 
> for testing purposes on cluster, I would like to know if there is a virtual 
> machine where flink allerady work as standalone or on yarn.
> Thank you in advance for answers!
> 
> Cheers,
> Hermann





Re: Flink test environment

2015-08-19 Thread Andreas Fritzler
Hi Hermann,

there is a docker-compose setup for Flink:
https://github.com/apache/flink/tree/master/flink-contrib/docker-flink

Regards,
Andreas

On Wed, Aug 19, 2015 at 3:11 PM, Hermann Azong 
wrote:

> Hey Flinkers,
>
> for testing purposes on cluster, I would like to know if there is a
> virtual machine where flink allerady work as standalone or on yarn.
> Thank you in advance for answers!
>
> Cheers,
> Hermann
>


Flink test environment

2015-08-19 Thread Hermann Azong

Hey Flinkers,

for testing purposes on cluster, I would like to know if there is a 
virtual machine where flink allerady work as standalone or on yarn.

Thank you in advance for answers!

Cheers,
Hermann


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

That's really helpful. The docs is improving so much since the last time
(0.9).

Thanks a lot !

Cheers

On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra  wrote:

> Hey,
>
> If it is always better to check the events against a more up-to-date model
> (even if the events we are checking arrived before the update) then it is
> fine to keep the model outside of the system.
>
> In this case we need to make sure that we can push the updates to the
> external system consistently. If you are using the PersistentKafkaSource
> for instance it can happen that some messages are replayed in case of
> failure. In this case you need to make sure that you remove duplicate
> updates or have idempotent updates.
>
> You can read about the checkpoint mechanism in the Flink website:
> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
>
> Cheers,
> Gyula
>
> On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan  wrote:
>
>> Thanks Gyula,
>>
>> Another question i have..
>>
>> > ... while external model updates would be *tricky *to keep consistent.
>> Is that still the case if the Operator treat the external model as
>> read-only ? We create another stream that will update the external model
>> separately.
>>
>> Could you please elaborate more about this one ?
>>
>> Cheers
>>
>> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra  wrote:
>>
>>> In that case I would apply a map to wrap in some common type, like a n
>>> Either before the union.
>>>
>>> And then in the coflatmap you can unwrap it.
>>> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan 
>>> wrote:
>>>
 Hi Gyula,

 Thanks.

 However update1 and update2 have a different type. Based on my
 understanding, i don't think we can use union. How can we handle this one ?

 We like to create our event strongly type to get the domain language
 captured.


 Cheers

 On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra 
 wrote:

> Hey,
>
> One input of your co-flatmap would be model updates and the other
> input would be events to check against the model if I understand 
> correctly.
>
> This means that if your model updates come from more than one stream
> you need to union them into a single stream before connecting them with 
> the
> event stream and applying the coatmap.
>
> DataStream updates1 = 
> DataStream updates2 = 
> DataStream events = 
>
> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>
> Does this answer your question?
>
> Gyula
>
>
> On Wednesday, August 19, 2015, Welly Tambunan 
> wrote:
>
>> Hi Gyula,
>>
>> Thanks for your response.
>>
>> However the model can received multiple event for update. How can we
>> do that with co-flatmap as i can see the connect API only received single
>> datastream ?
>>
>>
>> > ... while external model updates would be tricky to keep
>> consistent.
>> Is that still the case if the Operator treat the external model as
>> read-only ? We create another stream that will update the external model
>> separately.
>>
>> Cheers
>>
>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra 
>> wrote:
>>
>>> Hey!
>>>
>>> I think it is safe to say that the best approach in this case is
>>> creating a co-flatmap that will receive updates on one input. The events
>>> should probably be broadcasted in this case so you can check in 
>>> parallel.
>>>
>>> This approach can be used effectively with Flink's checkpoint
>>> mechanism, while external model updates would be tricky to keep 
>>> consistent.
>>>
>>> Cheers,
>>> Gyula
>>>
>>>
>>>
>>>
>>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
>>> wrote:
>>>
 Hi All,

 We have a streaming computation that required to validate the data
 stream against the model provided by the user.

 Right now what I have done is to load the model into flink operator
 and then validate against it. However the model can be updated and 
 changed
 frequently. Fortunately we always publish this event to RabbitMQ.

 I think we can


1. Create RabbitMq listener for model changed event from inside
the operator, then update the model if event arrived.

But i think this will create race condition if not handle
correctly and it seems odd to keep this

2. We can move the model into external in external memory cache
storage and keep the model up to date using flink. So the operator 
 will
retrieve that from memory cache

3. Create two stream and using co operator for managing the
shared state.


 What is your s

Re: Custom Class for state checkpointing

2015-08-19 Thread Rico Bergmann
Hi. 

Thanks for the tip. It seems to work...

Greets. 



> Am 18.08.2015 um 13:56 schrieb Stephan Ewen :
> 
> Yep, that is a valid bug!
> State is apparently not resolved with the correct classloader.
> 
> As a workaround, you can checkpoint byte arrays and serialize/deserialize the 
> state into byte arrays yourself. You can use the apache commons 
> SerializationUtil class, or Flinks InstantiationUtil class for that.
> 
> You can get the ClassLoader for the user code (needed for deserialization) 
> via "getRuntimeContext().getUserCodeClassLoader()".
> 
> Let us know if that workaround works. We'll try to get a fix for that out 
> very soon!
> 
> Greetings,
> Stephan
> 
> 
> 
>> On Tue, Aug 18, 2015 at 12:23 PM, Robert Metzger  wrote:
>> Java's HashMap is serializable.
>> If it is only the map, you can just use the HashMap<> as the state.
>> 
>> If you have more data, you can use TupleX, for example:
>> 
>> Tuple2, Long>(myMap, myLong);
>> 
>> 
>>> On Tue, Aug 18, 2015 at 12:21 PM, Rico Bergmann  
>>> wrote:
>>> Hi!
>>> 
>>> Using TupleX is not possible since the state is very big (a Hashtable). 
>>> 
>>> How would I have to do serialization into a byte array?
>>> 
>>> Greets. Rico. 
>>> 
>>> 
>>> 
 Am 18.08.2015 um 11:44 schrieb Robert Metzger :
 
 Hi Rico,
 
 I'm pretty sure that this is a valid bug you've found, since this case is 
 not yet tested (afaik).
 We'll fix the issue asap, until then, are you able to encapsulate your 
 state in something that is available in Flink, for example a TupleX or 
 just serialize it yourself into a byte[] ?
 
> On Tue, Aug 18, 2015 at 11:32 AM, Rico Bergmann  
> wrote:
> Hi!
> Is it possible to use your own class?
> I'm using the file state handler at the Jobmanager and implemented the 
> Checkpointed interface. 
> 
> I tried this and got an exception:
> 
> Error: java.lang.RuntimeException: Failed to deserialize state handle and 
> setup initial operator state.
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.ottogroup.bi.searchlab.searchsessionizer.OperatorState
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:348)
>> at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626)
>> at 
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>> at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> at 
>> org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63)
>> at 
>> org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33)
>> at 
>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83)
>> at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276)
>> at 
>> org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541)
> 


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Gyula Fóra
Hey,

If it is always better to check the events against a more up-to-date model
(even if the events we are checking arrived before the update) then it is
fine to keep the model outside of the system.

In this case we need to make sure that we can push the updates to the
external system consistently. If you are using the PersistentKafkaSource
for instance it can happen that some messages are replayed in case of
failure. In this case you need to make sure that you remove duplicate
updates or have idempotent updates.

You can read about the checkpoint mechanism in the Flink website:
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

Cheers,
Gyula
On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan  wrote:

> Thanks Gyula,
>
> Another question i have..
>
> > ... while external model updates would be *tricky *to keep consistent.
> Is that still the case if the Operator treat the external model as
> read-only ? We create another stream that will update the external model
> separately.
>
> Could you please elaborate more about this one ?
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra  wrote:
>
>> In that case I would apply a map to wrap in some common type, like a n
>> Either before the union.
>>
>> And then in the coflatmap you can unwrap it.
>> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan  wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks.
>>>
>>> However update1 and update2 have a different type. Based on my
>>> understanding, i don't think we can use union. How can we handle this one ?
>>>
>>> We like to create our event strongly type to get the domain language
>>> captured.
>>>
>>>
>>> Cheers
>>>
>>> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra 
>>> wrote:
>>>
 Hey,

 One input of your co-flatmap would be model updates and the other input
 would be events to check against the model if I understand correctly.

 This means that if your model updates come from more than one stream
 you need to union them into a single stream before connecting them with the
 event stream and applying the coatmap.

 DataStream updates1 = 
 DataStream updates2 = 
 DataStream events = 

 events.connect(updates1.union(updates2).broadcast()).flatMap(...)

 Does this answer your question?

 Gyula


 On Wednesday, August 19, 2015, Welly Tambunan 
 wrote:

> Hi Gyula,
>
> Thanks for your response.
>
> However the model can received multiple event for update. How can we
> do that with co-flatmap as i can see the connect API only received single
> datastream ?
>
>
> > ... while external model updates would be tricky to keep consistent.
>
> Is that still the case if the Operator treat the external model as
> read-only ? We create another stream that will update the external model
> separately.
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  wrote:
>
>> Hey!
>>
>> I think it is safe to say that the best approach in this case is
>> creating a co-flatmap that will receive updates on one input. The events
>> should probably be broadcasted in this case so you can check in parallel.
>>
>> This approach can be used effectively with Flink's checkpoint
>> mechanism, while external model updates would be tricky to keep 
>> consistent.
>>
>> Cheers,
>> Gyula
>>
>>
>>
>>
>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a streaming computation that required to validate the data
>>> stream against the model provided by the user.
>>>
>>> Right now what I have done is to load the model into flink operator
>>> and then validate against it. However the model can be updated and 
>>> changed
>>> frequently. Fortunately we always publish this event to RabbitMQ.
>>>
>>> I think we can
>>>
>>>
>>>1. Create RabbitMq listener for model changed event from inside
>>>the operator, then update the model if event arrived.
>>>
>>>But i think this will create race condition if not handle
>>>correctly and it seems odd to keep this
>>>
>>>2. We can move the model into external in external memory cache
>>>storage and keep the model up to date using flink. So the operator 
>>> will
>>>retrieve that from memory cache
>>>
>>>3. Create two stream and using co operator for managing the
>>>shared state.
>>>
>>>
>>> What is your suggestion on keeping the state up to date from
>>> external event ? Is there some kind of best practice for maintaining 
>>> model
>>> up to date on streaming operator ?
>>>
>>> Thanks a lot
>>>
>>>
>>> Cheers
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> h

Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Thanks Gyula,

Another question i have..

> ... while external model updates would be *tricky *to keep consistent.
Is that still the case if the Operator treat the external model as
read-only ? We create another stream that will update the external model
separately.

Could you please elaborate more about this one ?

Cheers

On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra  wrote:

> In that case I would apply a map to wrap in some common type, like a n
> Either before the union.
>
> And then in the coflatmap you can unwrap it.
> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan  wrote:
>
>> Hi Gyula,
>>
>> Thanks.
>>
>> However update1 and update2 have a different type. Based on my
>> understanding, i don't think we can use union. How can we handle this one ?
>>
>> We like to create our event strongly type to get the domain language
>> captured.
>>
>>
>> Cheers
>>
>> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra  wrote:
>>
>>> Hey,
>>>
>>> One input of your co-flatmap would be model updates and the other input
>>> would be events to check against the model if I understand correctly.
>>>
>>> This means that if your model updates come from more than one stream you
>>> need to union them into a single stream before connecting them with the
>>> event stream and applying the coatmap.
>>>
>>> DataStream updates1 = 
>>> DataStream updates2 = 
>>> DataStream events = 
>>>
>>> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>>>
>>> Does this answer your question?
>>>
>>> Gyula
>>>
>>>
>>> On Wednesday, August 19, 2015, Welly Tambunan  wrote:
>>>
 Hi Gyula,

 Thanks for your response.

 However the model can received multiple event for update. How can we do
 that with co-flatmap as i can see the connect API only received single
 datastream ?


 > ... while external model updates would be tricky to keep consistent.
 Is that still the case if the Operator treat the external model as
 read-only ? We create another stream that will update the external model
 separately.

 Cheers

 On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  wrote:

> Hey!
>
> I think it is safe to say that the best approach in this case is
> creating a co-flatmap that will receive updates on one input. The events
> should probably be broadcasted in this case so you can check in parallel.
>
> This approach can be used effectively with Flink's checkpoint
> mechanism, while external model updates would be tricky to keep 
> consistent.
>
> Cheers,
> Gyula
>
>
>
>
> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
> wrote:
>
>> Hi All,
>>
>> We have a streaming computation that required to validate the data
>> stream against the model provided by the user.
>>
>> Right now what I have done is to load the model into flink operator
>> and then validate against it. However the model can be updated and 
>> changed
>> frequently. Fortunately we always publish this event to RabbitMQ.
>>
>> I think we can
>>
>>
>>1. Create RabbitMq listener for model changed event from inside
>>the operator, then update the model if event arrived.
>>
>>But i think this will create race condition if not handle
>>correctly and it seems odd to keep this
>>
>>2. We can move the model into external in external memory cache
>>storage and keep the model up to date using flink. So the operator 
>> will
>>retrieve that from memory cache
>>
>>3. Create two stream and using co operator for managing the
>>shared state.
>>
>>
>> What is your suggestion on keeping the state up to date from external
>> event ? Is there some kind of best practice for maintaining model up to
>> date on streaming operator ?
>>
>> Thanks a lot
>>
>>
>> Cheers
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com 

>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Gyula Fóra
In that case I would apply a map to wrap in some common type, like a n
Either before the union.

And then in the coflatmap you can unwrap it.
On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan  wrote:

> Hi Gyula,
>
> Thanks.
>
> However update1 and update2 have a different type. Based on my
> understanding, i don't think we can use union. How can we handle this one ?
>
> We like to create our event strongly type to get the domain language
> captured.
>
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra  wrote:
>
>> Hey,
>>
>> One input of your co-flatmap would be model updates and the other input
>> would be events to check against the model if I understand correctly.
>>
>> This means that if your model updates come from more than one stream you
>> need to union them into a single stream before connecting them with the
>> event stream and applying the coatmap.
>>
>> DataStream updates1 = 
>> DataStream updates2 = 
>> DataStream events = 
>>
>> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>>
>> Does this answer your question?
>>
>> Gyula
>>
>>
>> On Wednesday, August 19, 2015, Welly Tambunan  wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks for your response.
>>>
>>> However the model can received multiple event for update. How can we do
>>> that with co-flatmap as i can see the connect API only received single
>>> datastream ?
>>>
>>>
>>> > ... while external model updates would be tricky to keep consistent.
>>> Is that still the case if the Operator treat the external model as
>>> read-only ? We create another stream that will update the external model
>>> separately.
>>>
>>> Cheers
>>>
>>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  wrote:
>>>
 Hey!

 I think it is safe to say that the best approach in this case is
 creating a co-flatmap that will receive updates on one input. The events
 should probably be broadcasted in this case so you can check in parallel.

 This approach can be used effectively with Flink's checkpoint
 mechanism, while external model updates would be tricky to keep consistent.

 Cheers,
 Gyula




 On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
 wrote:

> Hi All,
>
> We have a streaming computation that required to validate the data
> stream against the model provided by the user.
>
> Right now what I have done is to load the model into flink operator
> and then validate against it. However the model can be updated and changed
> frequently. Fortunately we always publish this event to RabbitMQ.
>
> I think we can
>
>
>1. Create RabbitMq listener for model changed event from inside
>the operator, then update the model if event arrived.
>
>But i think this will create race condition if not handle
>correctly and it seems odd to keep this
>
>2. We can move the model into external in external memory cache
>storage and keep the model up to date using flink. So the operator will
>retrieve that from memory cache
>
>3. Create two stream and using co operator for managing the shared
>state.
>
>
> What is your suggestion on keeping the state up to date from external
> event ? Is there some kind of best practice for maintaining model up to
> date on streaming operator ?
>
> Thanks a lot
>
>
> Cheers
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>

>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

Thanks.

However update1 and update2 have a different type. Based on my
understanding, i don't think we can use union. How can we handle this one ?

We like to create our event strongly type to get the domain language
captured.


Cheers

On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra  wrote:

> Hey,
>
> One input of your co-flatmap would be model updates and the other input
> would be events to check against the model if I understand correctly.
>
> This means that if your model updates come from more than one stream you
> need to union them into a single stream before connecting them with the
> event stream and applying the coatmap.
>
> DataStream updates1 = 
> DataStream updates2 = 
> DataStream events = 
>
> events.connect(updates1.union(updates2).broadcast()).flatMap(...)
>
> Does this answer your question?
>
> Gyula
>
>
> On Wednesday, August 19, 2015, Welly Tambunan  wrote:
>
>> Hi Gyula,
>>
>> Thanks for your response.
>>
>> However the model can received multiple event for update. How can we do
>> that with co-flatmap as i can see the connect API only received single
>> datastream ?
>>
>>
>> > ... while external model updates would be tricky to keep consistent.
>> Is that still the case if the Operator treat the external model as
>> read-only ? We create another stream that will update the external model
>> separately.
>>
>> Cheers
>>
>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  wrote:
>>
>>> Hey!
>>>
>>> I think it is safe to say that the best approach in this case is
>>> creating a co-flatmap that will receive updates on one input. The events
>>> should probably be broadcasted in this case so you can check in parallel.
>>>
>>> This approach can be used effectively with Flink's checkpoint mechanism,
>>> while external model updates would be tricky to keep consistent.
>>>
>>> Cheers,
>>> Gyula
>>>
>>>
>>>
>>>
>>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan 
>>> wrote:
>>>
 Hi All,

 We have a streaming computation that required to validate the data
 stream against the model provided by the user.

 Right now what I have done is to load the model into flink operator and
 then validate against it. However the model can be updated and changed
 frequently. Fortunately we always publish this event to RabbitMQ.

 I think we can


1. Create RabbitMq listener for model changed event from inside the
operator, then update the model if event arrived.

But i think this will create race condition if not handle correctly
and it seems odd to keep this

2. We can move the model into external in external memory cache
storage and keep the model up to date using flink. So the operator will
retrieve that from memory cache

3. Create two stream and using co operator for managing the shared
state.


 What is your suggestion on keeping the state up to date from external
 event ? Is there some kind of best practice for maintaining model up to
 date on streaming operator ?

 Thanks a lot


 Cheers


 --
 Welly Tambunan
 Triplelands

 http://weltam.wordpress.com
 http://www.triplelands.com 

>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Gyula Fóra
Hey,

One input of your co-flatmap would be model updates and the other input
would be events to check against the model if I understand correctly.

This means that if your model updates come from more than one stream you
need to union them into a single stream before connecting them with the
event stream and applying the coatmap.

DataStream updates1 = 
DataStream updates2 = 
DataStream events = 

events.connect(updates1.union(updates2).broadcast()).flatMap(...)

Does this answer your question?

Gyula


On Wednesday, August 19, 2015, Welly Tambunan  wrote:

> Hi Gyula,
>
> Thanks for your response.
>
> However the model can received multiple event for update. How can we do
> that with co-flatmap as i can see the connect API only received single
> datastream ?
>
>
> > ... while external model updates would be tricky to keep consistent.
> Is that still the case if the Operator treat the external model as
> read-only ? We create another stream that will update the external model
> separately.
>
> Cheers
>
> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  > wrote:
>
>> Hey!
>>
>> I think it is safe to say that the best approach in this case is creating
>> a co-flatmap that will receive updates on one input. The events should
>> probably be broadcasted in this case so you can check in parallel.
>>
>> This approach can be used effectively with Flink's checkpoint mechanism,
>> while external model updates would be tricky to keep consistent.
>>
>> Cheers,
>> Gyula
>>
>>
>>
>>
>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan > > wrote:
>>
>>> Hi All,
>>>
>>> We have a streaming computation that required to validate the data
>>> stream against the model provided by the user.
>>>
>>> Right now what I have done is to load the model into flink operator and
>>> then validate against it. However the model can be updated and changed
>>> frequently. Fortunately we always publish this event to RabbitMQ.
>>>
>>> I think we can
>>>
>>>
>>>1. Create RabbitMq listener for model changed event from inside the
>>>operator, then update the model if event arrived.
>>>
>>>But i think this will create race condition if not handle correctly
>>>and it seems odd to keep this
>>>
>>>2. We can move the model into external in external memory cache
>>>storage and keep the model up to date using flink. So the operator will
>>>retrieve that from memory cache
>>>
>>>3. Create two stream and using co operator for managing the shared
>>>state.
>>>
>>>
>>> What is your suggestion on keeping the state up to date from external
>>> event ? Is there some kind of best practice for maintaining model up to
>>> date on streaming operator ?
>>>
>>> Thanks a lot
>>>
>>>
>>> Cheers
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com 
>>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Welly Tambunan
Hi Gyula,

Thanks for your response.

However the model can received multiple event for update. How can we do
that with co-flatmap as i can see the connect API only received single
datastream ?


> ... while external model updates would be tricky to keep consistent.
Is that still the case if the Operator treat the external model as
read-only ? We create another stream that will update the external model
separately.

Cheers

On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra  wrote:

> Hey!
>
> I think it is safe to say that the best approach in this case is creating
> a co-flatmap that will receive updates on one input. The events should
> probably be broadcasted in this case so you can check in parallel.
>
> This approach can be used effectively with Flink's checkpoint mechanism,
> while external model updates would be tricky to keep consistent.
>
> Cheers,
> Gyula
>
>
>
>
> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan  wrote:
>
>> Hi All,
>>
>> We have a streaming computation that required to validate the data stream
>> against the model provided by the user.
>>
>> Right now what I have done is to load the model into flink operator and
>> then validate against it. However the model can be updated and changed
>> frequently. Fortunately we always publish this event to RabbitMQ.
>>
>> I think we can
>>
>>
>>1. Create RabbitMq listener for model changed event from inside the
>>operator, then update the model if event arrived.
>>
>>But i think this will create race condition if not handle correctly
>>and it seems odd to keep this
>>
>>2. We can move the model into external in external memory cache
>>storage and keep the model up to date using flink. So the operator will
>>retrieve that from memory cache
>>
>>3. Create two stream and using co operator for managing the shared
>>state.
>>
>>
>> What is your suggestion on keeping the state up to date from external
>> event ? Is there some kind of best practice for maintaining model up to
>> date on streaming operator ?
>>
>> Thanks a lot
>>
>>
>> Cheers
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com 
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: Keep Model in Operator instance up to date

2015-08-19 Thread Gyula Fóra
Hey!

I think it is safe to say that the best approach in this case is creating a
co-flatmap that will receive updates on one input. The events should
probably be broadcasted in this case so you can check in parallel.

This approach can be used effectively with Flink's checkpoint mechanism,
while external model updates would be tricky to keep consistent.

Cheers,
Gyula




On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan  wrote:

> Hi All,
>
> We have a streaming computation that required to validate the data stream
> against the model provided by the user.
>
> Right now what I have done is to load the model into flink operator and
> then validate against it. However the model can be updated and changed
> frequently. Fortunately we always publish this event to RabbitMQ.
>
> I think we can
>
>
>1. Create RabbitMq listener for model changed event from inside the
>operator, then update the model if event arrived.
>
>But i think this will create race condition if not handle correctly
>and it seems odd to keep this
>
>2. We can move the model into external in external memory cache
>storage and keep the model up to date using flink. So the operator will
>retrieve that from memory cache
>
>3. Create two stream and using co operator for managing the shared
>state.
>
>
> What is your suggestion on keeping the state up to date from external
> event ? Is there some kind of best practice for maintaining model up to
> date on streaming operator ?
>
> Thanks a lot
>
>
> Cheers
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>