Re: Serializers and Schemas

2016-12-08 Thread Matt
Hi people,

This is what I was talking about regarding a generic de/serializer for POJO
classes [1].

The Serde class in [2] can be used in both Kafka [3] and Flink [4], and it
works out of the box for any POJO class.

Do you see anything wrong in this approach? Any way to improve it?

Cheers,
Matt

[1] https://github.com/Dromit/StreamTest/
[2]
https://github.com/Dromit/StreamTest/blob/master/src/main/java/com/stream/Serde.java
[3]
https://github.com/Dromit/StreamTest/blob/master/src/main/java/com/stream/MainProducer.java#L19
[4]
https://github.com/Dromit/StreamTest/blob/master/src/main/java/com/stream/MainConsumer.java#L19



On Thu, Dec 8, 2016 at 4:15 AM, Tzu-Li (Gordon) Tai 
wrote:

> Hi Matt,
>
> 1. There’s some in-progress work on wrapper util classes for Kafka
> de/serializers here [1] that allows
> Kafka de/serializers to be used with the Flink Kafka Consumers/Producers
> with minimal user overhead.
> The PR also has some proposed adds to the documentations for the wrappers.
>
> 2. I feel that it would be good to have more documentation on Flink’s
> de/serializers because they’ve been
> frequently asked about on the mailing lists, but at the same time,
> probably the fastest / efficient de/serialization
> approach would be tailored for each use case, so we’d need to think more
> on the presentation and the purpose
> of the documentation.
>
> Cheers,
> Gordon
>
> [1] https://github.com/apache/flink/pull/2705
>
> On December 8, 2016 at 5:00:19 AM, milind parikh (milindspar...@gmail.com)
> wrote:
>
> Why not use a self-describing format  (json), stream as String and read
> through a json reader and avoid top-level reflection?
>
> Github.com/milindparikh/streamingsi
>
> https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-
> simulated-data-no-cdc-advanced-eventing/2-dataprocessing
>
> ?
>
> Apologies if I misunderstood the question. But I can quite see how to
> model your Product class (or indeed POJO) in a fairly generic way ( assumes
> JSON).
>
> The real issues faced when you have different versions of same POJO class
> requires storing enough information to dynamically instantiate the actual
> version of the class; which I believe is beyond the simple use case.
>
> Milind
> On Dec 7, 2016 2:42 PM, "Matt"  wrote:
>
>> I've read your example, but I've found the same problem. You're
>> serializing your POJO as a string, where all fields are separated by "\t".
>> This may work for you, but not in general.
>>
>> https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/ma
>> ster/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java#L60
>>
>> I would like to see a more "generic" approach for the class Product in my
>> last message. I believe a more general purpose de/serializer for POJOs
>> should be possible to achieve using reflection.
>>
>> On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi  wrote:
>>
>>> Hi Matt,
>>>
>>> I had the same problem, trying to read some records in event time using
>>> a POJO, doing some transformation and save the result into Kafka for
>>> further processing. I am not yet done but maybe the code I wrote starting
>>> from the Flink Forward 2016 training docs
>>> 
>>> could be useful.
>>>
>>> https://github.com/luigiselmi/pilot-sc4-fcd-producer
>>>
>>>
>>> Best,
>>>
>>> Luigi
>>>
>>> On 7 December 2016 at 16:35, Matt  wrote:
>>>
 Hello,

 I don't quite understand how to integrate Kafka and Flink, after a lot
 of thoughts and hours of reading I feel I'm still missing something
 important.

 So far I haven't found a non-trivial but simple example of a stream of
 a custom class (POJO). It would be good to have such an example in Flink
 docs, I can think of many many scenarios in which using SimpleStringSchema
 is not an option, but all Kafka+Flink guides insist on using that.

 Maybe we can add a simple example to the documentation [1], it would be
 really helpful for many of us. Also, explaining how to create a Flink
 De/SerializationSchema from a Kafka De/Serializer would be really useful
 and would save a lot of time to a lot of people, it's not clear why you
 need both of them or if you need both of them.

 As far as I know Avro is a common choice for serialization, but I've
 read Kryo's performance is much better (true?). I guess though that the
 fastest serialization approach is writing your own de/serializer.

 1. What do you think about adding some thoughts on this to the
 documentation?
 2. Can anyone provide an example for the following class?

 ---
 public class Product {
 public String code;
 public double price;
 public String description;
 public long created;
 }
 ---

 Regards,
 Matt

 [1] 

Re: Serializers and Schemas

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi Matt,

1. There’s some in-progress work on wrapper util classes for Kafka 
de/serializers here [1] that allows
Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with 
minimal user overhead.
The PR also has some proposed adds to the documentations for the wrappers.

2. I feel that it would be good to have more documentation on Flink’s 
de/serializers because they’ve been
frequently asked about on the mailing lists, but at the same time, probably the 
fastest / efficient de/serialization
approach would be tailored for each use case, so we’d need to think more on the 
presentation and the purpose
of the documentation.

Cheers,
Gordon

[1] https://github.com/apache/flink/pull/2705

On December 8, 2016 at 5:00:19 AM, milind parikh (milindspar...@gmail.com) 
wrote:

Why not use a self-describing format  (json), stream as String and read through 
a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model 
your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class 
requires storing enough information to dynamically instantiate the actual 
version of the class; which I believe is beyond the simple use case.

Milind

On Dec 7, 2016 2:42 PM, "Matt"  wrote:
I've read your example, but I've found the same problem. You're serializing 
your POJO as a string, where all fields are separated by "\t". This may work 
for you, but not in general.

https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/master/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java#L60

I would like to see a more "generic" approach for the class Product in my last 
message. I believe a more general purpose de/serializer for POJOs should be 
possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi  wrote:
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, 
doing some transformation and save the result into Kafka for further 
processing. I am not yet done but maybe the code I wrote starting from the 
Flink Forward 2016 training docs could be useful.

https://github.com/luigiselmi/pilot-sc4-fcd-producer


Best,

Luigi 

On 7 December 2016 at 16:35, Matt  wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of 
thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom 
class (POJO). It would be good to have such an example in Flink docs, I can 
think of many many scenarios in which using SimpleStringSchema is not an 
option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really 
helpful for many of us. Also, explaining how to create a Flink 
De/SerializationSchema from a Kafka De/Serializer would be really useful and 
would save a lot of time to a lot of people, it's not clear why you need both 
of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read 
Kryo's performance is much better (true?). I guess though that the fastest 
serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt

[1] http://data-artisans.com/kafka-flink-a-practical-how-to/



--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: +49 2241 14-2440




Re: Serializers and Schemas

2016-12-07 Thread milind parikh
Why not use a self-describing format  (json), stream as String and read
through a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model
your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class
requires storing enough information to dynamically instantiate the actual
version of the class; which I believe is beyond the simple use case.

Milind
On Dec 7, 2016 2:42 PM, "Matt"  wrote:

> I've read your example, but I've found the same problem. You're
> serializing your POJO as a string, where all fields are separated by "\t".
> This may work for you, but not in general.
>
> https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/
> master/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java#L60
>
> I would like to see a more "generic" approach for the class Product in my
> last message. I believe a more general purpose de/serializer for POJOs
> should be possible to achieve using reflection.
>
> On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi  wrote:
>
>> Hi Matt,
>>
>> I had the same problem, trying to read some records in event time using a
>> POJO, doing some transformation and save the result into Kafka for further
>> processing. I am not yet done but maybe the code I wrote starting from the 
>> Flink
>> Forward 2016 training docs
>> 
>> could be useful.
>>
>> https://github.com/luigiselmi/pilot-sc4-fcd-producer
>>
>>
>> Best,
>>
>> Luigi
>>
>> On 7 December 2016 at 16:35, Matt  wrote:
>>
>>> Hello,
>>>
>>> I don't quite understand how to integrate Kafka and Flink, after a lot
>>> of thoughts and hours of reading I feel I'm still missing something
>>> important.
>>>
>>> So far I haven't found a non-trivial but simple example of a stream of a
>>> custom class (POJO). It would be good to have such an example in Flink
>>> docs, I can think of many many scenarios in which using SimpleStringSchema
>>> is not an option, but all Kafka+Flink guides insist on using that.
>>>
>>> Maybe we can add a simple example to the documentation [1], it would be
>>> really helpful for many of us. Also, explaining how to create a Flink
>>> De/SerializationSchema from a Kafka De/Serializer would be really useful
>>> and would save a lot of time to a lot of people, it's not clear why you
>>> need both of them or if you need both of them.
>>>
>>> As far as I know Avro is a common choice for serialization, but I've
>>> read Kryo's performance is much better (true?). I guess though that the
>>> fastest serialization approach is writing your own de/serializer.
>>>
>>> 1. What do you think about adding some thoughts on this to the
>>> documentation?
>>> 2. Can anyone provide an example for the following class?
>>>
>>> ---
>>> public class Product {
>>> public String code;
>>> public double price;
>>> public String description;
>>> public long created;
>>> }
>>> ---
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] http://data-artisans.com/kafka-flink-a-practical-how-to/
>>>
>>
>>
>>
>> --
>> Luigi Selmi, M.Sc.
>> Fraunhofer IAIS Schloss Birlinghoven .
>> 53757 Sankt Augustin, Germany
>> Phone: +49 2241 14-2440 <+49%202241%20142440>
>>
>>
>


Re: Serializers and Schemas

2016-12-07 Thread Matt
I've read your example, but I've found the same problem. You're serializing
your POJO as a string, where all fields are separated by "\t". This may
work for you, but not in general.

https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/master/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java#L60

I would like to see a more "generic" approach for the class Product in my
last message. I believe a more general purpose de/serializer for POJOs
should be possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi  wrote:

> Hi Matt,
>
> I had the same problem, trying to read some records in event time using a
> POJO, doing some transformation and save the result into Kafka for further
> processing. I am not yet done but maybe the code I wrote starting from the 
> Flink
> Forward 2016 training docs
> 
> could be useful.
>
> https://github.com/luigiselmi/pilot-sc4-fcd-producer
>
>
> Best,
>
> Luigi
>
> On 7 December 2016 at 16:35, Matt  wrote:
>
>> Hello,
>>
>> I don't quite understand how to integrate Kafka and Flink, after a lot of
>> thoughts and hours of reading I feel I'm still missing something important.
>>
>> So far I haven't found a non-trivial but simple example of a stream of a
>> custom class (POJO). It would be good to have such an example in Flink
>> docs, I can think of many many scenarios in which using SimpleStringSchema
>> is not an option, but all Kafka+Flink guides insist on using that.
>>
>> Maybe we can add a simple example to the documentation [1], it would be
>> really helpful for many of us. Also, explaining how to create a Flink
>> De/SerializationSchema from a Kafka De/Serializer would be really useful
>> and would save a lot of time to a lot of people, it's not clear why you
>> need both of them or if you need both of them.
>>
>> As far as I know Avro is a common choice for serialization, but I've read
>> Kryo's performance is much better (true?). I guess though that the fastest
>> serialization approach is writing your own de/serializer.
>>
>> 1. What do you think about adding some thoughts on this to the
>> documentation?
>> 2. Can anyone provide an example for the following class?
>>
>> ---
>> public class Product {
>> public String code;
>> public double price;
>> public String description;
>> public long created;
>> }
>> ---
>>
>> Regards,
>> Matt
>>
>> [1] http://data-artisans.com/kafka-flink-a-practical-how-to/
>>
>
>
>
> --
> Luigi Selmi, M.Sc.
> Fraunhofer IAIS Schloss Birlinghoven .
> 53757 Sankt Augustin, Germany
> Phone: +49 2241 14-2440 <+49%202241%20142440>
>
>


Re: Serializers and Schemas

2016-12-07 Thread Luigi Selmi
Hi Matt,

I had the same problem, trying to read some records in event time using a
POJO, doing some transformation and save the result into Kafka for further
processing. I am not yet done but maybe the code I wrote starting from
the Flink
Forward 2016 training docs

could be useful.

https://github.com/luigiselmi/pilot-sc4-fcd-producer


Best,

Luigi

On 7 December 2016 at 16:35, Matt  wrote:

> Hello,
>
> I don't quite understand how to integrate Kafka and Flink, after a lot of
> thoughts and hours of reading I feel I'm still missing something important.
>
> So far I haven't found a non-trivial but simple example of a stream of a
> custom class (POJO). It would be good to have such an example in Flink
> docs, I can think of many many scenarios in which using SimpleStringSchema
> is not an option, but all Kafka+Flink guides insist on using that.
>
> Maybe we can add a simple example to the documentation [1], it would be
> really helpful for many of us. Also, explaining how to create a Flink
> De/SerializationSchema from a Kafka De/Serializer would be really useful
> and would save a lot of time to a lot of people, it's not clear why you
> need both of them or if you need both of them.
>
> As far as I know Avro is a common choice for serialization, but I've read
> Kryo's performance is much better (true?). I guess though that the fastest
> serialization approach is writing your own de/serializer.
>
> 1. What do you think about adding some thoughts on this to the
> documentation?
> 2. Can anyone provide an example for the following class?
>
> ---
> public class Product {
> public String code;
> public double price;
> public String description;
> public long created;
> }
> ---
>
> Regards,
> Matt
>
> [1] http://data-artisans.com/kafka-flink-a-practical-how-to/
>



-- 
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven .
53757 Sankt Augustin, Germany
Phone: +49 2241 14-2440