Re: Serialization in Apex

Bhupesh Chawda Wed, 18 May 2016 10:16:11 -0700

I agree that the system behaviour must be predictable. However, this can
still be an optional configuration which users can use if they need to get
going even if it is sub optimal.


Even Storm does not enforce it and has a config parameter:
Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION

~Bhupesh

On Tue, May 17, 2016 at 1:17 PM, Vlad Rozov <[email protected]> wrote:

> +1. Java serialization is much slower than Kryo serialization and using
> Java serialization must be an explicit application designer choice.
>
> Thank you,
> Vlad
>
>
> On 5/17/16 11:50, Thomas Weise wrote:
>
>> IMO automatically picking a serialializer conflicts with predictable
>> system
>> behavior. If the serialization does not work I would want to know that
>> instead of the system doing some trick and arrive at suboptimal or faulty
>> behavior.
>>
>> That does not mean we cannot have optimizations though, as long as there
>> is
>> explicit user control.
>>
>> Thomas
>>
>>
>> On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <[email protected]
>> >
>> wrote:
>>
>> As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer
>>> annotations. However, these are tightly coupled with the field in
>>> question
>>> and do require modifying external code. Additionally it may also break
>>> other systems, if we are binding it to a JavaSerializer and perhaps there
>>> are systems which have other means of serializing the field.
>>>
>>> My point was more to do with user having to worry about what serializer
>>> to
>>> use and how to serialize objects.
>>> For example, I liked the approach that Storm takes by falling back to
>>> Java
>>> serialization automatically in case the target class does not have a
>>> default constructor.
>>>
>>> Of course, we can explore type based serialization. But this email was
>>> more
>>> about the usability aspect; to handle classes not having default
>>> constructors in general, not just POJO tuples.
>>>
>>> ~Bhupesh
>>>
>>>
>>>
>>> On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <[email protected]
>>> >
>>> wrote:
>>>
>>> Can we do a test where we hard code a codec for a POJO and compare
>>>> performance against kryo. Thereafter we can dynamically compose a
>>>> codec via pojoutils and inject it.
>>>>
>>>> Thanks
>>>>
>>>> On May 17, 2016, at 8:16 AM, Vlad Rozov <[email protected]>
>>>>>
>>>> wrote:
>>>
>>>> +1 for type based serialization. Tuples in most cases are flat
>>>>>
>>>> records/pojo and it should be possible programmatically construct a
>>>> codec
>>>> that will significantly outperform Kryo. It should also reduce amount of
>>>> data passed over the wire. I started to look in that direction as well
>>>> as
>>>> Kryo serialization is one of bottlenecks that limits Apex throughput
>>>> when
>>>> operators are deployed into different containers including NODE_LOCAL
>>>>
>>> case.
>>>
>>>> Thank you,
>>>>> Vlad
>>>>>
>>>>> On 5/17/16 07:13, Sandesh Hegde wrote:
>>>>>> If it is possible to serialize, platform should do it automatically,
>>>>>>
>>>>> it
>>>
>>>> reduces the tribal knowledge requirement to use the platform. Couples
>>>>>>
>>>>> of
>>>
>>>> month back, I also sent out the similar email.
>>>>>>
>>>>>> Type based serialization may improve the performance.
>>>>>>
>>>>>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <[email protected]
>>>>>>>
>>>>>> wrote:
>>>>
>>>>> Traditionally, we've recommended using
>>>>>>> "@DefaultSerializer(JavaSerializer.class)" or
>>>>>>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at
>>>>>>>
>>>>>>>
>>>>>>>
>>> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
>>>
>>>> Can you describe why those approaches are not adequate ?
>>>>>>>
>>>>>>> Ram
>>>>>>>
>>>>>>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <
>>>>>>>
>>>>>> [email protected]>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> While working on the integration of Apex with Apache Samoa, I am
>>>>>>>>
>>>>>>> coming
>>>>
>>>>> across some scenarios where I have to add default constructors in
>>>>>>>>
>>>>>>> some
>>>
>>>> external classes to make them Kryo serializable. Although this
>>>>>>>>
>>>>>>> should
>>>
>>>> be
>>>>
>>>>> okay, we would like to avoid modifying external classes as far as
>>>>>>>>
>>>>>>> possible.
>>>>>>>
>>>>>>>> Some other streaming engines have taken different approaches towards
>>>>>>>> serialization.
>>>>>>>>
>>>>>>>> I looked at Flink and Storm serialization mechanisms.
>>>>>>>>
>>>>>>>> Storm has a fall back mechanism on Java serialization. It does use
>>>>>>>>
>>>>>>> Kryo
>>>>
>>>>> for
>>>>>>>
>>>>>>>> serialization due to performance. But, if the class is not
>>>>>>>>
>>>>>>> serializable
>>>>
>>>>> using Kryo, then it will try to serialize it using Java
>>>>>>>>
>>>>>>> serialization. If
>>>>
>>>>> even then it cannot serialize, then it throws an error. [1]
>>>>>>>>
>>>>>>>> Flink has its own serialization stack where it uses a serializer
>>>>>>>>
>>>>>>> based on
>>>>
>>>>> the type information known about the data. [2]
>>>>>>>>
>>>>>>>> What does the community think about the current state of
>>>>>>>>
>>>>>>> serialization in
>>>>
>>>>> Apex. Is there a need to explore some approaches which could avoid
>>>>>>>> serialization issues such as the one described above? Are there any
>>>>>>>>
>>>>>>> other
>>>>
>>>>> approaches one could use?
>>>>>>>>
>>>>>>>> 1.
>>>>>>>>
>>>>>>>
>>> http://storm.apache.org/releases/current/Serialization.html#java-serialization
>>>
>>>> 2.
>>>>>>>>
>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
>>>
>>>> ~Bhupesh
>>>>>>>>
>>>>>>>
>

Re: Serialization in Apex

Reply via email to