Re: Serialization in Apex

Pramod Immaneni Tue, 17 May 2016 09:54:01 -0700

Can we do a test where we hard code a codec for a POJO and compare
performance against kryo. Thereafter we can dynamically compose a
codec via pojoutils and inject it.


Thanks

> On May 17, 2016, at 8:16 AM, Vlad Rozov <[email protected]> wrote:
>
> +1 for type based serialization. Tuples in most cases are flat records/pojo 
> and it should be possible programmatically construct a codec that will 
> significantly outperform Kryo. It should also reduce amount of data passed 
> over the wire. I started to look in that direction as well as Kryo 
> serialization is one of bottlenecks that limits Apex throughput when 
> operators are deployed into different containers including NODE_LOCAL case.
>
> Thank you,
> Vlad
>
>> On 5/17/16 07:13, Sandesh Hegde wrote:
>> If it is possible to serialize, platform should do it automatically, it
>> reduces the tribal knowledge requirement to use the platform. Couples of
>> month back, I also sent out the similar email.
>>
>> Type based serialization may improve the performance.
>>
>>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <[email protected]> 
>>> wrote:
>>>
>>> Traditionally, we've recommended using
>>> "@DefaultSerializer(JavaSerializer.class)" or
>>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at
>>>
>>> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception
>>>
>>> Can you describe why those approaches are not adequate ?
>>>
>>> Ram
>>>
>>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> While working on the integration of Apex with Apache Samoa, I am coming
>>>> across some scenarios where I have to add default constructors in some
>>>> external classes to make them Kryo serializable. Although this should be
>>>> okay, we would like to avoid modifying external classes as far as
>>> possible.
>>>> Some other streaming engines have taken different approaches towards
>>>> serialization.
>>>>
>>>> I looked at Flink and Storm serialization mechanisms.
>>>>
>>>> Storm has a fall back mechanism on Java serialization. It does use Kryo
>>> for
>>>> serialization due to performance. But, if the class is not serializable
>>>> using Kryo, then it will try to serialize it using Java serialization. If
>>>> even then it cannot serialize, then it throws an error. [1]
>>>>
>>>> Flink has its own serialization stack where it uses a serializer based on
>>>> the type information known about the data. [2]
>>>>
>>>> What does the community think about the current state of serialization in
>>>> Apex. Is there a need to explore some approaches which could avoid
>>>> serialization issues such as the one described above? Are there any other
>>>> approaches one could use?
>>>>
>>>> 1.
>>> http://storm.apache.org/releases/current/Serialization.html#java-serialization
>>>> 2.
>>> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
>>>>
>>>> ~Bhupesh
>

Re: Serialization in Apex

Reply via email to