[ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440000#comment-16440000
 ] 

Jonathan Eagles commented on TEZ-3914:
--------------------------------------

Essentially, the underlying serializer/deserializer for protobuf messages is 
the CodedOutputStream/CodedInputStream. When messages are written either with 
the writeDelimitedTo (read parallel is parseDelimitedFrom) or writeMessageNoTag 
(parallel is readMessage), first the serialized size is written and then the 
serialized message is written. In addition to this, Tez prepends the message 
type to handle message dispatching. To protect itself from remote messages, 
CodedInputStream has a default serialized size limit of 64 MB. When using the 
parseDelimitedFrom java.io.InputStream method, new underlying deserializer 
CodedInputStream is created using the default message size limit. This prevents 
larger than 64MB Tez recovery messages from being properly parsed and prevents 
further messages after the large message from being parsed as well since any 
Exception is treated the same as EOF.

There are a few possible solutions to this.
1) Ensure all recovery messages are bounded in size less than 64 MB
2) Ensure messages larger than 64 MB can be successfully read
3) Ensure large messages are non-fatal and skip them.

This jira takes aproach 2 since recovery data is not lost. In addition, this 
jira replaces the higher level writeDelimitedTo/parseDelimitedFrom with the 
writeMessageNoTag/readMessage. This allows control over the maximum message 
size as required. In addition, recover writes and reads are now more efficient 
as serializing/deserializing streams (CodedInputStream/CodedOutputStream) are 
now reused.

> Recovering a large DAG fails to size limit exceeded
> ---------------------------------------------------
>
>                 Key: TEZ-3914
>                 URL: https://issues.apache.org/jira/browse/TEZ-3914
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch, 
> TEZ-3914.003.patch
>
>
> A large message will be failed to parse and will be treated as recovery file 
> EOF.
> {noformat}
> 2018-04-16 15:33:59,807 WARN  [Thread-2] app.RecoveryParser 
> (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying 
> to read next event
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to