I think I found where the problem comes from.
I am writing lzo compressed thrift records using elephant-bird, my guess is
that perhaps one side is computing the checksum based on the uncompressed
data and the other on the compressed data, thus getting a mismatch.
When writing the data as strings
Hi,
I am not sure my problem is relevant to spark, but perhaps someone else had
the same error. When I try to write files that need multipart upload to S3
from a job on EMR I always get this error:
com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you
specified did not match