I think I found where the problem comes from.

I am writing lzo compressed thrift records using elephant-bird, my guess is
that perhaps one side is computing the checksum based on the uncompressed
data and the other on the compressed data, thus getting a mismatch.

When writing the data as strings using a plain TextOutputFormat, the multi
part upload works, this confirms that the lzo compression is probably the
problem... but it is not a solution :(

2015-04-13 18:46 GMT+02:00 Eugen Cepoi <cepoi.eu...@gmail.com>:

> Hi,
>
> I am not sure my problem is relevant to spark, but perhaps someone else
> had the same error. When I try to write files that need multipart upload to
> S3 from a job on EMR I always get this error:
>
> com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you
> specified did not match what we received.
>
> If I disable multipart upload via fs.s3n.multipart.uploads.enabled (or
> output smaller files that don't require multi part upload), then everything
> works fine.
>
> I've seen an old thread on the ML where someone has the same error, but in
> my case I don't have any other errors on the worker nodes.
>
> I am using spark 1.2.1 and hadoop 2.4.0.
>
> Thanks,
> Eugen
>

Reply via email to