Using lz4 in Kafka seems to be broken by jpountz dependency upgrade in Spark 1.5.x+

Stefan Schadwinkel Tue, 12 Jan 2016 01:31:05 -0800

Hi all,

we'd like to upgrade one of our Spark jobs from 1.4.1 to 1.5.2 (we run
Spark on Amazon EMR).


The job consumes and pushes lz4 compressed data from/to Kafka.

When upgrading to 1.5.2 everything works fine, except we get the following
exception:

java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange([BII)V
at
org.apache.kafka.common.message.KafkaLZ4BlockOutputStream.write(KafkaLZ4BlockOutputStream.java:179)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
at org.apache.kafka.common.record.Compressor.putLong(Compressor.java:132)
at
org.apache.kafka.common.record.MemoryRecords.append(MemoryRecords.java:85)
at
org.apache.kafka.clients.producer.internals.RecordBatch.tryAppend(RecordBatch.java:63)
at
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:171)
at
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:338)


Some digging yields:

- The net.jpountz.lz4/lz4 dependency was upgraded from 1.2.0 to 1.3.0 in
Spark 1.5+ to fix some issues with IBM JDK:
https://issues.apache.org/jira/browse/SPARK-7063

- The net.jpountz.lz4 1.3.0 version refactored net.jpountz.util.Utils to
net.jpountz.util.SafeUtils, thus yielding the above inconsistency:
https://github.com/jpountz/lz4-java/blob/1.3.0/src/java/net/jpountz/util/SafeUtils.java

- Kafka on github up to 0.9.0.0 uses jpountz 1.2.0 (
https://github.com/apache/kafka/blob/0.9.0.0/build.gradle), however, the
latest trunk upgrades to 1.3.0.


Patching Kafka to use SafeUtils is easy and creating Jar for our projects
that includes the correct depencies as well, the option of compiling Spark
1.5.x with jpountz 1.2.0 should also work, but I didn't try yet.

The main problem is that Spark 1.5.x+ and all Kafka 0.8 releases are
incompatible in regard to lz4 compression and we would like to avoid
provisioning EMR with a custom Spark through bootstrapping due to the
operational overhead.

One could try to play with the classpath and a Jar file with compatible
dependencies, but I was wondering if nobody else uses Kafka with lz4 and
Spark and has run into the same issue?

Maybe there's also an easier way to reconcile the situation?

BTW: There's a similar issue regarding Druid as well, but no reconciliation
beyond patching Kafka was discussed:
https://groups.google.com/forum/#!topic/druid-user/ZW_Clovf42k

Any input would be highly appreciated.


Best regards,
Stefan


-- 

*Dr. Stefan Schadwinkel*
Senior Big Data Developer
stefan.schadwin...@smaato.com




Smaato Inc.
San Francisco – New York - Hamburg - Singapore
www.smaato.com





Germany:
Valentinskamp 70, Emporio, 19th Floor

20355 Hamburg


T          +49 (40) 3480 949 0
F          +49 (40) 492 19 055



The information contained in this communication may be CONFIDENTIAL and is
intended only for the use of the recipient(s) named above. If you are not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of its contents, is
strictly prohibited. If you have received this communication in error,
please notify the sender and delete/destroy the original message and any
copy of it from your computer or paper files.

Using lz4 in Kafka seems to be broken by jpountz dependency upgrade in Spark 1.5.x+

Reply via email to