[ https://issues.apache.org/jira/browse/SPARK-29088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930005#comment-16930005 ]
JP Bordenave commented on SPARK-29088: -------------------------------------- may be for close, it can be a question of order jar loading it add in the classpath the lz4-java at first , spark/conf second, spark/jars/* after park Executor Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/spark/jars/lz4-java-1.4.0.jar:/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=33107" "-Dspark.hadoop.hbase.master.port=16000" "-Dspark.hadoop.hbase.regionserver.port=16020" "-Dspark.hadoop.hbase.rest.port=8080" "-Dspark.hadoop.hbase.status.multicast.address.port=16100" "-Dspark.rpc.askTimeout=10s" "-Dspark.hadoop.hbase.regionserver.info.port=16030" "-Dspark.hadoop.hbase.master.info.port=16010" "-Dhive.spark.log.dir=/opt/spark/logs/" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@sparkjp:33107" "--executor-id" "7" "--hostname" "192.168.0.30" "--cores" "1" "--app-id" "app-20190915182117-0000" "--worker-url" "spark://Worker@192.168.0.30:35885" > Hive 2.3.6 / HDP 2.7.7 / spark 2.4.4 lz4-java.jar, insert fail with MR > spark engine mode , work fine with hadoop mode > ------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-29088 > URL: https://issues.apache.org/jira/browse/SPARK-29088 > Project: Spark > Issue Type: Bug > Components: Deploy > Affects Versions: 2.4.4 > Environment: linux ubuntu 18.04 standalone > hive 2.3.6 > mysql 5.7.27 > hadoop 2.7.7 > spark 2.4.4 > lz4-java.jar dependencies added in hive/lib and spark/jars > spark/jars added on hdfs spark-jars/ > Reporter: JP Bordenave > Priority: Critical > > hello, > i install hadoop 2.7.7 work fine > i install hive 2.3.6, work fine with hadoop 2.7.7, the lz-1.3.0.jar was > replaced by lz-java-1.4.0 jar from spark/jars because risk conflict class > loader, the version 1.4.0 look compatible with old methods and not > disturbed by new features > hive is configured with mysql 5.7.27 > i install spark 2.4.4 > i configure hive-site.xml of hive/conf with spark engine and i copy then to > spark/conf > <property> > <name>hive.execution.engine</name> > <value>spark</value> > <description>Use Map Reduce as default execution engine</description> > </property> > <property> > <name>spark.master</name> > <value>spark://192.168.0.30:7077</value> > </property> > <property> > <name>spark.eventLog.enabled</name> > <value>true</value> > </property> > <property> > <name>spark.eventLog.dir</name> > <value>/tmp</value> > </property> > <property> > <name>spark.serializer</name> > <value>org.apache.spark.serializer.KryoSerializer</value> > </property> > <property> > <name>spark.yarn.jars</name> > <value>hdfs://192.168.0.30:54310/spark-jars/*</value> > </property> > <property> > <name>system:java.io.tmpdir</name> > <value>/tmp/hive/java</value> > </property> > <property> > <name>system:user.name</name> > <value>${user.name}</value> > </property> > </configuration> > ~ > when i start hive with spark engine (hive work fine in context hadoop) > i can use show table > i can use query select * from employee ; > lwork fine > but when i use insert > i go fail, > Job failed with java.lang.NoSuchMethodError: > net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: > i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in > hive/lib > i have no more lz-1.3.0.jar, but it can't find the new method of > lZ4-java-1.4.0( (Ljava/io/InputStream;Z) in the spark worker > i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into > spark/jars > i add all jars spark-2.4.4/jars/* to hadoop 2.7.7 hdsf /spark-jars/ > the worker driver log use the jar hive-exec-2.3.6.jar > i forget something todo ? it dont see where is the proble . the > lz4-java-1.4.0 jar is present and the method called exist in lz4-java-1.4.0, > i have no more lz-1.3.0.jar, i have no conflict in configuration hadoop+hive > mode, with using dependency lz4-java-1.4.0 > Thanks for your remarks, because i have no more idea where found solution. > that seem fail in the map worker of spark engine, i must add somewhere je the > jars lz4-java in some extra classpath somewhere ?. > some stacks ? also present into logs > {noformat} > > SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains > multiple SLF4J bindings.SLF4J: Found binding in > [jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: > Found binding in > [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: > See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation.SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory] > Logging initialized using configuration in > file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: > truehive> select * from employee > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 > Dana IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 > seconds, Fetched: 7 row(s)hive> insert into employee > values("10","Pierre","xXx");Query ID = > spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs = > 1Launching Job 1 out of 1In order to change the average load for a reducer > (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to > limit the maximum number of reducers: set hive.exec.reducers.max=<number>In > order to set a constant number of reducers: set > mapreduce.job.reduces=<number>Starting Spark Job = > 6b9db937-53d2-4d45-84b2-8e5c6427d9d3 > Query Hive on Spark job[0] stages: [0] > Status: Running (Hive on Spark > job[0])-------------------------------------------------------------------------------------- > STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING > FAILED > --------------------------------------------------------------------------------------Stage-0 > 0 RUNNING 1 0 0 1 1 > > --------------------------------------------------------------------------------------STAGES: > 00/01 [>>--------------------------] 0% ELAPSED TIME: 3,02 s > --------------------------------------------------------------------------------------Job > failed with java.lang.NoSuchMethodError: > net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: > Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. > java.util.concurrent.ExecutionException: Exception thrown by job at > org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at > org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362) > at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError: > net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at > org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) > at scala.Option.map(Option.scala:146) at > org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235) > at scala.Option.getOrElse(Option.scala:121) at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at > org.apache.spark.scheduler.Task.run(Task.scala:123) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) > at scala.Option.foreach(Option.scala:257) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: > java.lang.NoSuchMethodError: > net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at > org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) > at scala.Option.map(Option.scala:146) at > org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235) > at scala.Option.getOrElse(Option.scala:121) at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at > org.apache.spark.scheduler.Task.run(Task.scala:123) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > do you have sample configuration spark 2.4.4 with hive 2.3.6 somewhere ? lot > of tutorials are not more up to date, thank a lot. > JP > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org