[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199666#comment-16199666 ] Swaapnika Guntaka commented on SPARK-1911: -- Does this issue still exist with Spark-2.2.? > Warn users if their assembly jars are not built with Java 6 > --- > > Key: SPARK-1911 > URL: https://issues.apache.org/jira/browse/SPARK-1911 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.1.0 >Reporter: Andrew Or >Assignee: Sean Owen > Fix For: 1.2.2, 1.3.0 > > > The root cause of the problem is detailed in: > https://issues.apache.org/jira/browse/SPARK-1520. > In short, an assembly jar built with Java 7+ is not always accessible by > Python or other versions of Java (especially Java 6). If the assembly jar is > not built on the cluster itself, this problem may manifest itself in strange > exceptions that are not trivial to debug. This is an issue especially for > PySpark on YARN, which relies on the python files included within the > assembly jar. > Currently we warn users only in make-distribution.sh, but most users build > the jars directly. At the very least we need to emphasize this in the docs > (currently missing entirely). The next step is to add a warning prompt in the > mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502589#comment-14502589 ] Steve Loughran commented on SPARK-1911: --- This doesn't fix the problem, merely documents it. It should be doable by using Ant's zip task, which doesn't use the JDK zip routines. The assembly would be unzipped first, then zipped with zip63 option set to never see [https://ant.apache.org/manual/Tasks/zip.html] Warn users if their assembly jars are not built with Java 6 --- Key: SPARK-1911 URL: https://issues.apache.org/jira/browse/SPARK-1911 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Sean Owen Fix For: 1.2.2, 1.3.0 The root cause of the problem is detailed in: https://issues.apache.org/jira/browse/SPARK-1520. In short, an assembly jar built with Java 7+ is not always accessible by Python or other versions of Java (especially Java 6). If the assembly jar is not built on the cluster itself, this problem may manifest itself in strange exceptions that are not trivial to debug. This is an issue especially for PySpark on YARN, which relies on the python files included within the assembly jar. Currently we warn users only in make-distribution.sh, but most users build the jars directly. At the very least we need to emphasize this in the docs (currently missing entirely). The next step is to add a warning prompt in the mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502590#comment-14502590 ] Sean Owen commented on SPARK-1911: -- [~ste...@apache.org] Yeah this is mostly duplicating https://issues.apache.org/jira/browse/SPARK-1703 which has an actual check and warning. I think this JIRA/PR ended up just being about the follow-on doc change. Warn users if their assembly jars are not built with Java 6 --- Key: SPARK-1911 URL: https://issues.apache.org/jira/browse/SPARK-1911 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Sean Owen Fix For: 1.2.2, 1.3.0 The root cause of the problem is detailed in: https://issues.apache.org/jira/browse/SPARK-1520. In short, an assembly jar built with Java 7+ is not always accessible by Python or other versions of Java (especially Java 6). If the assembly jar is not built on the cluster itself, this problem may manifest itself in strange exceptions that are not trivial to debug. This is an issue especially for PySpark on YARN, which relies on the python files included within the assembly jar. Currently we warn users only in make-distribution.sh, but most users build the jars directly. At the very least we need to emphasize this in the docs (currently missing entirely). The next step is to add a warning prompt in the mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346749#comment-14346749 ] Apache Spark commented on SPARK-1911: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4888 Warn users if their assembly jars are not built with Java 6 --- Key: SPARK-1911 URL: https://issues.apache.org/jira/browse/SPARK-1911 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or The root cause of the problem is detailed in: https://issues.apache.org/jira/browse/SPARK-1520. In short, an assembly jar built with Java 7+ is not always accessible by Python or other versions of Java (especially Java 6). If the assembly jar is not built on the cluster itself, this problem may manifest itself in strange exceptions that are not trivial to debug. This is an issue especially for PySpark on YARN, which relies on the python files included within the assembly jar. Currently we warn users only in make-distribution.sh, but most users build the jars directly. At the very least we need to emphasize this in the docs (currently missing entirely). The next step is to add a warning prompt in the mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345149#comment-14345149 ] Sean Owen commented on SPARK-1911: -- [~andrewor14] {{compute-classpath.sh}} will now show a warning in this situation (cf. SPARK-1703). I will send a PR for the doc change. Is this the same issue as SPARK-1753? Warn users if their assembly jars are not built with Java 6 --- Key: SPARK-1911 URL: https://issues.apache.org/jira/browse/SPARK-1911 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or The root cause of the problem is detailed in: https://issues.apache.org/jira/browse/SPARK-1520. In short, an assembly jar built with Java 7+ is not always accessible by Python or other versions of Java (especially Java 6). If the assembly jar is not built on the cluster itself, this problem may manifest itself in strange exceptions that are not trivial to debug. This is an issue especially for PySpark on YARN, which relies on the python files included within the assembly jar. Currently we warn users only in make-distribution.sh, but most users build the jars directly. At the very least we need to emphasize this in the docs (currently missing entirely). The next step is to add a warning prompt in the mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345151#comment-14345151 ] Apache Spark commented on SPARK-1911: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/4874 Warn users if their assembly jars are not built with Java 6 --- Key: SPARK-1911 URL: https://issues.apache.org/jira/browse/SPARK-1911 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or The root cause of the problem is detailed in: https://issues.apache.org/jira/browse/SPARK-1520. In short, an assembly jar built with Java 7+ is not always accessible by Python or other versions of Java (especially Java 6). If the assembly jar is not built on the cluster itself, this problem may manifest itself in strange exceptions that are not trivial to debug. This is an issue especially for PySpark on YARN, which relies on the python files included within the assembly jar. Currently we warn users only in make-distribution.sh, but most users build the jars directly. At the very least we need to emphasize this in the docs (currently missing entirely). The next step is to add a warning prompt in the mvn scripts whenever Java 7+ is detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262552#comment-14262552 ] naveen kumar commented on SPARK-1911: - Hi, I have built spark assembly jar with java 6 using make-distribution.sh and started spark cluster on 2 nodes(which are on unix boxes) I am able to execute java programs on the cluster. Now I am able to connect to the cluster from my windows machine using pyspark interactive shell Bin pyspark –master spark://master:7078 And then I am trying to execute following commands at interactive shell lines = sc.textFile(hdfs://master/data/spark/SINGLE.TXT) lineLengths = lines.map(lambda s: len(s)) totalLength = lineLengths.reduce(lambda a, b: a + b) It is throwing the following error Traceback (most recent call last): File , line 1, in File C:\Users\npokala\Downloads\spark-java\spark-master\python\pyspark\rdd.py, line 715, in reduce vals = self.mapPartitions(func).collect() File C:\Users\npokala\Downloads\spark-java\spark-master\python\pyspark\rdd.py, line 676, in collect bytesInJava = self.jrdd.collect().iterator() File C:\Users\npokala\Downloads\spark-java\spark-master\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py, line 538, in __call_ 15/01/01 18:29:07 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on master:34586 (size: 3.9 KB, free: 1060.0 MB) File C:\Users\npokala\Downloads\spark-java\spark-master\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o24.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, master): org.apache.spark.SparkException: Error from python worker: python: module pyspark.daemon not found PYTHONPATH was: /home/npokala/data/spark-install/spark-java-1.6/spark-master/python:/home/npokala/data/spark-install/spark-java-1.6/spark-master/python/lib/py4j-0.8.2.1-src.zip:/home/npokala/data/spark-install/spark-java-1.6/spark-master/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.4.0.jar:/home/npokala/data/spark-install/spark-java-1.6/spark-master/sbin/../python/lib/py4j-0.8.2.1-src.zip:/home/npokala/data/spark-install/spark-java-1.6/spark-master/sbin/../python: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:102) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:265) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at