Re: correct/best way to install custom spark1.2 on cdh5.3.0?
I installed the custom as a standalone mode as normal. The master and slaves started successfully. However, I got error when I ran a job. It seems to me from the error message the some library was compiled against hadoop1, but my spark was compiled against hadoop2. 15/01/08 23:27:36 INFO ClientCnxn: Opening socket connection to server master/10.191.41.253:2181. Will not attempt to authenticate using SASL (unknown error) 15/01/08 23:27:36 INFO ClientCnxn: Socket connection established to master/10.191.41.253:2181, initiating session 15/01/08 23:27:36 INFO ClientCnxn: Session establishment complete on server master/10.191.41.253:2181, sessionid = 0x14acbdae7e60022, negotiated timeout = 6 Traceback (most recent call last): File /root/workspace/test/sparkhbase.py, line 23, in module conf=conf2) File /root/spark/python/pyspark/context.py, line 530, in newAPIHadoopRDD jconf, batchSize) File /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:157) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.RDD.take(RDD.scala:1060) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) at org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) If I understand correctly, the org.apache.hadoop.mapreduce.JobContext in hadoop1 is a class, but is a interface in hadoop2. My question is which library could cause this problem. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21046.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: correct/best way to install custom spark1.2 on cdh5.3.0?
I ran this with CDH 5.2 without a problem (sorry don't have 5.3 readily available at the moment): $ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*' $ spark-submit --driver-class-path $HBASE --conf spark.executor.extraClassPath=$HBASE --master yarn --class org.apache.spark.examples.HBaseTest /opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.1.0-cdh5.2.2-SNAPSHOT-hadoop2.5.0-cdh5.2.2-SNAPSHOT.jar t1 Seems to me like you still have some wrong Spark build somewhere in your environment getting in the way. On Thu, Jan 8, 2015 at 4:15 PM, freedafeng freedaf...@yahoo.com wrote: I ran the release spark in cdh5.3.0 but got the same error. Anyone tried to run spark in cdh5.3.0 using its newAPIHadoopRDD? command: spark-submit --master spark://master:7077 --jars /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar ./sparkhbase.py Error. 2015-01-09 00:02:03,344 INFO [Thread-2-SendThread(master:2181)] zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to master/10.191.41.253:2181, initiating session 2015-01-09 00:02:03,358 INFO [Thread-2-SendThread(master:2181)] zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session establishment complete on server master/10.191.41.253:2181, sessionid = 0x14acbdae7e60066, negotiated timeout = 6 Traceback (most recent call last): File /root/workspace/test/./sparkhbase.py, line 23, in module conf=conf2) File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/pyspark/context.py, line 530, in newAPIHadoopRDD jconf, batchSize) File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.RDD.take(RDD.scala:1060) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) at org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21047.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: correct/best way to install custom spark1.2 on cdh5.3.0?
I ran the release spark in cdh5.3.0 but got the same error. Anyone tried to run spark in cdh5.3.0 using its newAPIHadoopRDD? command: spark-submit --master spark://master:7077 --jars /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar ./sparkhbase.py Error. 2015-01-09 00:02:03,344 INFO [Thread-2-SendThread(master:2181)] zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to master/10.191.41.253:2181, initiating session 2015-01-09 00:02:03,358 INFO [Thread-2-SendThread(master:2181)] zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session establishment complete on server master/10.191.41.253:2181, sessionid = 0x14acbdae7e60066, negotiated timeout = 6 Traceback (most recent call last): File /root/workspace/test/./sparkhbase.py, line 23, in module conf=conf2) File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/pyspark/context.py, line 530, in newAPIHadoopRDD jconf, batchSize) File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.RDD.take(RDD.scala:1060) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) at org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21047.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: correct/best way to install custom spark1.2 on cdh5.3.0?
On Thu, Jan 8, 2015 at 3:33 PM, freedafeng freedaf...@yahoo.com wrote: I installed the custom as a standalone mode as normal. The master and slaves started successfully. However, I got error when I ran a job. It seems to me from the error message the some library was compiled against hadoop1, but my spark was compiled against hadoop2. Is that using your build or the CDH build? It seems you have the wrong HBase dependency. I'd be surprised if the CDH build had that problem, since IIRC we don't even build HBase for hadoop 1 anymore. Take a look at examples/pom.xml in Spark for an example, which has different profiles for dealing with the HBase builds for hadoop1 and hadoop2. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
correct/best way to install custom spark1.2 on cdh5.3.0?
Could anyone come up with your experience on how to do this? I have created a cluster and installed cdh5.3.0 on it with basically core + Hbase. but cloudera installed and configured the spark in its parcels anyway. I'd like to install our custom spark on this cluster to use the hadoop and hbase service there. There could be potentially conflicts if this is not done correctly. Library conflicts are what I worry most. I understand this is a special case. but if you know how to do it, please let me know. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: correct/best way to install custom spark1.2 on cdh5.3.0?
Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org. But the question I'd like to ask is: why do you need your own Spark build? What's wrong with CDH's Spark that it doesn't work for you? On Thu, Jan 8, 2015 at 3:01 PM, freedafeng freedaf...@yahoo.com wrote: Could anyone come up with your experience on how to do this? I have created a cluster and installed cdh5.3.0 on it with basically core + Hbase. but cloudera installed and configured the spark in its parcels anyway. I'd like to install our custom spark on this cluster to use the hadoop and hbase service there. There could be potentially conflicts if this is not done correctly. Library conflicts are what I worry most. I understand this is a special case. but if you know how to do it, please let me know. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org