Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread freedafeng
I installed the custom as a standalone mode as normal. The master and slaves
started successfully. 
However, I got error when I ran a job. It seems to me from the error message
the some library was compiled against hadoop1, but my spark was compiled
against hadoop2. 

15/01/08 23:27:36 INFO ClientCnxn: Opening socket connection to server
master/10.191.41.253:2181. Will not attempt to authenticate using SASL
(unknown error)
15/01/08 23:27:36 INFO ClientCnxn: Socket connection established to
master/10.191.41.253:2181, initiating session
15/01/08 23:27:36 INFO ClientCnxn: Session establishment complete on server
master/10.191.41.253:2181, sessionid = 0x14acbdae7e60022, negotiated timeout
= 6
Traceback (most recent call last):
  File /root/workspace/test/sparkhbase.py, line 23, in module
conf=conf2)
  File /root/spark/python/pyspark/context.py, line 530, in newAPIHadoopRDD
jconf, batchSize)
  File /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
line 538, in __call__
  File /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.JobContext, but class was expected
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:157)
at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.RDD.take(RDD.scala:1060)
at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
at
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)

If I understand correctly, the org.apache.hadoop.mapreduce.JobContext in
hadoop1 is a class, but is a interface in hadoop2. My question is which
library could cause this problem. 

Thanks.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21046.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
I ran this with CDH 5.2 without a problem (sorry don't have 5.3
readily available at the moment):

$ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*'
$ spark-submit --driver-class-path $HBASE --conf
spark.executor.extraClassPath=$HBASE --master yarn --class
org.apache.spark.examples.HBaseTest
/opt/cloudera/parcels/CDH/lib/spark/examples/lib/spark-examples-1.1.0-cdh5.2.2-SNAPSHOT-hadoop2.5.0-cdh5.2.2-SNAPSHOT.jar
t1

Seems to me like you still have some wrong Spark build somewhere in
your environment getting in the way.


On Thu, Jan 8, 2015 at 4:15 PM, freedafeng freedaf...@yahoo.com wrote:
 I ran the release spark in cdh5.3.0 but got the same error. Anyone tried to
 run spark in cdh5.3.0 using its newAPIHadoopRDD?

 command:
 spark-submit --master spark://master:7077 --jars
 /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
 ./sparkhbase.py

 Error.

 2015-01-09 00:02:03,344 INFO  [Thread-2-SendThread(master:2181)]
 zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket
 connection established to master/10.191.41.253:2181, initiating session
 2015-01-09 00:02:03,358 INFO  [Thread-2-SendThread(master:2181)]
 zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session
 establishment complete on server master/10.191.41.253:2181, sessionid =
 0x14acbdae7e60066, negotiated timeout = 6
 Traceback (most recent call last):
   File /root/workspace/test/./sparkhbase.py, line 23, in module
 conf=conf2)
   File
 /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/pyspark/context.py,
 line 530, in newAPIHadoopRDD
 jconf, batchSize)
   File
 /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
 line 538, in __call__
   File
 /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
 line 300, in get_return_value
 py4j.protocol.Py4JJavaError: An error occurred while calling
 z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
 : java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.JobContext, but class was expected
 at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at 
 org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
 at org.apache.spark.rdd.RDD.take(RDD.scala:1060)
 at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
 at
 org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
 at
 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
 at 
 org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21047.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread freedafeng
I ran the release spark in cdh5.3.0 but got the same error. Anyone tried to
run spark in cdh5.3.0 using its newAPIHadoopRDD? 

command: 
spark-submit --master spark://master:7077 --jars
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
./sparkhbase.py

Error.

2015-01-09 00:02:03,344 INFO  [Thread-2-SendThread(master:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket
connection established to master/10.191.41.253:2181, initiating session
2015-01-09 00:02:03,358 INFO  [Thread-2-SendThread(master:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session
establishment complete on server master/10.191.41.253:2181, sessionid =
0x14acbdae7e60066, negotiated timeout = 6
Traceback (most recent call last):
  File /root/workspace/test/./sparkhbase.py, line 23, in module
conf=conf2)
  File
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/pyspark/context.py,
line 530, in newAPIHadoopRDD
jconf, batchSize)
  File
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
line 538, in __call__
  File
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.JobContext, but class was expected
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.RDD.take(RDD.scala:1060)
at org.apache.spark.rdd.RDD.first(RDD.scala:1093)
at
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045p21047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
On Thu, Jan 8, 2015 at 3:33 PM, freedafeng freedaf...@yahoo.com wrote:
 I installed the custom as a standalone mode as normal. The master and slaves
 started successfully.
 However, I got error when I ran a job. It seems to me from the error message
 the some library was compiled against hadoop1, but my spark was compiled
 against hadoop2.

Is that using your build or the CDH build?

It seems you have the wrong HBase dependency. I'd be surprised if the
CDH build had that problem, since IIRC we don't even build HBase for
hadoop 1 anymore.

Take a look at examples/pom.xml in Spark for an example, which has
different profiles for dealing with the HBase builds for hadoop1 and
hadoop2.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread freedafeng
Could anyone come up with your experience on how to do this? 

I have created a cluster and installed cdh5.3.0 on it with basically core +
Hbase. but cloudera installed and configured the spark in its parcels
anyway. I'd like to install our custom spark on this cluster to use the
hadoop and hbase service there. There could be potentially conflicts if this
is not done correctly. Library conflicts are what I worry most.

I understand this is a special case. but if you know how to do it, please
let me know. Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org.

But the question I'd like to ask is: why do you need your own Spark
build? What's wrong with CDH's Spark that it doesn't work for you?

On Thu, Jan 8, 2015 at 3:01 PM, freedafeng freedaf...@yahoo.com wrote:
 Could anyone come up with your experience on how to do this?

 I have created a cluster and installed cdh5.3.0 on it with basically core +
 Hbase. but cloudera installed and configured the spark in its parcels
 anyway. I'd like to install our custom spark on this cluster to use the
 hadoop and hbase service there. There could be potentially conflicts if this
 is not done correctly. Library conflicts are what I worry most.

 I understand this is a special case. but if you know how to do it, please
 let me know. Thanks.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/correct-best-way-to-install-custom-spark1-2-on-cdh5-3-0-tp21045.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org