Re: pyspark cassandra examples
Thanks, that worked! I downloaded the version pre-built against hadoop1 and the examples worked. - David On Tue, Sep 30, 2014 at 5:08 PM, Kan Zhang wrote: > > java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.JobContext, but class was expected > > Most likely it is the Hadoop 1 vs Hadoop 2 issue. The example was given > for Hadoop 1 (default Hadoop version for Spark). You may try to set the > output format class in conf for Hadoop 2, or recompile your Spark with > Hadoop 1. > > On Tue, Sep 30, 2014 at 11:37 AM, David Vincelli < > david.vince...@vantageanalytics.com> wrote: > >> I've been trying to get the cassandra_inputformat.py and >> cassandra_outputformat.py examples running for the past half day. I am >> running cassandra21 community from datastax on a single node (in my dev >> environment) with spark-1.1.0-bin-hadoop2.4. >> >> I can connect and use cassandra via cqlsh and I can run the pyspark >> computation of pi job. >> >> Unfortunately, I cannot run the cassandra_inputformat and >> cassandra_outputformat examples succesfully. >> >> This is the output I am getting now: >> >> 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver: >> akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver >> 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is >> deprecated. Instead, use mapreduce.job.outputformat.class >> 14/09/30 18:15:43 INFO Converter: Loaded converter: >> org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter >> 14/09/30 18:15:43 INFO Converter: Loaded converter: >> org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter >> Traceback (most recent call last): >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py", >> line 83, in >> >> valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter") >> File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184, >> in saveAsNewAPIHadoopDataset >> keyConverter, valueConverter, True) >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >> line 538, in __call__ >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", >> line 300, in get_return_value >> py4j.protocol.Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. >> : java.lang.IncompatibleClassChangeError: Found interface >> org.apache.hadoop.mapreduce.JobContext, but class was expected >> at >> org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75) >> at >> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) >> at >> org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687) >> at >> org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >> at py4j.Gateway.invoke(Gateway.java:259) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:207) >> at java.lang.Thread.run(Thread.java:745) >> >> Should I have built a custom spark assembly? Am I missing a cassandra >> driver? I have browsed through the documentation and found nothing >> specifically relevant to cassandra, is there such a piece of documentation? >> >> Thank you, >> >> - David >> > >
Re: pyspark cassandra examples
> java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected Most likely it is the Hadoop 1 vs Hadoop 2 issue. The example was given for Hadoop 1 (default Hadoop version for Spark). You may try to set the output format class in conf for Hadoop 2, or recompile your Spark with Hadoop 1. On Tue, Sep 30, 2014 at 11:37 AM, David Vincelli < david.vince...@vantageanalytics.com> wrote: > I've been trying to get the cassandra_inputformat.py and > cassandra_outputformat.py examples running for the past half day. I am > running cassandra21 community from datastax on a single node (in my dev > environment) with spark-1.1.0-bin-hadoop2.4. > > I can connect and use cassandra via cqlsh and I can run the pyspark > computation of pi job. > > Unfortunately, I cannot run the cassandra_inputformat and > cassandra_outputformat examples succesfully. > > This is the output I am getting now: > > 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver: > akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver > 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is > deprecated. Instead, use mapreduce.job.outputformat.class > 14/09/30 18:15:43 INFO Converter: Loaded converter: > org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter > 14/09/30 18:15:43 INFO Converter: Loaded converter: > org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter > Traceback (most recent call last): > File > "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py", > line 83, in > > valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter") > File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184, > in saveAsNewAPIHadoopDataset > keyConverter, valueConverter, True) > File > "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. > : java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.JobContext, but class was expected > at > org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) > at > org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687) > at > org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > Should I have built a custom spark assembly? Am I missing a cassandra > driver? I have browsed through the documentation and found nothing > specifically relevant to cassandra, is there such a piece of documentation? > > Thank you, > > - David >
pyspark cassandra examples
I've been trying to get the cassandra_inputformat.py and cassandra_outputformat.py examples running for the past half day. I am running cassandra21 community from datastax on a single node (in my dev environment) with spark-1.1.0-bin-hadoop2.4. I can connect and use cassandra via cqlsh and I can run the pyspark computation of pi job. Unfortunately, I cannot run the cassandra_inputformat and cassandra_outputformat examples succesfully. This is the output I am getting now: 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/09/30 18:15:43 INFO Converter: Loaded converter: org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter 14/09/30 18:15:43 INFO Converter: Loaded converter: org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter Traceback (most recent call last): File "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py", line 83, in valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter") File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184, in saveAsNewAPIHadoopDataset keyConverter, valueConverter, True) File "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) at org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687) at org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Should I have built a custom spark assembly? Am I missing a cassandra driver? I have browsed through the documentation and found nothing specifically relevant to cassandra, is there such a piece of documentation? Thank you, - David