subject:"spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD"

spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as 
yarn-client) - pretty much the standard case demonstrated in the 
hbase_inputformat.py from examples... the thing is the when trying the very 
same code on spark 1.2 I am getting the error bellow which based on similar 
cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving 
the classpath which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.: 
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.RDD.take(RDD.scala:1060) at 
org.apache.spark.rdd.RDD.first(RDD.scala:1093) at 
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at 
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:207) at 
java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Sean Owen

Problems like this are always due to having code compiled for Hadoop 1.x
run against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at
runtime Hadoop 2.x is used.

A common cause is actually bundling Spark / Hadoop classes with your app,
when the app should just use the Spark / Hadoop provided by the cluster. It
could also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x
cluster.

On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid
wrote:

 Hi,

 I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running
 as yarn-client) - pretty much the standard case demonstrated in the
 hbase_inputformat.py from examples... the thing is the when trying the very
 same code on spark 1.2 I am getting the error bellow which based on similar
 cases on another forums suggest incompatibility between MR1 and MR2.

 why would this now start happening? is that due to some changes in
 resolving the classpath which now picks up MR2 jars first while before it
 was MR1?

 is there any workaround for this?

 thanks,
 Antony.

 the error:

 py4j.protocol.Py4JJavaError: An error occurred while calling
 z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. :
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.JobContext, but class was expected at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.RDD.take(RDD.scala:1060) at
 org.apache.spark.rdd.RDD.first(RDD.scala:1093) at
 org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
 at
 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
 at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at
 py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
 py4j.Gateway.invoke(Gateway.java:259) at
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
 py4j.commands.CallCommand.execute(CallCommand.java:79) at
 py4j.GatewayConnection.run(GatewayConnection.java:207) at
 java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Shixiong Zhu

I have not used CDH5.3.0. But looks
spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar contains some
hadoop1 jars (come from a wrong hbase version).

I don't know the recommanded way to build spark-examples jar because the
official Spark docs does not mention how to build spark-examples jar. For
me, I will addd -Dhbase.profile=hadoop2 to the build instruction so that
the examples project will use a haoop2-compatible hbase.

Best Regards,
Shixiong Zhu

2015-01-08 0:30 GMT+08:00 Antony Mayi antonym...@yahoo.com.invalid:

 thanks, I found the issue, I was including 
 /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar into
 the classpath - this was breaking it. now using custom jar with just the
 python convertors and all works as a charm.
 thanks,
 antony.


   On Wednesday, 7 January 2015, 23:57, Sean Owen so...@cloudera.com
 wrote:



 Yes, the distribution is certainly fine and built for Hadoop 2. It sounds
 like you are inadvertently including Spark code compiled for Hadoop 1 when
 you run your app. The general idea is to use the cluster's copy at runtime.
 Those with more pyspark experience might be able to give more useful
 directions about how to fix that.

 On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi antonym...@yahoo.com wrote:

 this is official cloudera compiled stack cdh 5.3.0 - nothing has been done
 by me and I presume they are pretty good in building it so I still suspect
 it now gets the classpath resolved in different way?

 thx,
 Antony.


   On Wednesday, 7 January 2015, 18:55, Sean Owen so...@cloudera.com
 wrote:



 Problems like this are always due to having code compiled for Hadoop 1.x
 run against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at
 runtime Hadoop 2.x is used.

 A common cause is actually bundling Spark / Hadoop classes with your app,
 when the app should just use the Spark / Hadoop provided by the cluster. It
 could also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x
 cluster.

 On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid
 wrote:

 Hi,

 I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running
 as yarn-client) - pretty much the standard case demonstrated in the
 hbase_inputformat.py from examples... the thing is the when trying the very
 same code on spark 1.2 I am getting the error bellow which based on similar
 cases on another forums suggest incompatibility between MR1 and MR2.

 why would this now start happening? is that due to some changes in
 resolving the classpath which now picks up MR2 jars first while before it
 was MR1?

 is there any workaround for this?

 thanks,
 Antony.

 the error:

 py4j.protocol.Py4JJavaError: An error occurred while calling
 z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. :
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.JobContext, but class was expected at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.RDD.take(RDD.scala:1060) at
 org.apache.spark.rdd.RDD.first(RDD.scala:1093) at
 org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
 at
 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
 at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at
 py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
 py4j.Gateway.invoke(Gateway.java:259) at
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
 py4j.commands.CallCommand.execute(CallCommand.java:79) at
 py4j.GatewayConnection.run(GatewayConnection.java:207) at
 java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Sean Owen

Yes, the distribution is certainly fine and built for Hadoop 2. It sounds
like you are inadvertently including Spark code compiled for Hadoop 1 when
you run your app. The general idea is to use the cluster's copy at runtime.
Those with more pyspark experience might be able to give more useful
directions about how to fix that.

On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi antonym...@yahoo.com wrote:

 this is official cloudera compiled stack cdh 5.3.0 - nothing has been done
 by me and I presume they are pretty good in building it so I still suspect
 it now gets the classpath resolved in different way?

 thx,
 Antony.


   On Wednesday, 7 January 2015, 18:55, Sean Owen so...@cloudera.com
 wrote:



 Problems like this are always due to having code compiled for Hadoop 1.x
 run against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at
 runtime Hadoop 2.x is used.

 A common cause is actually bundling Spark / Hadoop classes with your app,
 when the app should just use the Spark / Hadoop provided by the cluster. It
 could also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x
 cluster.

 On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid
 wrote:

 Hi,

 I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running
 as yarn-client) - pretty much the standard case demonstrated in the
 hbase_inputformat.py from examples... the thing is the when trying the very
 same code on spark 1.2 I am getting the error bellow which based on similar
 cases on another forums suggest incompatibility between MR1 and MR2.

 why would this now start happening? is that due to some changes in
 resolving the classpath which now picks up MR2 jars first while before it
 was MR1?

 is there any workaround for this?

 thanks,
 Antony.

 the error:

 py4j.protocol.Py4JJavaError: An error occurred while calling
 z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. :
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.JobContext, but class was expected at
 org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at
 org.apache.spark.rdd.RDD.take(RDD.scala:1060) at
 org.apache.spark.rdd.RDD.first(RDD.scala:1093) at
 org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202)
 at
 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500)
 at org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at
 py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at
 py4j.Gateway.invoke(Gateway.java:259) at
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
 py4j.commands.CallCommand.execute(CallCommand.java:79) at
 py4j.GatewayConnection.run(GatewayConnection.java:207) at
 java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

thanks, I found the issue, I was including 
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar into 
the classpath - this was breaking it. now using custom jar with just the python 
convertors and all works as a charm.thanks,antony. 

 On Wednesday, 7 January 2015, 23:57, Sean Owen so...@cloudera.com wrote:
   
 

 Yes, the distribution is certainly fine and built for Hadoop 2. It sounds like 
you are inadvertently including Spark code compiled for Hadoop 1 when you run 
your app. The general idea is to use the cluster's copy at runtime. Those with 
more pyspark experience might be able to give more useful directions about how 
to fix that.
On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi antonym...@yahoo.com wrote:

this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by 
me and I presume they are pretty good in building it so I still suspect it now 
gets the classpath resolved in different way?
thx,Antony. 

 On Wednesday, 7 January 2015, 18:55, Sean Owen so...@cloudera.com wrote:
   
 

 Problems like this are always due to having code compiled for Hadoop 1.x run 
against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at runtime 
Hadoop 2.x is used.
A common cause is actually bundling Spark / Hadoop classes with your app, when 
the app should just use the Spark / Hadoop provided by the cluster. It could 
also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid 
wrote:

Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as 
yarn-client) - pretty much the standard case demonstrated in the 
hbase_inputformat.py from examples... the thing is the when trying the very 
same code on spark 1.2 I am getting the error bellow which based on similar 
cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving 
the classpath which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.: 
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.RDD.take(RDD.scala:1060) at 
org.apache.spark.rdd.RDD.first(RDD.scala:1093) at 
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at 
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:207) at 
java.lang.Thread.run(Thread.java:745)

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by 
me and I presume they are pretty good in building it so I still suspect it now 
gets the classpath resolved in different way?
thx,Antony. 

 On Wednesday, 7 January 2015, 18:55, Sean Owen so...@cloudera.com wrote:
   
 

 Problems like this are always due to having code compiled for Hadoop 1.x run 
against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at runtime 
Hadoop 2.x is used.
A common cause is actually bundling Spark / Hadoop classes with your app, when 
the app should just use the Spark / Hadoop provided by the cluster. It could 
also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi antonym...@yahoo.com.invalid 
wrote:

Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as 
yarn-client) - pretty much the standard case demonstrated in the 
hbase_inputformat.py from examples... the thing is the when trying the very 
same code on spark 1.2 I am getting the error bellow which based on similar 
cases on another forums suggest incompatibility between MR1 and MR2.
why would this now start happening? is that due to some changes in resolving 
the classpath which now picks up MR2 jars first while before it was MR1?
is there any workaround for this?
thanks,Antony.
the error:
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.: 
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:158)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at 
org.apache.spark.rdd.RDD.take(RDD.scala:1060) at 
org.apache.spark.rdd.RDD.first(RDD.scala:1093) at 
org.apache.spark.api.python.SerDeUtil$.pairRDDToPython(SerDeUtil.scala:202) at 
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:500) at 
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:207) at 
java.lang.Thread.run(Thread.java:745)

spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

6 matches

Site Navigation

Mail list logo

Footer information