Re: A basic question
What is this spring-boot-starter-batch ? why do we need it ? Regards, Shyam On Mon, Jun 17, 2019 at 12:39 PM Deepak Sharma wrote: > You can follow this example: > > https://docs.spring.io/spring-hadoop/docs/current/reference/html/springandhadoop-spark.html > > > On Mon, Jun 17, 2019 at 12:27 PM Shyam P wrote: > >> I am developing a spark job using java1.8v. >> >> Is it possible to write a spark app using spring-boot technology? >> Did anyone tried it ? if so how it should be done? >> >> >> Regards, >> Shyam >> > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net >
Re: A basic question
Thank you so much Deepak. Let me implement and update you. Hope it works. Any short-comings I need to consider or take care of ? Regards, Shyam On Mon, Jun 17, 2019 at 12:39 PM Deepak Sharma wrote: > You can follow this example: > > https://docs.spring.io/spring-hadoop/docs/current/reference/html/springandhadoop-spark.html > > > On Mon, Jun 17, 2019 at 12:27 PM Shyam P wrote: > >> I am developing a spark job using java1.8v. >> >> Is it possible to write a spark app using spring-boot technology? >> Did anyone tried it ? if so how it should be done? >> >> >> Regards, >> Shyam >> > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net >
Re: A basic question
You can follow this example: https://docs.spring.io/spring-hadoop/docs/current/reference/html/springandhadoop-spark.html On Mon, Jun 17, 2019 at 12:27 PM Shyam P wrote: > I am developing a spark job using java1.8v. > > Is it possible to write a spark app using spring-boot technology? > Did anyone tried it ? if so how it should be done? > > > Regards, > Shyam > -- Thanks Deepak www.bigdatabig.com www.keosha.net
RE: a basic question on first use of PySpark shell and example, which is failing
I guess I should also point out that I do an export CLASSPATH in my .bash_profile file, so the CLASSPATH info should be usable by the PySpark shell that I invoke. Ron Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Taylor, Ronald C Sent: Monday, February 29, 2016 2:57 PM To: 'Yin Yang'; user@spark.apache.org Cc: Jules Damji; ronald.taylo...@gmail.com; Taylor, Ronald C Subject: RE: a basic question on first use of PySpark shell and example, which is failing HI Yin, My Classpath is set to: CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/*:/people/rtaylor/SparkWork/DataAlgUtils:. And there is indeed a spark-core.jar in the ../jars subdirectory, though it is not named precisely “spark-core.jar”. It has a version number in its name, as you can see: [rtaylor@bigdatann ~]$ find /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars -name "spark-core*.jar" /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/spark-core_2.10-1.5.0-cdh5.5.1.jar I extracted the class names into a text file: [rtaylor@bigdatann jars]$ jar tf spark-core_2.10-1.5.0-cdh5.5.1.jar > /people/rtaylor/SparkWork/jar_file_listing_of_spark-core_jar.txt And then searched for RDDOperationScope. I found these classes: [rtaylor@bigdatann SparkWork]$ grep RDDOperationScope jar_file_listing_of_spark-core_jar.txt org/apache/spark/rdd/RDDOperationScope$$anonfun$5.class org/apache/spark/rdd/RDDOperationScope$$anonfun$3.class org/apache/spark/rdd/RDDOperationScope$$anonfun$4$$anonfun$apply$1.class org/apache/spark/rdd/RDDOperationScope$$anonfun$4.class org/apache/spark/rdd/RDDOperationScope$$anonfun$1.class org/apache/spark/rdd/RDDOperationScope$$anonfun$getAllScopes$2.class org/apache/spark/rdd/RDDOperationScope$.class org/apache/spark/rdd/RDDOperationScope$$anonfun$getAllScopes$1.class org/apache/spark/rdd/RDDOperationScope.class org/apache/spark/rdd/RDDOperationScope$$anonfun$2.class [rtaylor@bigdatann SparkWork]$ It looks like the RDDOperationScope class is present. Shouldn’t that work? Ron Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov<mailto:ronald.tay...@pnnl.gov> web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Yin Yang [mailto:yy201...@gmail.com] Sent: Monday, February 29, 2016 2:27 PM To: Taylor, Ronald C Cc: Jules Damji; user@spark.apache.org<mailto:user@spark.apache.org>; ronald.taylo...@gmail.com<mailto:ronald.taylo...@gmail.com> Subject: Re: a basic question on first use of PySpark shell and example, which is failing RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C mailto:ronald.tay...@pnnl.gov>> wrote: Hi Jules, folks, I have tried “hdfs://” as well as “file://”. And several variants. Every time, I get the same msg – NoClassDefFoundError. See below. Why do I get such a msg, if the problem is simply that Spark cannot find the text file? Doesn’t the error msg indicate some other source of the problem? I may be missing something in the error report; I am a Java person, not a Python programmer. But doesn’t it look like a call to a Java class –something associated with “o9.textFile” - is failing? If so, how do I fix this? Ron "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", line 451, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", line 36, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov<mailto:ronald.tay...@pnnl.gov> web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Jules Damji [mailto:dmat...@comcas
RE: a basic question on first use of PySpark shell and example, which is failing
HI Yin, My Classpath is set to: CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/*:/people/rtaylor/SparkWork/DataAlgUtils:. And there is indeed a spark-core.jar in the ../jars subdirectory, though it is not named precisely “spark-core.jar”. It has a version number in its name, as you can see: [rtaylor@bigdatann ~]$ find /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars -name "spark-core*.jar" /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/spark-core_2.10-1.5.0-cdh5.5.1.jar I extracted the class names into a text file: [rtaylor@bigdatann jars]$ jar tf spark-core_2.10-1.5.0-cdh5.5.1.jar > /people/rtaylor/SparkWork/jar_file_listing_of_spark-core_jar.txt And then searched for RDDOperationScope. I found these classes: [rtaylor@bigdatann SparkWork]$ grep RDDOperationScope jar_file_listing_of_spark-core_jar.txt org/apache/spark/rdd/RDDOperationScope$$anonfun$5.class org/apache/spark/rdd/RDDOperationScope$$anonfun$3.class org/apache/spark/rdd/RDDOperationScope$$anonfun$4$$anonfun$apply$1.class org/apache/spark/rdd/RDDOperationScope$$anonfun$4.class org/apache/spark/rdd/RDDOperationScope$$anonfun$1.class org/apache/spark/rdd/RDDOperationScope$$anonfun$getAllScopes$2.class org/apache/spark/rdd/RDDOperationScope$.class org/apache/spark/rdd/RDDOperationScope$$anonfun$getAllScopes$1.class org/apache/spark/rdd/RDDOperationScope.class org/apache/spark/rdd/RDDOperationScope$$anonfun$2.class [rtaylor@bigdatann SparkWork]$ It looks like the RDDOperationScope class is present. Shouldn’t that work? Ron Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Yin Yang [mailto:yy201...@gmail.com] Sent: Monday, February 29, 2016 2:27 PM To: Taylor, Ronald C Cc: Jules Damji; user@spark.apache.org; ronald.taylo...@gmail.com Subject: Re: a basic question on first use of PySpark shell and example, which is failing RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C mailto:ronald.tay...@pnnl.gov>> wrote: Hi Jules, folks, I have tried “hdfs://” as well as “file://”. And several variants. Every time, I get the same msg – NoClassDefFoundError. See below. Why do I get such a msg, if the problem is simply that Spark cannot find the text file? Doesn’t the error msg indicate some other source of the problem? I may be missing something in the error report; I am a Java person, not a Python programmer. But doesn’t it look like a call to a Java class –something associated with “o9.textFile” - is failing? If so, how do I fix this? Ron "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", line 451, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", line 36, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov<mailto:ronald.tay...@pnnl.gov> web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Jules Damji [mailto:dmat...@comcast.net<mailto:dmat...@comcast.net>] Sent: Sunday, February 28, 2016 10:07 PM To: Taylor, Ronald C Cc: user@spark.apache.org<mailto:user@spark.apache.org>; ronald.taylo...@gmail.com<mailto:ronald.taylo...@gmail.com> Subject: Re: a basic question on first use of PySpark shell and example, which is failing Hello Ronald, Since you have placed the file under HDFS, you might same change the path name to: val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") Sent from my iPhone Pardon the dumb thumb typos :) On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C mailto:ronald.tay...@pnnl.gov>> wrote: Hello folks, I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at our lab. I am trying to use the PySpark shell for the first time. and am attempting to duplicate the
Re: a basic question on first use of PySpark shell and example, which is failing
RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C wrote: > Hi Jules, folks, > > > > I have tried “hdfs://” as well as “file:// filepath>”. And several variants. Every time, I get the same msg – > NoClassDefFoundError. See below. Why do I get such a msg, if the problem is > simply that Spark cannot find the text file? Doesn’t the error msg indicate > some other source of the problem? > > > > I may be missing something in the error report; I am a Java person, not a > Python programmer. But doesn’t it look like a call to a Java class > –something associated with “o9.textFile” - is failing? If so, how do I > fix this? > > > > Ron > > > > > > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > > return RDD(self._jsc.textFile(name, minPartitions), self, > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > > return f(*a, **kw) > > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.rdd.RDDOperationScope$ > > > > Ronald C. Taylor, Ph.D. > > Computational Biology & Bioinformatics Group > > Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) > > Richland, WA 99352 > > phone: (509) 372-6568, email: ronald.tay...@pnnl.gov > > web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 > > > > *From:* Jules Damji [mailto:dmat...@comcast.net] > *Sent:* Sunday, February 28, 2016 10:07 PM > *To:* Taylor, Ronald C > *Cc:* user@spark.apache.org; ronald.taylo...@gmail.com > *Subject:* Re: a basic question on first use of PySpark shell and > example, which is failing > > > > > > Hello Ronald, > > > > Since you have placed the file under HDFS, you might same change the path > name to: > > > > val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") > > > Sent from my iPhone > > Pardon the dumb thumb typos :) > > > On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C > wrote: > > > > Hello folks, > > > > I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster > at our lab. I am trying to use the PySpark shell for the first time. and am > attempting to duplicate the documentation example of creating an RDD > which I called "lines" using a text file. > > I placed a a text file called Warehouse.java in this HDFS location: > > > [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark > -rw-r--r-- 3 rtaylor supergroup1155355 2016-02-28 18:09 > /user/rtaylor/Spark/Warehouse.java > [rtaylor@bigdatann ~]$ > > I then invoked sc.textFile()in the PySpark shell.That did not work. See > below. Apparently a class is not found? Don't know why that would be the > case. Any guidance would be very much appreciated. > > The Cloudera Manager for the cluster says that Spark is operating in the > "green", for whatever that is worth. > > - Ron Taylor > > > >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java") > > Traceback (most recent call last): > File "", line 1, in > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > return RDD(self._jsc.textFile(name, minPartitions), self, > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > return f(*a, **kw) > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.
RE: a basic question on first use of PySpark shell and example, which is failing
Hi Jules, folks, I have tried “hdfs://” as well as “file://”. And several variants. Every time, I get the same msg – NoClassDefFoundError. See below. Why do I get such a msg, if the problem is simply that Spark cannot find the text file? Doesn’t the error msg indicate some other source of the problem? I may be missing something in the error report; I am a Java person, not a Python programmer. But doesn’t it look like a call to a Java class –something associated with “o9.textFile” - is failing? If so, how do I fix this? Ron "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", line 451, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", line 36, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ Ronald C. Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle) Richland, WA 99352 phone: (509) 372-6568, email: ronald.tay...@pnnl.gov web page: http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048 From: Jules Damji [mailto:dmat...@comcast.net] Sent: Sunday, February 28, 2016 10:07 PM To: Taylor, Ronald C Cc: user@spark.apache.org; ronald.taylo...@gmail.com Subject: Re: a basic question on first use of PySpark shell and example, which is failing Hello Ronald, Since you have placed the file under HDFS, you might same change the path name to: val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") Sent from my iPhone Pardon the dumb thumb typos :) On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C mailto:ronald.tay...@pnnl.gov>> wrote: Hello folks, I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at our lab. I am trying to use the PySpark shell for the first time. and am attempting to duplicate the documentation example of creating an RDD which I called "lines" using a text file. I placed a a text file called Warehouse.java in this HDFS location: [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark -rw-r--r-- 3 rtaylor supergroup1155355 2016-02-28 18:09 /user/rtaylor/Spark/Warehouse.java [rtaylor@bigdatann ~]$ I then invoked sc.textFile()in the PySpark shell.That did not work. See below. Apparently a class is not found? Don't know why that would be the case. Any guidance would be very much appreciated. The Cloudera Manager for the cluster says that Spark is operating in the "green", for whatever that is worth. - Ron Taylor >>> lines = >>> sc.textFile("file:///user/taylor/Spark/Warehouse.java") Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", line 451, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", line 36, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) at org.apache.spark.SparkContext.textFile(SparkContext.scala:825) at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) >>>
Re: a basic question on first use of PySpark shell and example, which is failing
Hello Ronald, Since you have placed the file under HDFS, you might same change the path name to: val lines = sc.textFile("hdfs://user/taylor/Spark/Warehouse.java") Sent from my iPhone Pardon the dumb thumb typos :) > On Feb 28, 2016, at 9:36 PM, Taylor, Ronald C wrote: > > > Hello folks, > > I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at > our lab. I am trying to use the PySpark shell for the first time. and am > attempting to duplicate the documentation example of creating an RDD which > I called "lines" using a text file. > > I placed a a text file called Warehouse.java in this HDFS location: > > [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark > -rw-r--r-- 3 rtaylor supergroup1155355 2016-02-28 18:09 > /user/rtaylor/Spark/Warehouse.java > [rtaylor@bigdatann ~]$ > > I then invoked sc.textFile()in the PySpark shell.That did not work. See > below. Apparently a class is not found? Don't know why that would be the > case. Any guidance would be very much appreciated. > > The Cloudera Manager for the cluster says that Spark is operating in the > "green", for whatever that is worth. > > - Ron Taylor > > >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java") > > Traceback (most recent call last): > File "", line 1, in > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", > line 451, in textFile > return RDD(self._jsc.textFile(name, minPartitions), self, > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", > line 36, in deco > return f(*a, **kw) > File > "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. > : java.lang.NoClassDefFoundError: Could not initialize class > org.apache.spark.rdd.RDDOperationScope$ > at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:825) > at > org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > > >>>