Re: Running spark from Eclipse and then Jar

Md. Rezaul Karim Wed, 07 Dec 2016 15:34:58 -0800

Hi,

You should prepare your jar file (from your Spark application written in
Java) with all the necessary dependencies. You can create a Maven project
on Eclipse by specifying the dependencies in a Maven friendly pom.xml file.


For building the jar with the dependencies and *main class (since you are
getting the **ClassNotFoundException)* your pom.xml should contain the
following in the *build *tag (example main class is marked in Red color):

<build>
        <plugins>
            <!-- download source code in Eclipse, best practice -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-eclipse-plugin</artifactId>
                <version>2.9</version>
                <configuration>
                    <downloadSources>true</downloadSources>
                    <downloadJavadocs>false</downloadJavadocs>
                </configuration>
            </plugin>
            <!-- Set a compiler level -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>${jdk.version}</source>
                    <target>${jdk.version}</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <configuration>
                    <shadeTestJar>true</shadeTestJar>
                </configuration>
            </plugin>
            <!-- Maven Assembly Plugin -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <!-- get all project dependencies -->
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <!-- MainClass in mainfest make a executable jar -->
                    <archive>
                        <manifest>

<mainClass>com.example.RandomForest.SongPrediction</mainClass>
                        </manifest>
                    </archive>

                    <property>

<name>oozie.launcher.mapreduce.job.user.classpath.first</name>
                        <value>true</value>
                    </property>

                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <!-- bind to the packaging phase -->
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>


An example pom.xml file has been attached for your reference. Feel free to
reuse it.


Regards,
_________________________________
*Md. Rezaul Karim,* BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote:

> Hello,
> I have a simple word count example in Java and I can run this in Eclipse
> (code at the bottom)
>
> I then create a jar file from it and try to run it from the cmd
>
>
> java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt
>
> But I get this error?
>
> I think the main error is:
> *Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find
> data source: text*
>
> Any advise on how to run this jar file in spark would be appreciated
>
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2
> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner
> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner
> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to:
> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to:
> 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(Owner);
> groups
> with view permissions: Set(); users  with modify permissions: Set(Owner);
> groups with modify permissions: Set()
> 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' on
> port 10211.
> 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker
> 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster
> 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at
> C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc-
> 44fd-a75e-1a0450556873
> 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity
> 1984.5
> MB
> 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on
> port
> 4040.
> 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
> http://192.168.19.2:4040
> 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host
> localhost
> 16/12/07 15:16:45 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252.
> 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on
> 192.168.19.2:10252
> 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager
> BlockManagerId(driver, 192.168.19.2, 10252)
> 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block
> manager
> 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver,
> 192.168.19.2,
> 10252)
> 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, 192.168.19.2, 10252)
> 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some
> configuration may not take effect.
> 16/12/07 15:16:46 INFO SharedState: Warehouse path is
> 'file:/C:/Users/Owner/spark-warehouse'.
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find
> data source: text. Please find packages at
> https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
>         at
> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(
> DataSource.scala:148)
>         at
> org.apache.spark.sql.execution.datasources.DataSource.providingClass$
> lzycompute(DataSource.scala:79)
>         at
> org.apache.spark.sql.execution.datasources.DataSource.providingClass(
> DataSource.scala:79)
>         at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(
> DataSource.scala:340)
>         at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>         at
> org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504)
>         at
> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540)
>         at
> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513)
>         at JavaWordCount.main(JavaWordCount.java:57)
> Caused by: java.lang.ClassNotFoundException: text.DefaultSource
>         at java.net.URLClassLoader.findClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$
> anonfun$apply$1.apply(DataSource.scala:132)
>         at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$
> anonfun$apply$1.apply(DataSource.scala:132)
>         at scala.util.Try$.apply(Try.scala:192)
>         at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(
> DataSource.scala:132)
>         at
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(
> DataSource.scala:132)
>         at scala.util.Try.orElse(Try.scala:84)
>         at
> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(
> DataSource.scala:132)
>         ... 8 more
> 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook
> 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at
> http://192.168.19.2:4040
> 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared
> 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped
> 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped
> 16/12/07 15:16:46 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext
> 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called
> 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory
> C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947-
> ac13-d40056cf71d8
>
> C:\Users\Owner>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> public final class JavaWordCount {
>   private static final Pattern SPACE = Pattern.compile(" ");
>
>   public static void main(String[] args) throws Exception {
>
>     if (args.length < 1) {
>       System.err.println("Usage: JavaWordCount <file>");
>       System.exit(1);
>     }
>
>         //boiler plate needed to run locally
>         SparkConf conf = new SparkConf().setAppName("Word Count
> Application").setMaster("local[*]");
>         JavaSparkContext sc = new JavaSparkContext(conf);
>
>     SparkSession spark = SparkSession
>                         .builder()
>                         .appName("Word Count")
>                         .getOrCreate()
>                         .newSession();
>
>
>     JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD();
>
>
>     JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String,
> String>() {
>       @Override
>       public Iterator<String> call(String s) {
>         return Arrays.asList(SPACE.split(s)).iterator();
>       }
>     });
>
>     JavaPairRDD<String, Integer> ones = words.mapToPair(
>       new PairFunction<String, String, Integer>() {
>         @Override
>         public Tuple2<String, Integer> call(String s) {
>           return new Tuple2<>(s, 1);
>         }
>       });
>
>     JavaPairRDD<String, Integer> counts = ones.reduceByKey(
>       new Function2<Integer, Integer, Integer>() {
>         @Override
>         public Integer call(Integer i1, Integer i2) {
>           return i1 + i2;
>         }
>       });
>
>     List<Tuple2&lt;String, Integer>> output = counts.collect();
>     for (Tuple2<?,?> tuple : output) {
>       System.out.println(tuple._1() + ": " + tuple._2());
>     }
>     spark.stop();
>   }
> }
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then-Jar-tp28182.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

<project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";>
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.examples</groupId>
	<artifactId>MillionSongsDatabase</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>MillionSongsDatabase</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<jdk.version>1.8</jdk.version>
		<spark.version>2.0.0</spark.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-sql_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.bahir</groupId>
			<artifactId>spark-streaming-twitter_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-mllib_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-hive_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-graphx_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-yarn_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-network-shuffle_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming-kafka_2.10</artifactId>
			<version>1.6.2</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-streaming-flume_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>com.databricks</groupId>
			<artifactId>spark-csv_2.11</artifactId>
			<version>1.3.0</version>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>5.1.38</version>
		</dependency>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>3.8.1</version>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<!-- download source code in Eclipse, best practice -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-eclipse-plugin</artifactId>
				<version>2.9</version>
				<configuration>
					<downloadSources>true</downloadSources>
					<downloadJavadocs>false</downloadJavadocs>
				</configuration>
			</plugin>
			<!-- Set a compiler level -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.5.1</version>
				<configuration>
					<source>${jdk.version}</source>
					<target>${jdk.version}</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-shade-plugin</artifactId>
				<version>2.4.3</version>
				<configuration>
					<shadeTestJar>true</shadeTestJar>
				</configuration>
			</plugin>
			<!-- Maven Assembly Plugin -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-assembly-plugin</artifactId>
				<version>2.4.1</version>
				<configuration>
					<!-- get all project dependencies -->
					<descriptorRefs>
						<descriptorRef>jar-with-dependencies</descriptorRef>
					</descriptorRefs>
					<!-- MainClass in mainfest make a executable jar -->
					<archive>
						<manifest>
							<mainClass>com.example.RandomForest.SongPredictionusingLinear</mainClass>
						</manifest>
					</archive>

					<property>
						<name>oozie.launcher.mapreduce.job.user.classpath.first</name>
						<value>true</value>
					</property>

				</configuration>
				<executions>
					<execution>
						<id>make-assembly</id>
						<!-- bind to the packaging phase -->
						<phase>package</phase>
						<goals>
							<goal>single</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>

</project>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Running spark from Eclipse and then Jar

Reply via email to