Hi, You should prepare your jar file (from your Spark application written in Java) with all the necessary dependencies. You can create a Maven project on Eclipse by specifying the dependencies in a Maven friendly pom.xml file.
For building the jar with the dependencies and *main class (since you are getting the **ClassNotFoundException)* your pom.xml should contain the following in the *build *tag (example main class is marked in Red color): <build> <plugins> <!-- download source code in Eclipse, best practice --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-eclipse-plugin</artifactId> <version>2.9</version> <configuration> <downloadSources>true</downloadSources> <downloadJavadocs>false</downloadJavadocs> </configuration> </plugin> <!-- Set a compiler level --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.5.1</version> <configuration> <source>${jdk.version}</source> <target>${jdk.version}</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <configuration> <shadeTestJar>true</shadeTestJar> </configuration> </plugin> <!-- Maven Assembly Plugin --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.4.1</version> <configuration> <!-- get all project dependencies --> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <!-- MainClass in mainfest make a executable jar --> <archive> <manifest> <mainClass>com.example.RandomForest.SongPrediction</mainClass> </manifest> </archive> <property> <name>oozie.launcher.mapreduce.job.user.classpath.first</name> <value>true</value> </property> </configuration> <executions> <execution> <id>make-assembly</id> <!-- bind to the packaging phase --> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> An example pom.xml file has been attached for your reference. Feel free to reuse it. Regards, _________________________________ *Md. Rezaul Karim,* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html> On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote: > Hello, > I have a simple word count example in Java and I can run this in Eclipse > (code at the bottom) > > I then create a jar file from it and try to run it from the cmd > > > java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt > > But I get this error? > > I think the main error is: > *Exception in thread "main" java.lang.ClassNotFoundException: Failed to > find > data source: text* > > Any advise on how to run this jar file in spark would be appreciated > > > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2 > 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner > 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner > 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to: > 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to: > 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(Owner); > groups > with view permissions: Set(); users with modify permissions: Set(Owner); > groups with modify permissions: Set() > 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' on > port 10211. > 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker > 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster > 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at > C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc- > 44fd-a75e-1a0450556873 > 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity > 1984.5 > MB > 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator > 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on > port > 4040. > 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.19.2:4040 > 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host > localhost > 16/12/07 15:16:45 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252. > 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on > 192.168.19.2:10252 > 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.19.2, 10252) > 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block > manager > 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver, > 192.168.19.2, > 10252) > 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.19.2, 10252) > 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/12/07 15:16:46 INFO SharedState: Warehouse path is > 'file:/C:/Users/Owner/spark-warehouse'. > Exception in thread "main" java.lang.ClassNotFoundException: Failed to > find > data source: text. Please find packages at > https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects > at > org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource( > DataSource.scala:148) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$ > lzycompute(DataSource.scala:79) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass( > DataSource.scala:79) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation( > DataSource.scala:340) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) > at > org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504) > at > org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540) > at > org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513) > at JavaWordCount.main(JavaWordCount.java:57) > Caused by: java.lang.ClassNotFoundException: text.DefaultSource > at java.net.URLClassLoader.findClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$ > anonfun$apply$1.apply(DataSource.scala:132) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$ > anonfun$apply$1.apply(DataSource.scala:132) > at scala.util.Try$.apply(Try.scala:192) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply( > DataSource.scala:132) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply( > DataSource.scala:132) > at scala.util.Try.orElse(Try.scala:84) > at > org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource( > DataSource.scala:132) > ... 8 more > 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook > 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at > http://192.168.19.2:4040 > 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared > 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped > 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped > 16/12/07 15:16:46 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext > 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called > 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory > C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947- > ac13-d40056cf71d8 > > C:\Users\Owner> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > public final class JavaWordCount { > private static final Pattern SPACE = Pattern.compile(" "); > > public static void main(String[] args) throws Exception { > > if (args.length < 1) { > System.err.println("Usage: JavaWordCount <file>"); > System.exit(1); > } > > //boiler plate needed to run locally > SparkConf conf = new SparkConf().setAppName("Word Count > Application").setMaster("local[*]"); > JavaSparkContext sc = new JavaSparkContext(conf); > > SparkSession spark = SparkSession > .builder() > .appName("Word Count") > .getOrCreate() > .newSession(); > > > JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD(); > > > JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, > String>() { > @Override > public Iterator<String> call(String s) { > return Arrays.asList(SPACE.split(s)).iterator(); > } > }); > > JavaPairRDD<String, Integer> ones = words.mapToPair( > new PairFunction<String, String, Integer>() { > @Override > public Tuple2<String, Integer> call(String s) { > return new Tuple2<>(s, 1); > } > }); > > JavaPairRDD<String, Integer> counts = ones.reduceByKey( > new Function2<Integer, Integer, Integer>() { > @Override > public Integer call(Integer i1, Integer i2) { > return i1 + i2; > } > }); > > List<Tuple2<String, Integer>> output = counts.collect(); > for (Tuple2<?,?> tuple : output) { > System.out.println(tuple._1() + ": " + tuple._2()); > } > spark.stop(); > } > } > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then-Jar-tp28182.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.examples</groupId> <artifactId>MillionSongsDatabase</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>MillionSongsDatabase</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <jdk.version>1.8</jdk.version> <spark.version>2.0.0</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.bahir</groupId> <artifactId>spark-streaming-twitter_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-graphx_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-network-shuffle_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka_2.10</artifactId> <version>1.6.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-flume_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.11</artifactId> <version>1.3.0</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.38</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <!-- download source code in Eclipse, best practice --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-eclipse-plugin</artifactId> <version>2.9</version> <configuration> <downloadSources>true</downloadSources> <downloadJavadocs>false</downloadJavadocs> </configuration> </plugin> <!-- Set a compiler level --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.5.1</version> <configuration> <source>${jdk.version}</source> <target>${jdk.version}</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <configuration> <shadeTestJar>true</shadeTestJar> </configuration> </plugin> <!-- Maven Assembly Plugin --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.4.1</version> <configuration> <!-- get all project dependencies --> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <!-- MainClass in mainfest make a executable jar --> <archive> <manifest> <mainClass>com.example.RandomForest.SongPredictionusingLinear</mainClass> </manifest> </archive> <property> <name>oozie.launcher.mapreduce.job.user.classpath.first</name> <value>true</value> </property> </configuration> <executions> <execution> <id>make-assembly</id> <!-- bind to the packaging phase --> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org