yes exactly. I run mine fine in Eclipse but when I run it from a corresponding jar I get the same error!
On Wed, Dec 7, 2016 at 5:04 PM Md. Rezaul Karim < rezaul.ka...@insight-centre.org> wrote: > I believe, it's not about the location (i.e., local machine or HDFS) but > it's all about the format of the input file. For example, I am getting the > following error while trying to read an input file in libsvm format: > > *Exception in thread "main" java.lang.ClassNotFoundException: Failed to > find data source: libsvm. * > > The application works fine on Eclipse. However, while packaging the > corresponding jar file, I am getting the above error which is really weird! > > > > Regards, > _________________________________ > *Md. Rezaul Karim* BSc, MSc > > PhD Researcher, INSIGHT Centre for Data Analytics > National University of Ireland, Galway > IDA Business Park, Dangan, Galway, Ireland > Web: http://www.reza-analytics.eu/index.html > <http://139.59.184.114/index.html> > > On 7 December 2016 at 23:39, Iman Mohtashemi <iman.mohtash...@gmail.com> > wrote: > > No but I tried that too and still didn't work. Where are the files being > read from? From the local machine or HDFS? Do I need to get the files to > HDFS first? In Eclipse I just point to the location of the directory? > > On Wed, Dec 7, 2016 at 3:34 PM Md. Rezaul Karim < > rezaul.ka...@insight-centre.org> wrote: > > Hi, > > You should prepare your jar file (from your Spark application written in > Java) with all the necessary dependencies. You can create a Maven project > on Eclipse by specifying the dependencies in a Maven friendly pom.xml file. > > For building the jar with the dependencies and *main class (since you are > getting the **ClassNotFoundException)* your pom.xml should contain the > following in the *build *tag (example main class is marked in Red color): > > <build> > <plugins> > <!-- download source code in Eclipse, best practice --> > <plugin> > <groupId>org.apache.maven.plugins</groupId> > <artifactId>maven-eclipse-plugin</artifactId> > <version>2.9</version> > <configuration> > <downloadSources>true</downloadSources> > <downloadJavadocs>false</downloadJavadocs> > </configuration> > </plugin> > <!-- Set a compiler level --> > <plugin> > <groupId>org.apache.maven.plugins</groupId> > <artifactId>maven-compiler-plugin</artifactId> > <version>3.5.1</version> > <configuration> > <source>${jdk.version}</source> > <target>${jdk.version}</target> > </configuration> > </plugin> > <plugin> > <groupId>org.apache.maven.plugins</groupId> > <artifactId>maven-shade-plugin</artifactId> > <version>2.4.3</version> > <configuration> > <shadeTestJar>true</shadeTestJar> > </configuration> > </plugin> > <!-- Maven Assembly Plugin --> > <plugin> > <groupId>org.apache.maven.plugins</groupId> > <artifactId>maven-assembly-plugin</artifactId> > <version>2.4.1</version> > <configuration> > <!-- get all project dependencies --> > <descriptorRefs> > > <descriptorRef>jar-with-dependencies</descriptorRef> > </descriptorRefs> > <!-- MainClass in mainfest make a executable jar --> > <archive> > <manifest> > > <mainClass>com.example.RandomForest.SongPrediction</mainClass> > </manifest> > </archive> > > <property> > > <name>oozie.launcher.mapreduce.job.user.classpath.first</name> > <value>true</value> > </property> > > </configuration> > <executions> > <execution> > <id>make-assembly</id> > <!-- bind to the packaging phase --> > <phase>package</phase> > <goals> > <goal>single</goal> > </goals> > </execution> > </executions> > </plugin> > </plugins> > </build> > > > An example pom.xml file has been attached for your reference. Feel free to > reuse it. > > > Regards, > _________________________________ > *Md. Rezaul Karim,* BSc, MSc > PhD Researcher, INSIGHT Centre for Data Analytics > National University of Ireland, Galway > IDA Business Park, Dangan, Galway, Ireland > Web: http://www.reza-analytics.eu/index.html > <http://139.59.184.114/index.html> > > On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote: > > Hello, > I have a simple word count example in Java and I can run this in Eclipse > (code at the bottom) > > I then create a jar file from it and try to run it from the cmd > > > java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt > > But I get this error? > > I think the main error is: > *Exception in thread "main" java.lang.ClassNotFoundException: Failed to > find > data source: text* > > Any advise on how to run this jar file in spark would be appreciated > > > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2 > 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner > 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner > 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to: > 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to: > 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(Owner); > groups > with view permissions: Set(); users with modify permissions: Set(Owner); > groups with modify permissions: Set() > 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' on > port 10211. > 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker > 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster > 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at > > C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc-44fd-a75e-1a0450556873 > 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity > 1984.5 > MB > 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator > 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on > port > 4040. > 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.19.2:4040 > 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host > localhost > 16/12/07 15:16:45 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252. > 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on > 192.168.19.2:10252 > 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.19.2, 10252) > 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block > manager > 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver, > 192.168.19.2, > 10252) > 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.19.2, 10252) > 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/12/07 15:16:46 INFO SharedState: Warehouse path is > 'file:/C:/Users/Owner/spark-warehouse'. > Exception in thread "main" java.lang.ClassNotFoundException: Failed to find > data source: text. Please find packages at > https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects > at > > org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148) > at > > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79) > at > > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79) > at > > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) > at > org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504) > at > org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540) > at > org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513) > at JavaWordCount.main(JavaWordCount.java:57) > Caused by: java.lang.ClassNotFoundException: text.DefaultSource > at java.net.URLClassLoader.findClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) > at java.lang.ClassLoader.loadClass(Unknown Source) > at > > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132) > at > > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132) > at scala.util.Try$.apply(Try.scala:192) > at > > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132) > at > > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132) > at scala.util.Try.orElse(Try.scala:84) > at > > org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132) > ... 8 more > 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook > 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at > http://192.168.19.2:4040 > 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared > 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped > 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped > 16/12/07 15:16:46 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext > 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called > 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory > > C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947-ac13-d40056cf71d8 > > C:\Users\Owner> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > public final class JavaWordCount { > private static final Pattern SPACE = Pattern.compile(" "); > > public static void main(String[] args) throws Exception { > > if (args.length < 1) { > System.err.println("Usage: JavaWordCount <file>"); > System.exit(1); > } > > //boiler plate needed to run locally > SparkConf conf = new SparkConf().setAppName("Word Count > Application").setMaster("local[*]"); > JavaSparkContext sc = new JavaSparkContext(conf); > > SparkSession spark = SparkSession > .builder() > .appName("Word Count") > .getOrCreate() > .newSession(); > > > JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD(); > > > JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, > String>() { > @Override > public Iterator<String> call(String s) { > return Arrays.asList(SPACE.split(s)).iterator(); > } > }); > > JavaPairRDD<String, Integer> ones = words.mapToPair( > new PairFunction<String, String, Integer>() { > @Override > public Tuple2<String, Integer> call(String s) { > return new Tuple2<>(s, 1); > } > }); > > JavaPairRDD<String, Integer> counts = ones.reduceByKey( > new Function2<Integer, Integer, Integer>() { > @Override > public Integer call(Integer i1, Integer i2) { > return i1 + i2; > } > }); > > List<Tuple2<String, Integer>> output = counts.collect(); > for (Tuple2<?,?> tuple : output) { > System.out.println(tuple._1() + ": " + tuple._2()); > } > spark.stop(); > } > } > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then-Jar-tp28182.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > >