Re: Running spark from Eclipse and then Jar

Iman Mohtashemi Wed, 07 Dec 2016 15:39:59 -0800

No but I tried that too and still didn't work. Where are the files being
read from? From the local machine or HDFS? Do I need to get the files to
HDFS first? In Eclipse I just point to the location of the directory?


On Wed, Dec 7, 2016 at 3:34 PM Md. Rezaul Karim <
rezaul.ka...@insight-centre.org> wrote:

> Hi,
>
> You should prepare your jar file (from your Spark application written in
> Java) with all the necessary dependencies. You can create a Maven project
> on Eclipse by specifying the dependencies in a Maven friendly pom.xml file.
>
> For building the jar with the dependencies and *main class (since you are
> getting the **ClassNotFoundException)* your pom.xml should contain the
> following in the *build *tag (example main class is marked in Red color):
>
> <build>
>         <plugins>
>             <!-- download source code in Eclipse, best practice -->
>             <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-eclipse-plugin</artifactId>
>                 <version>2.9</version>
>                 <configuration>
>                     <downloadSources>true</downloadSources>
>                     <downloadJavadocs>false</downloadJavadocs>
>                 </configuration>
>             </plugin>
>             <!-- Set a compiler level -->
>             <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-compiler-plugin</artifactId>
>                 <version>3.5.1</version>
>                 <configuration>
>                     <source>${jdk.version}</source>
>                     <target>${jdk.version}</target>
>                 </configuration>
>             </plugin>
>             <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-shade-plugin</artifactId>
>                 <version>2.4.3</version>
>                 <configuration>
>                     <shadeTestJar>true</shadeTestJar>
>                 </configuration>
>             </plugin>
>             <!-- Maven Assembly Plugin -->
>             <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-assembly-plugin</artifactId>
>                 <version>2.4.1</version>
>                 <configuration>
>                     <!-- get all project dependencies -->
>                     <descriptorRefs>
>
> <descriptorRef>jar-with-dependencies</descriptorRef>
>                     </descriptorRefs>
>                     <!-- MainClass in mainfest make a executable jar -->
>                     <archive>
>                         <manifest>
>
> <mainClass>com.example.RandomForest.SongPrediction</mainClass>
>                         </manifest>
>                     </archive>
>
>                     <property>
>
> <name>oozie.launcher.mapreduce.job.user.classpath.first</name>
>                         <value>true</value>
>                     </property>
>
>                 </configuration>
>                 <executions>
>                     <execution>
>                         <id>make-assembly</id>
>                         <!-- bind to the packaging phase -->
>                         <phase>package</phase>
>                         <goals>
>                             <goal>single</goal>
>                         </goals>
>                     </execution>
>                 </executions>
>             </plugin>
>         </plugins>
>     </build>
>
>
> An example pom.xml file has been attached for your reference. Feel free to
> reuse it.
>
>
> Regards,
> _________________________________
> *Md. Rezaul Karim,* BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html
> <http://139.59.184.114/index.html>
>
> On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote:
>
> Hello,
> I have a simple word count example in Java and I can run this in Eclipse
> (code at the bottom)
>
> I then create a jar file from it and try to run it from the cmd
>
>
> java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt
>
> But I get this error?
>
> I think the main error is:
> *Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find
> data source: text*
>
> Any advise on how to run this jar file in spark would be appreciated
>
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2
> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner
> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner
> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to:
> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to:
> 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(Owner);
> groups
> with view permissions: Set(); users  with modify permissions: Set(Owner);
> groups with modify permissions: Set()
> 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' on
> port 10211.
> 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker
> 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster
> 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at
>
> C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc-44fd-a75e-1a0450556873
> 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity
> 1984.5
> MB
> 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on
> port
> 4040.
> 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
> http://192.168.19.2:4040
> 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host
> localhost
> 16/12/07 15:16:45 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252.
> 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on
> 192.168.19.2:10252
> 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager
> BlockManagerId(driver, 192.168.19.2, 10252)
> 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block
> manager
> 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver,
> 192.168.19.2,
> 10252)
> 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, 192.168.19.2, 10252)
> 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some
> configuration may not take effect.
> 16/12/07 15:16:46 INFO SharedState: Warehouse path is
> 'file:/C:/Users/Owner/spark-warehouse'.
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
> data source: text. Please find packages at
> https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
>         at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>         at
> org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504)
>         at
> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540)
>         at
> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513)
>         at JavaWordCount.main(JavaWordCount.java:57)
> Caused by: java.lang.ClassNotFoundException: text.DefaultSource
>         at java.net.URLClassLoader.findClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>         at java.lang.ClassLoader.loadClass(Unknown Source)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
>         at scala.util.Try$.apply(Try.scala:192)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
>         at scala.util.Try.orElse(Try.scala:84)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
>         ... 8 more
> 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook
> 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at
> http://192.168.19.2:4040
> 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared
> 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped
> 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped
> 16/12/07 15:16:46 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext
> 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called
> 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory
>
> C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947-ac13-d40056cf71d8
>
> C:\Users\Owner>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> public final class JavaWordCount {
>   private static final Pattern SPACE = Pattern.compile(" ");
>
>   public static void main(String[] args) throws Exception {
>
>     if (args.length < 1) {
>       System.err.println("Usage: JavaWordCount <file>");
>       System.exit(1);
>     }
>
>         //boiler plate needed to run locally
>         SparkConf conf = new SparkConf().setAppName("Word Count
> Application").setMaster("local[*]");
>         JavaSparkContext sc = new JavaSparkContext(conf);
>
>     SparkSession spark = SparkSession
>                         .builder()
>                         .appName("Word Count")
>                         .getOrCreate()
>                         .newSession();
>
>
>     JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD();
>
>
>     JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String,
> String>() {
>       @Override
>       public Iterator<String> call(String s) {
>         return Arrays.asList(SPACE.split(s)).iterator();
>       }
>     });
>
>     JavaPairRDD<String, Integer> ones = words.mapToPair(
>       new PairFunction<String, String, Integer>() {
>         @Override
>         public Tuple2<String, Integer> call(String s) {
>           return new Tuple2<>(s, 1);
>         }
>       });
>
>     JavaPairRDD<String, Integer> counts = ones.reduceByKey(
>       new Function2<Integer, Integer, Integer>() {
>         @Override
>         public Integer call(Integer i1, Integer i2) {
>           return i1 + i2;
>         }
>       });
>
>     List<Tuple2&lt;String, Integer>> output = counts.collect();
>     for (Tuple2<?,?> tuple : output) {
>       System.out.println(tuple._1() + ": " + tuple._2());
>     }
>     spark.stop();
>   }
> }
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then-Jar-tp28182.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>

Re: Running spark from Eclipse and then Jar

Reply via email to