I believe, it's not about the location (i.e., local machine or HDFS) but
it's all about the format of the input file. For example, I am getting the
following error while trying to read an input file in libsvm format:

*Exception in thread "main" java.lang.ClassNotFoundException: Failed to
find data  source: libsvm. *

The application works fine on Eclipse. However, while packaging the
corresponding jar file, I am getting the above error which is really weird!



Regards,
_________________________________
*Md. Rezaul Karim* BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 7 December 2016 at 23:39, Iman Mohtashemi <iman.mohtash...@gmail.com>
wrote:

> No but I tried that too and still didn't work. Where are the files being
> read from? From the local machine or HDFS? Do I need to get the files to
> HDFS first? In Eclipse I just point to the location of the directory?
>
> On Wed, Dec 7, 2016 at 3:34 PM Md. Rezaul Karim <
> rezaul.ka...@insight-centre.org> wrote:
>
>> Hi,
>>
>> You should prepare your jar file (from your Spark application written in
>> Java) with all the necessary dependencies. You can create a Maven project
>> on Eclipse by specifying the dependencies in a Maven friendly pom.xml file.
>>
>> For building the jar with the dependencies and *main class (since you
>> are getting the **ClassNotFoundException)* your pom.xml should contain
>> the following in the *build *tag (example main class is marked in Red
>> color):
>>
>> <build>
>>         <plugins>
>>             <!-- download source code in Eclipse, best practice -->
>>             <plugin>
>>                 <groupId>org.apache.maven.plugins</groupId>
>>                 <artifactId>maven-eclipse-plugin</artifactId>
>>                 <version>2.9</version>
>>                 <configuration>
>>                     <downloadSources>true</downloadSources>
>>                     <downloadJavadocs>false</downloadJavadocs>
>>                 </configuration>
>>             </plugin>
>>             <!-- Set a compiler level -->
>>             <plugin>
>>                 <groupId>org.apache.maven.plugins</groupId>
>>                 <artifactId>maven-compiler-plugin</artifactId>
>>                 <version>3.5.1</version>
>>                 <configuration>
>>                     <source>${jdk.version}</source>
>>                     <target>${jdk.version}</target>
>>                 </configuration>
>>             </plugin>
>>             <plugin>
>>                 <groupId>org.apache.maven.plugins</groupId>
>>                 <artifactId>maven-shade-plugin</artifactId>
>>                 <version>2.4.3</version>
>>                 <configuration>
>>                     <shadeTestJar>true</shadeTestJar>
>>                 </configuration>
>>             </plugin>
>>             <!-- Maven Assembly Plugin -->
>>             <plugin>
>>                 <groupId>org.apache.maven.plugins</groupId>
>>                 <artifactId>maven-assembly-plugin</artifactId>
>>                 <version>2.4.1</version>
>>                 <configuration>
>>                     <!-- get all project dependencies -->
>>                     <descriptorRefs>
>>                         <descriptorRef>jar-with-
>> dependencies</descriptorRef>
>>                     </descriptorRefs>
>>                     <!-- MainClass in mainfest make a executable jar -->
>>                     <archive>
>>                         <manifest>
>>                             <mainClass>com.example.
>> RandomForest.SongPrediction</mainClass>
>>                         </manifest>
>>                     </archive>
>>
>>                     <property>
>>                         <name>oozie.launcher.
>> mapreduce.job.user.classpath.first</name>
>>                         <value>true</value>
>>                     </property>
>>
>>                 </configuration>
>>                 <executions>
>>                     <execution>
>>                         <id>make-assembly</id>
>>                         <!-- bind to the packaging phase -->
>>                         <phase>package</phase>
>>                         <goals>
>>                             <goal>single</goal>
>>                         </goals>
>>                     </execution>
>>                 </executions>
>>             </plugin>
>>         </plugins>
>>     </build>
>>
>>
>> An example pom.xml file has been attached for your reference. Feel free
>> to reuse it.
>>
>>
>> Regards,
>> _________________________________
>> *Md. Rezaul Karim,* BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html
>> <http://139.59.184.114/index.html>
>>
>> On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote:
>>
>> Hello,
>> I have a simple word count example in Java and I can run this in Eclipse
>> (code at the bottom)
>>
>> I then create a jar file from it and try to run it from the cmd
>>
>>
>> java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt
>>
>> But I get this error?
>>
>> I think the main error is:
>> *Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>> find
>> data source: text*
>>
>> Any advise on how to run this jar file in spark would be appreciated
>>
>>
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2
>> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner
>> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner
>> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to:
>> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to:
>> 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(Owner);
>> groups
>> with view permissions: Set(); users  with modify permissions: Set(Owner);
>> groups with modify permissions: Set()
>> 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver'
>> on
>> port 10211.
>> 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker
>> 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster
>> 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at
>> C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc-
>> 44fd-a75e-1a0450556873
>> 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity
>> 1984.5
>> MB
>> 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator
>> 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on
>> port
>> 4040.
>> 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
>> http://192.168.19.2:4040
>> 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host
>> localhost
>> 16/12/07 15:16:45 INFO Utils: Successfully started service
>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252.
>> 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on
>> 192.168.19.2:10252
>> 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager
>> BlockManagerId(driver, 192.168.19.2, 10252)
>> 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block
>> manager
>> 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver,
>> 192.168.19.2,
>> 10252)
>> 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager
>> BlockManagerId(driver, 192.168.19.2, 10252)
>> 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some
>> configuration may not take effect.
>> 16/12/07 15:16:46 INFO SharedState: Warehouse path is
>> 'file:/C:/Users/Owner/spark-warehouse'.
>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>> find
>> data source: text. Please find packages at
>> https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(
>> DataSource.scala:148)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource.providingClass$
>> lzycompute(DataSource.scala:79)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource.providingClass(
>> DataSource.scala:79)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(
>> DataSource.scala:340)
>>         at
>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>>         at
>> org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504)
>>         at
>> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540)
>>         at
>> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513)
>>         at JavaWordCount.main(JavaWordCount.java:57)
>> Caused by: java.lang.ClassNotFoundException: text.DefaultSource
>>         at java.net.URLClassLoader.findClass(Unknown Source)
>>         at java.lang.ClassLoader.loadClass(Unknown Source)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>>         at java.lang.ClassLoader.loadClass(Unknown Source)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$
>> anonfun$apply$1.apply(DataSource.scala:132)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$
>> anonfun$apply$1.apply(DataSource.scala:132)
>>         at scala.util.Try$.apply(Try.scala:192)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(
>> DataSource.scala:132)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(
>> DataSource.scala:132)
>>         at scala.util.Try.orElse(Try.scala:84)
>>         at
>> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(
>> DataSource.scala:132)
>>         ... 8 more
>> 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook
>> 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at
>> http://192.168.19.2:4040
>> 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>> 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared
>> 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped
>> 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 16/12/07 15:16:46 INFO
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>> 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext
>> 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called
>> 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory
>> C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947-
>> ac13-d40056cf71d8
>>
>> C:\Users\Owner>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> public final class JavaWordCount {
>>   private static final Pattern SPACE = Pattern.compile(" ");
>>
>>   public static void main(String[] args) throws Exception {
>>
>>     if (args.length < 1) {
>>       System.err.println("Usage: JavaWordCount <file>");
>>       System.exit(1);
>>     }
>>
>>         //boiler plate needed to run locally
>>         SparkConf conf = new SparkConf().setAppName("Word Count
>> Application").setMaster("local[*]");
>>         JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>     SparkSession spark = SparkSession
>>                         .builder()
>>                         .appName("Word Count")
>>                         .getOrCreate()
>>                         .newSession();
>>
>>
>>     JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD();
>>
>>
>>     JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String,
>> String>() {
>>       @Override
>>       public Iterator<String> call(String s) {
>>         return Arrays.asList(SPACE.split(s)).iterator();
>>       }
>>     });
>>
>>     JavaPairRDD<String, Integer> ones = words.mapToPair(
>>       new PairFunction<String, String, Integer>() {
>>         @Override
>>         public Tuple2<String, Integer> call(String s) {
>>           return new Tuple2<>(s, 1);
>>         }
>>       });
>>
>>     JavaPairRDD<String, Integer> counts = ones.reduceByKey(
>>       new Function2<Integer, Integer, Integer>() {
>>         @Override
>>         public Integer call(Integer i1, Integer i2) {
>>           return i1 + i2;
>>         }
>>       });
>>
>>     List<Tuple2&lt;String, Integer>> output = counts.collect();
>>     for (Tuple2<?,?> tuple : output) {
>>       System.out.println(tuple._1() + ": " + tuple._2());
>>     }
>>     spark.stop();
>>   }
>> }
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then-
>> Jar-tp28182.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>>

Reply via email to