Sorry, accidentally sent the last email before finishing.

I had asked this question before, but wanted to ask again as I think
it is now related to my pom file or project setup. Really appreciate the help!

I have been trying on/off for the past month to try to run this MLlib
example: 
https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala

I am able to build the project successfully. When I run it, it returns:

features in spam: 8
features in ham: 7

and then freezes. According to the UI, the description of the job is
"count at DataValidators.scala.38. This corresponds to this line in
the code:

val model = lrLearner.run(trainingData)

I've tried just about everything I can think of...changed numFeatures
from 1 -> 10,000, set executor memory to 1g, set up a new cluster, at
this point I think I might have missed dependencies as that has
usually been the problem in other spark apps I have tried to run. This
is my pom file, that I have used for other successful spark apps.
Please let me know if you think I need any additional dependencies or
there are incompatibility issues, or a pom.xml that is better to use.
Thank you!

Cluster information:

Spark version: 1.2.0-SNAPSHOT (in my older cluster it is 1.2.0)
java version "1.7.0_25"
Scala version: 2.10.4
hadoop version: hadoop 2.5.0-cdh5.3.3 (older cluster was 5.3.0)



<project xmlns = "http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://w3.org/2001/XMLSchema-instance"; xsi:schemaLocation
="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd";>
        <groupId> edu.berkely</groupId>
        <artifactId> simple-project </artifactId>
        <modelVersion> 4.0.0</modelVersion>
        <name> Simple Project </name>
        <packaging> jar </packaging>
        <version> 1.0 </version>
<repositories>
        <repository>
        <id>cloudera</id>
        <url> http://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>

                <repository>
                <id>scala-tools.org</id>
                <name>Scala-tools Maven2 Repository</name>
                <url>http://scala-tools.org/repo-releases</url>
                </repository>

</repositories>

<pluginRepositories>
        <pluginRepository>
                <id>scala-tools.org</id>
                <name>Scala-tools Maven2 Repository</name>
                <url>http://scala-tools.org/repo-releases</url>
        </pluginRepository>
</pluginRepositories>

<build>
        <plugins>
                <plugin>
                        <groupId>org.scala-tools</groupId>
                        <artifactId>maven-scala-plugin</artifactId>
                        <executions>

                                <execution>
                                        <id>compile</id>
                                        <goals>
                                                <goal>compile</goal>
                                        </goals>
                                        <phase>compile</phase>
                                </execution>
                                <execution>
                                        <id>test-compile</id>
                                        <goals>
                                                <goal>testCompile</goal>
                                        </goals>
                                        <phase>test-compile</phase>
                                </execution>
                <execution>
                   <phase>process-resources</phase>
                   <goals>
                     <goal>compile</goal>
                   </goals>
                </execution>
                        </executions>
                </plugin>
                <plugin>
                        <artifactId>maven-compiler-plugin</artifactId>
                        <configuration>
                                <source>1.7</source>
                                <target>1.7</target>
                        </configuration>
                </plugin>
        </plugins>
</build>


<dependencies>
        <dependency> <!--Spark dependency -->
        <groupId> org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.2.0-cdh5.3.0</version>
        </dependency>

        <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.5.0-mr1-cdh5.3.0</version>
        </dependency>

        <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.10.4</version>
        </dependency>

        <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>2.10.4</version>
        </dependency>

        <dependency>
        <groupId>com.101tec</groupId>
        <artifactId>zkclient</artifactId>
        <version>0.3</version>
        </dependency>

         <dependency>
         <groupId>com.yammer.metrics</groupId>
         <artifactId>metrics-core</artifactId>
         <version>2.2.0</version>
         </dependency>


        <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-yarn-server-web-proxy</artifactId>
        <version>2.5.0</version>
        </dependency>

        <dependency>
        <groupId>org.apache.thrift</groupId>
        <artifactId>libthrift</artifactId>
        <version>0.9.2</version>
        </dependency>

        <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>18.0</version>
        </dependency>

         <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
        </dependency>

        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>1.2.0</version>
        </dependency>

        <dependency>
        <groupId>org.scalanlp</groupId>
        <artifactId>breeze-math_2.10</artifactId>
        <version>0.4</version>
        </dependency>

        <dependency>
        <groupId>com.googlecode.netlib-java</groupId>
        <artifactId>netlib-java</artifactId>
        <version>1.0</version>
        </dependency>

        <dependency>
        <groupId>org.jblas</groupId>
        <artifactId>jblas</artifactId>
        <version>1.2.3</version>
        </dependency>

</dependencies>

</project>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to