spark ec2 script doest not install necessary files to launch spark
Hello, I followed the instructions for launching Spark 1.5.1 on my AWS EC2 but the script is not installing all the folders/files required to initialize Spark. Since the log message is long, I have created a gist here: https://gist.github.com/Emaasit/696145959bbbd989bfe1 Please help. I have been going at this for more than 6 hours now to no success. - Daniel Emaasit, Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-script-doest-not-install-necessary-files-to-launch-spark-tp25311.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: inlcudePackage() deprecated?
Got it. Ignore my similar question on Github comments. On Thu, Jun 4, 2015 at 11:48 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah - We don't have support for running UDFs on DataFrames yet. There is an open issue to track this https://issues.apache.org/jira/browse/SPARK-6817 Thanks Shivaram On Thu, Jun 4, 2015 at 3:10 AM, Daniel Emaasit daniel.emaa...@gmail.com wrote: Hello Shivaram, Was the includePackage() function deprecated in SparkR 1.4.0? I don't see it in the documentation? If it was, does that mean that we can use R packages on Spark DataFrames the usual way we do for local R dataframes? Daniel -- Daniel Emaasit Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com http://www.danielemaasit.com/ -- Daniel Emaasit Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com http://www.danielemaasit.com/
Re: DataFrames coming in SparkR in Apache Spark 1.4.0
You can build Spark from the 1.4 release branch yourself: https://github.com/apache/spark/tree/branch-1.4 - Daniel Emaasit, Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrames-coming-in-SparkR-in-Apache-Spark-1-4-0-tp23116p23131.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Error: Building Spark 1.4.0 from Github-1.4 release branch
cannot find t he path specified) - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit ch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please rea d the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE xception C:\Program Files\Apache Software Foundation\spark-branch-1.4 - Daniel Emaasit, Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-Building-Spark-1-4-0-from-Github-1-4-release-branch-tp23132.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark 1.4.0 build Error on Windows
-shared-archive-resources\META-INF\NOTICE (The system cannot find t he path specified) - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit ch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please rea d the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE xception C:\Program Files\Apache Software Foundation\spark-branch-1.4 On Tue, Jun 2, 2015 at 7:17 PM, Shivaram Venkataraman shivaram.venkatara...@gmail.com wrote: No worries - Also cc'ing user@spark.apache.org might get faster responses ! Shivaram On Tue, Jun 2, 2015 at 6:05 PM, Daniel Emaasit daniel.emaa...@gmail.com wrote: Oops, My bad. I was building from the wrong Directory. On Tue, Jun 2, 2015 at 5:57 PM, Daniel Emaasit daniel.emaa...@gmail.com wrote: Hello Shivaram, While I was able to build Spark 1.3.0. I am getting errors building Spark 1.4.0. I was trying to build from the 1.4 branch from https://github.com/apache/spark/tree/branch-1.4 Here is the log file. C:\Program Files\Apache Software Foundation\spark-branch-1.4cd build C:\Program Files\Apache Software Foundation\spark-branch-1.4\buildls mvn sbt sbt-launch-lib.bash C:\Program Files\Apache Software Foundation\spark-branch-1.4\buildmvn -Psparkr -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package [INFO] Scanning for projects... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 0.469 s [INFO] Finished at: 2015-06-02T17:47:28-07:00 [INFO] Final Memory: 4M/121M [INFO] [WARNING] The requested profile sparkr could not be activated because it does not exist. [WARNING] The requested profile yarn could not be activated because it does no t exist. [WARNING] The requested profile hadoop-2.4 could not be activated because it d oes not exist. [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (C:\Program Files\Apache Software Foundation\spark-branch-1.4 \build). Please verify you invoked Maven from the correct directory. - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit ch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please rea d the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProject Exception C:\Program Files\Apache Software Foundation\spark-branch-1.4\build -- Daniel Emaasit Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com http://www.danielemaasit.com/ -- Daniel Emaasit Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com http://www.danielemaasit.com/ -- Daniel Emaasit Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com http://www.danielemaasit.com/
DataFrames coming in SparkR in Apache Spark 1.4.0
For the impatient R-user, here is a link http://people.apache.org/~pwendell/spark-nightly/spark-1.4-docs/latest/sparkr.html to get started working with DataFrames using SparkR. Or copy and paste this link into your web browser: http://people.apache.org/~pwendell/spark-nightly/spark-1.4-docs/latest/sparkr.html Happy coding, Daniel - Daniel Emaasit, Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrames-coming-in-SparkR-in-Apache-Spark-1-4-0-tp23116.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: IDE for sparkR
Rstudio is the best IDE for running sparkR. Instructions for this can be found at this link https://github.com/apache/spark/tree/branch-1.4/R . You will need to set some environment variables as described below. *Using SparkR from RStudio* If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example # Set this to where Spark is installed Sys.setenv(SPARK_HOME=/Users/shivaram/spark) # This line loads SparkR from the installed directory .libPaths(c(file.path(Sys.getenv(SPARK_HOME), R, lib), .libPaths())) library(SparkR) sc - sparkR.init(master=local) - Daniel Emaasit, Ph.D. Research Assistant Transportation Research Center (TRC) University of Nevada, Las Vegas Las Vegas, NV 89154-4015 Cell: 615-649-2489 www.danielemaasit.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/IDE-for-sparkR-tp4764p23115.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Book: Data Analysis with SparkR
Is the a book on SparkR for the absolute terrified beginner? I use R for my daily analysis and I am interested in a detailed guide to using SparkR for data analytics: like a book or online tutorials. If there's any please direct me to the address. Thanks, Daniel -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Book-Data-Analysis-with-SparkR-tp19529.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org