Re: Beginner in Spark
You may also go through these posts https://docs.sigmoidanalytics.com/index.php/Spark_Installation Thanks Best Regards On Fri, Feb 6, 2015 at 9:39 PM, King sami kgsam...@gmail.com wrote: Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Re: Beginner in Spark
Refer this blog http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/ for step by step installation of Spark on Ubuntu On 7 February 2015 at 03:12, Matei Zaharia matei.zaha...@gmail.com wrote: You don't need HDFS or virtual machines to run Spark. You can just download it, unzip it and run it on your laptop. See http://spark.apache.org/docs/latest/index.html. Matei On Feb 6, 2015, at 2:58 PM, David Fallside falls...@us.ibm.com wrote: King, consider trying the Spark Kernel ( https://github.com/ibm-et/spark-kernel) which will install Spark etc and provide you with a Spark/Scala Notebook in which you can develop your algorithm. The Vagrant installation described in https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment will have you quickly up and running on a single machine without having to manage the details of the system installations. There is a Docker version, https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel, if you prefer Docker. Regards, David King sami kgsam...@gmail.com wrote on 02/06/2015 08:09:39 AM: From: King sami kgsam...@gmail.com To: user@spark.apache.org Date: 02/06/2015 08:11 AM Subject: Beginner in Spark Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Re: Beginner in Spark
2015-02-06 17:28 GMT+00:00 King sami kgsam...@gmail.com: The purpose is to build a data processing system for door events. An event will describe a door unlocking with a badge system. This event will differentiate unlocking by somebody from the inside and by somebody from the outside. *Producing the events*: You will need a simulator capable of producing events at random intervals. Simulating 200 doors seems like a good number, but adapt it as you see fit to get relevant results. Make sure different doors have different patterns to make the analysis interesting. *Processing the events:* After having accumulated a certain amount of events (for example: a day), you will calculate statistics. To do this, you will use spark for your batch processing. You will extract: • most used door, less used door, door with most exits, door with most entrances • most and less busy moment (when people entered and exited a lot, or not at all) • less busy moment of the day *Hints:* • Spark is required: http://spark.apache.org • Coding in Scala is required. • Using HDFS for file storage is a plus. 2015-02-06 17:00 GMT+00:00 Nagesh sarvepalli sarvepalli.nag...@gmail.com : Hi, Here is the sequence I suggest. Feel free if you need further help. 1) You need to decide if you want to go with any particular distribution of Hadoop (Cloudera / Hortonworks / MapR) or want to go for apache version . Downloading Hadoop from Apache and integrating with various projects is laborious (compared to distributions). Also, you need to take care of maintenance including version compatibility of various projects. Cloudera Manager is the best when it comes to cluster installation and maintenance but it is memory intensive. Cloud offerings (ex: from Microsoft) are even much more simpler and hassle free when it comes to installation and maintenance. 2) Depending on the server resources and the data size, you need to decide on the HDFS cluster size (number of nodes). Ensure you have the right JDK version installed if you are installing Hadoop on your own. 3) Once Hadoop is installed, you need to download Scala from scala-lang.org and then 4) Download and install spark from http://spark.apache.org/downloads.html Hope this helps to kick-start. Thanks Regards Nagesh Cloudera Certified Hadoop Developer On Fri, Feb 6, 2015 at 4:09 PM, King sami kgsam...@gmail.com wrote: Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Re: Beginner in Spark
You don't need HDFS or virtual machines to run Spark. You can just download it, unzip it and run it on your laptop. See http://spark.apache.org/docs/latest/index.html http://spark.apache.org/docs/latest/index.html. Matei On Feb 6, 2015, at 2:58 PM, David Fallside falls...@us.ibm.com wrote: King, consider trying the Spark Kernel (https://github.com/ibm-et/spark-kernel https://github.com/ibm-et/spark-kernel) which will install Spark etc and provide you with a Spark/Scala Notebook in which you can develop your algorithm. The Vagrant installation described in https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment will have you quickly up and running on a single machine without having to manage the details of the system installations. There is a Docker version, https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel, if you prefer Docker. Regards, David King sami kgsam...@gmail.com wrote on 02/06/2015 08:09:39 AM: From: King sami kgsam...@gmail.com To: user@spark.apache.org Date: 02/06/2015 08:11 AM Subject: Beginner in Spark Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Re: Beginner in Spark
King, consider trying the Spark Kernel ( https://github.com/ibm-et/spark-kernel) which will install Spark etc and provide you with a Spark/Scala Notebook in which you can develop your algorithm. The Vagrant installation described in https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment will have you quickly up and running on a single machine without having to manage the details of the system installations. There is a Docker version, https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel , if you prefer Docker. Regards, David King sami kgsam...@gmail.com wrote on 02/06/2015 08:09:39 AM: From: King sami kgsam...@gmail.com To: user@spark.apache.org Date: 02/06/2015 08:11 AM Subject: Beginner in Spark Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Beginner in Spark
Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,