2015-02-06 17:28 GMT+00:00 King sami kgsam...@gmail.com:
The purpose is to build a data processing system for door events. An event
will describe a door unlocking
with a badge system. This event will differentiate unlocking by somebody
from the inside and by somebody
from the outside.
*Producing the events*:
You will need a simulator capable of producing events at random intervals.
Simulating 200 doors seems like
a good number, but adapt it as you see fit to get relevant results. Make
sure different doors have different
patterns to make the analysis interesting.
*Processing the events:*
After having accumulated a certain amount of events (for example: a day),
you will calculate statistics. To do
this, you will use spark for your batch processing. You will extract:
• most used door, less used door, door with most exits, door with most
entrances
• most and less busy moment (when people entered and exited a lot, or not
at all)
• less busy moment of the day
*Hints:*
• Spark is required: http://spark.apache.org
• Coding in Scala is required.
• Using HDFS for file storage is a plus.
2015-02-06 17:00 GMT+00:00 Nagesh sarvepalli sarvepalli.nag...@gmail.com
:
Hi,
Here is the sequence I suggest. Feel free if you need further help.
1) You need to decide if you want to go with any particular distribution
of Hadoop (Cloudera / Hortonworks / MapR) or want to go for apache version
. Downloading Hadoop from Apache and integrating with various projects is
laborious (compared to distributions). Also, you need to take care of
maintenance including version compatibility of various projects. Cloudera
Manager is the best when it comes to cluster installation and maintenance
but it is memory intensive. Cloud offerings (ex: from Microsoft) are even
much more simpler and hassle free when it comes to installation and
maintenance.
2) Depending on the server resources and the data size, you need to
decide on the HDFS cluster size (number of nodes). Ensure you have the
right JDK version installed if you are installing Hadoop on your own.
3) Once Hadoop is installed, you need to download Scala from
scala-lang.org and then
4) Download and install spark from http://spark.apache.org/downloads.html
Hope this helps to kick-start.
Thanks Regards
Nagesh
Cloudera Certified Hadoop Developer
On Fri, Feb 6, 2015 at 4:09 PM, King sami kgsam...@gmail.com wrote:
Hi,
I'm new in Spark, I'd like to install Spark with Scala. The aim is to
build a data processing system foor door events.
the first step is install spark, scala, hdfs and other required tools.
the second is build the algorithm programm in Scala which can treat a
file of my data logs (events).
Could you please help me to install the required tools: Spark, Scala,
HDF and tell me how can I execute my programm treating the entry file.
Best regards,