Re: Jr. to Mid Level Big Data jobs in Bay Area

2015-05-17 Thread Juan Suero
Hes a human asking for human advice.. its ok methinks.
we should live in a more tolerant world.
Thanks.

On Sun, May 17, 2015 at 8:10 PM, Stephen Boesch java...@gmail.com wrote:

 Hi,  This is not a job board. Thanks.

 2015-05-17 16:00 GMT-07:00 Adam Pritchard apritchard...@gmail.com:

 Hi everyone,

 I was wondering if any of you know any openings looking to hire a big
 data dev in the Palo Alto area.

 Main thing I am looking for is to be on a team that will embrace having a
 Jr to Mid level big data developer, where I can grow my skill set and
 contribute.


 My skills are:

 3 years Java
 1.5 years Hadoop
 1.5 years Hbase
 1 year map reduce
 1 year Apache Storm
 1 year Apache Spark (did a Spark Streaming project in Scala)

 5 years PHP
 3 years iOS development
 4 years Amazon ec2 experience


 Currently I am working in San Francisco as a big data developer, but the
 team I'm on is content leaving me work that I already knew how to do when I
 came to the team (web services) and I want to work with big data
 technologies at least 70% of the time.


 I am not a senior big data dev, but I am motivated to be and am just
 looking for an opportunity where I can work all day or most of the day with
 big data technologies, and contribute and learn from the project at hand.


 Thanks if anyone can share any information,


 Adam






Re: Best practice for EC2 deployment

2013-10-26 Thread Juan Suero
Everything fails
On Saturday, October 26, 2013, Trev Smith wrote:

 Hi all,

 Could anyone direct me to a resource (or perhaps give me their own
 thoughts) on best practice for deploying a robust, resilient Hadoop
 (specifically CDH in this case) cluster to AWS? The data is important to us
 and we expect to store it for a long time. We want to make sure we are not
 impacted by outages in single availability zones and we want to implement a
 sensible backup/disaster recovery plan.

 We are using HBase in addition to HDFS but not Hive, at present.

 Your thoughts much appreciated.

 Regards,
 Trevor Smith



Re: Project ideas

2013-05-21 Thread Juan Suero
im a newbie but maybe this will also add some value...
it is my understanding that mapreduce is like a distributed group by
statement

when you run a statement like this against your petabyes of dataset it can
take a long time.. first and foremost because the first thing you have to
do before you apply the group by logic is to read the data off disk.

if your disk reads at 100/MBs then you can do the math.
The time frame that this query will run take at least this long to complete.

If you need this info really fast like in the next hour to support i dunno
personalization features on a ecommerce site or month end report that needs
to be complete in 2 hours.
Then it would be nice to put equal parts of your data on 100s of disks and
run the same algorithm in parralel

but thats just if your bottleneck is disk.
what if your dataset is relatively small but calculations done on each
element coming in is large
so therefore your bottleneck there is CPU power

there are alot of bottlenecks you could run into.
number of threads
memory
latency of remote apis or remote database you hit as you analyze the data

Theres a book called programming collective intelligence from oreilly that
should help you out too
http://shop.oreilly.com/product/9780596529321.do



On Tue, May 21, 2013 at 11:02 PM, Sai Sai saigr...@yahoo.in wrote:

 Excellent Sanjay, really excellent input. Many Thanks for this input.
 I have been always thinking about some ideas but never knowing what to
 proceed with.
 Thanks again.
 Sai

   --
  *From:* Sanjay Subramanian sanjay.subraman...@wizecommerce.com
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Tuesday, 21 May 2013 11:51 PM
 *Subject:* Re: Project ideas

  +1

  My $0.02 is look look around and see problems u can solve…Its better to
 get a list of problems and see if u can model a solution using map-reduce
 framework

  An example is as follows

  PROBLEM
 Build a Cars Pricing Model based on advertisements on Craigs list

  OBJECTIVE
 Recommend a price to the Craigslist car seller when the user gives info
 about make,model,color,miles

  DATA required
 Collect RSS feeds daily from Craigs List (don't pound their website , else
 they will lock u down)

  DESIGN COMPONENTS
 - Daily RSS Collector - pulls data and puts into HDFS
 - Data Loader - Structures the columns u need to analyze and puts into HDFS
 - Hive Aggregator and analyzer - studies and queries data and brings out
 recommendation models for car pricing
 - REST Web service to return query results in XML/JSON
 - iPhone App that talks to web service and gets info

  There u go…this should keep a couple of students busy for 3 months

  I find this kind of problem statement and solutions simpler to
 understand because its all there in the real world !

  An example of my way of thinking led to me founding this non profit
 called www.medicalsidefx.org that gives users valuable metrics regarding
 medical side fx.
 It uses Hadoop to aggregate , Lucene to search….This year I am redesigning
 the core to use Hive :-)

  Good luck

  Sanjay





   From: Michael Segel michael_se...@hotmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Tuesday, May 21, 2013 6:46 AM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: Re: Project ideas

  Drink heavily?

  Sorry.

  Let me rephrase.

  Part of the exercise is for you, the student to come up with the idea.
 Not solicit someone else for a suggestion.  This is how you learn.

  The exercise is to get you to think about the following:

  1) What is Hadoop
 2) How does it work
 3) Why would you want to use it

  You need to understand #1 and #2 to be able to #3.

  But at the same time... you need to also incorporate your own view of
 the world.
 What are your hobbies? What do you like to do?
 What scares you the most?  What excites you the most?
 Why are you here?
 And most importantly, what do you think you can do within the time period.
 (What data can you easily capture and work with...)

  Have you ever seen 'Eden of the East' ? ;-)

  HTH


  On May 21, 2013, at 8:35 AM, Anshuman Mathur ans...@gmail.com wrote:

  Hello fellow users,
 We are a group of students studying in National University of Singapore.
 As part of our course curriculum we need to develop an application using
 Hadoop and  map-reduce. Can you please suggest some innovative ideas for
 our project?
 Thanks in advance.
 Anshuman



 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this