What versions of Hadoop Spark supports?

2014-10-04 Thread tomo cocoa
Hi,

I reported this issue (https://issues.apache.org/jira/browse/SPARK-3794)
about compilation error of spark core.
This error depends on a Hadoop version, and problematic versions are
1.1.1--2.2.0.

At first, we should argue about what versions of Hadoop Spark supports.
If we decide to omit a support for those versions, things are so simple and
no modification is needed.
Otherwise, we should be careful to use only commons-io 2.1.


Regards,
cocoatomo

-- 
class Cocoatomo:
name = 'cocoatomo'
email_address = 'cocoatom...@gmail.com'
twitter_id = '@cocoatomo'


Re: EC2 clusters ready in launch time + 30 seconds

2014-10-04 Thread Nicholas Chammas
Thanks for posting that script, Patrick. It looks like a good place to
start.

Regarding Docker vs. Packer, as I understand it you can use Packer to
create Docker containers at the same time as AMIs and other image types.

Nick


On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell  wrote:

> Hey All,
>
> Just a couple notes. I recently posted a shell script for creating the
> AMI's from a clean Amazon Linux AMI.
>
> https://github.com/mesos/spark-ec2/blob/v3/create_image.sh
>
> I think I will update the AMI's soon to get the most recent security
> updates. For spark-ec2's purpose this is probably sufficient (we'll
> only need to re-create them every few months).
>
> However, it would be cool if someone wanted to tackle providing a more
> general mechanism for defining Spark-friendly "images" that can be
> used more generally. I had thought that docker might be a good way to
> go for something like this - but maybe this packer thing is good too.
>
> For one thing, if we had a standard image we could use it to create
> containers for running Spark's unit test, which would be really cool.
> This would help a lot with random issues around port and filesystem
> contention we have for unit tests.
>
> I'm not sure if the long term place for this would be inside the spark
> codebase or a community library or what. But it would definitely be
> very valuable to have if someone wanted to take it on.
>
> - Patrick
>
> On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas
>  wrote:
> > FYI: There is an existing issue -- SPARK-3314
> >  -- about scripting
> the
> > creation of Spark AMIs.
> >
> > With Packer, it looks like we may be able to script the creation of
> > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a
> > single Packer template. That's very cool.
> >
> > I'll be looking into this.
> >
> > Nick
> >
> >
> > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com
> >> wrote:
> >
> >> Thanks for the update, Nate. I'm looking forward to seeing how these
> >> projects turn out.
> >>
> >> David, Packer looks very, very interesting. I'm gonna look into it more
> >> next week.
> >>
> >> Nick
> >>
> >>
> >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico  wrote:
> >>
> >>> Bit of progress on our end, bit of lagging as well.  Our guy leading
> >>> effort got little bogged down on client project to update hive/sql
> testbed
> >>> to latest spark/sparkSQL, also launching public service so we have
> been bit
> >>> scattered recently.
> >>>
> >>> Will have some more updates probably after next week.  We are planning
> on
> >>> taking our client work around hive/spark, plus taking over the bigtop
> >>> automation work to modernize and get that fit for human consumption
> outside
> >>> or org.  All our work and puppet modules will be open sourced,
> documented,
> >>> hopefully start to rally some other folks around effort that find it
> useful
> >>>
> >>> Side note, another effort we are looking into is gradle tests/support.
> >>> We have been leveraging serverspec for some basic infrastructure
> tests, but
> >>> with bigtop switching over to gradle builds/testing setup in 0.8 we
> want to
> >>> include support for that in our own efforts, probably some stuff that
> can
> >>> be learned and leveraged in spark world for repeatable/tested
> infrastructure
> >>>
> >>> If anyone has any specific automation questions to your environment you
> >>> can drop me a line directly.., will try to help out best I can.  Else
> will
> >>> post update to dev list once we get on top of our own product release
> and
> >>> the bigtop work
> >>>
> >>> Nate
> >>>
> >>>
> >>> -Original Message-
> >>> From: David Rowe [mailto:davidr...@gmail.com]
> >>> Sent: Thursday, October 02, 2014 4:44 PM
> >>> To: Nicholas Chammas
> >>> Cc: dev; Shivaram Venkataraman
> >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds
> >>>
> >>> I think this is exactly what packer is for. See e.g.
> >>> http://www.packer.io/intro/getting-started/build-image.html
> >>>
> >>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*)
> has
> >>> a bad package for httpd, whcih causes ganglia not to start. For some
> reason
> >>> I can't get access to the raw AMI to fix it.
> >>>
> >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas <
> >>> nicholas.cham...@gmail.com
> >>> > wrote:
> >>>
> >>> > Is there perhaps a way to define an AMI programmatically? Like, a
> >>> > collection of base AMI id + list of required stuff to be installed +
> >>> > list of required configuration changes. I'm guessing that's what
> >>> > people use things like Puppet, Ansible, or maybe also AWS
> >>> CloudFormation for, right?
> >>> >
> >>> > If we could do something like that, then with every new release of
> >>> > Spark we could quickly and easily create new AMIs that have
> everything
> >>> we need.
> >>> > spark-ec2 would only have to bring up the instances and do a minimal
> >>> >