Re: Mahout + BigDataR Linux

2012-06-02 Thread Ted Dunning
In my experience with Zookeeper, dealing with the Debian personalities was
a significant issue.  There was also the issue that LOTs of issues come up
with the range of platforms that Debian runs on.

Just getting the dependencies in place is non-trivial.  Remember that we
depend on Hadoop.  Getting Hadoop into Debian is a non-starter.

On Sat, Jun 2, 2012 at 5:20 PM, Nicholas Kolegraff
wrote:

> I glanced over the links you sent, seems some of the overhead would be in
> earning 'merit badges' to get approved for uploading your packages to the
> debian archives.  Otherwise, we'd need to find someone from debian
> interested in sponsoring the packages.  Seems the overhead is more
> operations-y than technical.
>
> In this case, I think the repo would be a quick win.
>
> On Sat, Jun 2, 2012 at 7:35 AM, Isabel Drost  wrote:
>
> > On 01.06.2012 Ted Dunning wrote:
> > > Jumping through the hoops to get Debian to approve a Java project is a
> > lot of
> > > work for very little gain that I see.
> >
> > I don't see that much overhead (other than the need to have all
> > dependencies in
> > as well): When getting started to build dep packages at least considering
> > to
> > follow the Debian documentation and getting feedback from ppl who have
> > done that
> > before could proof benefitial.
> >
> > Another way to get started quickly would be to host the results at a
> > Ubuntu ppa.
> >
> >
> > Isabel
> >
>


Re: Mahout + BigDataR Linux

2012-06-02 Thread Nicholas Kolegraff
I glanced over the links you sent, seems some of the overhead would be in
earning 'merit badges' to get approved for uploading your packages to the
debian archives.  Otherwise, we'd need to find someone from debian
interested in sponsoring the packages.  Seems the overhead is more
operations-y than technical.

In this case, I think the repo would be a quick win.

On Sat, Jun 2, 2012 at 7:35 AM, Isabel Drost  wrote:

> On 01.06.2012 Ted Dunning wrote:
> > Jumping through the hoops to get Debian to approve a Java project is a
> lot of
> > work for very little gain that I see.
>
> I don't see that much overhead (other than the need to have all
> dependencies in
> as well): When getting started to build dep packages at least considering
> to
> follow the Debian documentation and getting feedback from ppl who have
> done that
> before could proof benefitial.
>
> Another way to get started quickly would be to host the results at a
> Ubuntu ppa.
>
>
> Isabel
>


Re: Mahout + BigDataR Linux

2012-06-02 Thread Isabel Drost
On 01.06.2012 Ted Dunning wrote:
> Jumping through the hoops to get Debian to approve a Java project is a lot of
> work for very little gain that I see.

I don't see that much overhead (other than the need to have all dependencies in 
as well): When getting started to build dep packages at least considering to 
follow the Debian documentation and getting feedback from ppl who have done 
that 
before could proof benefitial.

Another way to get started quickly would be to host the results at a Ubuntu ppa.


Isabel


signature.asc
Description: This is a digitally signed message part.


Re: Mahout + BigDataR Linux

2012-06-01 Thread Ted Dunning
I think that getting packages into Debian archives themselves is way more
effort than it is worth at this point.

Building real Debian packages that can be installed with an addition to
sources.list is a grand idea.  Jumping through the hoops to get Debian to
approve a Java project is a lot of work for very little gain that I see.

On Thu, May 31, 2012 at 3:00 PM, Isabel Drost  wrote:

> On 03.05.2012 Ted Dunning wrote:
> > As a point of strategy, wouldn't have better to just build a debian
> package
> > repository and a script for installing packages?
>
> Or go even one step further and provide real Debian packages?
>
>
> Isabel
>


Re: Mahout + BigDataR Linux

2012-06-01 Thread Isabel Drost
On Fri, Jun 1, 2012 at 7:40 AM, Nicholas Kolegraff
 wrote:
> I'm on board with this.
> This has been a common suggestion from more advanced users (and makes
> sense). I am exploring how to incorporate packages into the build process, I 
> don't
> want to commit to anything, yet, but plan to take a much deeper dive mid
> July.

Some information that might help you:

The Debian new maintainers guide:
http://www.debian.org/doc/manuals/maint-guide/index.en.html

The Debian wiki on how to package Java projects including information
on how to package maven-built software:
http://wiki.debian.org/Java/Packaging

There also is a mailing list for more discussion on problems and
questions related to packaging java projects into Debian:
http://lists.debian.org/debian-java/

One word of warning: You might run into one issue or another as Java
projects usually aren't build in a way that's particularly amenable to
turn them into distribution packages right away. However it should
help that Mahout is maven built and relies on standard libraries only.


Cheers,
Isabel


Re: Mahout + BigDataR Linux

2012-05-31 Thread Nicholas Kolegraff
I'm on board with this.
This has been a common suggestion from more advanced users (and makes
sense).
I am exploring how to incorporate packages into the build process, I don't
want to commit to anything, yet, but plan to take a much deeper dive mid
July.


On Thu, May 31, 2012 at 6:00 AM, Isabel Drost  wrote:

> On 03.05.2012 Ted Dunning wrote:
> > As a point of strategy, wouldn't have better to just build a debian
> package
> > repository and a script for installing packages?
>
> Or go even one step further and provide real Debian packages?
>
>
> Isabel
>


Re: Mahout + BigDataR Linux

2012-05-31 Thread Isabel Drost
On 03.05.2012 Ted Dunning wrote:
> As a point of strategy, wouldn't have better to just build a debian package
> repository and a script for installing packages?

Or go even one step further and provide real Debian packages?


Isabel


signature.asc
Description: This is a digitally signed message part.


Re: Mahout + BigDataR Linux

2012-05-03 Thread Nicholas Kolegraff
oh, no worries, never got that impression -- this was good feedback.

On Thu, May 3, 2012 at 9:51 PM, Ted Dunning  wrote:

> Don't take any of our suggestions as discouragement.  At most treat them
> as an excuse to reexamine your decisions.
>
> Sent from my iPhone
>
> On May 3, 2012, at 6:58 PM, Nicholas Kolegraff 
> wrote:
>
> > Agree, this could prove insane.  If that is the case, it wouldn't be
> *too*
>


Re: Mahout + BigDataR Linux

2012-05-03 Thread Ted Dunning
Don't take any of our suggestions as discouragement.  At most treat them as an 
excuse to reexamine your decisions. 

Sent from my iPhone

On May 3, 2012, at 6:58 PM, Nicholas Kolegraff  wrote:

> Agree, this could prove insane.  If that is the case, it wouldn't be *too*


Re: Re: Mahout + BigDataR Linux

2012-05-03 Thread Nicholas Kolegraff
On Thu, May 3, 2012 at 10:15 AM, Ted Dunning  wrote:

> On Thu, May 3, 2012 at 10:06 AM, Nicholas Kolegraff <
> nickkolegr...@gmail.com
> > wrote:
>
> > ... I have this crazy notion that nothing should ever be installed and
> > bootstrapping is really annoying.
> >


Disclaimer: I'm not one of these "install everything from source and you
should be the package manager of your system" people ... that is silly,
good luck building a model around that.
For the record: I actually like Yum and apt-get.


> This opinion is more and more in the minority.  Yum and apt have made this
> much less painful.  And saving an AMI after doing that is nearly painless
> (for EBS based boots).
>
> I don't know of *any* major place that works without packages.  Many places
> build images for fast installation, but the entire mindset is around
> packages.
>

Regarding these comments: Valid.
Fast installation and extreme stability are very key, over versatility.
It seemed the image was the best route for these key objectives.
For the short term I'm more concerned the examples I build against the
system are *totally* stable and work well, although this doesn't further
justify distro over package repo.

However, I can still see the same comeback.
So? again, why not just use packages
Meh, perhaps I provide both.

> I felt it was easier to just launch an AMI.yet again, why not just
> > repackage another image.
> >
>
> Indeed.  Why not provide packages and an AMI.  Remember that if you want to
> provide an AMI, you pretty much have to make 12 of them.
>
>
>
> > The automated build nature of what I do requires me to repackage some
> lower
> > level libraries so they can link easily and stably (is that a word?)
> across
> > multiple packages.
> >
>
> So?
>
> If you have this need, then others will as well.  They will find a complete
> distro unusable.
>
>
> > I also have some longer term objectives that will require me to have
> > complete control over the kernel and packaging.
> > It just seemed easier to start my own thing for this.
> >
>
> I suppose it depends on your goals.  Announcing this publicly implies that
> you are interested in having others use it.


Not so much 'use' but rather 'try'.

But building a distro for your
> private needs that conflict with other peoples' private needs says just the
> opposite.


I think this makes some assumptions around needs.  That being said, I owe
some usage examples. noted.
Although, I think I see where you are coming from here.  The distro
supports a longer term objective, but, that is rather irrelevant since the
short-term needs of others are probably focused on packages, thus, creating
this usability paradox [y/n]?

Agree, this could prove insane.  If that is the case, it wouldn't be *too*
> > hard for me to convert this to some package repo
> >
>
> Probably not insane.  Probably just isn't entirely consistent in action and
> intent.
>


RE: Re: Re: Mahout + BigDataR Linux

2012-05-03 Thread Darren Govoni

Exactly.

Start with something like an Ubuntu LTS release. Work your magic on it. Then release the AMI publicly for others to use. 


Mahout et. al. was born to run in a cloud environment anyway.

--- Original Message ---
On 5/3/2012  12:06 PM Nicholas Kolegraff wrote:Assumed this question was 
coming.
I had given this a lot of thought.
I have this crazy notion that nothing should ever be installed and
bootstrapping is really annoying.
I felt it was easier to just launch an AMI.yet again, why not just
repackage another image.

The automated build nature of what I do requires me to repackage some lower
level libraries so they can link easily and stably (is that a word?) across
multiple packages.
I also have some longer term objectives that will require me to have
complete control over the kernel and packaging.
It just seemed easier to start my own thing for this.

Agree, this could prove insane.  If that is the case, it wouldn't be *too*
hard for me to convert this to some package repo

On Thu, May 3, 2012 at 9:53 AM, Darren Govoni  wrote:

> A distro would be good and if it was made into an Amazon Machine Image so
> we can spin it up and use it _without_ having to install it, that's a good
> thing too.
>
> So the best approach is always both!
>
> --- Original Message ---
> On 5/3/2012  11:42 AM Dan Brickley wrote:On 3 May 2012 18:34, Ted
> Dunning  wrote:
> > Thanks for including Mahout.
> >
> > As a point of strategy, wouldn't have better to just build a debian
> package
> > repository and a script for installing packages?  That would allow
> people
> > to use their own debian or ubuntu based distros for their own
> special needs
> > such as hardware virtualization or special kernel modules and still
> get the
> > benefits that you are offering.
> >
> > Otherwise, you are sentencing yourself to a life of hard labor
> keeping up
> > with kernel updates and such.
> 
> I was about to ask the same question... why a whole distro? Unless the
> whole thing is a highly-tuned and unusual setup, some install scripts
> are often good enough.
> 
> Dan
> 
> 
>



Re: Re: Mahout + BigDataR Linux

2012-05-03 Thread Ted Dunning
On Thu, May 3, 2012 at 10:06 AM, Nicholas Kolegraff  wrote:

> ... I have this crazy notion that nothing should ever be installed and
> bootstrapping is really annoying.
>

This opinion is more and more in the minority.  Yum and apt have made this
much less painful.  And saving an AMI after doing that is nearly painless
(for EBS based boots).

I don't know of *any* major place that works without packages.  Many places
build images for fast installation, but the entire mindset is around
packages.


> I felt it was easier to just launch an AMI.yet again, why not just
> repackage another image.
>

Indeed.  Why not provide packages and an AMI.  Remember that if you want to
provide an AMI, you pretty much have to make 12 of them.



> The automated build nature of what I do requires me to repackage some lower
> level libraries so they can link easily and stably (is that a word?) across
> multiple packages.
>

So?

If you have this need, then others will as well.  They will find a complete
distro unusable.


> I also have some longer term objectives that will require me to have
> complete control over the kernel and packaging.
> It just seemed easier to start my own thing for this.
>

I suppose it depends on your goals.  Announcing this publicly implies that
you are interested in having others use it.  But building a distro for your
private needs that conflict with other peoples' private needs says just the
opposite.

Agree, this could prove insane.  If that is the case, it wouldn't be *too*
> hard for me to convert this to some package repo
>

Probably not insane.  Probably just isn't entirely consistent in action and
intent.


Re: Re: Mahout + BigDataR Linux

2012-05-03 Thread Sean Owen
A machine image is not the only deployment model to be sure, but is the
kind of deployment model you need if you're offering something as a cloud
service. I'm a big fan of the AWS Marketplace which of course is based on
this kind of model. (I'm also about to make the stand-alone Myrrix server
available this way as a test of this model.)

On Thu, May 3, 2012 at 6:06 PM, Nicholas Kolegraff
wrote:

> Assumed this question was coming.
> I had given this a lot of thought.
> I have this crazy notion that nothing should ever be installed and
> bootstrapping is really annoying.
> I felt it was easier to just launch an AMI.yet again, why not just
> repackage another image.
>
> The automated build nature of what I do requires me to repackage some lower
> level libraries so they can link easily and stably (is that a word?) across
> multiple packages.
> I also have some longer term objectives that will require me to have
> complete control over the kernel and packaging.
> It just seemed easier to start my own thing for this.
>
> Agree, this could prove insane.  If that is the case, it wouldn't be *too*
> hard for me to convert this to some package repo
>
>


Re: Re: Mahout + BigDataR Linux

2012-05-03 Thread Nicholas Kolegraff
Assumed this question was coming.
I had given this a lot of thought.
I have this crazy notion that nothing should ever be installed and
bootstrapping is really annoying.
I felt it was easier to just launch an AMI.yet again, why not just
repackage another image.

The automated build nature of what I do requires me to repackage some lower
level libraries so they can link easily and stably (is that a word?) across
multiple packages.
I also have some longer term objectives that will require me to have
complete control over the kernel and packaging.
It just seemed easier to start my own thing for this.

Agree, this could prove insane.  If that is the case, it wouldn't be *too*
hard for me to convert this to some package repo

On Thu, May 3, 2012 at 9:53 AM, Darren Govoni  wrote:

> A distro would be good and if it was made into an Amazon Machine Image so
> we can spin it up and use it _without_ having to install it, that's a good
> thing too.
>
> So the best approach is always both!
>
> --- Original Message ---
> On 5/3/2012  11:42 AM Dan Brickley wrote:On 3 May 2012 18:34, Ted
> Dunning  wrote:
> > Thanks for including Mahout.
> >
> > As a point of strategy, wouldn't have better to just build a debian
> package
> > repository and a script for installing packages?  That would allow
> people
> > to use their own debian or ubuntu based distros for their own
> special needs
> > such as hardware virtualization or special kernel modules and still
> get the
> > benefits that you are offering.
> >
> > Otherwise, you are sentencing yourself to a life of hard labor
> keeping up
> > with kernel updates and such.
> 
> I was about to ask the same question... why a whole distro? Unless the
> whole thing is a highly-tuned and unusual setup, some install scripts
> are often good enough.
> 
> Dan
> 
> 
>


RE: Re: Mahout + BigDataR Linux

2012-05-03 Thread Darren Govoni

A distro would be good and if it was made into an Amazon Machine Image so we 
can spin it up and use it _without_ having to install it, that's a good thing 
too.

So the best approach is always both!

--- Original Message ---
On 5/3/2012  11:42 AM Dan Brickley wrote:On 3 May 2012 18:34, Ted Dunning 
 wrote:
> Thanks for including Mahout.
>
> As a point of strategy, wouldn't have better to just build a debian 
package
> repository and a script for installing packages?  That would allow people
> to use their own debian or ubuntu based distros for their own special 
needs
> such as hardware virtualization or special kernel modules and still get 
the
> benefits that you are offering.
>
> Otherwise, you are sentencing yourself to a life of hard labor keeping up
> with kernel updates and such.

I was about to ask the same question... why a whole distro? Unless the
whole thing is a highly-tuned and unusual setup, some install scripts
are often good enough.

Dan




Re: Mahout + BigDataR Linux

2012-05-03 Thread Dan Brickley
On 3 May 2012 18:34, Ted Dunning  wrote:
> Thanks for including Mahout.
>
> As a point of strategy, wouldn't have better to just build a debian package
> repository and a script for installing packages?  That would allow people
> to use their own debian or ubuntu based distros for their own special needs
> such as hardware virtualization or special kernel modules and still get the
> benefits that you are offering.
>
> Otherwise, you are sentencing yourself to a life of hard labor keeping up
> with kernel updates and such.

I was about to ask the same question... why a whole distro? Unless the
whole thing is a highly-tuned and unusual setup, some install scripts
are often good enough.

Dan


Re: Mahout + BigDataR Linux

2012-05-03 Thread Nicholas Kolegraff
:)
Thanks, where do you see this ... I'm blind to this kind of thing. (obv.)

On Thu, May 3, 2012 at 9:39 AM, Ted Dunning  wrote:

> Yes.  It is impossible for me to correctly spell when correcting somebody
> else's spelling.
>
> I think that this follows from the general karmic principle.
>
> On Thu, May 3, 2012 at 9:36 AM, Sean Owen  wrote:
>
> > *V*owpal Wabbit ? :)
> >
> > On Thu, May 3, 2012 at 5:32 PM, Ted Dunning 
> wrote:
> >
> > > Gently here:
> > >
> > > You misspelled woWpal wabbit.
> > >
> > >
> >
>


Re: Mahout + BigDataR Linux

2012-05-03 Thread Ted Dunning
Yes.  It is impossible for me to correctly spell when correcting somebody
else's spelling.

I think that this follows from the general karmic principle.

On Thu, May 3, 2012 at 9:36 AM, Sean Owen  wrote:

> *V*owpal Wabbit ? :)
>
> On Thu, May 3, 2012 at 5:32 PM, Ted Dunning  wrote:
>
> > Gently here:
> >
> > You misspelled woWpal wabbit.
> >
> >
>


Re: Mahout + BigDataR Linux

2012-05-03 Thread Sean Owen
*V*owpal Wabbit ? :)

On Thu, May 3, 2012 at 5:32 PM, Ted Dunning  wrote:

> Gently here:
>
> You misspelled woWpal wabbit.
>
>


Re: Mahout + BigDataR Linux

2012-05-03 Thread Ted Dunning
Thanks for including Mahout.

As a point of strategy, wouldn't have better to just build a debian package
repository and a script for installing packages?  That would allow people
to use their own debian or ubuntu based distros for their own special needs
such as hardware virtualization or special kernel modules and still get the
benefits that you are offering.

Otherwise, you are sentencing yourself to a life of hard labor keeping up
with kernel updates and such.

On Thu, May 3, 2012 at 7:06 AM, Nicholas Kolegraff
wrote:

> I'm working on a Linux Distro with a focus around Machine Learning and have
> included Mahout!  www.bigdatarlinux.com
> I will be giving a demo of BigDataR Linux at this workshop
> http://graphlab.org/workshop2012/
>


Re: Mahout + BigDataR Linux

2012-05-03 Thread Ted Dunning
Gently here:

You misspelled woWpal wabbit.

I look forward to seeing you at the graphlab workshop and hearing more
about this.

On Thu, May 3, 2012 at 7:06 AM, Nicholas Kolegraff
wrote:

> Hi Everyone,
> I'm working on a Linux Distro with a focus around Machine Learning and have
> included Mahout!  www.bigdatarlinux.com
> I will be giving a demo of BigDataR Linux at this workshop
> http://graphlab.org/workshop2012/
>
> I've also started a project that surrounds BigDataR with some compelling
> examples, The idea here is to provide stable consistent examples (or at
> least that is the thought)
> https://github.com/koooee/BigDataR_Examples
>
> If anyone is interested in building some compelling Mahout examples against
> BigDataR feel free to reach out, would love to chat.
> (I plan to reformat a select few from the website and make sure they are
> stable with BigDataR, but am open to other thoughts/ideas)
>
> Cheers,
> Nick
> n...@bigdatarlinux.com
>
> PS: This is very much dev/gamma at the moment so be gentle :-)
>