Jake, thank you for taking the time to document the level of staleness we
have in the documentation! We have known this was an issue for quite some
time and we have been focused so much on cleaning up the build itself we
have neglected this.

Maybe we can get you going with some bare-bones repro steps to verify
things are working; for now could you try the spark-shell commands I posted
yesterday?

On Fri, Sep 25, 2020 at 11:19 PM Jake Mannix <jake.man...@gmail.com> wrote:

> Howdy all.  Andrew asked me to take a little time and verify the release so
>
> we could get another PMC +1, so I tried to dust off my Mahout skills and
>
> help out... but frankly, the "getting started" from a binary distribution
>
> docs are pretty hard for _me_ to follow.
>
>
>
> I start on the main page and navigate to:
>
> https://mahout.apache.org/general/downloads to read what to do with the
>
> 14.1 tarball y'all linked above.
>
>
>
> Ok, so I'm supposed to set env vars for MAHOUT_HOME and MAHOUT_LOCAL if I'm
>
> running on my laptop, sounds good:
>
>
>
> bash-3.2$ mkdir -p mahout/mahout-14.1 && mv
>
> Downloads/apache-mahout-distribution-14.1.tar.bz2 mahout/mahout-14.1 && cd
>
> mahout/mahout-14.1 && bunzip2 *.bz2 && tar xvf *.tar
>
>
>
>
>
> Not sure what where to go from here, as the Downloads page tells me
>
> nothing about how to drop into the mahout shell.  So I follow the next top
>
> link to Overview, and read
>
>
>
> "You’ve probably already noticed Mahout has a lot of things going on at
>
> different levels, and it can be hard to know where to start."
>
>
>
> Indeed, not sure where to start, hopefully this page will tell me!  Hmm.
>
> The page talks a bit about abstractions, generic application code,
>
> something about the DSL, some DAG stuff and optimizations, engine bindings
>
> and... native solvers?   I thought I was getting started?
>
>
>
> Ok, maybe this page is named weird.  How about the Tutorial, that sounds
>
> promising!  So many tutorials to choose from: "Recommenders: CCO with
>
> Last.fm", ok I vaguely imagine "CCO" is probably some kind of co-occurrence
>
> thing, and Last.fm was a music site from before when some members of my
>
> current team were born.  Let's start with something more straightforward,
>
> oooh "Text Classification(shell)", that's better, onward to
>
>
> https://mahout.apache.org/docs/latest/tutorials/samsara/classify-a-doc-from-the-shell.html
>
> (interestingly, the <title> tag on this page is called "Perceptron and
>
> Winnow", ok... I'll just let that slide on by, but )
>
>
>
> Yay, a Prerequisites section!  It links to
>
> http://mahout.apache.org/users/sparkbindings/play-with-shell.html (why
>
> wasn't this already in my quickstart browsing, did I miss it?).  Right at
>
> the top, I'm warned:
>
>
>
> "This tutorial will show you how to play with Mahout’s scala DSL for linear
>
> algebra and its Spark shell. Please keep in mind that this code is still in
>
> a very early experimental stage.
>
> *(Edited for 0.10.2)*"
>
>
>
> I'm testing build 14.1.  Seems this may no longer be "in a very
>
> experimental stage" (hopefully)?
>
>
>
> Keeping that in mind, I forge onward, to learn about classifying cereals.
>
> A section on "Installing Mahout and Spark", excellent, I hope this is up to
>
> date! Step 1:
>
>
>
> "
>
>
>
>    1. Download Apache Spark 1.6.2
>
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz>
> and
>
>    unpack the archive file
>
>
>
> "
>
>
>
> Hmm... Spark 1.6.2, you say?  I'm gonna hope my new and shiny Spark 3.0.1
>
> download doesn't break anything?
>
>
>
> "
>
>
>
>    1. Change to the directory where you unpacked Spark and type sbt/sbt
>
>    assembly to build it
>
>
>
> "
>
>
>
> Type "sbt/sbt assembly" to build ... it?  To build Spark?  Both the
>
> prehistoric 1.6.2 link, and the 3.0.1 bits I've got are a *binary*
>
> distribution of spark.  I don't think I need to build it.  If I'm wrong
>
> here, happy to hear what I'm supposed to do, but there's certainly no
>
> build.sbt hanging around any of these binary distros.  I'll imagine I'm ok
>
> with this binary distro and see where things go from here.
>
>
>
> "
>
>
>
>    1. Create a directory for Mahout somewhere on your machine, change to
>
>    there and checkout the master branch of Apache Mahout from GitHub git
>
>    clone https://github.com/apache/mahout mahout
>
>
>
> "
>
>
>
> Hmm, I have build from source?  Scanning down the page, no, apparently no
>
> instructions on what to do with my tarball here.
>
>
>
> So I suppose I could try to build from source and go from there, but... not
>
> sure how, as a "so old I'm new again" person can even get started with the
>
> binary distribution to try out a simple text classification in the mahout
>
> shell, and verify that said binary distro is worthy of release.
>
>
>
> :\
>
>
>
>   -jake
>
>
>
>
>
>
>
> On Fri, Sep 25, 2020 at 3:15 PM Andrew Musselman <
> andrew.mussel...@gmail.com>
>
> wrote:
>
>
>
> > I am +1 (binding) on this release; I have:
>
> > (1) Downloaded both binary and source releases
>
> > (2) Verified checksums and signatures
>
> > (3) Built from source
>
> > (4) Run the spark-shell, and loaded and run the sparse DRM multiplier
>
> > function for both distros:
>
> >
>
> > Note that in the source build the shell script is
>
> > `./distribution/target/apache-mahout-14.1/bin/mahout
>
> > spark-shell`.
>
> >
>
> > scala> :load /home/akm/a/src/test/
>
> >
>
> >
> repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/apache-mahout-14.1/distribution/target/apache-mahout-14.1/examples/bin/SparseSparseDrmTimer.mscala
>
> > Loading /home/akm/a/src/test/
>
> >
>
> >
> repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/apache-mahout-14.1/distribution/target/apache-mahout-14.1/examples/bin/SparseSparseDrmTimer.mscala
>
> > .
>
> > ..
>
> > timeSparseDRMMMul: (m: Int, n: Int, s: Int, para: Int, pctDense: Double,
>
> > seed: Long)Long
>
> >
>
> > scala> timeSparseDRMMMul(1000,1000,1000,1,.02,1234L)
>
> > res6: Long = 2205
>
> >
>
> > So far that is two binding +1 votes and two non-binding +1 votes, with no
>
> > -1 votes. Still looking for another PMC vote.
>
> >
>
> > On Thu, Sep 24, 2020 at 6:45 PM Trevor Grant <trevor.d.gr...@gmail.com>
>
> > wrote:
>
> >
>
> > > Sorry for the delay on my vote.
>
> > >
>
> > > Sigs checked out- source built without issues and passed all tests.
>
> > > Additional tests as described on RC6 candidate.
>
> > >
>
> > > I'm +1 binding.
>
> > >
>
> > > Thanks again Andrew!!
>
> > >
>
> > > tg
>
> > >
>
> > >
>
> > > On Wed, Sep 16, 2020 at 12:56 PM Trevor Grant <
> trevor.d.gr...@gmail.com>
>
> > > wrote:
>
> > >
>
> > > > Away from my computer on vacation- if it's still up next week I can
>
> > vote.
>
> > > >
>
> > > > On Wed, Sep 16, 2020, 11:27 AM Christofer Dutz <
>
> > > christofer.d...@c-ware.de>
>
> > > > wrote:
>
> > > >
>
> > > >> +1 (non-binding)
>
> > > >>
>
> > > >> Chris
>
> > > >>
>
> > > >> [OK] Download all staged artifacts under the url specified in the
>
> > > release
>
> > > >> vote email into a directory we’ll now call download-dir.
>
> > > >> [MINOR] Verify the signature is correct: Additional Apache tutorial
> on
>
> > > >> how to verify downloads can be found here.
>
> > > >> [OK] Check if the signature references an Apache email address.
>
> > > >> [MINOR] Verify the SHA512 hashes:
>
> > > >> [OK] Unzip the archive
>
> > > >> [OK] Verify the existence of LICENSE, NOTICE, README files in the
>
> > > >> extracted source bundle.
>
> > > >> [MINOR] Verify the content of LICENSE, NOTICE, README files in the
>
> > > >> extracted source bundle.
>
> > > >> [MINOR] Run RAT externally to ensure there are no surprises.
>
> > > >> [MINOR] Search for SNAPSHOT references
>
> > > >> [OK] Search for Copyright references, and if they are in headers,
> make
>
> > > >> sure these files containing them are mentioned in the LICENSE file.
>
> > > >> [OK] Build the project according to the information in the README.md
>
> > > file.
>
> > > >>
>
> > > >> Remarks:
>
> > > >> - The signature is correct, but no secure trust chain could be
>
> > > >> established (Possibly worth attending a Key-Signing-Party as soon as
>
> > > they
>
> > > >> are happening again)
>
> > > >> - There are no SHA512 hashes, I validated against the SHA1 hashes
>
> > > instead
>
> > > >> - Two CERN files weren't listed in the LICENSE
>
> > > (KeyTypeValueTypeProcedure
>
> > > >> , ValueTypeComparator)
>
> > > >> - One CERN file still has a double CERN/Apache Header
>
> > (NegativeBinomial)
>
> > > >> - The community modules all contain SNAPSHOT references as they
>
> > weren't
>
> > > >> enabled in the release
>
> > > >> - The distribution references SNAPSHOT versions of community modules
>
> > > >> (However not in an always-on profile)
>
> > > >>
>
> > > >>
>
> > > >>
>
> > > >> Am 11.09.20, 19:18 schrieb "Andrew Musselman" <
>
> > > >> andrew.mussel...@gmail.com>:
>
> > > >>
>
> > > >>     My bad, RC7 out now!
>
> > > >>
>
> > > >>     Binaries:
>
> > > >>
>
> > > >>
>
> > >
>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/apache-mahout-distribution/14.1/
>
> > > >>
>
> > > >>     Source:
>
> > > >>
>
> > > >>
>
> > >
>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/
>
> > > >>
>
> > > >>     On Fri, Sep 11, 2020 at 1:04 AM Christofer Dutz <
>
> > > >> christofer.d...@c-ware.de>
>
> > > >>     wrote:
>
> > > >>
>
> > > >>     > Hi,
>
> > > >>     >
>
> > > >>     > Sad to see you didn't merge my changes in "issue/MAHOUT-2117"
>
> > back
>
> > > >> to
>
> > > >>     > master before cutting the next RC :-(
>
> > > >>     >
>
> > > >>     > So I'm not going to vote this time as the result would be the
>
> > same
>
> > > >> as last
>
> > > >>     > time ...
>
> > > >>     >
>
> > > >>     > Chris
>
> > > >>     >
>
> > > >>     >
>
> > > >>     > Am 11.09.20, 03:17 schrieb "Trevor Grant" <
>
> > > trevor.d.gr...@gmail.com
>
> > > >> >:
>
> > > >>     >
>
> > > >>     >     Thank you so much for getting this out Andrew.
>
> > > >>     >
>
> > > >>     >     I verified all checksums/sigs.
>
> > > >>     >
>
> > > >>     >     I successfully built the source including all tests. (I
> did
>
> > > >> this in the
>
> > > >>     >     public docker container rawkintrevo/mahout-builder-base)
>
> > > >>     >
>
> > > >>     >     I also tested the binaries in the public docker container
>
> > > >>     >     rawkintrevo/mahoutgui , but bashing into the running
>
> > > container,
>
> > > >>     > unpacking
>
> > > >>     >     both the binary archives, and then aiming and running the
>
> > > mahout
>
> > > >>     > example
>
> > > >>     >     notebook in turn against each of the unpacked binaries
> from
>
> > > >> each of the
>
> > > >>     >     archives.  I did this in place of spark-shell, as I think
>
> > it's
>
> > > >> a more
>
> > > >>     >     elegant solution going forward, but would encourage others
>
> > to
>
> > > >> test
>
> > > >>     > against
>
> > > >>     >     mahout spark-shell.
>
> > > >>     >
>
> > > >>     >     So given all of that, I give an enthusiastic +1
>
> > > >>     >
>
> > > >>     >     On Thu, Sep 10, 2020 at 3:37 PM Andrew Musselman <
>
> > > >> a...@apache.org>
>
> > > >>     > wrote:
>
> > > >>     >
>
> > > >>     >     > Binaries:
>
> > > >>     >     >
>
> > > >>     >     >
>
> > > >>     >
>
> > > >>
>
> > >
>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/apache-mahout-distribution/14.1/
>
> > > >>     >     >
>
> > > >>     >     > Source:
>
> > > >>     >     >
>
> > > >>     >     >
>
> > > >>     >
>
> > > >>
>
> > >
>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/mahout/14.1/
>
> > > >>     >     >
>
> > > >>     >     > Please check checksums and signatures, run the shell, do
>
> > > some
>
> > > >>     > computation,
>
> > > >>     >     > run your favorite jobs, and let us know how it looks.
>
> > > >>     >     >
>
> > > >>     >     > Thanks!
>
> > > >>     >     >
>
> > > >>     >     > Best
>
> > > >>     >     > Andrew
>
> > > >>     >     >
>
> > > >>     >
>
> > > >>     >
>
> > > >>
>
> > > >>
>
> > >
>
> >
>
>
>
>
>
> --
>
>
>
>   -jake
>
>

Reply via email to