Howdy all. Andrew asked me to take a little time and verify the release so we could get another PMC +1, so I tried to dust off my Mahout skills and help out... but frankly, the "getting started" from a binary distribution docs are pretty hard for _me_ to follow.
I start on the main page and navigate to: https://mahout.apache.org/general/downloads to read what to do with the 14.1 tarball y'all linked above. Ok, so I'm supposed to set env vars for MAHOUT_HOME and MAHOUT_LOCAL if I'm running on my laptop, sounds good: bash-3.2$ mkdir -p mahout/mahout-14.1 && mv Downloads/apache-mahout-distribution-14.1.tar.bz2 mahout/mahout-14.1 && cd mahout/mahout-14.1 && bunzip2 *.bz2 && tar xvf *.tar Not sure what where to go from here, as the Downloads page tells me nothing about how to drop into the mahout shell. So I follow the next top link to Overview, and read "You’ve probably already noticed Mahout has a lot of things going on at different levels, and it can be hard to know where to start." Indeed, not sure where to start, hopefully this page will tell me! Hmm. The page talks a bit about abstractions, generic application code, something about the DSL, some DAG stuff and optimizations, engine bindings and... native solvers? I thought I was getting started? Ok, maybe this page is named weird. How about the Tutorial, that sounds promising! So many tutorials to choose from: "Recommenders: CCO with Last.fm", ok I vaguely imagine "CCO" is probably some kind of co-occurrence thing, and Last.fm was a music site from before when some members of my current team were born. Let's start with something more straightforward, oooh "Text Classification(shell)", that's better, onward to https://mahout.apache.org/docs/latest/tutorials/samsara/classify-a-doc-from-the-shell.html (interestingly, the <title> tag on this page is called "Perceptron and Winnow", ok... I'll just let that slide on by, but ) Yay, a Prerequisites section! It links to http://mahout.apache.org/users/sparkbindings/play-with-shell.html (why wasn't this already in my quickstart browsing, did I miss it?). Right at the top, I'm warned: "This tutorial will show you how to play with Mahout’s scala DSL for linear algebra and its Spark shell. Please keep in mind that this code is still in a very early experimental stage. *(Edited for 0.10.2)*" I'm testing build 14.1. Seems this may no longer be "in a very experimental stage" (hopefully)? Keeping that in mind, I forge onward, to learn about classifying cereals. A section on "Installing Mahout and Spark", excellent, I hope this is up to date! Step 1: " 1. Download Apache Spark 1.6.2 <http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz> and unpack the archive file " Hmm... Spark 1.6.2, you say? I'm gonna hope my new and shiny Spark 3.0.1 download doesn't break anything? " 1. Change to the directory where you unpacked Spark and type sbt/sbt assembly to build it " Type "sbt/sbt assembly" to build ... it? To build Spark? Both the prehistoric 1.6.2 link, and the 3.0.1 bits I've got are a *binary* distribution of spark. I don't think I need to build it. If I'm wrong here, happy to hear what I'm supposed to do, but there's certainly no build.sbt hanging around any of these binary distros. I'll imagine I'm ok with this binary distro and see where things go from here. " 1. Create a directory for Mahout somewhere on your machine, change to there and checkout the master branch of Apache Mahout from GitHub git clone https://github.com/apache/mahout mahout " Hmm, I have build from source? Scanning down the page, no, apparently no instructions on what to do with my tarball here. So I suppose I could try to build from source and go from there, but... not sure how, as a "so old I'm new again" person can even get started with the binary distribution to try out a simple text classification in the mahout shell, and verify that said binary distro is worthy of release. :\ -jake On Fri, Sep 25, 2020 at 3:15 PM Andrew Musselman <andrew.mussel...@gmail.com> wrote: > I am +1 (binding) on this release; I have: > (1) Downloaded both binary and source releases > (2) Verified checksums and signatures > (3) Built from source > (4) Run the spark-shell, and loaded and run the sparse DRM multiplier > function for both distros: > > Note that in the source build the shell script is > `./distribution/target/apache-mahout-14.1/bin/mahout > spark-shell`. > > scala> :load /home/akm/a/src/test/ > > repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/apache-mahout-14.1/distribution/target/apache-mahout-14.1/examples/bin/SparseSparseDrmTimer.mscala > Loading /home/akm/a/src/test/ > > repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/apache-mahout-14.1/distribution/target/apache-mahout-14.1/examples/bin/SparseSparseDrmTimer.mscala > . > .. > timeSparseDRMMMul: (m: Int, n: Int, s: Int, para: Int, pctDense: Double, > seed: Long)Long > > scala> timeSparseDRMMMul(1000,1000,1000,1,.02,1234L) > res6: Long = 2205 > > So far that is two binding +1 votes and two non-binding +1 votes, with no > -1 votes. Still looking for another PMC vote. > > On Thu, Sep 24, 2020 at 6:45 PM Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > Sorry for the delay on my vote. > > > > Sigs checked out- source built without issues and passed all tests. > > Additional tests as described on RC6 candidate. > > > > I'm +1 binding. > > > > Thanks again Andrew!! > > > > tg > > > > > > On Wed, Sep 16, 2020 at 12:56 PM Trevor Grant <trevor.d.gr...@gmail.com> > > wrote: > > > > > Away from my computer on vacation- if it's still up next week I can > vote. > > > > > > On Wed, Sep 16, 2020, 11:27 AM Christofer Dutz < > > christofer.d...@c-ware.de> > > > wrote: > > > > > >> +1 (non-binding) > > >> > > >> Chris > > >> > > >> [OK] Download all staged artifacts under the url specified in the > > release > > >> vote email into a directory we’ll now call download-dir. > > >> [MINOR] Verify the signature is correct: Additional Apache tutorial on > > >> how to verify downloads can be found here. > > >> [OK] Check if the signature references an Apache email address. > > >> [MINOR] Verify the SHA512 hashes: > > >> [OK] Unzip the archive > > >> [OK] Verify the existence of LICENSE, NOTICE, README files in the > > >> extracted source bundle. > > >> [MINOR] Verify the content of LICENSE, NOTICE, README files in the > > >> extracted source bundle. > > >> [MINOR] Run RAT externally to ensure there are no surprises. > > >> [MINOR] Search for SNAPSHOT references > > >> [OK] Search for Copyright references, and if they are in headers, make > > >> sure these files containing them are mentioned in the LICENSE file. > > >> [OK] Build the project according to the information in the README.md > > file. > > >> > > >> Remarks: > > >> - The signature is correct, but no secure trust chain could be > > >> established (Possibly worth attending a Key-Signing-Party as soon as > > they > > >> are happening again) > > >> - There are no SHA512 hashes, I validated against the SHA1 hashes > > instead > > >> - Two CERN files weren't listed in the LICENSE > > (KeyTypeValueTypeProcedure > > >> , ValueTypeComparator) > > >> - One CERN file still has a double CERN/Apache Header > (NegativeBinomial) > > >> - The community modules all contain SNAPSHOT references as they > weren't > > >> enabled in the release > > >> - The distribution references SNAPSHOT versions of community modules > > >> (However not in an always-on profile) > > >> > > >> > > >> > > >> Am 11.09.20, 19:18 schrieb "Andrew Musselman" < > > >> andrew.mussel...@gmail.com>: > > >> > > >> My bad, RC7 out now! > > >> > > >> Binaries: > > >> > > >> > > > https://repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/apache-mahout-distribution/14.1/ > > >> > > >> Source: > > >> > > >> > > > https://repository.apache.org/content/repositories/orgapachemahout-1066/org/apache/mahout/mahout/14.1/ > > >> > > >> On Fri, Sep 11, 2020 at 1:04 AM Christofer Dutz < > > >> christofer.d...@c-ware.de> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > Sad to see you didn't merge my changes in "issue/MAHOUT-2117" > back > > >> to > > >> > master before cutting the next RC :-( > > >> > > > >> > So I'm not going to vote this time as the result would be the > same > > >> as last > > >> > time ... > > >> > > > >> > Chris > > >> > > > >> > > > >> > Am 11.09.20, 03:17 schrieb "Trevor Grant" < > > trevor.d.gr...@gmail.com > > >> >: > > >> > > > >> > Thank you so much for getting this out Andrew. > > >> > > > >> > I verified all checksums/sigs. > > >> > > > >> > I successfully built the source including all tests. (I did > > >> this in the > > >> > public docker container rawkintrevo/mahout-builder-base) > > >> > > > >> > I also tested the binaries in the public docker container > > >> > rawkintrevo/mahoutgui , but bashing into the running > > container, > > >> > unpacking > > >> > both the binary archives, and then aiming and running the > > mahout > > >> > example > > >> > notebook in turn against each of the unpacked binaries from > > >> each of the > > >> > archives. I did this in place of spark-shell, as I think > it's > > >> a more > > >> > elegant solution going forward, but would encourage others > to > > >> test > > >> > against > > >> > mahout spark-shell. > > >> > > > >> > So given all of that, I give an enthusiastic +1 > > >> > > > >> > On Thu, Sep 10, 2020 at 3:37 PM Andrew Musselman < > > >> a...@apache.org> > > >> > wrote: > > >> > > > >> > > Binaries: > > >> > > > > >> > > > > >> > > > >> > > > https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/apache-mahout-distribution/14.1/ > > >> > > > > >> > > Source: > > >> > > > > >> > > > > >> > > > >> > > > https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/mahout/14.1/ > > >> > > > > >> > > Please check checksums and signatures, run the shell, do > > some > > >> > computation, > > >> > > run your favorite jobs, and let us know how it looks. > > >> > > > > >> > > Thanks! > > >> > > > > >> > > Best > > >> > > Andrew > > >> > > > > >> > > > >> > > > >> > > >> > > > -- -jake