Chris, Thank you so much for what you are doing, This is Apache at its best..
I've been down and out with a serious Illness, Injury and other issues, which
have seriously limited my Machine time. I was pretty close to getting a good
build, but it was hacky, and the method that you use to name the modules for
both Scala versions, looks great.
We've always relied on Stevo to fix the builds for us, but as he said is
unable to contribute right now. The main issues (solved by hacks), currently
are
1. Dependencies and transitive dependencies are not being picked and copied
to the `./lib` directory, where `/bin/mahout` and parts of the
MahoutSparkContext look for them, to add to the class path. So running either
from the CLI or as a library, dependencies are not picked up.
* We used to use the mahout-experimental-xx.jar as a fat jar for this,
though it was bloated with now deprecated MR stuff, and no longer packed.
2. `./bin/mahout` (and `compute-classpath.sh`) need to be revamped to ensure
that they are picking up the correct classes.
w.r.t. to Java 8/7 issues, We did mandate Java 8+, and this required a few
minor code changes to play nicely with Scala 2.11. Mainly one class needed a
JVM "Static" field, so i refactored that field out of the Class and into a
companion object. I wonder if this is what is giving you issues with Java 7.
I'd thought that Java 8 was mandated now, but may be thinking of maven 3.3.x.
Regardless Thank you very much for this. This board is doing really doing well
so far. and deserves accolades.
>
> <dependency>
> <groupId>org.apache.mahout</groupId>
> <artifactId>mahout-spark</artifactId>
> <version>14.1-SNAPSHOT</version>
> <classifier>2.11</classifier>
> </dependency>
This would be perfect IMO.
I can send you the commits that I am talking about.
As well, I saw that Trevor gave you a link to a filter.. I have one here with a
bit more limited scope, which is open issues fixversion == 14.1.
To answer one question yes this was recently building, and releasing, with all
of the tests passing (for a few modules, that we were focusing on). after that
i made some changes that broke it again..
the board with limited scope:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=348&view=detail&selectedIssue=MAHOUT-2093
Thanks again for helping out. we are really bad with poms, not so much from
the ground up, as fixing some that are 10 years old, as Stevo mentioned, very
quickly while working on several other things.
Thank you again for this. It is a great help, and once we get a good build, we
can get back to doing work on the library itself.
I have some documents that i can provide if it will help explain the structure
of the project, which is still kind of in flux. E.g. I'd like to get the
ViennaCL-OMP branch out of experimental, but there is much to clean up first.
As well, I am on medical leave, and dont have much time on the computer these
days.. have to budget my time.
I'll send you some (closed) PRs with notes and changes, if it helps. lmk.
Thanks again, This is Huge.
Andy
________________________________
From: Christofer Dutz <[email protected]>
Sent: Thursday, April 16, 2020 9:50 AM
To: [email protected] <[email protected]>
Subject: Re: Hi ... need some help?
Hi Trevor,
ok ... first of all ... the Mahout PMC is defining a "community maintained"
library which is not maintained by the mahout PMC?!?!
I thought at Apache everything is about Community over code. So is a company
driving the non-community stuff?
But back to your build issues:
I had a look and I too encountered these comments and remarks and sometimes
patterns I recognized and could imagine why they were created.
Yes quite a bit of the build could be cleaned up and simplified a lot.
So how about I create a fork and try to do a cleanup of the build.
Usually I also leave comments about what I do as I hope I'll not be the only
one maintaining a build and documenting things helps people feel more confident.
However in some cases I will have questions ... so would someone be available
on Slack for quick questions?
Usually switching to another build system does solve some problems ... mostly
the reason to switch is that it solved the main problem that you are having
with the old.
However you usually notice too late that you get yourself a lot of new
problems. I remember doing some contract work for an insurance company and they
were totally down Maven-road but then had to build something with SBT ... in
the end I compiled the thing on my laptop, copied it to a USB stick and told
the people what was on the stick and that I'll be having a coffee and will be
back in 30 minutes. When I came back the sick wasn't at the same place and the
build problem was "solved" ;-)
So I think it's quite good to stick to maven ... that is very mature, you can
do almost everything you want with it and it integrates perfectly into the
Apache infrastructure.
But that's just my opinion.
So if you want me to help, I'll be happy to be of assistance.
Chris
Am 16.04.20, 15:28 schrieb "Trevor Grant" <[email protected]>:
Hey Christopher,
I would agree with what Stevo outlined but add some more context and a
couple related JIRA issues.
For 0.14.0 We did a big refactor and finally moved the MapReduce based
Mahout all into what we called "community/" that is community maintained,
which is to say, we're not maintaining it anymore (sunset began I think in
2015).
But all of our POMs were so huge and fat because they'd been layered up
over the years by people coming and going and dropping in code. I wouldn't
call these drive- bys, its just been over 10 years and people come and go.
Such is the life of Apache Projects. So we had a situation where a lot of
the old Map Reduce stuff and the POMs were considered "old-magic" no one
really knew how it was all tied together, but we didn't want to mess with
it for fear of breaking something in the "new" Mahout (aka Samsar) which is
the Scala/Spark based library that it is now* (to others in the community:
I know it runs on other engines, but for simplicity, I'm just calling it
"runs-on-spark").
For 0.14.0 We decided to trim out as much of that which was possible. We
did some major liposuction on POMs, re organized things, etc. This was done
by commenting out a section, then seeing if it would still build. So the
current release
_does_ build. And aside for some CLI driver issues which are outlined in
[1], the project runs fairly smooth. (An SBT would probably solve [1], I
believe Pat Ferrel has made his own SBT script to compile Mahout, which
solved that problem for them).
The issue we ran into with the releases (and the reason I think you're
here), is that we also somewhere along the line commented out something
that was important to the release process. Hence why 0.14.0 released source
only.
Since 2008, there has been a lot of great work on generating plugins for
doing Apache releases. Instead of the awkward hacks that made up the old
poms (literally comments that said, "this is a hack, there's supposedly
something better coming from ..." dated like 2012), we would like to do it
the "right way" and incorporate the appropriate plugins.
Refactoring to SBT was _one_ proposed solution. We're also OK continuing to
use Maven, and I agree with what you said about the cross compiling. We
actually have a script that just changes the scala version. We tried using
the classifiers but there were issues in SBT, but the way you're proposing
sounds a lot more pro than the route we were trying for.
That said- we'd be OK just releasing one scala/spark version at a time.
But getting the convenience binaries to release/publish would be a major
first step.
Also, we really appreciate the help,
tg
[1]
https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2093?filter=allopenissues
On Thu, Apr 16, 2020 at 4:50 AM Christofer Dutz <[email protected]>
wrote:
> Hi Stevo,
>
> so let me summarize what I understood:
>
> - There are some modules in mahout that are built with Scala, some with
> java and some with both (At least that's what I see when checking out the
> project)
> - The current build uses Scala 2.11 to build the Scala code.
> - The resulting libraries are only compatible with Scala 2.11
>
> Now you want to also publish versions compatible with Scala 2.12?
>
> If that's the case I think Maven could easily add multiple executions
> where each compile compiles to different output directories:
> - Java --> target/classes
> - Scala 2.11 --> target/classes-2.11
> - Scala 2.12 --> target/classes-2.12
>
> Then the packaging would also need a second execution ... each of the
> executions bundling the classes and the corresponding scala output.
> Ideally I would probably use maven classifiers to distinguish the
> artifacts.
>
> <dependency>
> <groupId>org.apache.mahout</groupId>
> <artifactId>mahout-spark</artifactId>
> <version>14.1-SNAPSHOT</version>
> <classifier>2.11</classifier>
> </dependency>
>
> Then it should all work in a normal maven build. In the distributions you
> could also filter the versions according to their classifiers.
>
> So if this is the case, I could help you with this.
>
> Chris
>
>
> Am 16.04.20, 09:39 schrieb "Stevo Slavić" <[email protected]>:
>
> Disclaimer: I'm not active Mahout maintainer for quite a while, have
> some
> historical perspective, take it with a grain of salt, could be I'm
> missing
> the whole point you were approached for by a wide margin of error.
>
> At a point Mahout, some of its modules, have turned into a scala
> library, and there was need to cross publish those modules, across
> different scala versions. Back than Maven scala plugin didn't support
> cross
> publishing, it doesn't fit well with Maven's build lifecycle concept
> (multiple compile phases - one for each scala version, and what not
> would
> be needed). Switching to sbt could have solved the problem. Switch was
> deemed to be too big task, even though ages have been spent on trying
> to
> apply Maven (profiles) + bash scripts and what not to solve the
> problem.
> Trying to apply same approach over and over again and expecting
> different
> results is not smart, no expert can help there. Mahout maintainers and
> contributors, should consider alternative approach, one of them being
> switching to sbt - it's scala native, supports scala cross publishing,
> supports publishing Maven compatible release metadata and binaries.
>
> Kind regards,
> Stevo Slavic.
>
> On Thu, Apr 16, 2020 at 9:15 AM Christofer Dutz <
> [email protected]>
> wrote:
>
> > Hi folks,
> >
> > my name is Chris and I’m involved in quite a lot of Apache projects.
> > Justin approached me this morning, asking me if I could perhaps help
> you.
> > He told me you were having trouble with doing Maven releases.
> >
> > As Maven releases are my specialty, could you please summarize the
> issues
> > you are having?
> >
> > Chris
> >
>
>