PS assuming we clean mahout-math and scala modules -- this should be fairly easy. Maybe there's some stuff in the colt classes but there shoulnd't be a lot?
On Tue, May 19, 2015 at 11:16 AM, Dmitriy Lyubimov <[email protected]> wrote: > can't we just declare its own guava for mahout-mr? Or inherit it from > whenever it is declared in hadoop we depend on there? > > On Tue, May 19, 2015 at 9:24 AM, Pat Ferrel <[email protected]> wrote: > >> I was hoping someone knew the differences. Andrew and I are feeling our >> way along since we haven’t used either to any extent. >> >> On May 19, 2015, at 9:17 AM, Suneel Marthi <[email protected]> wrote: >> >> Ok, see ur point if its only for MAhout-Math and Mahout-hdfs. Not sure if >> its just straight replacement of Preconditions -> Asserts though. >> Preconditions throw an exception if some condition is not satisfied. Java >> Asserts are never meant to be used in production code. >> >> So the right fix would be to replace all references to Preconditions with >> some exception handling boilerplate. >> >> On Tue, May 19, 2015 at 11:58 AM, Pat Ferrel <[email protected]> >> wrote: >> >> > We only have to worry about mahout-math and mahout-hdfs. >> > >> > Yes, Andrew was working on those they were replaced with plain Java >> > asserts. >> > >> > There still remain the uses you mention in those two modules but I see >> no >> > good alternative to hacking them out. Maybe we can move some code out to >> > mahout-mr if it’s easier. >> > >> > On May 19, 2015, at 8:48 AM, Suneel Marthi <[email protected]> wrote: >> > >> > I had tried minimizing the Guava Dependency to a large extent in the >> run up >> > to 0.10.0. Its not as trivial as it seems, there are parts of the code >> > (Collocations, lucene2seq. Lucene TokenStream processing and >> tokenization >> > code) that are heavily reliant on AbstractIterator; and there are >> sections >> > of the code that assign a HashSet to a List (again have to use Guava for >> > that if one wants to avoid writing boilerplate for doing the same. >> > >> > Moreover, things that return something like Iterable<?> and need to be >> > converted into a regular collection, can easily be done using Guava >> without >> > writing own boilerplate again. >> > >> > Are we replacing all Preconditions by straight Asserts now ?? >> > >> > >> > On Tue, May 19, 2015 at 11:21 AM, Pat Ferrel <[email protected]> >> > wrote: >> > >> >> We need to move to Spark 1.3 asap and set the stage for beyond 1.3. The >> >> primary reason is that the big distros are there already or will be >> very >> >> soon. Many people using Mahout will have the environment they must use >> >> dictated by support orgs in their companies so our current position as >> >> running only on Spark 1.1.1 means many potential users are out of luck. >> >> >> >> Here are the problems I know of in moving Mahout ahead on Spark >> >> 1) Guava in any backend code (executor closures) relies on being >> >> serialized with Javaserializer, which is broken and hasn’t been fixed >> in >> >> 1.2+ There is a work around, which involves moving a Guava jar to all >> > Spark >> >> workers, which is unacceptable in many cases. Guava in the Spark-1.2 PR >> > has >> >> been removed from Scala code and will be pushed to the master probably >> > this >> >> week. That leaves a bunch of uses of Guava in java math and hdfs. >> Andrew >> >> has (I think) removed the Preconditions and replaced them with asserts. >> > But >> >> there remain some uses of Map and AbstractIterator from Guava. Not sure >> > how >> >> many of these remain but if anyone can help please check here: >> >> https://issues.apache.org/jira/browse/MAHOUT-1708 < >> >> https://issues.apache.org/jira/browse/MAHOUT-1708> >> >> 2) the Mahout Shell relies on APIs not available in Spark 1.3. >> >> 3) the api for writing to sequence files now requires implicit values >> > that >> >> are not available in the current code. I think Andy did a temp fix to >> > write >> >> to object files but this is probably nto what we want to release. >> >> >> >> I for one would dearly love to see Mahout 0.10.1 support Spark 1.3+. >> and >> >> soon. This is a call for help in cleaning these things up. Even with no >> > new >> >> features the above things would make Mahout much more usable in current >> >> environments. >> > >> > >> >> >
