Weekly Minutes

2024-04-24 Thread Trevor Grant
## Weekly community meeting
[Subscribe](mailto:user-subscr...@mahout.apache.org) to the Mahout User
list to ask for details on joining.

### Attendees
* Tommy Naugle
* Trevor Grant


All old and new business was pushed to next meeting due to lack of quorum.

Tommy and Trevor jointly reviewed [PR #442](
https://github.com/apache/mahout/pull/442).
The tl;dr is that it will be merged following some clean up on the git log.
(Trevor walked Tommy through a chatGPT he made about how to clean it up).

Don't forget the Community Happy Hour on Monday 4/29! (Ask for link on
user@m.a.o or the #mahout channel on the-asf.slack.com, all are welcome!)

(That all got posted to the website, since y'all are on these mailing lists
already, the story behind the story is we don't have a link yet, however if
you're interested in joining the community calls- the weekly Qumat one is:

Mahout-Q
Wednesday, April 24 · 3:00 – 3:30pm
Time zone: America/Chicago
Google Meet joining info
Video call link: https://meet.google.com/tpi-msur-qug


Re: Mar 20 minutes

2024-03-25 Thread Trevor Grant
All are welcome to come co-conspire!

On Mon, Mar 25, 2024 at 4:37 AM Peng Zhang  wrote:

> Wow, i am curious who are the conspirators.
> “Happy hour some week soon, invite collaborators and conspirators”
>
> Cheers,
> Peng
>
> On Sat, Mar 23, 2024 at 00:54 Andrew Musselman  wrote:
>
> > Community meeting minutes posted at
> > https://mahout.apache.org/minutes/2024/03/20/Meeting-Minutes.html
> >
> > Meeting Minutes
> >
> > 2024-03-20 08:00:00 +0000
> > Weekly community meetingAttendees
> >
> >- Trevor Grant
> >- Tommy Naugle
> >
> > Old Business
> >
> >1. Happy hour some week soon, invite collaborators and conspirators
> >2. Drop this meeting time from two hours to a half hour
> >3. Coordinate on JIRA
> >   - Web site cleanup (~210 broken links fixed out of ~220, tommy
> >   continuing)
> >   - Continued qumat data structure work (tommy in flight, akm to
> > review)
> >4. Ask INFRA to help us make sure PRs are defaulting to main instead
> of
> >trunk (akm) (done)
> >5. Kernel method research spike:
> >https://issues.apache.org/jira/browse/MAHOUT-2200
> >6. Make ticket to add notebooks to notebooks directory in source tree
> (
> >https://issues.apache.org/jira/browse/MAHOUT-2198)
> >7. Add execute method to qumat
> >https://issues.apache.org/jira/browse/MAHOUT-2201
> >8. Rebuild JIRA - now that we have wiped it clean, on the qumat side
> >anyway, lets start grooming tasks into the appropriate
> >components/releases/etc (todo)
> >   - Including adding filters to all boards so only those tickets show
> >   up (todo)
> >
> > New Business
> >
> >1. Tommy is working on making a docker container for previewing
> website
> >builds
> >2. Trevor is pivoting from kernel research into implementing POC for
> >cirq ie the 9 gates and circuit execute
> >
> > Other Business
> >
>


Re: Help regarding Mahout installation as a library

2022-04-11 Thread Trevor Grant
Hi Tanmay,

The mahout-math is contained in mahout core, can you try replacing the slug
in your pom about importing mahout-math with this:


org.apache.mahout
mahout-core
14.1


Where did you read about importing mahout-math, we might need to update
some old docs.

Don't worry about asking questions- you're not the only person to have
them, just the only one brave enough to ask :)

tg

On Sun, Apr 10, 2022 at 8:26 AM Tanmay Chavan 
wrote:

> Hi Trevor,
>
> Thanks for your response! I was initially trying to set up mahout and test
> it on the command line. However, I hadn't configured Spark properly during
> the installation and thus it failed. I solved that issue now and can get
> mahout spark-shell to run :D
>
> However I'm facing problems using it as a library in eclipse. I created a
> maven project in eclipse, edited the pom.xml file to add the dependencies,
> and finally used maven install via eclipse. However, it wasn't able to find
> mahout-math. The error was:
>
> Could not resolve dependencies for project
> test4j:test4j:jar:0.0.1-SNAPSHOT: Failure to find
> org.apache.mahout:mahout-math:jar:14.1 in
> https://repo.maven.apache.org/maven2 was cached in the local repository,
> resolution will not be reattempted until the update interval of central has
> elapsed or updates are forced -> [Help 1]
>
> Running maven -U clean install in bash gave:
>
> Could not find artifact org.apache.mahout:mahout-math:jar:14.1 in central (
> https://repo.maven.apache.org/maven2)
>
> However, it seems to be able to find mahout-hdfs. Am I doing something
> wrong? I'm sorry for asking build questions, but I'm new to maven
> development and can't seem to figure out this.
>
> Sincerely,
> Tanmay
>
>
> On Sat, 9 Apr 2022 at 17:37, Trevor Grant 
> wrote:
>
> > Hi Tanmay-
> >
> > The maven install command will copy jars to your local maven cache. Then
> if
> > you try to compile a second program with dependency on mahout, it should
> > work.
> >
> > The likely reason it is telling you command not found is it's not on the
> > path. Can you reply with the full error it is giving you?
> >
> > tg
> >
> >
> > On Fri, Apr 8, 2022 at 9:19 AM Tanmay Chavan 
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to install Apache Mahout Ubuntu linux machine for a college
> > > project. I downloaded it using the link provided on the main page (v
> > 14.1)
> > > as well as the download page, and tried to build it using mvn
> -DskipTests
> > > clean install. The build seemed to conclude successfully. However, it
> > stille
> > > shows mahout: command not found on shell. How can I install the
> software
> > > without building it from source? Is there any binary for mahout?
> > >
> > > Sincerely,
> > > Tanmay
> > >
> >
>


Re: Help regarding Mahout installation as a library

2022-04-09 Thread Trevor Grant
Hi Tanmay-

The maven install command will copy jars to your local maven cache. Then if
you try to compile a second program with dependency on mahout, it should
work.

The likely reason it is telling you command not found is it's not on the
path. Can you reply with the full error it is giving you?

tg


On Fri, Apr 8, 2022 at 9:19 AM Tanmay Chavan 
wrote:

> Hi,
>
> I am trying to install Apache Mahout Ubuntu linux machine for a college
> project. I downloaded it using the link provided on the main page (v 14.1)
> as well as the download page, and tried to build it using mvn -DskipTests
> clean install. The build seemed to conclude successfully. However, it still
> shows mahout: command not found on shell. How can I install the software
> without building it from source? Is there any binary for mahout?
>
> Sincerely,
> Tanmay
>


Mahout Blog

2022-01-21 Thread Trevor Grant
Hey all,

I was just on the site and remembered about the Mahout blog and wanted to
note 2 things.

1) The outcome of the thread on block chain stuff and the upcoming meeting
should get recapped in some sort of post- I can do it, just someone please
remind me.

2) If anyone wants to get involved with the project but isn't sure what a
good first commit would be- we'd always love blog posts.

The posts are typical Jeckyll style- here is an example[1] reach out if you
need/want more help.

[1]
https://github.com/apache/mahout/blob/trunk/website/_posts/2021-06-01-Zeppelin%20Quickstart.md


Re: Project Idea - Blockchain

2022-01-18 Thread Trevor Grant
Thanks for taking point on this Andrew.

On Tue, Jan 18, 2022 at 12:00 PM Andrew Musselman  wrote:

> Hi all, with the holiday our schedule got pushed back some; only a few
> votes so far and most of them were for the holiday yesterday. I'm going to
> try again for next week.
>
> Please vote here:
> https://calendly.com/d/cmn-rh2-xr5/mahout-community-meeting
>
> On Thu, Jan 13, 2022 at 10:38 AM Andrew Musselman  wrote:
>
> > We're looking at good times for a community session next week. Please
> vote
> > on the times here, if none of the times works for you please let me know:
> > https://calendly.com/d/cg8-g4d-g5x/mahout-community-meeting
> >
> > On Fri, Jan 7, 2022 at 1:49 PM Andrew Musselman  wrote:
> >
> >> I've started an epic at
> https://issues.apache.org/jira/browse/MAHOUT-2142
> >>
> >> Please feel free to comment and add ideas and child stories.
> >>
> >> On Thu, Jan 6, 2022 at 6:29 PM Manoj Awasthi 
> >> wrote:
> >>
> >>> Interesting.
> >>>
> >>> Please keep this group posted so interested people can monitor and
> join.
> >>> I'll want to contribute if there is any way I can.
> >>>
> >>> On Fri, 7 Jan 2022 at 06:27, Andrew Musselman  wrote:
> >>>
> >>> > Hi Amanda, we will be putting an epic up on
> >>> > https://issues.apache.org/jira/projects/MAHOUT/issues in the next
> >>> couple
> >>> > days and posting here for comments and help.
> >>> >
> >>> > In the meantime feel free to go get a copy of the software from
> >>> > https://mahout.apache.org.
> >>> >
> >>> > We have had bug bash and planning meetings at times weekly or
> monthly;
> >>> may
> >>> > be time to put something back on the calendar..
> >>> >
> >>> > On Thu, Jan 6, 2022 at 2:02 PM Amanda Lunt 
> >>> > wrote:
> >>> >
> >>> > > Hi Everyone,
> >>> > >
> >>> > > Where do I find details about this project? General and specific.
> >>> > > I would like to participate, but unsure about getting started :-)
> >>> > >
> >>> > > Amanda
> >>> > >
> >>> > >
> >>> > > On 7/1/22 4:37 am, Andrew Musselman wrote:
> >>> > > Thanks for the input; I'll create an epic this week in jira so we
> can
> >>> > build
> >>> > > on these ideas.
> >>> > >
> >>> > > On Wed, Jan 5, 2022 at 11:30 PM Shaloo Shalini  >>> > > >>> > > sh...@hotmail.com> wrote:
> >>> > >
> >>> > > > Very interesting!
> >>> > > >
> >>> > > > How about a few more use cases:
> >>> > > >
> >>> > > > (4) Time-series analysis of transactions (overall # per
> >>> > > > week/month/year/customperiod, by user account etc.) for a list of
> >>> > > ledgers.
> >>> > > > (Comparative analysis of usage)
> >>> > > > (5) Max/Min range of transactions for different ledgers
> >>> > > >
> >>> > > > > On 06-Jan-2022, at 6:34 AM, Andrew Musselman  >>> > > >>> > > a...@apache.org> wrote:
> >>> > > > >
> >>> > > > > After some chats with Trevor about project direction we wanted
> to
> >>> > bring
> >>> > > > > some ideas back to the lists.
> >>> > > > >
> >>> > > > > I have professional interest in blockchain tech, including how
> to
> >>> > query
> >>> > > > > ledgers and submit new entries, how to index the actual
> >>> blockchain
> >>> > > files
> >>> > > > > for search. Specific interests are in tracking contents of
> smart
> >>> > > > contracts (
> >>> > > > > https://ethereum.org/en/developers/docs/smart-contracts/<
> >>> > > https://ethereum.org/en/developers/docs/smart-contracts>) and
> their
> >>> > > > > execution, doing analytics on public transactions, etc.
> >>> > > > >
> >>> > > > > Proposal is to provide a new data source, namely any number of
> >>> > > > > ethereum-compatible ledgers, and pick a few compelling use
> cases
> >>> to
> >>> > > build
> >>> > > > > out this year. Examples could be:
> >>> > > > > (1) Search-indexes of given ledgers
> >>> > > > > (2) Computed similarity to other accounts on the same ledger
> >>> based on
> >>> > > > > activity history
> >>> > > > > (3) Time-series analysis of gas (transaction) fees across
> >>> multiple
> >>> > > > ledgers
> >>> > > > >
> >>> > > > > If this sounds interesting or if anyone has things to add
> please
> >>> let
> >>> > me
> >>> > > > > know.
> >>> > > > >
> >>> > > > > Happy New Year!
> >>> > > > >
> >>> > > > > Best
> >>> > > > > Andrew
> >>> > > >
> >>> > > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Amanda Lunt |  Casual Academic Staff Member | School of Information
> >>> and
> >>> > > Communication Technology
> >>> > > Syndicate of Schools for Built, Digital and Natural Environments |
> >>> > College
> >>> > > of Sciences and Engineering
> >>> > > University of Tasmania | Locked Bag 1359 | Launceston 7250
> >>> > >
> >>> > > amanda.l...@utas.edu.au
> >>> > >
> >>> > > I am currently reading The Chimes by Anna Smaill<
> >>> > > http://www.annasmaill.com/the-chimes.html>
> >>> > >
> >>> > >
> >>> > > This email is confidential, and is for the intended recipient only.
> >>> > > Access, disclosure, copying, distribution, or reliance on any of it
> >>> by
> >>> > > anyone outside the intended recipient o

Re: Project Idea - Blockchain

2022-01-05 Thread Trevor Grant
Very interested.

On Wed, Jan 5, 2022, 7:04 PM Andrew Musselman  wrote:

> After some chats with Trevor about project direction we wanted to bring
> some ideas back to the lists.
>
> I have professional interest in blockchain tech, including how to query
> ledgers and submit new entries, how to index the actual blockchain files
> for search. Specific interests are in tracking contents of smart contracts
> (
> https://ethereum.org/en/developers/docs/smart-contracts/) and their
> execution, doing analytics on public transactions, etc.
>
> Proposal is to provide a new data source, namely any number of
> ethereum-compatible ledgers, and pick a few compelling use cases to build
> out this year. Examples could be:
> (1) Search-indexes of given ledgers
> (2) Computed similarity to other accounts on the same ledger based on
> activity history
> (3) Time-series analysis of gas (transaction) fees across multiple ledgers
>
> If this sounds interesting or if anyone has things to add please let me
> know.
>
> Happy New Year!
>
> Best
> Andrew
>


Re: Log4j, CVE-2021-44228, and Mahout

2021-12-28 Thread Trevor Grant
@Musselman, I sent invite directly to you.

@Anyone-else-interested, please don't be shy, join us:

Apache Mahout
Tuesday, December 28 · 5:00 – 6:00pm (CST, -0600)
Google Meet joining info
Video call link: https://meet.google.com/ajg-rxbo-jvw

On Thu, Dec 23, 2021 at 12:33 PM Trevor Grant 
wrote:

> Works for me- if anyone else wants to join and that time doesn't work
> (17:00 -6:00 UTC), speak up.
>
> On Thu, Dec 23, 2021 at 12:22 PM Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
>> Works for me; have a good holiday and see you Tuesday. Five p.m. Central
>> maybe?
>>
>> On Tue, Dec 21, 2021 at 12:56 PM Trevor Grant 
>> wrote:
>>
>> > I don't think we set a time / place to meet tonight-
>> >
>> > I propose punting to next week, I'll probably hack a bit tonight- just
>> send
>> > a proposed time / channel.
>> >
>> > tg
>> >
>> > On Wed, Dec 15, 2021 at 8:52 AM Andrew Musselman <
>> > andrew.mussel...@gmail.com>
>> > wrote:
>> >
>> > > Good for me
>> > >
>> > > On Tue, Dec 14, 2021 at 6:13 AM Trevor Grant <
>> trevor.d.gr...@gmail.com>
>> > > wrote:
>> > >
>> > > > Love this idea, how about Tuesday evenings, starting the 21st ( a
>> week
>> > > from
>> > > > tonight )
>> > > >
>> > > > On Mon, Dec 13, 2021 at 7:37 PM Andrew Musselman <
>> > > > andrew.mussel...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Thanks Trevor; may be a good time to revive our online meetings to
>> > talk
>> > > > > through this one..
>> > > > >
>> > > > > I could find time during the holiday break pretty much any day; if
>> > > anyone
>> > > > > else is interested let us know if there's a good time to chat.
>> > > > >
>> > > > > On Mon, Dec 13, 2021 at 4:26 PM Trevor Grant <
>> > trevor.d.gr...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Many of you have probably become aware of Log4j's vulnerability
>> to
>> > > > > > CVE-2021-44228 recently.
>> > > > > >
>> > > > > > Though Mahout is a sleepy project, we are vigilant and want you
>> to
>> > > know
>> > > > > we
>> > > > > > are aware of the issue and have been monitoring.
>> > > > > >
>> > > > > > First, let me assure you that since Mahout (like over 90% of
>> log4j
>> > > > users)
>> > > > > > is on version 1.x it is not vulnerable to the JDNI remote
>> execution
>> > > > > attack
>> > > > > > [1]. That said, 1.x was set for EOL in 2015, so it's probably
>> time
>> > to
>> > > > > > update that. I've made a JIRA ticket (MAHOUT-2140)[2].
>> > > > > >
>> > > > > > The update isn't too complex, but it's also not trivial, and
>> most
>> > > > > > importantly it's not critical so you're not endangering anything
>> > > > running
>> > > > > > Mahout, and we'll hopefully get it in for the next release in a
>> > > couple
>> > > > of
>> > > > > > months.
>> > > > > >
>> > > > > > Hope this helps everyone feel secure going into their holiday
>> > season.
>> > > > > >
>> > > > > > ~Trevor
>> > > > > >
>> > > > > > [1] http://slf4j.org/log4shell.html
>> > > > > > [2]
>> > > https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2140
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: Log4j, CVE-2021-44228, and Mahout

2021-12-23 Thread Trevor Grant
Works for me- if anyone else wants to join and that time doesn't work
(17:00 -6:00 UTC), speak up.

On Thu, Dec 23, 2021 at 12:22 PM Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Works for me; have a good holiday and see you Tuesday. Five p.m. Central
> maybe?
>
> On Tue, Dec 21, 2021 at 12:56 PM Trevor Grant 
> wrote:
>
> > I don't think we set a time / place to meet tonight-
> >
> > I propose punting to next week, I'll probably hack a bit tonight- just
> send
> > a proposed time / channel.
> >
> > tg
> >
> > On Wed, Dec 15, 2021 at 8:52 AM Andrew Musselman <
> > andrew.mussel...@gmail.com>
> > wrote:
> >
> > > Good for me
> > >
> > > On Tue, Dec 14, 2021 at 6:13 AM Trevor Grant  >
> > > wrote:
> > >
> > > > Love this idea, how about Tuesday evenings, starting the 21st ( a
> week
> > > from
> > > > tonight )
> > > >
> > > > On Mon, Dec 13, 2021 at 7:37 PM Andrew Musselman <
> > > > andrew.mussel...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Trevor; may be a good time to revive our online meetings to
> > talk
> > > > > through this one..
> > > > >
> > > > > I could find time during the holiday break pretty much any day; if
> > > anyone
> > > > > else is interested let us know if there's a good time to chat.
> > > > >
> > > > > On Mon, Dec 13, 2021 at 4:26 PM Trevor Grant <
> > trevor.d.gr...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Many of you have probably become aware of Log4j's vulnerability
> to
> > > > > > CVE-2021-44228 recently.
> > > > > >
> > > > > > Though Mahout is a sleepy project, we are vigilant and want you
> to
> > > know
> > > > > we
> > > > > > are aware of the issue and have been monitoring.
> > > > > >
> > > > > > First, let me assure you that since Mahout (like over 90% of
> log4j
> > > > users)
> > > > > > is on version 1.x it is not vulnerable to the JDNI remote
> execution
> > > > > attack
> > > > > > [1]. That said, 1.x was set for EOL in 2015, so it's probably
> time
> > to
> > > > > > update that. I've made a JIRA ticket (MAHOUT-2140)[2].
> > > > > >
> > > > > > The update isn't too complex, but it's also not trivial, and most
> > > > > > importantly it's not critical so you're not endangering anything
> > > > running
> > > > > > Mahout, and we'll hopefully get it in for the next release in a
> > > couple
> > > > of
> > > > > > months.
> > > > > >
> > > > > > Hope this helps everyone feel secure going into their holiday
> > season.
> > > > > >
> > > > > > ~Trevor
> > > > > >
> > > > > > [1] http://slf4j.org/log4shell.html
> > > > > > [2]
> > > https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2140
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Log4j, CVE-2021-44228, and Mahout

2021-12-21 Thread Trevor Grant
I don't think we set a time / place to meet tonight-

I propose punting to next week, I'll probably hack a bit tonight- just send
a proposed time / channel.

tg

On Wed, Dec 15, 2021 at 8:52 AM Andrew Musselman 
wrote:

> Good for me
>
> On Tue, Dec 14, 2021 at 6:13 AM Trevor Grant 
> wrote:
>
> > Love this idea, how about Tuesday evenings, starting the 21st ( a week
> from
> > tonight )
> >
> > On Mon, Dec 13, 2021 at 7:37 PM Andrew Musselman <
> > andrew.mussel...@gmail.com>
> > wrote:
> >
> > > Thanks Trevor; may be a good time to revive our online meetings to talk
> > > through this one..
> > >
> > > I could find time during the holiday break pretty much any day; if
> anyone
> > > else is interested let us know if there's a good time to chat.
> > >
> > > On Mon, Dec 13, 2021 at 4:26 PM Trevor Grant  >
> > > wrote:
> > >
> > > > Many of you have probably become aware of Log4j's vulnerability to
> > > > CVE-2021-44228 recently.
> > > >
> > > > Though Mahout is a sleepy project, we are vigilant and want you to
> know
> > > we
> > > > are aware of the issue and have been monitoring.
> > > >
> > > > First, let me assure you that since Mahout (like over 90% of log4j
> > users)
> > > > is on version 1.x it is not vulnerable to the JDNI remote execution
> > > attack
> > > > [1]. That said, 1.x was set for EOL in 2015, so it's probably time to
> > > > update that. I've made a JIRA ticket (MAHOUT-2140)[2].
> > > >
> > > > The update isn't too complex, but it's also not trivial, and most
> > > > importantly it's not critical so you're not endangering anything
> > running
> > > > Mahout, and we'll hopefully get it in for the next release in a
> couple
> > of
> > > > months.
> > > >
> > > > Hope this helps everyone feel secure going into their holiday season.
> > > >
> > > > ~Trevor
> > > >
> > > > [1] http://slf4j.org/log4shell.html
> > > > [2]
> https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2140
> > > >
> > >
> >
>


Re: Log4j, CVE-2021-44228, and Mahout

2021-12-14 Thread Trevor Grant
Love this idea, how about Tuesday evenings, starting the 21st ( a week from
tonight )

On Mon, Dec 13, 2021 at 7:37 PM Andrew Musselman 
wrote:

> Thanks Trevor; may be a good time to revive our online meetings to talk
> through this one..
>
> I could find time during the holiday break pretty much any day; if anyone
> else is interested let us know if there's a good time to chat.
>
> On Mon, Dec 13, 2021 at 4:26 PM Trevor Grant 
> wrote:
>
> > Many of you have probably become aware of Log4j's vulnerability to
> > CVE-2021-44228 recently.
> >
> > Though Mahout is a sleepy project, we are vigilant and want you to know
> we
> > are aware of the issue and have been monitoring.
> >
> > First, let me assure you that since Mahout (like over 90% of log4j users)
> > is on version 1.x it is not vulnerable to the JDNI remote execution
> attack
> > [1]. That said, 1.x was set for EOL in 2015, so it's probably time to
> > update that. I've made a JIRA ticket (MAHOUT-2140)[2].
> >
> > The update isn't too complex, but it's also not trivial, and most
> > importantly it's not critical so you're not endangering anything running
> > Mahout, and we'll hopefully get it in for the next release in a couple of
> > months.
> >
> > Hope this helps everyone feel secure going into their holiday season.
> >
> > ~Trevor
> >
> > [1] http://slf4j.org/log4shell.html
> > [2] https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2140
> >
>


Log4j, CVE-2021-44228, and Mahout

2021-12-13 Thread Trevor Grant
Many of you have probably become aware of Log4j's vulnerability to
CVE-2021-44228 recently.

Though Mahout is a sleepy project, we are vigilant and want you to know we
are aware of the issue and have been monitoring.

First, let me assure you that since Mahout (like over 90% of log4j users)
is on version 1.x it is not vulnerable to the JDNI remote execution attack
[1]. That said, 1.x was set for EOL in 2015, so it's probably time to
update that. I've made a JIRA ticket (MAHOUT-2140)[2].

The update isn't too complex, but it's also not trivial, and most
importantly it's not critical so you're not endangering anything running
Mahout, and we'll hopefully get it in for the next release in a couple of
months.

Hope this helps everyone feel secure going into their holiday season.

~Trevor

[1] http://slf4j.org/log4shell.html
[2] https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2140


PyMahout (incore) (alpha v0.1)

2021-01-06 Thread Trevor Grant
Hey all,

I made a branch for a thing I'm toying with. PyMahout.

See https://github.com/rawkintrevo/pymahout/tree/trunk

Right now, its sort of dumb- it just makes a couple of random incore
matrices... but it _does_ make them.

Next I want to show I can do something with DRMs.

Once I know its all possible- Ill make a batch of JIRA tickets and we can
start implementing a python like package so that in theory in a pyspark
workbook you could

```jupyter
!pip install pymahout


import pymhout

# do pymahot things here... in python.

```

So if you're interested in helping /playing- reach out on here or direct-
if there is a bunch of interest I can commit all of this to a branch as we
play with it.

Thanks!
tg


MahoutCon Tomorrow!

2020-09-30 Thread Trevor Grant
Hey all,

Tomorrow the Mahout Track of ApacheCon@Home.

Registration is free (or you can donate, if you want) but free tickets are
limited.

Check the schedule here[1] for all of the great talks tomorrow.

We'll also be posting the videos if you can't make it.

Thanks!

tg

[1] https://www.apachecon.com/acah2020/tracks/mahout.html


Re: [VOTE] Release 14.1, RC6

2020-09-10 Thread Trevor Grant
Thank you so much for getting this out Andrew.

I verified all checksums/sigs.

I successfully built the source including all tests. (I did this in the
public docker container rawkintrevo/mahout-builder-base)

I also tested the binaries in the public docker container
rawkintrevo/mahoutgui , but bashing into the running container, unpacking
both the binary archives, and then aiming and running the mahout example
notebook in turn against each of the unpacked binaries from each of the
archives.  I did this in place of spark-shell, as I think it's a more
elegant solution going forward, but would encourage others to test against
mahout spark-shell.

So given all of that, I give an enthusiastic +1

On Thu, Sep 10, 2020 at 3:37 PM Andrew Musselman  wrote:

> Binaries:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/apache-mahout-distribution/14.1/
>
> Source:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1065/org/apache/mahout/mahout/14.1/
>
> Please check checksums and signatures, run the shell, do some computation,
> run your favorite jobs, and let us know how it looks.
>
> Thanks!
>
> Best
> Andrew
>


[ANNOUNCE] Mahout Con 2020 (A sub-track of ApacheCon @ Home)

2020-08-12 Thread Trevor Grant
Hey all,

We got enough people to volunteer for talks that we are going to be putting
on our very own track at ApacheCon (@Home) this year!

Check out the schedule here:
https://www.apachecon.com/acna2020/tracks/mahout.html

To see the talks live / in real time, please register at:
https://hopin.to/events/apachecon-home

But if you can't make it- we plan on pushing all of the recorded sessions
to the website after.

Thanks so much everyone, and can't wait to 'see' you there!

tg


Re: How to do logical subsetting in Mathout

2020-07-21 Thread Trevor Grant
Very nice... How would you feel about writing some docs on this?

tg


On Tue, Jul 21, 2020 at 1:54 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
kuno.baeris...@sbb.ch> wrote:

> Hallo Andrew,
> thanks for your hint.
>
> Yes, that's way I've found too.
>
> def createIndexMap(x : CheckpointedDrm[Int]) : RDD[(Int, Int)] = {
> val xIndexFiltered = x.rdd
> .filter(r => r._2.get(0) > 0)
> .map(r => r._1)
>
> xIndexFiltered.zipWithIndex
> .map(r => (r._1,r._2.toInt))
> }
>
> First, I filter the DRM and create a map with old and new indexes, as you
> mentioned.
>
> By appling joins this index map, I'm can reduce the rows in my DRM
> according to certain condition, do some more calculation and map back the
> newly calculated values to the original DRM.
>
> Like:
> def mergeDrm(drmOrig : CheckpointedDrm[Int],drmFiltriert :
> CheckpointedDrm[Int], indexMapping: RDD[(Int, Int)]) :
> CheckpointedDrm[Int] = {
>drmWrap (
> drmOrig.rdd
> .map(r => Pair(r._1, r._2))
> .leftOuterJoin(indexMapping.map(r => Pair(r._1, r._2)))
> .map(r=> Pair(r._2._2, (r._1, r._2._1)))
> .leftOuterJoin(drmFiltriert.rdd.map(r => Pair(Option(r._1),
> r._2)))
> .map(r=> (r._2._1._1, r._2._2.getOrElse(r._2._1._2)))
> )
> }
>
> Greets
>
> Kuno
>
>
>
> -Ursprüngliche Nachricht-
> Von: Andrew Musselman 
> Gesendet: Dienstag, 7. Juli 2020 23:16
> An: user@mahout.apache.org
> Betreff: Re: How to do logical subsetting in Mathout
>
> Kuno, thanks for your note. I don't know of an equivalent function out of
> the box, but if you want to get the indices where a condition is true you
> could try something in Scala like:
>
> myList.zipWithIndex.collect { case (item, index) if item > 1 => index }
>
> Hope this is helpful.
>
> On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
> kuno.baeris...@sbb.ch> wrote:
>
> > Hi all,
> >
> > I've pumped into the Mahout, because I need to migrate a R Script
> > including matric algebra to Spark Cluster.
> >
> > Mahouts Scala/Spark Binding provides all of the operations, except of
> > logical subsetting.
> >
> > Example:
> >
> > x1 = c(1.0,4.0,2.0,5.0)
> > x2 = c(0,0,0,0)
> > x2[x1 > 1] = 2
> >
> > Would set value 2 to return Row 2,3 and 4.
> >
> > Is there an equivalent function in Mahout?
> >
> >
> > Thanks.
> >
> > Kuno
> >
> >
>


Re: Mahout Con (An ApacheCon@Home Track)

2020-07-09 Thread Trevor Grant
Hey all,

I've been out of pocket for the last week- but I saw today the CFP ends
Monday.  If anyone who is interested could submit a talk with their name /
title.

There's no limit on space, so we're keen to accept all talks of value, but
having your name and a rough abstract in the system is the important thing.

I'm pretty sure we can edit the abstracts talks later, but jic check
spelling :)

tg


On Thu, Jul 2, 2020 at 2:01 PM Andrew Musselman 
wrote:

> Thanks for the heads up Trevor! Looking forward to it, I could do the end
> to end idea if that’s interesting.
>
> Will submit something soon, do you know how long it’s open?
>
> On Tue, Jun 30, 2020 at 14:29 Trevor Grant 
> wrote:
>
> > Hey Mahout users and devs!
> >
> > This year at ApacheCon(@Home), we're doing a Mahout track!
> >
> > We want to see lots of things about Mahout, and we'll (hopefully get
> > recordings and put them on the website).
> >
> > Haven't you always wanted to give a talk on Mahout ? well here's your
> > chance.
> >
> > We're taking anything Mahout related, but in case you need some ideas to
> > get you started:
> >
> > * Getting Started with Mahout: From Installing to Basic Samsera Shell (to
> > Apache Zeppelin integration?!)
> >
> > * Deep Dive on Specific Features of Mahout : (e.g. Pat may do a CCO)
> >
> > * How our company used Apache Mahout to implement features (I'm looking
> at
> > you Cars.com :P )
> >
> > * Mahout on Kubeflow (I'm going to do this one, but don't let that stop
> you
> > from doing one yourself!)
> >
> > * Mahout + Docker/K8s (A bit of overlap, with prior, but no shame there).
> >
> > * And much much more.
> >
> > The CFP is available here https://www.apachecon.com/cfp.html please fill
> > it
> > out,
> > https://www.apachecon.com/acna2020/cfp.html
> >
> > And thanks all!
> >
> > tg
> >
>


Mahout Con (An ApacheCon@Home Track)

2020-06-30 Thread Trevor Grant
Hey Mahout users and devs!

This year at ApacheCon(@Home), we're doing a Mahout track!

We want to see lots of things about Mahout, and we'll (hopefully get
recordings and put them on the website).

Haven't you always wanted to give a talk on Mahout ? well here's your
chance.

We're taking anything Mahout related, but in case you need some ideas to
get you started:

* Getting Started with Mahout: From Installing to Basic Samsera Shell (to
Apache Zeppelin integration?!)

* Deep Dive on Specific Features of Mahout : (e.g. Pat may do a CCO)

* How our company used Apache Mahout to implement features (I'm looking at
you Cars.com :P )

* Mahout on Kubeflow (I'm going to do this one, but don't let that stop you
from doing one yourself!)

* Mahout + Docker/K8s (A bit of overlap, with prior, but no shame there).

* And much much more.

The CFP is available here https://www.apachecon.com/cfp.html please fill it
out,
https://www.apachecon.com/acna2020/cfp.html

And thanks all!

tg


ApacheCon 2020

2020-02-27 Thread Trevor Grant
Hey all,

ApacheCon 2020 is coming up.  If you have any cool papers or cool things
you're doing with Mahout (or any Apache project), make sure to put in a
CFP.  It closes in like 63 days (I'm guessing, don't wait 62 days to
submit, do it now).

Also- there's an open call for hackathons.  It might be a good opportunity
to refactor all the poms... my only question is do we want to (can we) wait
that long?

https://www.apachecon.com

tg


ASF Community Survey

2019-12-05 Thread Trevor Grant
Including the following on behalf of Apache D&I

Hello everyone,

If you have an apache.org email, you should have received an email with an
invitation to take the 2020 ASF Community Survey. Please take 15 minutes to
complete it.

If you do not have an apache.org email address or you didn’t receive a
link, please follow this link to the survey:
https://communitysurvey.limequery.org/454363

This survey is important because it will provide us with scientific
information about our community, and shed some light on how we can
collaborate better and become more diverse. Our last survey of this kind
was implemented in 2016, which means that our existing data about Apache
communities is outdated. The deadline to complete the survey is January
4th, 2020. You can find information about privacy on the survey’s
Confluence page [1].

Your participation is paramount to the success of this project! Please
consider filling out the survey, and share this news with your fellow
Apache contributors. As individuals form the Apache community, your opinion
matters: we want to hear your voice.

If you have any questions about the survey or otherwise, please reach out
to us!

Kindly,
ASF Diversity & Inclusion
https://diversity.apache.org/


Re: dssvd documentation

2019-02-28 Thread Trevor Grant
looks good now, thanks for the call out.
tg


On Thu, Feb 28, 2019 at 1:28 PM Trevor Grant 
wrote:

> I updated it - but Jenkins is what builds the site, and there are some
> issues with that right now, so not sure when it will update teh website.
>
> tg
>
>
> On Thu, Feb 28, 2019 at 1:15 PM Trevor Grant 
> wrote:
>
>> It is...
>>
>> I'll go see if i can fix it.
>>
>>
>> On Thu, Feb 28, 2019 at 1:13 PM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>>
>>> Hi, I'm very interested in your distributed stochastic singular value
>>> decomposition algorithm. I'm curious whether the math here
>>> <
>>> https://mahout.apache.org/docs/latest/algorithms/linear-algebra/d-ssvd.html
>>> >
>>> is supposed to be rendered?
>>>
>>> Alex
>>>
>>


Re: dssvd documentation

2019-02-28 Thread Trevor Grant
I updated it - but Jenkins is what builds the site, and there are some
issues with that right now, so not sure when it will update teh website.

tg


On Thu, Feb 28, 2019 at 1:15 PM Trevor Grant 
wrote:

> It is...
>
> I'll go see if i can fix it.
>
>
> On Thu, Feb 28, 2019 at 1:13 PM Alexander Lindsay <
> alexlindsay...@gmail.com> wrote:
>
>> Hi, I'm very interested in your distributed stochastic singular value
>> decomposition algorithm. I'm curious whether the math here
>> <
>> https://mahout.apache.org/docs/latest/algorithms/linear-algebra/d-ssvd.html
>> >
>> is supposed to be rendered?
>>
>> Alex
>>
>


Re: dssvd documentation

2019-02-28 Thread Trevor Grant
It is...

I'll go see if i can fix it.


On Thu, Feb 28, 2019 at 1:13 PM Alexander Lindsay 
wrote:

> Hi, I'm very interested in your distributed stochastic singular value
> decomposition algorithm. I'm curious whether the math here
> <
> https://mahout.apache.org/docs/latest/algorithms/linear-algebra/d-ssvd.html
> >
> is supposed to be rendered?
>
> Alex
>


[ANNOUNCE] Apache Roadshow Chicago, Call for Presentations

2019-01-10 Thread Trevor Grant
Hello Mahouters,

I’m writing to let you know about an exciting event coming to the Chicago
area: The Apache Roadshow Chicago.  It will be held May 13th and 14th at
three bars in the Logan Square neighborhood (Revolution Brewing, The
Native, and the Radler).

There will be six tracks:

   -

   Apache in Adtech:  Tell us how Apache works in your advertising stack
   -

   Apache in Fintech: Tell us how Apache works in your finance/insurance
   business
   -

   Apache in Startups: Tell us how you’re using Apache in your startup
   -

   Diversity in Apache: How do we increase and encourage diversity in
   Apache and tech fields overall
   -

   Made in Chicago: Apache related things made by people in Chicago that
   don’t fall into other buckets
   -

   Project Shark Tank: Do you want more developers or users for your Apache
   project? Come here and pitch it!


This is an exciting change to learn about how Apache Projects are in use in
production around Chicago, how business users make the decision to use
Apache projects, to learn about exciting new projects that want help from
developers like you, and how/why to increase diversity in tech and IT.

If you have any use cases of Apache products in Adtech, Fintech, or
Startups; if you represent a minority working in tech and have perspectives
to share, if you live in the Chicagoland area and want to highlight some
work you’ve done on an Apache project, or if you want to get other people
excited to come work on your project, then please submit a CFP before the
deadline on February 15th!

Tickets to the Apache Roadshow Chicago are $100; speakers will get a
complimentary ticket.

We’re looking forward to reading your submissions and seeing you there on
May 13-14!

Sincerely,

Trevor Grant

https://www.apachecon.com/chiroadshow19/cfp.html

https://www.apachecon.com/chiroadshow19/register.html


Re: Hangouts

2018-08-30 Thread Trevor Grant
I'm -6

On Thu, Aug 30, 2018 at 6:11 AM Andrew Musselman 
wrote:

> Sure, I'm currently -7
>
> On Thu, Aug 30, 2018 at 3:27 AM Ivan Serdyuk  >
> wrote:
>
> > OK, it is OK for any date.
> >
> > But please, don't use hours which is a from 00:00 till 9:00 in my GMT +3.
> > Let's figure out a suitable hours, which would fit with diff. GMTs.
> >
> > Perhaps everybody could share his(her?) GMT?
> >
> > On Wed, Aug 29, 2018 at 7:04 PM Andrew Musselman <
> > andrew.mussel...@gmail.com>
> > wrote:
> >
> > > Some people are traveling this week so I propose next Friday the
> seventh;
> > > I'll post an invite.
> > >
> > > On Wed, Aug 29, 2018 at 8:04 AM Ajay Sharma  wrote:
> > >
> > > > Works for me
> > > >
> > > > On Wed, 29 Aug 2018 at 15.29, Ivan Serdyuk <
> > local.tourist.k...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > What about the conf call? September 1 is Saturday - do you have any
> > > > > suggestions about the hour? Or there is another date?
> > > > >
> > > > > Please share a Google calendar event invitation. Thanks.
> > > > >
> > > > > Ivan
> > > > >
> > > > > On Mon, Aug 6, 2018 at 9:07 PM Andrew Musselman <
> > > > > andrew.mussel...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > We've used Google Hangouts for audio/video/screenshare, works
> > pretty
> > > > > > well.
> > > > > >
> > > > > > Seems like the next couple weeks are booked for people, so August
> > > 24th
> > > > > > and September 1st would be the next candidates.
> > > > > >
> > > > > > I'll put something on the calendar based on interest and put the
> > link
> > > > > > in a mail here.
> > > > > >
> > > > > > On Sat, Aug 4, 2018 at 12:19 AM, Ivan Serdyuk
> > > > > >  wrote:
> > > > > > > Is that some sort of an offline meeting? Cause if that is a
> conf
> > > call
> > > > > > > - do
> > > > > > > you have a date?
> > > > > > >
> > > > > > > Ivan
> > > > > > >
> > > > > > > On Wed, Aug 1, 2018 at 2:25 AM, Dmitriy Lyubimov <
> > > dlie...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >>  I am on vacation this week fyi
> > > > > > >>
> > > > > > >>  On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman <
> > > > > > >>  andrew.mussel...@gmail.com> wrote:
> > > > > > >>
> > > > > > >>  > Cool, I'll shoot for something on Friday early Pacific time
> > and
> > > > > > >> put an
> > > > > > >>  > invite in here; looking forward to it!
> > > > > > >>  >
> > > > > > >>  > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn <
> > > squ...@gatech.edu
> > > > >
> > > > > > >> wrote:
> > > > > > >>  >
> > > > > > >>  > > Weekdays are better for me, so my vote would be a Friday
> > > > > > >> morning.
> > > > > > >>  > >
> > > > > > >>  > > On 7/27/18 5:55 PM, Ivan Serdyuk wrote:
> > > > > > >>  > > > Works for me.
> > > > > > >>  > > >
> > > > > > >>  > > > On Sat, Jul 28, 2018 at 12:14 AM, Andrew Musselman <
> > > > > > >>  > > > andrew.mussel...@gmail.com> wrote:
> > > > > > >>  > > >
> > > > > > >>  > > >> Weekends okay for people? Or Friday, morning Pacific
> > Time?
> > > > > > >>  > > >>
> > > > > > >>  > > >> On Tue, Jul 24, 2018 at 3:20 PM Trevor Grant <
> > > > > > >>  > trevor.d.gr...@gmail.com>
> > > > > > >>  > > >> wrote:
> > > > > > >>  > > >>
> > > > > > >>  > > >>> yea sounds good.
> > > > > > >>  > > >>>
> > > > > > >>  > > >>> On Mon, Jul 23, 2018 at 12:32 PM, Andrew Musselman <
> > > > > > >>  > > >>> andrew.mussel...@gmail.com> wrote:
> > > > > > >>  > > >>>
> > > > > > >>  > > >>>> Hi all, any interest in a hangout meeting next month
> > to
> > > > > > >> catch up
> > > > > > >>  on
> > > > > > >>  > a
> > > > > > >>  > > >>>> release/blockers?
> > > > > > >>  > > >>>>
> > > > > > >>  > >
> > > > > > >>  > >
> > > > > > >>  >
> > > > > > >>
> > > > > >
> > > > >
> > > > --
> > > >
> > > >
> > > > Best Regards
> > > > Ajay Sharma
> > > >
> > >
> >
>


Re: Hangouts

2018-08-29 Thread Trevor Grant
+1 to September 7th.

On Wed, Aug 29, 2018 at 9:04 AM Andrew Musselman 
wrote:

> Some people are traveling this week so I propose next Friday the seventh;
> I'll post an invite.
>
> On Wed, Aug 29, 2018 at 8:04 AM Ajay Sharma  wrote:
>
> > Works for me
> >
> > On Wed, 29 Aug 2018 at 15.29, Ivan Serdyuk  >
> > wrote:
> >
> > > What about the conf call? September 1 is Saturday - do you have any
> > > suggestions about the hour? Or there is another date?
> > >
> > > Please share a Google calendar event invitation. Thanks.
> > >
> > > Ivan
> > >
> > > On Mon, Aug 6, 2018 at 9:07 PM Andrew Musselman <
> > > andrew.mussel...@gmail.com>
> > > wrote:
> > >
> > > > We've used Google Hangouts for audio/video/screenshare, works pretty
> > > > well.
> > > >
> > > > Seems like the next couple weeks are booked for people, so August
> 24th
> > > > and September 1st would be the next candidates.
> > > >
> > > > I'll put something on the calendar based on interest and put the link
> > > > in a mail here.
> > > >
> > > > On Sat, Aug 4, 2018 at 12:19 AM, Ivan Serdyuk
> > > >  wrote:
> > > > > Is that some sort of an offline meeting? Cause if that is a conf
> call
> > > > > - do
> > > > > you have a date?
> > > > >
> > > > > Ivan
> > > > >
> > > > > On Wed, Aug 1, 2018 at 2:25 AM, Dmitriy Lyubimov <
> dlie...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >>  I am on vacation this week fyi
> > > > >>
> > > > >>  On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman <
> > > > >>  andrew.mussel...@gmail.com> wrote:
> > > > >>
> > > > >>  > Cool, I'll shoot for something on Friday early Pacific time and
> > > > >> put an
> > > > >>  > invite in here; looking forward to it!
> > > > >>  >
> > > > >>  > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn <
> squ...@gatech.edu
> > >
> > > > >> wrote:
> > > > >>  >
> > > > >>  > > Weekdays are better for me, so my vote would be a Friday
> > > > >> morning.
> > > > >>  > >
> > > > >>  > > On 7/27/18 5:55 PM, Ivan Serdyuk wrote:
> > > > >>  > > > Works for me.
> > > > >>  > > >
> > > > >>  > > > On Sat, Jul 28, 2018 at 12:14 AM, Andrew Musselman <
> > > > >>  > > > andrew.mussel...@gmail.com> wrote:
> > > > >>  > > >
> > > > >>  > > >> Weekends okay for people? Or Friday, morning Pacific Time?
> > > > >>  > > >>
> > > > >>  > > >> On Tue, Jul 24, 2018 at 3:20 PM Trevor Grant <
> > > > >>  > trevor.d.gr...@gmail.com>
> > > > >>  > > >> wrote:
> > > > >>  > > >>
> > > > >>  > > >>> yea sounds good.
> > > > >>  > > >>>
> > > > >>  > > >>> On Mon, Jul 23, 2018 at 12:32 PM, Andrew Musselman <
> > > > >>  > > >>> andrew.mussel...@gmail.com> wrote:
> > > > >>  > > >>>
> > > > >>  > > >>>> Hi all, any interest in a hangout meeting next month to
> > > > >> catch up
> > > > >>  on
> > > > >>  > a
> > > > >>  > > >>>> release/blockers?
> > > > >>  > > >>>>
> > > > >>  > >
> > > > >>  > >
> > > > >>  >
> > > > >>
> > > >
> > >
> > --
> >
> >
> > Best Regards
> > Ajay Sharma
> >
>


Re: Hangouts

2018-07-24 Thread Trevor Grant
yea sounds good.

On Mon, Jul 23, 2018 at 12:32 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Hi all, any interest in a hangout meeting next month to catch up on a
> release/blockers?
>


Re: Speaking about Mahout implementation, use cases. Workshops

2018-05-15 Thread Trevor Grant
Hey Ivan,

Would love to give a talk- not sure I can be on site unless you can fund my
travel :)

Please reach out directly to me and we can discuss further.

Thanks

Tg


On Sun, May 13, 2018, 12:45 PM Ivan Serdyuk 
wrote:

> Greetings, people.
>
>  I am a co-organizer of Ukrainian Scala/Java user group:
>
> https://www.facebook.com/Kyiv-Scala-Group-223492434893596/
> https://www.facebook.com/groups/KarazinScalaUsersGroup/about/
> https://www.meetup.com/meetup-group-kyiv-scala-group/
>
> We are curious if there are any people interested to speak remotely or
> on-site, for us. To speak about the project, it's implementation
> (architecture, algorithms), commercial use cases. Perhaps you could be
> interested to recruit/involve new contributors, as well.
>
> Ivan
>


Congrats Palumbo and Holden

2018-05-02 Thread Trevor Grant
Both were just elected new ASF members!!

https://s.apache.org/D6iz


Re: distributed cholesky on mahout

2018-04-19 Thread Trevor Grant
Hey Qifan,

I think you can do a distributed QR if the matrix is thin.

http://mahout.apache.org/docs/latest/algorithms/linear-algebra/d-qr.html

dqrThin(drmA) I think you want.



On Thu, Apr 19, 2018 at 10:09 AM, Ted Dunning  wrote:

> There was a variant of cholesky decomposition in Mahout at one time not so
> long ago. I would guess that it is still there.
>
> It is difficult to make a truly distributed version of QR decomposition,
> but for the purposes of the randomized SVD in Mahout, it wasn't actually
> necessary to have a true QR.
>
> I don't have a pointer handy and I am not even sure that this code is still
> in Mahout.
>
> Sorry about that.
>
>
>
> On Wed, Apr 18, 2018 at 7:08 PM, QIFAN PU  wrote:
>
> > Hi,
> >
> > I'm wondering if distributed cholesky decomposition on mahout is
> supported
> > now.
> > From this doc:
> > https://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > It seems that the implementation is single-node?
> >
> > Thanks,
> > Qifan
> >
>


Updating Wikipedia

2018-02-18 Thread Trevor Grant
Is anyone good at Wikipedia?

We're still listed as being primarily running on Hadoop there.

https://en.wikipedia.org/wiki/Apache_Mahout

If anyone has some skills/time- an update would be cool...


Re: Apache Mahout Slack Channel

2018-02-08 Thread Trevor Grant
It is- but if you email me (most have been with out user/dev CCd) I can
send you an invite so you don't need it.

I'll send you the invite now Khatwani,

When you log in- go to the #mahout channel.


On Wed, Feb 7, 2018 at 9:30 PM, KHATWANI PARTH BHARAT <
h2016...@pilani.bits-pilani.ac.in> wrote:

> Is email with @apache.org domain necessary to sign up for the slack?
>
> Thanks & Regards
> Parth Khatwani
>
> On 08-Feb-2018 7:48 am, "Trevor Grant"  wrote:
>
> > For those who've been invited- when you get into the slack, look for the
> > channel #mahout
> >
> > Thanks
> >
> > On Wed, Feb 7, 2018 at 9:18 AM, Aditya  wrote:
> >
> > > Great! Can't wait to join the channel!
> > >
> > > On Wed, Feb 7, 2018 at 8:12 PM, Trevor Grant  >
> > > wrote:
> > >
> > > > Hello everyone!
> > > >
> > > > I wanted to make you all aware that we are using a Slack Channel
> > > (#mahout)
> > > > on the-asf.slack.com
> > > >
> > > > If anyone is interested in joining- I'm pretty sure you can just go
> > there
> > > > and sign up.  Email me privately if you need an invite.
> > > >
> > > > Thanks!
> > > >
> > > > tg
> > > >
> > > >
> > > > PS>
> > > > https://www.youtube.com/watch?v=PWjd9xyFrzk&index=9&list=
> > > > PLqxhJj6bcnY8Mb5qSKiQ_tpYR-gF-YhvG
> > > > <https://www.google.com/url?q=https://www.youtube.com/watch?
> > > > v%3DPWjd9xyFrzk%26index%3D9%26list%3DPLqxhJj6bcnY8Mb5qSKiQ_tpYR-
> > > > gF-YhvG&sa=D&ust=1518100899049000&usg=AFQjCNENRj8zcRHSmJnWKRIEARUCnq
> > > appg>
> > > >
> > >
> >
>


Re: Apache Mahout Slack Channel

2018-02-07 Thread Trevor Grant
For those who've been invited- when you get into the slack, look for the
channel #mahout

Thanks

On Wed, Feb 7, 2018 at 9:18 AM, Aditya  wrote:

> Great! Can't wait to join the channel!
>
> On Wed, Feb 7, 2018 at 8:12 PM, Trevor Grant 
> wrote:
>
> > Hello everyone!
> >
> > I wanted to make you all aware that we are using a Slack Channel
> (#mahout)
> > on the-asf.slack.com
> >
> > If anyone is interested in joining- I'm pretty sure you can just go there
> > and sign up.  Email me privately if you need an invite.
> >
> > Thanks!
> >
> > tg
> >
> >
> > PS>
> > https://www.youtube.com/watch?v=PWjd9xyFrzk&index=9&list=
> > PLqxhJj6bcnY8Mb5qSKiQ_tpYR-gF-YhvG
> > <https://www.google.com/url?q=https://www.youtube.com/watch?
> > v%3DPWjd9xyFrzk%26index%3D9%26list%3DPLqxhJj6bcnY8Mb5qSKiQ_tpYR-
> > gF-YhvG&sa=D&ust=1518100899049000&usg=AFQjCNENRj8zcRHSmJnWKRIEARUCnq
> appg>
> >
>


Apache Mahout Slack Channel

2018-02-07 Thread Trevor Grant
Hello everyone!

I wanted to make you all aware that we are using a Slack Channel  (#mahout)
on the-asf.slack.com

If anyone is interested in joining- I'm pretty sure you can just go there
and sign up.  Email me privately if you need an invite.

Thanks!

tg


PS>
https://www.youtube.com/watch?v=PWjd9xyFrzk&index=9&list=PLqxhJj6bcnY8Mb5qSKiQ_tpYR-gF-YhvG



Re: compile mahout for spark2.2 and scala2.11

2017-12-13 Thread Trevor Grant
Hi Pere,

Do you need ViennaCL support? ViennaCL is basically prototype GPU support.

If not, change viennacl as described on that webpage:
cd buildtools/
./change-scala-version.sh 2.11

cd ..
mvn clean package -Pscala-2.11, -Dspark.version=2.2.0
-Dspark.compat.version=2.2 -DskipTests

Let me know if that helps or if you need more assistance.

tg


On Wed, Dec 13, 2017 at 2:11 PM, Pere Urbón Bayes 
wrote:

> Hi,
>   my name is Pere, for a requirement in a project I am working on right now
> I do need to compile mahout with spark2.2 and scala2.11. Sorry If I ask
> stupid questions, is first time I dive into mahout project.
>
> For this task I am mostly following the work and comments that started with
> this thread.
>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201712.mbox/%
> 3CVI1PR08MB2638653EEFA810EEED1739A8FE3C0@VI1PR08MB2638.
> eurprd08.prod.outlook.com%3E
>
> and procedure as in https://github.com/apache/mahout/pull/335
>
> however when I compile the and do the package command in maven, I found
> that hthe viennacl package breaks.
>
> This is the last part of the error:
>
> [ERROR] Failed to execute goal
> org.codehaus.mojo:exec-maven-plugin:1.1.1:exec (viennacl-2.11) on project
> apache-mahout-distribution: Result of /bin/sh -c cd
> /home/purbon/apache/mahout/viennacl && mvn package -Dscala.version=2.11.8
> -Dscala.compat.version=2.11 -DskipTests execution is: '1'. -> [Help 1]
>
>
> I have followed setup to install it as recommended from
> http://mahout.apache.org/developers/buildingmahout.html
>
> As I understood from the source of your previous comments on the thread,
> looks like I should be able to compile the version, however I can not, can
> you help me with that?
>
> Does it makes sense to skip the viennacl package build?
>
>
> Looking forward to get this understood! and help where I can.
>
>
> - purbon
>


Re: Mahout and Spark 2.2 compatibility

2017-12-04 Thread Trevor Grant
Hi Marc,

Actually, it's not THAT hard to get Spark 2.2 compatibility for Mahout.

The code base is more or less compatible.  The issues were with respect to
Spark dependencies, specifically.

1. You have to build with Java 1.8
2. You have to have Hadoop 2.6+ (I think) support.

PR #335 outlines this pretty well.

https://github.com/apache/mahout/pull/335

This can be done from source, I'd have to double check the specific mvn
command line you would need.

As for supporting with a binary, maybe 0.13.2 (we're hoping to get spark
2.0/2.1 support in upcoming 0.13.1)

The hangups with releasing a Spark 2.2 binary is that we will have to
version bump java to 2.2 and hadoop to 2.6 or 2.7, and I'm not sure we were
ready to do that ATM.  Mainly bc it might alienate some users, and
secondarily Java 1.8 for a release is a real stickler on JavaDocs, and
there is a bit of work required to go back and clean them all up.

Let me know if building from source is an option at your company.  I'll
attempt to figure out how to build for SPark 2.2, and try to update the
website. I'll also post back here.

tg


On Mon, Dec 4, 2017 at 11:18 AM, Marc Cardus Garcia  wrote:

> Hello all,
>
>
> First time I write into this mailing list, so if there is something wrong
> with my message please let me know.
>
>
> I work for a company using Mahout and Spark. We have recently started a
> project using Spark 2.2 and we would like to use Mahout but if I am not
> wrong, according to his issue MAHOUT-2000 apache.org/jira/browse/MAHOUT-2000> there is still no compatibility
> between Spark 2.x and Mahout.  So regarding this issue I have to questions:
>
>
>   *   Is it planned to release this compatibility into the near future?
>   *
> If I want to add/help adding Spark 2.2.0 as supported binary release? How
> could I do that?
>
> Thank you,
> Marc.
>
>
>
>
> Marc Cardús Garcia
> Data Engineer | Data Science and Big Data Analytics [Web Eurecat] <
> http://eurecat.org/>
>  +34 932 381 400   |  **  |  marc.car...@eurecat.org marc.car...@eurecat.org>
>   Carrer Camí Antic de València 54-56, Edifici A - 08005 - Barcelona
> www.eurecat.org
> @Eurecat_news
>
>
>
> 
> DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè
> no n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber
> immediatament a la següent adreça: le...@eurecat.org Si el destinatari
> d'aquest missatge no consent la utilització del correu electrònic via
> Internet i la gravació de missatges, li preguem que ens ho comuniqui
> immediatament.
>
> DISCLAIMER: Este mensaje puede contener información confidencial. Si usted
> no es el destinatario del mensaje, por favor bórrelo y notifíquenoslo
> inmediatamente a la siguiente dirección: le...@eurecat.org Si el
> destinatario de este mensaje no consintiera la utilización del correo
> electrónico vía Internet y la grabación de los mensajes, rogamos lo ponga
> en nuestro conocimiento de forma inmediata.
>
> DISCLAIMER: Privileged/Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message you should
> destroy this message, and notify us immediately to the following address:
> le...@eurecat.org. If the addressee of this message does not consent to
> the use of Internet e-mail and message recording, please notify us
> immediately.
> 
>
>
>


Re: Running Mahout on a Spark cluster

2017-10-03 Thread Trevor Grant
The spark is included via maven classifier-

the sbt line should be

libraryDependencies += "org.apache.mahout" % "mahout-spark_2.11" %
"0.13.1-SNAPSHOT" classifier "spark_2.1"


On Tue, Oct 3, 2017 at 2:55 PM, Pat Ferrel  wrote:

> I’m the aforementioned pferrel
>
> @Hoa, thanks for that reference, I forgot I had that example. First don’t
> use the Hadoop part of Mahout, it is not supported and will be deprecated.
> The Spark version of cooccurrence will be supported. You find it in the
> SimilarityAnalysis object.
>
> If you go back to the last release you should be able to make that
> https://github.com/pferrel/3-input-cooc <https://github.com/pferrel/3-
> input-cooc> work with version updates to Mahout-0.13.0 and dependencies.
> To use the latest master of Mahout, there are the problems listed below.
>
>
> I’m having a hard time building with sbt using the mahout-spark module
> when I build that latest mahout master with `mvn clean install`. This puts
> the mahout-spark module in the local ~/.m2 maven cache. The structure
> doesn’t match what SBT expects the path and filenames to be.
>
> The build.sbt  `libraryDependencies` line *should* IMO be:
> `"org.apache.mahout" %% "mahout-spark-2.1" % “0.13.1-SNAPSHOT`
>
> This is parsed by sbt to yield the path of :
> org/apache/mahout/mahout-spark-2.1/0.13.1-SNAPSHOT/
> mahout-spark-2.1_2.11-0.13.1-SNAPSHOT.jar
>
> unfortunately the outcome of `mvn clean install` currently is (I think):
> org/apache/mahout/mahout-spark/0.13.1-SNAPSHOT/mahout-
> spark-0.13.1-SNAPSHOT-spark_2.1.jar
>
> I can’t find a way to make SBT parse that structure and name.
>
>
> On Oct 2, 2017, at 11:02 PM, Trevor Grant 
> wrote:
>
> Code pointer:
> https://github.com/rawkintrevo/cylons/tree/master/eigenfaces
>
> However, I build Mahout (0.13.1-SNAPSHOT) locally with
>
> mvn clean install -Pscala-2.11,spark-2.1,viennacl-omp -DskipTests
>
> That's how maven was able to pick those up.
>
>
> On Fri, Sep 22, 2017 at 10:06 PM, Hoa Nguyen 
> wrote:
>
> > Hey all,
> >
> > Thanks for the offers of help. I've been able to narrow down some of the
> > problems to version incompatibility and I just wanted to give an update.
> > Just to back track a bit, my initial goal was to run Mahout on a
> > distributed cluster whether that was running Hadoop Map Reduce or Spark.
> >
> > I started out trying to get it to run on Spark, which I have some
> > familiarity, but that didn't seem to work. While the error messages seem
> to
> > indicate there weren't enough resources on the workers ("WARN
> > scheduler.TaskSchedulerImpl: Initial job has not accepted any resources;
> > check your cluster UI to ensure that workers are registered and have
> > sufficient memory"), I'm pretty sure that wasn't the case, not only
> because
> > it's a 4 node cluster of m4.xlarges, I was able to run another, simpler
> > Spark batch job on that same distributed cluster.
> >
> > After a bit of wrangling, I was able to narrow down some of the issues.
> It
> > turns out I was kind of blindly using this repo https://github.com/
> > pferrel/3-input-cooc as a guide without fully realizing that it was from
> > several years ago and based on Mahout 0.10.0, Scala 2.10 and Spark 1.1.1
> > That is significantly different from my environment, which has Mahout
> > 0.13.0 and Spark 2.1.1 installed, which also means I have to use Scala
> > 2.11. After modifying the build.sbt file to account for those versions, I
> > now have compile type mismatch issues that I'm just not that savvy to fix
> > (see attached screenshot if you're interested).
> >
> > Anyway, the good news that I was able to finally get Mahout code running
> > on Hadoop map-reduce, but also after a bit wrangling. It turned out my
> > instances were running Ubuntu 14 and apparently that doesn't play well
> with
> > Hadoop 2.7.4, which prevented me from running any sample Mahout code
> (from
> > here: https://github.com/apache/mahout/tree/master/examples/bin) that
> > relied on map-reduce. Those problems went away after I installed Hadoop
> > 2.8.1 instead. Now I'm able to get the shell scripts running on a
> > distributed Hadoop cluster (yay!).
> >
> > Anyway, if anyone has more recent and working Spark Scala code that uses
> > Mahout that they can point me to, I'd appreciate it.
> >
> > Many thanks!
> > Hoa
> >
> > On Fri, Sep 22, 2017 at 1:09 AM, Trevor Grant 
> > wrote:
> >
> >> Hi Hoa,
> >>
> >> A few 

Re: Running Mahout on a Spark cluster

2017-10-02 Thread Trevor Grant
Code pointer:
https://github.com/rawkintrevo/cylons/tree/master/eigenfaces

However, I build Mahout (0.13.1-SNAPSHOT) locally with

mvn clean install -Pscala-2.11,spark-2.1,viennacl-omp -DskipTests

That's how maven was able to pick those up.


On Fri, Sep 22, 2017 at 10:06 PM, Hoa Nguyen 
wrote:

> Hey all,
>
> Thanks for the offers of help. I've been able to narrow down some of the
> problems to version incompatibility and I just wanted to give an update.
> Just to back track a bit, my initial goal was to run Mahout on a
> distributed cluster whether that was running Hadoop Map Reduce or Spark.
>
> I started out trying to get it to run on Spark, which I have some
> familiarity, but that didn't seem to work. While the error messages seem to
> indicate there weren't enough resources on the workers ("WARN
> scheduler.TaskSchedulerImpl: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient memory"), I'm pretty sure that wasn't the case, not only because
> it's a 4 node cluster of m4.xlarges, I was able to run another, simpler
> Spark batch job on that same distributed cluster.
>
> After a bit of wrangling, I was able to narrow down some of the issues. It
> turns out I was kind of blindly using this repo https://github.com/
> pferrel/3-input-cooc as a guide without fully realizing that it was from
> several years ago and based on Mahout 0.10.0, Scala 2.10 and Spark 1.1.1
> That is significantly different from my environment, which has Mahout
> 0.13.0 and Spark 2.1.1 installed, which also means I have to use Scala
> 2.11. After modifying the build.sbt file to account for those versions, I
> now have compile type mismatch issues that I'm just not that savvy to fix
> (see attached screenshot if you're interested).
>
> Anyway, the good news that I was able to finally get Mahout code running
> on Hadoop map-reduce, but also after a bit wrangling. It turned out my
> instances were running Ubuntu 14 and apparently that doesn't play well with
> Hadoop 2.7.4, which prevented me from running any sample Mahout code (from
> here: https://github.com/apache/mahout/tree/master/examples/bin) that
> relied on map-reduce. Those problems went away after I installed Hadoop
> 2.8.1 instead. Now I'm able to get the shell scripts running on a
> distributed Hadoop cluster (yay!).
>
> Anyway, if anyone has more recent and working Spark Scala code that uses
> Mahout that they can point me to, I'd appreciate it.
>
> Many thanks!
> Hoa
>
> On Fri, Sep 22, 2017 at 1:09 AM, Trevor Grant 
> wrote:
>
>> Hi Hoa,
>>
>> A few things could be happening here, I haven't run across that specific
>> error.
>>
>> 1) Spark 2.x - Mahout 0.13.0: Mahout 0.13.0 WILL run on Spark 2.x, however
>> you need to build from source (not the binaries).  You can do this by
>> downloading mahout source or cloning the repo and building with:
>> mvn clean install -Pspark-2.1,scala-2.11 -DskipTests
>>
>> 2) Have you setup spark with Kryo serialization? How you do this depends
>> on
>> if you're in the shell/zeppelin or using spark submit.
>>
>> However, for both of these cases- it shouldn't have even run local afaik
>> so
>> the fact it did tells me you probably have gotten this far?
>>
>> Assuming you've done 1 and 2, can you share some code? I'll see if I can
>> recreate on my end.
>>
>> Thanks!
>>
>> tg
>>
>> On Thu, Sep 21, 2017 at 9:37 PM, Hoa Nguyen 
>> wrote:
>>
>> > I apologize in advance if this is too much of a newbie question but I'm
>> > having a hard time running any Mahout example code in a distributed
>> Spark
>> > cluster. The code runs as advertised when Spark is running locally on
>> one
>> > machine but the minute I point Spark to a cluster and master url, I
>> can't
>> > get it to work, drawing the error: "WARN scheduler.TaskSchedulerImpl:
>> > Initial job has not accepted any resources; check your cluster UI to
>> ensure
>> > that workers are registered and have sufficient memory"
>> >
>> > I know my Spark cluster is configured and working correctly because I
>> ran
>> > non-Mahout code and it runs on a distributed cluster fine. What am I
>> doing
>> > wrong? The only thing I can think of is that my Spark version is too
>> recent
>> > -- 2.1.1 -- for the Mahout version I'm using -- 0.13.0. Is that it or
>> am I
>> > doing something else wrong?
>> >
>> > Thanks for any advice,
>> > Hoa
>> >
>>
>
>


Re: Running Mahout on a Spark cluster

2017-10-02 Thread Trevor Grant
Hey- sorry for long delay. I've been traveling.

Pat Ferrel was telling me he was having some simlar issues with
Spark+Mahout+SBT recently, and that we need to re-examine our naming
conventions on JARs.

Fwiw- I have several project that use Spark+Mahout in Spark 2.1/Scala-2.11,
and we even test this in our Travis CI tests, but the trick is- we use
Maven for the build. Any chance you could use maven?  If not, maybe Pat can
chime in here, I'm just not an SBT user, so I'm not 100% sure what to tell
you.



On Fri, Sep 22, 2017 at 10:06 PM, Hoa Nguyen 
wrote:

> Hey all,
>
> Thanks for the offers of help. I've been able to narrow down some of the
> problems to version incompatibility and I just wanted to give an update.
> Just to back track a bit, my initial goal was to run Mahout on a
> distributed cluster whether that was running Hadoop Map Reduce or Spark.
>
> I started out trying to get it to run on Spark, which I have some
> familiarity, but that didn't seem to work. While the error messages seem to
> indicate there weren't enough resources on the workers ("WARN
> scheduler.TaskSchedulerImpl: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient memory"), I'm pretty sure that wasn't the case, not only because
> it's a 4 node cluster of m4.xlarges, I was able to run another, simpler
> Spark batch job on that same distributed cluster.
>
> After a bit of wrangling, I was able to narrow down some of the issues. It
> turns out I was kind of blindly using this repo https://github.com/
> pferrel/3-input-cooc as a guide without fully realizing that it was from
> several years ago and based on Mahout 0.10.0, Scala 2.10 and Spark 1.1.1
> That is significantly different from my environment, which has Mahout
> 0.13.0 and Spark 2.1.1 installed, which also means I have to use Scala
> 2.11. After modifying the build.sbt file to account for those versions, I
> now have compile type mismatch issues that I'm just not that savvy to fix
> (see attached screenshot if you're interested).
>
> Anyway, the good news that I was able to finally get Mahout code running
> on Hadoop map-reduce, but also after a bit wrangling. It turned out my
> instances were running Ubuntu 14 and apparently that doesn't play well with
> Hadoop 2.7.4, which prevented me from running any sample Mahout code (from
> here: https://github.com/apache/mahout/tree/master/examples/bin) that
> relied on map-reduce. Those problems went away after I installed Hadoop
> 2.8.1 instead. Now I'm able to get the shell scripts running on a
> distributed Hadoop cluster (yay!).
>
> Anyway, if anyone has more recent and working Spark Scala code that uses
> Mahout that they can point me to, I'd appreciate it.
>
> Many thanks!
> Hoa
>
> On Fri, Sep 22, 2017 at 1:09 AM, Trevor Grant 
> wrote:
>
>> Hi Hoa,
>>
>> A few things could be happening here, I haven't run across that specific
>> error.
>>
>> 1) Spark 2.x - Mahout 0.13.0: Mahout 0.13.0 WILL run on Spark 2.x, however
>> you need to build from source (not the binaries).  You can do this by
>> downloading mahout source or cloning the repo and building with:
>> mvn clean install -Pspark-2.1,scala-2.11 -DskipTests
>>
>> 2) Have you setup spark with Kryo serialization? How you do this depends
>> on
>> if you're in the shell/zeppelin or using spark submit.
>>
>> However, for both of these cases- it shouldn't have even run local afaik
>> so
>> the fact it did tells me you probably have gotten this far?
>>
>> Assuming you've done 1 and 2, can you share some code? I'll see if I can
>> recreate on my end.
>>
>> Thanks!
>>
>> tg
>>
>> On Thu, Sep 21, 2017 at 9:37 PM, Hoa Nguyen 
>> wrote:
>>
>> > I apologize in advance if this is too much of a newbie question but I'm
>> > having a hard time running any Mahout example code in a distributed
>> Spark
>> > cluster. The code runs as advertised when Spark is running locally on
>> one
>> > machine but the minute I point Spark to a cluster and master url, I
>> can't
>> > get it to work, drawing the error: "WARN scheduler.TaskSchedulerImpl:
>> > Initial job has not accepted any resources; check your cluster UI to
>> ensure
>> > that workers are registered and have sufficient memory"
>> >
>> > I know my Spark cluster is configured and working correctly because I
>> ran
>> > non-Mahout code and it runs on a distributed cluster fine. What am I
>> doing
>> > wrong? The only thing I can think of is that my Spark version is too
>> recent
>> > -- 2.1.1 -- for the Mahout version I'm using -- 0.13.0. Is that it or
>> am I
>> > doing something else wrong?
>> >
>> > Thanks for any advice,
>> > Hoa
>> >
>>
>
>


Re: Running Mahout on a Spark cluster

2017-09-21 Thread Trevor Grant
Hi Hoa,

A few things could be happening here, I haven't run across that specific
error.

1) Spark 2.x - Mahout 0.13.0: Mahout 0.13.0 WILL run on Spark 2.x, however
you need to build from source (not the binaries).  You can do this by
downloading mahout source or cloning the repo and building with:
mvn clean install -Pspark-2.1,scala-2.11 -DskipTests

2) Have you setup spark with Kryo serialization? How you do this depends on
if you're in the shell/zeppelin or using spark submit.

However, for both of these cases- it shouldn't have even run local afaik so
the fact it did tells me you probably have gotten this far?

Assuming you've done 1 and 2, can you share some code? I'll see if I can
recreate on my end.

Thanks!

tg

On Thu, Sep 21, 2017 at 9:37 PM, Hoa Nguyen 
wrote:

> I apologize in advance if this is too much of a newbie question but I'm
> having a hard time running any Mahout example code in a distributed Spark
> cluster. The code runs as advertised when Spark is running locally on one
> machine but the minute I point Spark to a cluster and master url, I can't
> get it to work, drawing the error: "WARN scheduler.TaskSchedulerImpl:
> Initial job has not accepted any resources; check your cluster UI to ensure
> that workers are registered and have sufficient memory"
>
> I know my Spark cluster is configured and working correctly because I ran
> non-Mahout code and it runs on a distributed cluster fine. What am I doing
> wrong? The only thing I can think of is that my Spark version is too recent
> -- 2.1.1 -- for the Mahout version I'm using -- 0.13.0. Is that it or am I
> doing something else wrong?
>
> Thanks for any advice,
> Hoa
>


New Committer: Holden Karau

2017-07-17 Thread Trevor Grant
The Project Management Committee (PMC) for Apache Mahout
has invited Holden Karau to become a committer and we are pleased
to announce that she has accepted.

Holden brings a great deal of expertise and knowledge around the
Apache Spark project, and it working to improve the integration
between the two projects.

Being a committer enables easier contribution to the
project since there is no need to go via the patch
submission process. This should enable better productivity.

Please join mean in giving Holden a very warm welcome.


Re: [DISCUSS] How many binary combos do we want to release?

2017-07-10 Thread Trevor Grant
>From the Spark website:

"Note: Starting version 2.0, Spark is built with Scala 2.11 by default.
Scala 2.10 users should download the Spark source package and build with
Scala 2.10 support."

Given that, the minimum set (imho) would be:

Spark-1.6, Scala-2.10, viennacl, viennacl-omp
Spark-2.0, Scala-2.11, viennacl, viennacl-omp
Spark-2.1, Scala-2.11, viennacl, viennacl-omp

It has been pointed out that our spark-2.0 may cover all of spark 2.x, but
I haven't tested that.




On Mon, Jul 10, 2017 at 5:51 PM, Andrew Palumbo  wrote:

> Awesome!
>
>
> One point:
>
>
> INFRA may have an issue here. And we may need to move some of the older
> releases to the archives...
>
>
> We have a waiver for > the standard 200Mb cap, which should still be in
> place.. But if you start to notice that you're having trouble Uploading
> artifacts to the staging ground, It may be that we've blown their caps.
> Please let me know if this happens, and I'll figure out what needs to be
> done.
>
>
> Thanks
>
>
> --andy
>
> 
> From: Trevor Grant 
> Sent: Monday, July 10, 2017 1:30:46 PM
> To: Mahout Dev List
> Subject: [DISCUSS] How many binary combos do we want to release?
>
> In 0.13.1 we had one binary tarball.
>
> A full spread would look something like this in 0.13.2-
>
> Spark-1.6, Scala-2.10
> Spark-2.0, Scala-2.10
> Spark-2.1, Scala-2.10
> Spark-1.6, Scala-2.11
> Spark-2.0, Scala-2.11
> Spark-2.1, Scala-2.11
>
> Spark-1.6, Scala-2.10, viennacl
> Spark-2.0, Scala-2.10, viennacl
> Spark-2.1, Scala-2.10, viennacl
> Spark-1.6, Scala-2.11, viennacl
> Spark-2.0, Scala-2.11, viennacl
> Spark-2.1, Scala-2.11, viennacl
>
> Spark-1.6, Scala-2.10, viennacl-omp
> Spark-2.0, Scala-2.10, viennacl-omp
> Spark-2.1, Scala-2.10, viennacl-omp
> Spark-1.6, Scala-2.11, viennacl-omp
> Spark-2.0, Scala-2.11, viennacl-omp
> Spark-2.1, Scala-2.11, viennacl-omp
>
> Spark-1.6, Scala-2.10, viennacl, viennacl-omp
> Spark-2.0, Scala-2.10, viennacl, viennacl-omp
> Spark-2.1, Scala-2.10, viennacl, viennacl-omp
> Spark-1.6, Scala-2.11, viennacl, viennacl-omp
> Spark-2.0, Scala-2.11, viennacl, viennacl-omp
> Spark-2.1, Scala-2.11, viennacl, viennacl-omp
>
> That's 24 tarballs of pre-compiled binaries.
>
> The main thing I'm concerned about is getting all combos of spark/scala,
> viennacl/scala, viennacl-omp/scala into Maven repositories.  This can be
> accomplished with 6 tarballs:
>
> Spark-1.6, Scala-2.10, viennacl, viennacl-omp
> Spark-2.0, Scala-2.10, viennacl, viennacl-omp
> Spark-2.1, Scala-2.10, viennacl, viennacl-omp
> Spark-1.6, Scala-2.11, viennacl, viennacl-omp
> Spark-2.0, Scala-2.11, viennacl, viennacl-omp
> Spark-2.1, Scala-2.11, viennacl, viennacl-omp
>
>
> Not all users want ViennaCL (I would imagine) - A compromise might be the
> first and last 6 combinations:
>
> Spark-1.6, Scala-2.10
> Spark-2.0, Scala-2.10
> Spark-2.1, Scala-2.10
> Spark-1.6, Scala-2.11
> Spark-2.0, Scala-2.11
> Spark-2.1, Scala-2.11
>
> Spark-1.6, Scala-2.10, viennacl, viennacl-omp
> Spark-2.0, Scala-2.10, viennacl, viennacl-omp
> Spark-2.1, Scala-2.10, viennacl, viennacl-omp
> Spark-1.6, Scala-2.11, viennacl, viennacl-omp
> Spark-2.0, Scala-2.11, viennacl, viennacl-omp
> Spark-2.1, Scala-2.11, viennacl, viennacl-omp
>
> Thoughts?
>


Re: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Trevor Grant
So to tie all of this together-

org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_1

org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_1

(will jars compiled with 2.1 dependencies run on 2.0? I assume not, but I
don't know) (afaik, mahout compiled for spark 1.6.x tends to work with
spark 1.6.y, anecdotal)

A non-trivial motivation here, is we would like all of these available to
tighten up the Apache Zeppelin integration, where the user could have a
number of different spark/scala combos going on and we want it to 'just
work' out of the box (which means a wide array of binaries available, to
dmitriy's point).

I'm +1 on this, and as RM will begin cutting a provisional RC, just to try
to figure out how all of this will work (it's my first time as release
master, and this is a new thing we're doing).

72 hour lazy consensus. (will probably take me 72 hours to figure out
anyway ;) )

If no objections expect an RC on Monday evening.

tg

On Fri, Jul 7, 2017 at 3:24 PM, Holden Karau  wrote:

> Trevor looped me in on this since I hadn't had a chance to subscribe to
> the list yet (on now :)).
>
> Artifacts from cross spark-version building isn't super standardized (and
> their are two sort of very different types of cross-building).
>
> For folks who just need to build for the 1.X and 2.X and branches
> appending _spark1 & _spark2 to the version string is indeed pretty common
> and the DL4J folks do something pretty similar as Trevor pointed out.
>
> The folks over at hammerlab have made some sbt specific tooling to make
> this easier to do on the publishing side (see https://github.com/hammer
> lab/sbt-parent )
>
> It is true some people build Scala 2.10 artifacts for Spark 1.X series and
> 2.11 artifacts for Spark 2.X series only and use that to differentiate (I
> don't personally like this approach since it is super opaque and someone
> could upgrade their Scala version and then accidentally be using a
> different version of Spark which would likely not go very well).
>
> For folks who need to hook into internals and cross build against
> different minor versions there is much less of a consistent pattern,
> personally spark-testing-base is released as:
>
> [artifactname]_[scalaversion]:[sparkversion]_[artifact releaseversion]
>
> But this really only makes sense when you have to cross-build for lots of
> different Spark versions (which should be avoidable for Mahout).
>
> Since you are likely not depending on the internals of different point
> releases, I'd think the _spark1 / _spark2 is probably the right way (or
> _spark_1 / _spark_2 is fine too).
>
>
> On Fri, Jul 7, 2017 at 11:43 AM, Trevor Grant 
> wrote:
>
>>
>> -- Forwarded message --
>> From: Andrew Palumbo 
>> Date: Fri, Jul 7, 2017 at 12:28 PM
>> Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos
>> To: "d...@mahout.apache.org" 
>>
>>
>> another option for artifact names (using jars for example here):
>>
>>
>> mahout-spark-2.11_2.10-0.13.1.jar
>> mahout-spark-2.11_2.11-0.13.1.jar
>> mahout-math-scala-2.11_2.10-0.13.1.jar
>>
>>
>> i.e. ---.jar
>>
>>
>> not exactly pretty.. I somewhat prefer Trevor's idea of Dl4j convention.
>>
>> 
>> From: Trevor Grant 
>> Sent: Friday, July 7, 2017 11:57:53 AM
>> To: Mahout Dev List; user@mahout.apache.org
>> Subject: [DISCUSS] Naming convention for multiple spark/scala combos
>>
>> Hey all,
>>
>> Working on releasing 0.13.1 with multiple spark/scala combos.
>>
>> Afaik, there is no 'standard' for multiple spark versions (but I may be
>> wrong, I don't claim expertise here).
>>
>> One approach is simply only release binaries for:
>> Spark-1.6 + Scala 2.10
>> Spark-2.1 + Scala 2.11
>>
>> OR
>>
>> We could do like dl4j
>>
>> org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1
>> org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1
>>
>> org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2
>> org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2
>>
>> OR
>>
>> some other option I don't know of.
>>
>>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
>


[DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Trevor Grant
Hey all,

Working on releasing 0.13.1 with multiple spark/scala combos.

Afaik, there is no 'standard' for multiple spark versions (but I may be
wrong, I don't claim expertise here).

One approach is simply only release binaries for:
Spark-1.6 + Scala 2.10
Spark-2.1 + Scala 2.11

OR

We could do like dl4j

org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1

org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2

OR

some other option I don't know of.


Re: Proposal for changing Mahout's Git branching rules

2017-06-19 Thread Trevor Grant
First issue, one does not simply just start using a develop branch.  CI
only triggers off the 'main' branch, which is master by default.  If we
move to the way you propose, then we need to file a ticket with INFRA I
believe.  That can be done, but its not like we just start doing it one
day.

The current method is, when we cut a release- we make a new branch of that
release. Master is treated like dev. If you want the latest stable, you
would check out branch-0.13.0 .  This is the way most major projects
(citing Spark, Flink, Zeppelin), including Mahout up to version 0.10.x
worked.  To your point, there being a lack of a recent stable- that's fair,
but partly that's because no one created branches with the release for
0.10.? - 0.12.2.

For all intents and purposes, we are (now once again) following what you
propose, the only difference is we are treating master as dev, and
"branch-0.13.0" as master (e.g. last stable).  Larger features go on their
own branch until they are ready to merge- e.g. ATM there is just one
feature branch CUDA.  That was the big take away from this discussion last
time- there needed to be feature branches, as opposed to everyone running
around either working off WIP PRs or half baked merges, etc.  To that end-
"website" was a feature branch, and iirc there has been one other feature
branch that has merged in the last couple of months but I forget what it
was at the moment.






Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, Jun 19, 2017 at 8:02 PM, Pat Ferrel  wrote:

> Perhaps there is a misunderstanding about where a release comes
> from—master. So any release tools we have should work fine. It’s just that
> until you are ready to pull the trigger, development is in develop or more
> strictly a “getting a release ready” branch called a release branch. This
> sounds like a lot of branches but in practice it’s trivial to merge and
> purge. Everything stays clean and rapid fire last minute fixes are isolated
> to the release branch before going into master.
>
> The original reason I brought this up is that our Git tools now allow
> committers to delete old cruft laden branches that are created and
> ephemeral with this method.
>
>
> On Jun 19, 2017, at 5:52 PM, Pat Ferrel  wrote:
>
> I just heard we are not using git flow (the process not the tool), we are
> checking unclean (untested in any significant way) changes to master? What
> is the develop branch used for?
>
> The master is unstable most all the time with the old method, in fact
> there is *no stable bundle of source ever* without git flow. With git flow
> you can peel off a bug fix and merge with master and users can pull it
> expecting that everything else is stable and like the last build. This has
> bit me with Mahout in the past as I’m sure it has for everyone. This
> doesn’t fix that but it does limit the pain to committers.
>
> If we aren’t going to use it, fine but let’s not agree to it then do
> something else. If it’s a matter of timing ok, I understood from Andrew’s
> mail below there was no timing issue but I expect there will be Jenkins or
> Travis issues to iron out.
>
> For reference: http://nvie.com/posts/a-successful-git-branching-model/ <
> http://nvie.com/posts/a-successful-git-branching-model/> I have never
> heard of someone who has tried it that didn’t like it but it takes a leap
> of faith unless you have git in your bones.
>
>
> On Apr 22, 2017, at 10:42 AM, Andrew Musselman 
> wrote:
>
> Okay develop it is; I'll cut a develop branch from master right now.
>
> As we go, if people forget and push to master, we can merge those changes
> into develop.
>
> In addition, I'm making a 'website' branch for all work on the new version
> of the site.
>
> On Sat, Apr 22, 2017 at 10:36 AM, Pat Ferrel 
> wrote:
>
> > There are tools to implement git-flow that I haven’t used and may have
> > some standardization built in but I think “develop” is typical and safe.
> >
> >
> > On Apr 22, 2017, at 10:33 AM, Andrew Musselman <
> andrew.mussel...@gmail.com>
> > wrote:
> >
> > Cool, I'll make a new dev branch now.
> >
> > Dev, develop, any preference?
> >
> > On Sat, Apr 22, 2017 at 10:30 AM, Pat Ferrel 
> > wrote:
> >
> >> It hasn't been often but I’ve been bit by it and had to ask users of a
> >> dependent project to checkout a specific commit, nasty.
> >>
> >> The main affect would be to automation efforts that are currently wip.
> >>
> >> On Apr 22, 2017, at 10:25 AM, An

Re: Samsara's learning curve

2017-06-05 Thread Trevor Grant
Fwiw-

I think I'm about 10 hours into multi layer perceptrons, maybe another 2 to
go for docs and last unit tests.  Could have been quicker but I already
have follow on things I want to do, and am building them so that it will be
easily extendable (to LSTMs, convolution nets, etc). If I had taken some
short cuts- could have been done probably in 5-7, and a large part of that
is remembering how back-propegations works, and getting lost in my own
indices.



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Mar 29, 2017 at 11:26 AM, Pat Ferrel  wrote:

> While I agree with D and T, I’ll add a few things to watch out for.
>
> One of the hardest things to learn is the new model of execution, it’s not
> quite Spark or any other compute engine. You need to create contexts that
> have virtualized the actual compute engine. But you will probably need to
> use the actual compute engine too. Switching back and forth is fairly
> simple but must be learned and could be documented better.
>
> The other missing bit is dataframes. R and Spark have them in different
> forms but Mahout largely ignores the issue of real world object ids. Again
> not vey hard to work around and here’s hoping it's added in a future rev.
>
>
> On Mar 27, 2017, at 1:38 PM, Trevor Grant 
> wrote:
>
> I tend to agree with D.
>
> For example, I set out to do the 'Eigenfaces problem' last year, and wrote
> a blog on it.  It ended up being about 4 lines of Samsara code (+ imports),
> the "hardest" part was loading images into vectors, and then vectors back
> into images (wasn't awful, but I was new to Scala).  In addition to the
> modest marketing and a lack of introductory tutorials, is that to really
> use Mahout-Samsara in the first place you need to have a fairly good grasp
> of linear algebra, which gives it significantly less mass-appeal than say
> an mllib/sklearn/etc. Your
> I-just-got-my-data-science-certificate-from-coursera data scientists
> simply
> aren't equipped to use Mahout.  Your advanced-R-type data scientists can
> use it- but unless they have a problem that is to big for a single machine,
> have no motivation to use it (may change with native solvers, more
> algorithms, etc), and even given motivation the question then becomes learn
> Mahout OR come up with a clever trick for being able to stay in a single
> machine.
>
> But yea- a fairly easy and pleasant framework.  If you have the proper
> motivation, there is simply nothing else like it.
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Mon, Mar 27, 2017 at 12:32 PM, Dmitriy Lyubimov 
> wrote:
>
> > I believe writing in the DSL is simple enough, especially if you have
> some
> > familiarity with Scala on top of R (or, in my case, R on top of Scala
> > perhaps:). I've implemented about couple dozens customized algorithms
> that
> > used distributed Samsara algebra at least to some degree, and I think I
> can
> > reliably attest none of them ever exceeded 100 lines or so, and that it
> > significantly reduced my time dedicated to writing algebra on top of
> Spark
> > and some other backends I use under proprietary settings. I am now mostly
> > doing non-algebraic improvements because writing algebra is easy.
> >
> > The most difficult part however, at least for me, and as you can see as
> you
> > go along with the  book, was not the pecularities of R-like bindings, but
> > the algorithm reformulations. Traditional "in-memory" algorithms do not
> > work on shared-nothing backends, even though you could program them, they
> > simply will not perform.
> >
> > The main reasons some of the traditional algorithms do not work at scale
> > are because they either require random memory access, or (more often) are
> > simply super-linear w.r.t. input size, so as one scales  infrastructure
> at
> > linear cost, one would still incur less than expected increment in
> > performance (if any at all, at some point) per unit of input.
> >
> > Hence, usually some mathematically, or should i say, statistically
> > motivated tricks are still required. As the book describes, linearly or
> > sub-linearly scalable sketches, random projections, dimensionality
> > reductions etc. etc. are required to alleviate scalability issues of the
> > super-linear a

Re: How to use Mahout's model recommender in online experiments ?

2017-06-05 Thread Trevor Grant
Sorry for late response on this.

Might be worth checking out:
https://github.com/rawkintrevo/fsf17-twitter-recos

This is the corresponding talk. (relevant part starts at about 18:30)
https://youtu.be/h3j1JdtbhOI


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, May 25, 2017 at 9:51 AM, Alessandro Dias 
wrote:

> Hi,
>
> I learned in this site below how to use ALS facorization algoritm to made
> recommendations in Mahout Framework.
>
> https://mahout.apache.org/users/recommender/intro-als-hadoop.html
>
> From this:
> - we inform a file with the rating (user, item, rating), in my case I have
>  implicit ratings;
> - then get the files of the two latent matrices generated; and
> - finally we insert theses files in a recommender engine that generate a
> file with the list of recomendations for each user.
>
>
> I think that it is made for big e-commerce companies periodically. (the
> model and recomendations is built periodically in an offline moments)
>
>
>
> At my case, I'm going to do an online experiment of recommender. This model
> recommender will be the control group.
>
> I have a file with ratings of a set of old users and I will have a set of
> new users on this online experiment. The old users will not participate
> this experiment.
>
> Theses new users will use the recommener system for 2 weeks in the online
> experiment.
>
>
>
> >> How to use ALSWRFactorizer recommender (non-hadoop) from Mahout in
> online experiments ?
>
> I'd like to build a model once and use it to the new users...
>
> >> Will I have to run the algoritm (re-buid the model) in each
> recomendation made during the online experiment ?
>
> Thanks and Regards,
>
> Alessandro Dias
>


Mahout BOF at ApacheCon

2017-05-16 Thread Trevor Grant
Anyone at ApacheCon tomorrow- we're doing a Birds of a Feather breakout at
6:30- please stop by!

tg


Re: New Website is Staged

2017-05-09 Thread Trevor Grant
Fire away-

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, May 9, 2017 at 10:31 AM, Pat Ferrel  wrote:

> Are you guys ready for serious comments on the new design or is this just
> a first running version?
>
>
> On May 9, 2017, at 8:20 AM, Trevor Grant  wrote:
>
> In the interest of getting this thing up and running, use DFW Meetup video
> as a place holder for time being?
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Tue, May 9, 2017 at 10:17 AM, Andrew Palumbo 
> wrote:
>
> > I think its a great idea- I'm probably more camera shy than trevor 😊.
> > Maybe we can spread the fun around the PMC at lunch after GTC
> >
> > 
> > From: Trevor Grant 
> > Sent: Tuesday, May 9, 2017 12:02:39 AM
> > To: Mahout Dev List
> > Cc: user@mahout.apache.org
> > Subject: Re: New Website is Staged
> >
> > I agree it could be compelling. I'm somewhat video shy so I nominate our
> > PMC chair and moral compass @apalumbo
> >
> > :)
> >
> > On May 8, 2017 6:29 PM, "Kimberly Brown"  wrote:
> >
> >> Hey I was looking and had an idea.  What if someone recorded a youtube
> >> video whiteboarding the map-reduce mahout and how it’s evolved, so the
> >> “transformation” story for people discovering or rediscovering mahout.
> >> Benefits:  simple to create and embed, rich content (maybe 5-10 minutes)
> >> but still keeps the clean look. And everyone loves a walk-through video
> >> explanation over a paragraph ☺
> >>
> >> --
> >>
> >> Kim Brown
> >> Founder, CEO | Centrally Human LLC
> >> k...@centrallyhuman.com
> >> LinkedIn <https://www.linkedin.com/in/kim-weisensee-brown-33178011>
> >>
> >>
> >> On 5/8/17, 6:23 PM, "Trevor Grant"  wrote:
> >>
> >>Khurrum,
> >>
> >>Thanks for the feed back, anything more specific?
> >>
> >>
> >>
> >>
> >>
> >>Trevor Grant
> >>Data Scientist
> >>https://github.com/rawkintrevo
> >>http://stackexchange.com/users/3002022/rawkintrevo
> >>http://trevorgrant.org
> >>
> >>*"Fortunate is he, who is able to know the causes of things."
> > -Virgil*
> >>
> >>
> >>On Mon, May 8, 2017 at 4:57 PM, Andrew Palumbo 
> >> wrote:
> >>
> >>> I disagree with it being too bland- I find the open space and the
> >>> formatting much easier to navigate and read docs from.
> >>>
> >>>
> >>> 
> >>> From: Khurrum Nasim 
> >>> Sent: Monday, May 8, 2017 2:36:54 PM
> >>> To: Mahout Dev List; user@mahout.apache.org; d...@mahout.apache.org
> >>> Subject: Re: New Website is Staged
> >>>
> >>> Too bland looking
> >>>
> >>> Thanks,
> >>>
> >>> Khurrum.
> >>>
> >>> On May 8, 2017, 1:53 PM -0400, Trevor Grant <
> >> trevor.d.gr...@gmail.com>,
> >>> wrote:
> >>>> Hey all,
> >>>>
> >>>> The new website is staged. You can view it here
> >>>>
> >>>> http://mahout.staging.apache.org/
> >>>>
> >>>> Won't be publishing for a bit yet- there are still a few JIRAs
> >> left to do
> >>>> before its ready, but you can check it out there anyway.
> >>>>
> >>>> A couple of admin things:
> >>>> 1- New developer and community pages are linked from the landing
> >> site and
> >>>> new navbar, the landing page isn't done yet btw (one of the last
> >> todos)
> >>>>
> >>>> 2- All linkbacks from the old site should continue to work, pages
> >> were
> >>>> maintained however, they have had new skin applied to them.
> >>>>
> >>>> 3- The current website is also available in
> >>>> http://mahout.staging.apache.org/docs/0.13.0/
> >>>> and will be persevered for posterity.
> >>>>
> >>>> 4- new style docs, which I recommend everyone check out are
> >> available in
> >>>> http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/
> >>>>
> >>>>
> >>>> We have 6 high level talks coming up in the next 2 weeks and
> > would
> >> like
> >>> to
> >>>> have the shiny new website fielded if possible, working on hard
> > on
> >>> getting
> >>>> it ready.
> >>>>
> >>>> If you have any updates recommendations, etc, feel free to open a
> >> PR (all
> >>>> website code is contained in master now).
> >>>>
> >>>>
> >>>> Trevor Grant
> >>>> Data Scientist
> >>>> https://github.com/rawkintrevo
> >>>> http://stackexchange.com/users/3002022/rawkintrevo
> >>>> http://trevorgrant.org
> >>>>
> >>>> *"Fortunate is he, who is able to know the causes of things."
> >> -Virgil*
> >>>
> >>
> >>
> >>
> >>
> >
>
>


Re: New Website is Staged

2017-05-09 Thread Trevor Grant
In the interest of getting this thing up and running, use DFW Meetup video
as a place holder for time being?

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, May 9, 2017 at 10:17 AM, Andrew Palumbo  wrote:

> I think its a great idea- I'm probably more camera shy than trevor 😊.
> Maybe we can spread the fun around the PMC at lunch after GTC
>
> ________
> From: Trevor Grant 
> Sent: Tuesday, May 9, 2017 12:02:39 AM
> To: Mahout Dev List
> Cc: user@mahout.apache.org
> Subject: Re: New Website is Staged
>
> I agree it could be compelling. I'm somewhat video shy so I nominate our
> PMC chair and moral compass @apalumbo
>
> :)
>
> On May 8, 2017 6:29 PM, "Kimberly Brown"  wrote:
>
> > Hey I was looking and had an idea.  What if someone recorded a youtube
> > video whiteboarding the map-reduce mahout and how it’s evolved, so the
> > “transformation” story for people discovering or rediscovering mahout.
> > Benefits:  simple to create and embed, rich content (maybe 5-10 minutes)
> > but still keeps the clean look. And everyone loves a walk-through video
> > explanation over a paragraph ☺
> >
> > --
> >
> > Kim Brown
> > Founder, CEO | Centrally Human LLC
> > k...@centrallyhuman.com
> > LinkedIn <https://www.linkedin.com/in/kim-weisensee-brown-33178011>
> >
> >
> > On 5/8/17, 6:23 PM, "Trevor Grant"  wrote:
> >
> > Khurrum,
> >
> > Thanks for the feed back, anything more specific?
> >
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> >
> >
> > On Mon, May 8, 2017 at 4:57 PM, Andrew Palumbo 
> > wrote:
> >
> > > I disagree with it being too bland- I find the open space and the
> > > formatting much easier to navigate and read docs from.
> > >
> > >
> > > ________
> > > From: Khurrum Nasim 
> > > Sent: Monday, May 8, 2017 2:36:54 PM
> > > To: Mahout Dev List; user@mahout.apache.org; d...@mahout.apache.org
> > > Subject: Re: New Website is Staged
> > >
> > > Too bland looking
> > >
> > > Thanks,
> > >
> > > Khurrum.
> > >
> > > On May 8, 2017, 1:53 PM -0400, Trevor Grant <
> > trevor.d.gr...@gmail.com>,
> > > wrote:
> > > > Hey all,
> > > >
> > > > The new website is staged. You can view it here
> > > >
> > > > http://mahout.staging.apache.org/
> > > >
> > > > Won't be publishing for a bit yet- there are still a few JIRAs
> > left to do
> > > > before its ready, but you can check it out there anyway.
> > > >
> > > > A couple of admin things:
> > > > 1- New developer and community pages are linked from the landing
> > site and
> > > > new navbar, the landing page isn't done yet btw (one of the last
> > todos)
> > > >
> > > > 2- All linkbacks from the old site should continue to work, pages
> > were
> > > > maintained however, they have had new skin applied to them.
> > > >
> > > > 3- The current website is also available in
> > > > http://mahout.staging.apache.org/docs/0.13.0/
> > > > and will be persevered for posterity.
> > > >
> > > > 4- new style docs, which I recommend everyone check out are
> > available in
> > > > http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/
> > > >
> > > >
> > > > We have 6 high level talks coming up in the next 2 weeks and
> would
> > like
> > > to
> > > > have the shiny new website fielded if possible, working on hard
> on
> > > getting
> > > > it ready.
> > > >
> > > > If you have any updates recommendations, etc, feel free to open a
> > PR (all
> > > > website code is contained in master now).
> > > >
> > > >
> > > > Trevor Grant
> > > > Data Scientist
> > > > https://github.com/rawkintrevo
> > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > http://trevorgrant.org
> > > >
> > > > *"Fortunate is he, who is able to know the causes of things."
> > -Virgil*
> > >
> >
> >
> >
> >
>


Re: New Website is Staged

2017-05-09 Thread Trevor Grant
Yea- the more I think about it, the more I like the idea too.

It'll be late May before I can realistically do anything with it.  Can
anyone else take charge?

Otherwise- still open for suggestions on that homepage. (even if they are
just temporary)

per MAHOUT-1981 that is the last blocker on launching the new site (imho)



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, May 9, 2017 at 9:15 AM, Khurrum Nasim 
wrote:

> I do like the idea of blending in  a video (might be extra work). The site
> needs some spizzaz.
>
> Thanks,
>
> Khurrum.
>
> On May 8, 2017, 7:23 PM -0400, Trevor Grant ,
> wrote:
> > Khurrum,
> >
> > Thanks for the feed back, anything more specific?
> >
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things." -Virgil*
> >
> >
> > On Mon, May 8, 2017 at 4:57 PM, Andrew Palumbo 
> wrote:
> >
> > > I disagree with it being too bland- I find the open space and the
> > > formatting much easier to navigate and read docs from.
> > >
> > >
> > > 
> > > From: Khurrum Nasim  > > Sent: Monday, May 8, 2017 2:36:54 PM
> > > To: Mahout Dev List; user@mahout.apache.org; d...@mahout.apache.org
> > > Subject: Re: New Website is Staged
> > >
> > > Too bland looking
> > >
> > > Thanks,
> > >
> > > Khurrum.
> > >
> > > On May 8, 2017, 1:53 PM -0400, Trevor Grant  >,
> > > wrote:
> > > > Hey all,
> > > >
> > > > The new website is staged. You can view it here
> > > >
> > > > http://mahout.staging.apache.org/
> > > >
> > > > Won't be publishing for a bit yet- there are still a few JIRAs left
> to do
> > > > before its ready, but you can check it out there anyway.
> > > >
> > > > A couple of admin things:
> > > > 1- New developer and community pages are linked from the landing
> site and
> > > > new navbar, the landing page isn't done yet btw (one of the last
> todos)
> > > >
> > > > 2- All linkbacks from the old site should continue to work, pages
> were
> > > > maintained however, they have had new skin applied to them.
> > > >
> > > > 3- The current website is also available in
> > > > http://mahout.staging.apache.org/docs/0.13.0/
> > > > and will be persevered for posterity.
> > > >
> > > > 4- new style docs, which I recommend everyone check out are
> available in
> > > > http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/
> > > >
> > > >
> > > > We have 6 high level talks coming up in the next 2 weeks and would
> like
> > > to
> > > > have the shiny new website fielded if possible, working on hard on
> > > getting
> > > > it ready.
> > > >
> > > > If you have any updates recommendations, etc, feel free to open a PR
> (all
> > > > website code is contained in master now).
> > > >
> > > >
> > > > Trevor Grant
> > > > Data Scientist
> > > > https://github.com/rawkintrevo
> > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > http://trevorgrant.org
> > > >
> > > > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > >
>


Re: New Website is Staged

2017-05-08 Thread Trevor Grant
I agree it could be compelling. I'm somewhat video shy so I nominate our
PMC chair and moral compass @apalumbo

:)

On May 8, 2017 6:29 PM, "Kimberly Brown"  wrote:

> Hey I was looking and had an idea.  What if someone recorded a youtube
> video whiteboarding the map-reduce mahout and how it’s evolved, so the
> “transformation” story for people discovering or rediscovering mahout.
> Benefits:  simple to create and embed, rich content (maybe 5-10 minutes)
> but still keeps the clean look. And everyone loves a walk-through video
> explanation over a paragraph ☺
>
> --
>
> Kim Brown
> Founder, CEO | Centrally Human LLC
> k...@centrallyhuman.com
> LinkedIn <https://www.linkedin.com/in/kim-weisensee-brown-33178011>
>
>
> On 5/8/17, 6:23 PM, "Trevor Grant"  wrote:
>
>     Khurrum,
>
> Thanks for the feed back, anything more specific?
>
>
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Mon, May 8, 2017 at 4:57 PM, Andrew Palumbo 
> wrote:
>
> > I disagree with it being too bland- I find the open space and the
> > formatting much easier to navigate and read docs from.
> >
> >
> > 
> > From: Khurrum Nasim 
> > Sent: Monday, May 8, 2017 2:36:54 PM
> > To: Mahout Dev List; user@mahout.apache.org; d...@mahout.apache.org
> > Subject: Re: New Website is Staged
> >
> > Too bland looking
> >
> > Thanks,
> >
> > Khurrum.
> >
> > On May 8, 2017, 1:53 PM -0400, Trevor Grant <
> trevor.d.gr...@gmail.com>,
> > wrote:
> > > Hey all,
> > >
> > > The new website is staged. You can view it here
> > >
> > > http://mahout.staging.apache.org/
> > >
> > > Won't be publishing for a bit yet- there are still a few JIRAs
> left to do
> > > before its ready, but you can check it out there anyway.
> > >
> > > A couple of admin things:
> > > 1- New developer and community pages are linked from the landing
> site and
> > > new navbar, the landing page isn't done yet btw (one of the last
> todos)
> > >
> > > 2- All linkbacks from the old site should continue to work, pages
> were
> > > maintained however, they have had new skin applied to them.
> > >
> > > 3- The current website is also available in
> > > http://mahout.staging.apache.org/docs/0.13.0/
> > > and will be persevered for posterity.
> > >
> > > 4- new style docs, which I recommend everyone check out are
> available in
> > > http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/
> > >
> > >
> > > We have 6 high level talks coming up in the next 2 weeks and would
> like
> > to
> > > have the shiny new website fielded if possible, working on hard on
> > getting
> > > it ready.
> > >
> > > If you have any updates recommendations, etc, feel free to open a
> PR (all
> > > website code is contained in master now).
> > >
> > >
> > > Trevor Grant
> > > Data Scientist
> > > https://github.com/rawkintrevo
> > > http://stackexchange.com/users/3002022/rawkintrevo
> > > http://trevorgrant.org
> > >
> > > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> >
>
>
>
>


Re: New Website is Staged

2017-05-08 Thread Trevor Grant
Khurrum,

Thanks for the feed back, anything more specific?





Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, May 8, 2017 at 4:57 PM, Andrew Palumbo  wrote:

> I disagree with it being too bland- I find the open space and the
> formatting much easier to navigate and read docs from.
>
>
> 
> From: Khurrum Nasim 
> Sent: Monday, May 8, 2017 2:36:54 PM
> To: Mahout Dev List; user@mahout.apache.org; d...@mahout.apache.org
> Subject: Re: New Website is Staged
>
> Too bland looking
>
> Thanks,
>
> Khurrum.
>
> On May 8, 2017, 1:53 PM -0400, Trevor Grant ,
> wrote:
> > Hey all,
> >
> > The new website is staged. You can view it here
> >
> > http://mahout.staging.apache.org/
> >
> > Won't be publishing for a bit yet- there are still a few JIRAs left to do
> > before its ready, but you can check it out there anyway.
> >
> > A couple of admin things:
> > 1- New developer and community pages are linked from the landing site and
> > new navbar, the landing page isn't done yet btw (one of the last todos)
> >
> > 2- All linkbacks from the old site should continue to work, pages were
> > maintained however, they have had new skin applied to them.
> >
> > 3- The current website is also available in
> > http://mahout.staging.apache.org/docs/0.13.0/
> > and will be persevered for posterity.
> >
> > 4- new style docs, which I recommend everyone check out are available in
> > http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/
> >
> >
> > We have 6 high level talks coming up in the next 2 weeks and would like
> to
> > have the shiny new website fielded if possible, working on hard on
> getting
> > it ready.
> >
> > If you have any updates recommendations, etc, feel free to open a PR (all
> > website code is contained in master now).
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things." -Virgil*
>


New Website is Staged

2017-05-08 Thread Trevor Grant
Hey all,

The new website is staged. You can view it here

http://mahout.staging.apache.org/

Won't be publishing for a bit yet- there are still a few JIRAs left to do
before its ready, but you can check it out there anyway.

A couple of admin things:
1- New developer and community pages are linked from the landing site and
new navbar, the landing page isn't done yet btw (one of the last todos)

2- All linkbacks from the old site should continue to work, pages were
maintained however, they have had new skin applied to them.

3- The current website is also available in
http://mahout.staging.apache.org/docs/0.13.0/
and will be persevered for posterity.

4- new style docs, which I recommend everyone check out are available in
http://mahout.staging.apache.org/docs/0.13.1-SNAPSHOT/


We have 6 high level talks coming up in the next 2 weeks and would like to
have the shiny new website fielded if possible, working on hard on getting
it ready.

If you have any updates recommendations, etc, feel free to open a PR (all
website code is contained in master now).


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Welcome our GSoC Student Aditya Sarma

2017-05-04 Thread Trevor Grant
Hello all,

I want to extend a warm welcome to Aditya Sarma, who has been accepted to
the Mahout Project as Part of the Google Summer of Code program.

Aditya will be working on "DBSCAN Clustering In Mahout", if you go back in
the archives you can see his full proposal.

We're really excited to have him, and looking forward to a great summer.

Aditya, would you like to say a few words to introduce yourself?


Re: New logo

2017-05-01 Thread Trevor Grant
Thanks Scott,

You are correct- in fact we're going even further now, that you can do
native optimization regardless of the architecture with native-solvers.

Do you or anyone more familiar with the history of the website know
anything about the origins/uses of this:
https://mahout.apache.org/images/Mahout-logo-245x300.png
It seems to be a green mahout logo.

Also Scott, or anyone lurking who may be able to help.  As part of the
website reboot I've included a "history" page and would really apppreciate
some help capturing that from first person sources if possible. Ive put in
some headers but those are only directional:

https://github.com/rawkintrevo/mahout/blob/website/website/front/community/history.md



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, May 1, 2017 at 11:18 AM, scott cote  wrote:

> Trevor et al:
>
> Some ideas to spur you on (and related points):
>
> Mahout is no longer a grab bag of algorithms and routines, but a math
> language right?  You don’t care about the under the cover implementation.
> Today its Spark with alternative implementations in Flink, etc ….
>
> Don’t know if that is the long term goal still  - haven’t kept up - but it
> seems like you are insulating yourself from the underlying technology.
>
> Math is a universal language.  Right?
>
> Tower of Babel is coming to mind ….
>
> SCott
>
> > On Apr 27, 2017, at 10:27 PM, Trevor Grant 
> wrote:
> >
> > It also bugs me when I can't suggest any alternatives, yet don't like the
> > ones in front of me...
> >
> > I became aware of a symbol a week or so ago, and it keeps coming back to
> > me.
> >
> > The Enso.
> > https://en.wikipedia.org/wiki/Ens%C5%8D
> >
> > Things I like about it:
> > (all from wikipedia, since the only thing I knew about this symbol prior
> is
> > that someone I met had a tattoo of it).
> > It represents (among a few other things) enlightenment.
> > ^^ This resonated with the 'alternate definition of mahout' from Hebrew-
> > which may be something akin to essence or truth.
> >
> > It is a circle- which plays to the Samsara theme.
> >
> > It is very expressive, a simple one or two brush stroke circle which
> > symbolizes several large concepts and things about the creator,
> expressive
> > like our DSL (I feel gross comparing such a symbol to a Scala DSL, but
> I'm
> > spit balling here, please forgive me- I am not so expressive).
> >
> > "Once the *ensō* is drawn, one does not change it. It evidences the
> > character of its creator and the context of its creation in a brief,
> > contiguous period of time." Which reminds me of the DRMs
> >
> > In closed form it represents something akin to Plato's perfection- which
> a
> > little more wiki surfing tells me is the idea that no one can create a
> > perfect circle because a circle is a collection of infinite points and
> how
> > could ever be sure that you have arranged each one properly, yet such
> > things must exist, or what blueprint would a creator of circles be
> striving
> > for.  This, by-the-by reminds me of stochastic approaches to solving
> > problems, and really statistics / "machine-learning" in general, in that
> we
> > can't find perfect solutions, yet we believe solutions exist and serve as
> > our blueprint.
> >
> > Finally, I like that it is simple.
> >
> > Things I don't like about it:
> > Lucent Technologies used it back in the 90s, however they used a very
> > specific red one, and this isn't a deal breaker for me.
> >
> > Other thoughts:
> > Based on the tattoo I saw- one could make an Enso using old mahout color
> > palatte if one were to dab their brush in the appropriate colors. This
> > could also be represented in any single color. (Not sure what that does
> to
> > our TM, is it ok if we just keep slapping TMs on the side of it? If that
> is
> > the case is there any reason we must have a single Enso?)
> >
> > So there is something to throw in the pot that is a little more grown up
> > than my runner up favorites (honey badger, blueman riding bomb waving
> > cowboy hat, blueman riding lighting bolt into a squirrel covered in
> water,
> > etc).
> >
> > Again, only know what wiki has told me, so if anyone is more familiar
> with
> > this symbol (like was it used as a logo by some horrible dictator which
> > carried out terrible attrocit

Re: New logo

2017-04-27 Thread Trevor Grant
It also bugs me when I can't suggest any alternatives, yet don't like the
ones in front of me...

I became aware of a symbol a week or so ago, and it keeps coming back to
me.

The Enso.
https://en.wikipedia.org/wiki/Ens%C5%8D

Things I like about it:
(all from wikipedia, since the only thing I knew about this symbol prior is
that someone I met had a tattoo of it).
It represents (among a few other things) enlightenment.
^^ This resonated with the 'alternate definition of mahout' from Hebrew-
which may be something akin to essence or truth.

It is a circle- which plays to the Samsara theme.

It is very expressive, a simple one or two brush stroke circle which
symbolizes several large concepts and things about the creator, expressive
like our DSL (I feel gross comparing such a symbol to a Scala DSL, but I'm
spit balling here, please forgive me- I am not so expressive).

"Once the *ensō* is drawn, one does not change it. It evidences the
character of its creator and the context of its creation in a brief,
contiguous period of time." Which reminds me of the DRMs

In closed form it represents something akin to Plato's perfection- which a
little more wiki surfing tells me is the idea that no one can create a
perfect circle because a circle is a collection of infinite points and how
could ever be sure that you have arranged each one properly, yet such
things must exist, or what blueprint would a creator of circles be striving
for.  This, by-the-by reminds me of stochastic approaches to solving
problems, and really statistics / "machine-learning" in general, in that we
can't find perfect solutions, yet we believe solutions exist and serve as
our blueprint.

Finally, I like that it is simple.

Things I don't like about it:
Lucent Technologies used it back in the 90s, however they used a very
specific red one, and this isn't a deal breaker for me.

Other thoughts:
Based on the tattoo I saw- one could make an Enso using old mahout color
palatte if one were to dab their brush in the appropriate colors. This
could also be represented in any single color. (Not sure what that does to
our TM, is it ok if we just keep slapping TMs on the side of it? If that is
the case is there any reason we must have a single Enso?)

So there is something to throw in the pot that is a little more grown up
than my runner up favorites (honey badger, blueman riding bomb waving
cowboy hat, blueman riding lighting bolt into a squirrel covered in water,
etc).

Again, only know what wiki has told me, so if anyone is more familiar with
this symbol (like was it used as a logo by some horrible dictator which
carried out terrible attrocities?) or just general comments.
tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 27, 2017 at 5:50 PM, Ted Dunning  wrote:

> I don't have any constructive input at all. None of the proposals showed
> any spark (to me).
>
> I hate it when I can't suggest a better path and I hate negative feedback.
> But there it is.
>
>
>
> On Thu, Apr 27, 2017 at 3:48 PM, Pat Ferrel  wrote:
>
> > Do you have constructive input (guidance or opinion is welcome input) or
> > would you like to discontinue the contest. If the later, -1 now.
> >
> >
> > On Apr 27, 2017, at 3:42 PM, Ted Dunning  wrote:
> >
> > I thought that none of the proposals were worth continuing with.
> >
> >
> >
> > On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel 
> wrote:
> >
> > > Yes, -1 means you hate them all or think the designers  are not worth
> > > paying. We have to pay to continue, I’ll foot the bill (donations
> > > appreciated) but don’t want to unless people think it will lead to
> > > something. For me there are a couple I wouldn’t mind seeing on the web
> > site
> > > or swag and yes we do have time to try something completely different,
> > and
> > > the designers will be more willing since there is a guaranteed payout.
> > >
> > >
> > > On Apr 27, 2017, at 3:30 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com>
> > > wrote:
> > >
> > > I thought we were just voting on continuing this process :)
> > >
> > > On Thu, Apr 27, 2017 at 3:22 PM, Trevor Grant <
> trevor.d.gr...@gmail.com>
> > > wrote:
> > >
> > >> Also Pat, thank you for organizing.
> > >>
> > >> +0
> > >>
> > >> I don't love any of them enough to +1, I don't hate them all enough to
> > -1
> > >>
> > >> Most of them remind me of some spin on Apache 

Re: New logo

2017-04-27 Thread Trevor Grant
I didn't mean to veto-

I thought we were on release rules. (at least three positives and more
positives than negatives).

If we're treating this as code modification (any -1 stops it dead) then I'm
somewhere in the ball park of -.75.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 27, 2017 at 6:11 PM, Pat Ferrel  wrote:

> ok, the contest is cancelled
>
>
> On Apr 27, 2017, at 4:10 PM, Trevor Grant 
> wrote:
>
> I'll revise to -1.  Given blue man v. all the others, I'd be indifferent.
> None of the new possibilities give me enough hope of something worth
> displacing blue man (dated though he may be, has good recognition).
>
> ^^ Assuming it take 3 votes to pass either way?
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Thu, Apr 27, 2017 at 5:54 PM, Pat Ferrel  wrote:
>
> > Fair enough, I think Trevor feels the same.
> >
> > The blue man can continue, all it takes is a -1
> >
> >
> > On Apr 27, 2017, at 3:50 PM, Ted Dunning  wrote:
> >
> > I don't have any constructive input at all. None of the proposals showed
> > any spark (to me).
> >
> > I hate it when I can't suggest a better path and I hate negative
> feedback.
> > But there it is.
> >
> >
> >
> > On Thu, Apr 27, 2017 at 3:48 PM, Pat Ferrel 
> wrote:
> >
> >> Do you have constructive input (guidance or opinion is welcome input) or
> >> would you like to discontinue the contest. If the later, -1 now.
> >>
> >>
> >> On Apr 27, 2017, at 3:42 PM, Ted Dunning  wrote:
> >>
> >> I thought that none of the proposals were worth continuing with.
> >>
> >>
> >>
> >> On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel 
> > wrote:
> >>
> >>> Yes, -1 means you hate them all or think the designers  are not worth
> >>> paying. We have to pay to continue, I’ll foot the bill (donations
> >>> appreciated) but don’t want to unless people think it will lead to
> >>> something. For me there are a couple I wouldn’t mind seeing on the web
> >> site
> >>> or swag and yes we do have time to try something completely different,
> >> and
> >>> the designers will be more willing since there is a guaranteed payout.
> >>>
> >>>
> >>> On Apr 27, 2017, at 3:30 PM, Andrew Musselman <
> >> andrew.mussel...@gmail.com>
> >>> wrote:
> >>>
> >>> I thought we were just voting on continuing this process :)
> >>>
> >>> On Thu, Apr 27, 2017 at 3:22 PM, Trevor Grant <
> trevor.d.gr...@gmail.com
> >>
> >>> wrote:
> >>>
> >>>> Also Pat, thank you for organizing.
> >>>>
> >>>> +0
> >>>>
> >>>> I don't love any of them enough to +1, I don't hate them all enough to
> >> -1
> >>>>
> >>>> Most of them remind me of some spin on Apache Apex, Python, Numpy (a
> >>> Python
> >>>> Library), or IBM's DSX.  However, I realize a big part of that is the
> >>>> colors chosen.
> >>>>
> >>>> #143 is my favorite (possibly because it reminds me of none of the
> >>> above).
> >>>> But possibly if this goes to next round we can have them adjust hues /
> >>>> colors.
> >>>>
> >>>> Trevor Grant
> >>>> Data Scientist
> >>>> https://github.com/rawkintrevo
> >>>> http://stackexchange.com/users/3002022/rawkintrevo
> >>>> http://trevorgrant.org
> >>>>
> >>>> *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> >>>>
> >>>>
> >>>> On Thu, Apr 27, 2017 at 5:15 PM, Andrew Musselman <
> >>>> andrew.mussel...@gmail.com> wrote:
> >>>>
> >>>>> +1 to continue; thanks for organizing this Pat!
> >>>>>
> >>>>> My personal favorite is #38
> >>>>> https://images-platform.99static.com/I9quDzcBrtJXg_
> >>>> NMaIsH6ySQ7Ok=/filters:
> >>>>>

Re: New logo

2017-04-27 Thread Trevor Grant
I'll revise to -1.  Given blue man v. all the others, I'd be indifferent.
None of the new possibilities give me enough hope of something worth
displacing blue man (dated though he may be, has good recognition).

^^ Assuming it take 3 votes to pass either way?

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 27, 2017 at 5:54 PM, Pat Ferrel  wrote:

> Fair enough, I think Trevor feels the same.
>
> The blue man can continue, all it takes is a -1
>
>
> On Apr 27, 2017, at 3:50 PM, Ted Dunning  wrote:
>
> I don't have any constructive input at all. None of the proposals showed
> any spark (to me).
>
> I hate it when I can't suggest a better path and I hate negative feedback.
> But there it is.
>
>
>
> On Thu, Apr 27, 2017 at 3:48 PM, Pat Ferrel  wrote:
>
> > Do you have constructive input (guidance or opinion is welcome input) or
> > would you like to discontinue the contest. If the later, -1 now.
> >
> >
> > On Apr 27, 2017, at 3:42 PM, Ted Dunning  wrote:
> >
> > I thought that none of the proposals were worth continuing with.
> >
> >
> >
> > On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel 
> wrote:
> >
> >> Yes, -1 means you hate them all or think the designers  are not worth
> >> paying. We have to pay to continue, I’ll foot the bill (donations
> >> appreciated) but don’t want to unless people think it will lead to
> >> something. For me there are a couple I wouldn’t mind seeing on the web
> > site
> >> or swag and yes we do have time to try something completely different,
> > and
> >> the designers will be more willing since there is a guaranteed payout.
> >>
> >>
> >> On Apr 27, 2017, at 3:30 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com>
> >> wrote:
> >>
> >> I thought we were just voting on continuing this process :)
> >>
> >> On Thu, Apr 27, 2017 at 3:22 PM, Trevor Grant  >
> >> wrote:
> >>
> >>> Also Pat, thank you for organizing.
> >>>
> >>> +0
> >>>
> >>> I don't love any of them enough to +1, I don't hate them all enough to
> > -1
> >>>
> >>> Most of them remind me of some spin on Apache Apex, Python, Numpy (a
> >> Python
> >>> Library), or IBM's DSX.  However, I realize a big part of that is the
> >>> colors chosen.
> >>>
> >>> #143 is my favorite (possibly because it reminds me of none of the
> >> above).
> >>> But possibly if this goes to next round we can have them adjust hues /
> >>> colors.
> >>>
> >>> Trevor Grant
> >>> Data Scientist
> >>> https://github.com/rawkintrevo
> >>> http://stackexchange.com/users/3002022/rawkintrevo
> >>> http://trevorgrant.org
> >>>
> >>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >>>
> >>>
> >>> On Thu, Apr 27, 2017 at 5:15 PM, Andrew Musselman <
> >>> andrew.mussel...@gmail.com> wrote:
> >>>
> >>>> +1 to continue; thanks for organizing this Pat!
> >>>>
> >>>> My personal favorite is #38
> >>>> https://images-platform.99static.com/I9quDzcBrtJXg_
> >>> NMaIsH6ySQ7Ok=/filters:
> >>>> quality(100)/99designs-contests-attachments/84/84017/
> >> attachment_84017937
> >>>>
> >>>> I like the stylized and simple "M" and it reminds me of diagrams
> > showing
> >>>> vector multiplication.
> >>>>
> >>>> On Thu, Apr 27, 2017 at 12:56 PM, Pat Ferrel 
> >>>> wrote:
> >>>>
> >>>>> We can treat this like a release vote, if anyone hates all these and
> >>>>> doesn’t want to continue with shortlisted designers for 3 more days
> >>> (the
> >>>>> next step) vote -1 and say if your vote is binding (your are a PMC
> >>>> member)
> >>>>>
> >>>>> Otherwise all are welcome to rate everything on the polls below.
> >>>>>
> >>>>> In this case you have 24 hours to vote
> >>>>>
> >>>>> Here’s my +1 to continue refining.
> >>>>>
> >>>>>
> >>>>> On Apr 27, 2017, at 11:41 AM, Pat Ferrel 
> >>> wrote:
> >>>>>
> >>>>> Here is a second group, hopefully picked to be unique.
> >>>>> https://99designs.com/contests/poll/vl7xed
> >>>>>
> >>>>> We got a lot of responses, these 2 polls contain the best afaict.
> >>>>>
> >>>>>
> >>>>> On Apr 27, 2017, at 11:25 AM, Pat Ferrel 
> >>> wrote:
> >>>>>
> >>>>> Vote: https://99designs.com/contests/poll/rqcgif
> >>>>>
> >>>>> We asked for something “mathy” and asked for no elephant and rider.
> We
> >>>>> have the rest of the week to tweak so leave comments about what you
> >>> like
> >>>> or
> >>>>> would like to change.
> >>>>>
> >>>>> We don’t have to pick one of these, so if you hate them all, make
> that
> >>>>> known too.
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
> >
>
>


Re: New logo

2017-04-27 Thread Trevor Grant
Also Pat, thank you for organizing.

+0

I don't love any of them enough to +1, I don't hate them all enough to -1

Most of them remind me of some spin on Apache Apex, Python, Numpy (a Python
Library), or IBM's DSX.  However, I realize a big part of that is the
colors chosen.

#143 is my favorite (possibly because it reminds me of none of the above).
But possibly if this goes to next round we can have them adjust hues /
colors.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Apr 27, 2017 at 5:15 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> +1 to continue; thanks for organizing this Pat!
>
> My personal favorite is #38
> https://images-platform.99static.com/I9quDzcBrtJXg_NMaIsH6ySQ7Ok=/filters:
> quality(100)/99designs-contests-attachments/84/84017/attachment_84017937
>
> I like the stylized and simple "M" and it reminds me of diagrams showing
> vector multiplication.
>
> On Thu, Apr 27, 2017 at 12:56 PM, Pat Ferrel 
> wrote:
>
> > We can treat this like a release vote, if anyone hates all these and
> > doesn’t want to continue with shortlisted designers for 3 more days (the
> > next step) vote -1 and say if your vote is binding (your are a PMC
> member)
> >
> > Otherwise all are welcome to rate everything on the polls below.
> >
> > In this case you have 24 hours to vote
> >
> > Here’s my +1 to continue refining.
> >
> >
> > On Apr 27, 2017, at 11:41 AM, Pat Ferrel  wrote:
> >
> > Here is a second group, hopefully picked to be unique.
> > https://99designs.com/contests/poll/vl7xed
> >
> > We got a lot of responses, these 2 polls contain the best afaict.
> >
> >
> > On Apr 27, 2017, at 11:25 AM, Pat Ferrel  wrote:
> >
> > Vote: https://99designs.com/contests/poll/rqcgif
> >
> > We asked for something “mathy” and asked for no elephant and rider. We
> > have the rest of the week to tweak so leave comments about what you like
> or
> > would like to change.
> >
> > We don’t have to pick one of these, so if you hate them all, make that
> > known too.
> >
> >
> >
>


Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-04-16 Thread Trevor Grant
+1 (binding)

Judicial Opinion:

Verified Signatures

Built with `mvn clean package -Phadoop2` ; `mvn clean package -Phadoop2
-Pviennacl-omp` ; `mvn clean package -Phadoop2 -DskipTests` (see below)
For all builds, ran `spark item similarity` example with no issue, `wiki`
example fails with non-mahout related file not found errors. need JIRA to
update file path- as this is an example where the functionality is non
related to Mahout, i find this non blocking bc it is likely all of the
prior releases are now also broken (that is, if we had released this a
month ago, the release would still now be broken)

For viennacl, the build fails on the `sparse mmul microbench test`.  I have
two machines, both fail.  Both have older graphics cards (specs below).  On
the older of the two, I am able to make something very similar to the test
pass in the shell if I set `s=625` (using `timeSparseDRMMMul`). This
implies that the functionality is sound, but my cards simply aren't new
enough to pass the test.  I recommend opening a JIRA to tune down the
aggressiveness of the test- e.g. making `s=500` (currently `s=1000`).

This recommendation is based on 1) my cards aren't _that_ old and 2) I
don't want to buy new graphics cards just to pass unit tests.
(Included CPU arch for reference wrt OMP)

Dev Box:

NVidia GeForce GT 740
Driver Version: 352.63
Memory: 1021MiB

ViennaCL v 1.7.0

CPU: Intel Core i7-3770K @3.50 Ghz (Ivy Bridge Micro Arch- 3rd Gen)

Ubuntu 14.04.3 LTS


Laptop:
NVidia GeForce GTX 960M
Driver Version: 367.57
Memory: 2002MiB

CPU: Intel Core i7-5500U @ 2.40 Ghz (Broadwell-U Micro arch - 5th Gen)

ViennaCL v 1.7.0

Ubuntu 16.04.1 LTS



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sat, Apr 15, 2017 at 11:35 PM, Andrew Palumbo  wrote:

> +1 (binding)
>
>
> Built and tested source distribution with both profiles -Pviennacl and
> -Pviennacl-omp.
>
>
> Ran SparseSparseDrmTimer.mscala through the shell in both pseudo cluster
> and local[2] mode
>
> Tested with several iterations and combinations of arguments eg:
>
> timeSparseDRMMMul(1000,1000,1000,5,.2,1234L)
>
> Mostly with out issue in a consumer grade card.
>
> Note: in the shell after the %*% is called by a partition and the GPU is
> in use, will get a native failure exception which is caught and allows for
> MMul of that partition to fall back to JVM MMul (single-threaded).. this
> should be changed in 0.13.1 to fall back to OpenMP MMul.
>
> Note: Binary distribution is built for Tesla GPUs. My card was not
> compatible, though out target is higher end GPUs on AWS or PowerPC (PowerPC
> uses  teslas) so not a blocker IMO.
>
> We will target a wider range of cards in the next distributions.
>
>
>
>
>
>
>
>
>
>
>
> 
> From: Andrew Musselman 
> Sent: Saturday, April 15, 2017 2:48:17 AM
> To: user@mahout.apache.org; d...@mahout.apache.org
> Subject: Re: [VOTE] Apache Mahout 0.13.0 Release Candidate
>
> Hashes and sigs confirmed, bin and src (viennacl and viennacl-omp)
> artifacts run the spark shell and the sparse drm test fine, and kick off
> the GPU.
>
> +1 (binding)
>
> On Fri, Apr 14, 2017 at 10:25 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> > This is the vote for release 0.13.0 of Apache Mahout.
> >
> > The vote will be going for at least 72 hours and will be closed on
> > Monday, April 17th, 2017 or once there are at least 3 PMC +1 binding
> votes
> >  (whichever occurs earlier).  Please download, test and vote with
> >
> > [ ] +1, accept RC as the official 0.13.0 release of Apache Mahout
> > [ ] +0, I don't care either way,
> > [ ] -1, do not accept RC as the official 0.13.0 release of Apache Mahout,
> > because...
> >
> >
> > Maven staging repo:
> >
> > *https://repository.apache.org/content/repositories/
> orgapachemahout-1044/org/apache/mahout/apache-mahout-distribution/0.13.0
> > <https://repository.apache.org/content/repositories/
> orgapachemahout-1044/org/apache/mahout/apache-mahout-distribution/0.13.0>*
> >
> > The git tag to be voted upon is mahout-0.13.0
> >
>


Re: Lambda and Kappa CCO

2017-04-09 Thread Trevor Grant
Specifically, I hacked together a Lambda Streaming CCO with Spark and Flink
for a demo for my upcoming FlinkForward talk.  Will post code once I finish
it / strip all my creds out. In short- the lack of serialization in Mahout
incore vectors/matrices makes handing off / dealing with them somewhat
tedious.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Sun, Apr 9, 2017 at 5:39 PM, Andrew Palumbo  wrote:

> Pat-
>
> What can we do from the mahout side?  Would we need any new data
> structures?  Trevor and I were just discussing some of  the troubles of
> near real time matrix streaming.
> --
> *From:* Pat Ferrel 
> *Sent:* Monday, March 27, 2017 2:42:55 PM
> *To:* Ted Dunning; user@mahout.apache.org
> *Cc:* Trevor Grant; Ted Dunning; s...@apache.org
> *Subject:* Re: Lambda and Kappa CCO
>
> Agreed. Downsampling was ignored in several places and with it a great
> deal of input is a noop. Without downsampling too many things need to
> change.
>
> Also everything is dependent on this rather vague sentence. “- determine
> if the new interaction element cross-occurs with A and if so calculate the
> llr score”, which needs a lot more explanation. Whether to use Mahout
> in-memory objects or reimplement some in high speed data structures is a
> big question.
>
> The good thing I noticed in writing this is that model update and real
> time can be arbitrarily far apart, that the system degrades gracefully. So
> during high load it may fall behind but as long as user behavior is
> up-to-date and persisted (it will be) we are still in pretty good shape.
>
>
> On Mar 26, 2017, at 6:26 PM, Ted Dunning  wrote:
>
>
> I think that this analysis omits the fact that one user interaction causes
> many cooccurrences to change.
>
> This becomes feasible if you include the effect of down-sampling, but that
> has to be in the algorithm.
>
>
> From: Pat Ferrel 
> Sent: Saturday, March 25, 2017 12:01:00 PM
> To: Trevor Grant; user@mahout.apache.org
> Cc: Ted Dunning; s...@apache.org
> Subject: Lambda and Kappa CCO
>
> This is an overview and proposal for turning the multi-modal Correlated
> Cross-Occurrence (CCO) recommender from Lambda-style into an online
> streaming incrementally updated Kappa-style learner.
>
> # The CCO Recommender: Lambda-style
>
> We have largely solved the problems of calculating the multi-modal
> Correlated Cross-Occurrence models and serving recommendations in real time
> from real time user behavior. The model sits in Lucene (Elasticsearch or
> Solr) in a scalable way and the typical query to produce personalized
> recommendations comes from real time user behavior completes with 25ms
> latency.
>
> # CCO Algorithm
>
> A = rows are users, columns are items they have “converted” on (purchase,
> read, watch). A represents the conversion event—the interaction that you
> want to recommend.
> B = rows are users columns are items that the user has shown some
> preference for but not necessarily the same items as A. B represent a
> different interaction than A. B might be a preference for some category,
> brand, genre, or just a detailed item page view—or all of these in B, C, D,
> etc
> h_a = a particular user’s history of A type interactions, a vector of
> items that our user converted on.
> h_b = a particular user’s history of B type interactions, a vector of
> items that our user had B type interactions with.
>
> CCO says:
>
> [A’A]h_a + [A’B]h_b + [A’C]h_c = r; where r is the weighted items from A
> that represent personalized recommendations for our particular user.
>
> The innovation here is that A, B, C, … represent multi-modal data.
> Interactions of all types and on item-sets of arbitrary types. In other
> words we can look at virtually any action or possible indicator of user
> preference or taste. We strengthen the above raw cross-occurrence and
> cooccurrence formula by performing:
>
> [llr(A’A)]h_a + [llr(A’B)]h_b + … = r adding llr (log-likelihood ratio)
> correlation scoring to filter out coincidental cross-occurrences.
>
> The model becomes [llr(A’A)], [llr(A’B)], … each has items from A in rows
> and items from A, B, … in columns. This sits in Lucene as one document per
> items in A with a field for each of A, B, C items whose user interactions
> most strongly correlate to the conversion event on the row item. Put
> another way, the model is items from A. B, C… what have the most similar
> user interaction from users.
>
> To calculate r we need to find the most simllar items in the model to the
> history or beh

Re: Marketing

2017-04-04 Thread Trevor Grant
Re name changes- definitely not in love with the idea.

In my experience there are two main segments of data scientists when
opening a conversation about Apache Mahout:
1) "We don't use map reduce"
2) "I've never heard of it"
2a) "I don't understand math, I just blindly fire 'machine learning'
algorithms at things"
2b) "That sounds amazing!"

For segment 2, a name change will do no good.  For 2a, that was me being a
bit hyperbolic, but to some extent 'pre-canned' algorithms and docs will
help.

A user in segment 2b, who then goes searching and trying to learn mahout-
end up reading all the old slide decks/stack overflows/books, and then end
up in segment 1.

In my mind, there is the crux of the problem.

So THAT is who we're really trying to intercept.  A name change would be a
lazy but effective way to do it. Calling it Mahout-Samsara sort of hit that
point.  The old website with a mish mash of old and new- side by side is
certainly not helping.

Framing it like that, the 'user journey' in my mind is something like a
google search / trying code and finding things on stackoverflow, visiting
the website, and finding 'getting started' tutorials.

So the crux isn't the name so much as it is shepherding users on their
journey.  So if not name change (which I'm in no way convinced is necessary
or prudent- but will leave it open for someone to counter point), I think
the following:

1. More blogs / docs / "soft content" on Mahout- per previous post I am
personally working on getting a batch edited and looking right.
2. Website reboot with a re-organization that emphasizes 'new' stuff and
pushes old stuff down below the fold.
3. 1 and 2 possibly both happen together (e.g. tutorials and soft content
in support of new page- though I'd rather get soft content published asap,
e.g. don't delay content for website overhaul)

I also agree that case-studies and user stories are an awesome idea, but I
think that is how people hear about Mahout (are made aware), and then
proceed as I stated above. E.g. case studies and stories are an important
part about how we 'fill the funnel' with people who are interested in
Mahout, but don't want to loose them on the path from 'awareness' to
'user'.

Actual Steps:

1. Again, I am locked and loaded on a batch of content.
vacations/conferences have jammed me up, will start trying to post soon.
(can always spell check later ;) ). More is better- so if you have
something, please push it.
2. Website, we desperately need someone/ some people who has/have expertise
and bandwidth to make this happen. (Design and implementation)
3. This ties into the jekyll integration, which would allow contributors to
help write content for the website (instead of full blown committers).
Really any git based integration, however a number of projects (Flink,
Zeppelin, etc) have used Jekyll so I suppose there is some logic to their
rationale.
4. Logo- want to keep the discussion open on this.  I think we are leaning
towards 'keep the name Mahout' but still open to the idea of a logo reboot.
It makes sense we wouldn't change the logo while we're still undecided on
the name, so if someone has a strong opinion on changing the name please
speak up, otherwise lets keep kicking this around.

Just some thoughts I've had after stepping away and coming back to the
issue with a fresh set of eyes, and my .02

tg



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Mar 29, 2017 at 11:08 PM, Andrew Evans  wrote:

> The issue could be competition better grounded in Spark like ND4J and the
> increased popularity of Python. Name changes are really difficult. If you
> think that you have improved over your recent iteration  and moved to a
> more competitive platform, then it would be a good idea. Otherwise, try to
> wait and build credibility. At that point, it may even be a good idea to
> keep the old platform and move people to the newly named 'better' platform
> without the risk of losing respect.
>
>
>
> On Wed, Mar 29, 2017 at 5:03 PM, Isabel Drost-Fromm 
> wrote:
>
> > That is an awesome second interpretation.
> >
> > Having voted on the original name I'm 100% biased so take my opinion with
> > a huge grain of salt: on the one hand I think name changes are over rated
> > (anyone remember ethereal?), on the other hand IMHO Mahout is a fairly
> > strong brand representing machine learning at scale.
> >
> > Maybe a combination of any of a new logo, design, documentation, release
> > that drops the zero in "0.x.y", a press release f

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Trevor Grant
This sounds awesome!  If you can get the algorithm working I would be more
than happy to help integrate it into the Algorithms Framework (so other
people could use it too).



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, Mar 29, 2017 at 1:41 PM, Adi Haviv  wrote:

> I wish I could. i wasn't able to find any solution (with mahout or any
> other) that can do kmeans on over 10M sparse vectors.
>
> happy to connect and collaborate on a solution if you like. please contact
> me on a private email (or on linkedin -Adi Haviv).
>
> Thanks,
> Adi
>
> On Wed, Mar 29, 2017 at 11:41 AM, KHATWANI PARTH BHARAT <
> h2016...@pilani.bits-pilani.ac.in> wrote:
>
> > No,i am trying to write the kmeans from scratch using Mahout DSL's
> > Distributed Row Matrix.
> > And i am not getting how proceed. Can you help me with that.
> >
> >
> > On Wed, Mar 29, 2017 at 9:04 PM, Adi Haviv  wrote:
> >
> > > Is it working? I never got any of the mahout clustering to work on
> eBay's
> > > data.
> > >
> > > On Mar 29, 2017 11:30 AM, "KHATWANI PARTH BHARAT" <
> > > h2016...@pilani.bits-pilani.ac.in> wrote:
> > >
> > > > Sir,
> > > > I am trying to write the kmeans clustering algorithm using Mahout
> > Samsara
> > > > but i am bit confused
> > > > about how to leverage Distributed Row Matrix for the same. Can
> anybody
> > > help
> > > > me with same.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks
> > > > Parth Khatwani
> > > >
> > >
> >
>
>
>
> --
> Adi Haviv.
>


Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-03-27 Thread Trevor Grant
All look good to me.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, Mar 27, 2017 at 2:42 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Hashes and sigs look good to me; please someone else confirm.
>
> On Mon, Mar 27, 2017 at 10:40 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> > This is the vote for release 0.13.0 of Apache Mahout.
> >
> > The vote will be going for at least 72 hours and will be closed on
> > Thursday, March 30rd, 2017 or once there are at least 3 PMC +1 binding
> > votes (whichever occurs earlier).  Please download, test and vote with
> >
> > [ ] +1, accept RC as the official 0.13.0 release of Apache Mahout
> > [ ] +0, I don't care either way,
> > [ ] -1, do not accept RC as the official 0.13.0 release of Apache Mahout,
> > because...
> >
> > Maven staging repo:
> >
> > https://repository.apache.org/content/repositories/
> > orgapachemahout-1039/org/apache/mahout/apache-mahout-distribution/0.13.0
> >
> > The git tag to be voted upon is mahout-0.13.0.
> >
>


Re: Samsara's learning curve

2017-03-27 Thread Trevor Grant
I tend to agree with D.

For example, I set out to do the 'Eigenfaces problem' last year, and wrote
a blog on it.  It ended up being about 4 lines of Samsara code (+ imports),
the "hardest" part was loading images into vectors, and then vectors back
into images (wasn't awful, but I was new to Scala).  In addition to the
modest marketing and a lack of introductory tutorials, is that to really
use Mahout-Samsara in the first place you need to have a fairly good grasp
of linear algebra, which gives it significantly less mass-appeal than say
an mllib/sklearn/etc. Your
I-just-got-my-data-science-certificate-from-coursera data scientists simply
aren't equipped to use Mahout.  Your advanced-R-type data scientists can
use it- but unless they have a problem that is to big for a single machine,
have no motivation to use it (may change with native solvers, more
algorithms, etc), and even given motivation the question then becomes learn
Mahout OR come up with a clever trick for being able to stay in a single
machine.

But yea- a fairly easy and pleasant framework.  If you have the proper
motivation, there is simply nothing else like it.

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Mon, Mar 27, 2017 at 12:32 PM, Dmitriy Lyubimov 
wrote:

> I believe writing in the DSL is simple enough, especially if you have some
> familiarity with Scala on top of R (or, in my case, R on top of Scala
> perhaps:). I've implemented about couple dozens customized algorithms that
> used distributed Samsara algebra at least to some degree, and I think I can
> reliably attest none of them ever exceeded 100 lines or so, and that it
> significantly reduced my time dedicated to writing algebra on top of Spark
> and some other backends I use under proprietary settings. I am now mostly
> doing non-algebraic improvements because writing algebra is easy.
>
> The most difficult part however, at least for me, and as you can see as you
> go along with the  book, was not the pecularities of R-like bindings, but
> the algorithm reformulations. Traditional "in-memory" algorithms do not
> work on shared-nothing backends, even though you could program them, they
> simply will not perform.
>
> The main reasons some of the traditional algorithms do not work at scale
> are because they either require random memory access, or (more often) are
> simply super-linear w.r.t. input size, so as one scales  infrastructure at
> linear cost, one would still incur less than expected increment in
> performance (if any at all, at some point) per unit of input.
>
> Hence, usually some mathematically, or should i say, statistically
> motivated tricks are still required. As the book describes, linearly or
> sub-linearly scalable sketches, random projections, dimensionality
> reductions etc. etc. are required to alleviate scalability issues of the
> super-linear algorithms.
>
> To your question, i got couple of people doing some pieces on various
> projects before with Samsara, but they had me as a coworker. I am
> personally not aware of any outside developers beyond people already on the
> project @ Apache and my co-workers, although in all honesty i feel it has
> to do more with maturity and modest marketing of the public version of
> Samsara than necessarily the difficulty of adoption.
>
> -d
>
>
>
> On Sun, Mar 26, 2017 at 9:15 AM, Gustavo Frederico <
> gustavo.freder...@thinkwrap.com> wrote:
>
> > I read Lyubimov's and Palumbo's book on Mahout Samsara up to chapter 4
> > ( Distributed Algebra ). I have some familiarity with R, I did study
> > linear algebra and calculus in undergrad. In my master's I studied
> > statistical pattern recognition and researched a number of ML
> > algorithms in my thesis - spending more time on SVMs. This is to ask:
> > what is the learning curve of Samsara? How complicated is to work with
> > distributed algebra to create an algorithm? Can someone share an
> > example of how long she/he took to go from algorithm conception to
> > implementation?
> >
> > Thanks
> >
> > Gustavo
> >
>


Re: Marketing

2017-03-24 Thread Trevor Grant
To date we have referred to the GPU/CPU/CUDA as 'pluggable native-solvers'.
 'plugable backends' are the Spark - Flink -H20- whatever.

With the advent of both, I could see the confusion and we may want to
rethink the naming as part of of this too.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Mar 24, 2017 at 11:15 AM, Nikolai Sakharnykh  wrote:

> I guess we might have different interpretation of a backend. So just to
> avoid any confusion in my world (coming from accelerating applications on
> GPUs) the backends would be CUDA, OpenCL, OpenMP and JVM. I think it
> definitely makes sense to advertise GPU support on the front page, along
> with JVM and/or OpenMP for CPUs.
>
> -Original Message-
> From: Suneel Marthi [mailto:smar...@apache.org]
> Sent: Friday, March 24, 2017 11:13 AM
> To: mahout 
> Cc: user@mahout.apache.org
> Subject: Re: Marketing
>
> On Fri, Mar 24, 2017 at 12:09 PM, Dmitriy Lyubimov 
> wrote:
>
> > On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel 
> wrote:
> >
> > > The multiple backend support is such a waste of time IMO. The DSL
> > > and GPU support is super important and should be made even more
> > > distributed. The current (as I understand it) single threaded GPU
> > > per VM is only the first step in what will make Mahout important for a
> long time to come.
> > >
> >
> > This seems self contradicting a bit. Multiple backends is the only
> > thing that remedies it for me. By that i mean both distributed (i/o)
> > backends and the in-memory.
> >
> > Good CPU and GPU plugins will be important, as well as communication
> > layer alternatives to spark. Spark is not working out well for
> > interconnected problems, and H20 and Flink, well, I'd just forget
> > about them. I'd certainly drop H20 for now.
>
>
> FWIW, the H2O backend is more stable than the F%*&k backend, best to drop
> both.
>
>
>
> > But ability to plug in new communication backend primitives seems to
> > be critical in my experience, as well as variety of cpu/gpu chipset
> > support. (I do use both in-memory and i/o custom backends that IMO are
> > a must).
> >
> > In that sense, it is super-important that custom backends are easy to
> > plug (even if you are absolutely legitimately dissatisfied with the
> > existing ones).
> >
> >
> > > Think of Mahout in 5 years what will be important? H2O? Hadoop
> Mapreduce?
> > > Flink? I’ll stake my dollar on no. GPUs yes and up the stakes.
> > > Streaming online learning (kappa style) yes but not sure Mahout is
> > > made for this right now.
> > >
> > > Or if we are talking about web site revamp +1, I’d be happy to
> > > upgrade my section and have only held off waiting to see a redesign
> > > or moving to Jekyll.
> > >
> > > As to a new mascot, ok, but the old one fits the name. We tried
> > sub-naming
> > > Mahout-Samsara to symbolize the changing nature and rebirth of the
> > project,
> > > maybe we should drop the name Mahout altogether. the name Mahout,
> > > like
> > the
> > > blue man, is not relevant to the project anymore and maybe renaming,
> > > is good for marketing.
> > >
> > > On Mar 24, 2017, at 7:37 AM, Nikolai Sakharnykh
> > > 
> > > wrote:
> > >
> > > Agree that the website feels outdated. I would add Samsara code
> > > example
> > on
> > > the front page, list of key algorithms implemented, supported
> > > backends, github & download links, and cut down the news part
> > > especially towards
> > the
> > > end with flat release numbers and dates. Also probably reorganize
> > > the
> > tabs.
> > >
> > > If we go with honey badger as a mascot do we have any ideas on the
> > > logo itself? Honey badger biting/eating a snake?)
> > >
> > > -Original Message-
> > > From: Trevor Grant [mailto:trevor.d.gr...@gmail.com]
> > > Sent: Thursday, March 23, 2017 8:53 PM
> > > To: d...@mahout.apache.org
> > > Cc: user@mahout.apache.org
> > > Subject: Re: Marketing
> > >
> > > A student once asked his teacher, "Master, what is enlightenment?"
> > >
> > > The master replied, "When hungry, eat. When tired, sleep."
> > >
> > > Sounds like the honey badger to me...
> &

Re: Marketing

2017-03-24 Thread Trevor Grant
I don't think the backends we have now off the shelf are particularly
exciting, but the fact you CAN plug different ones back in is the value
prop (and a big one that we need to 'sell' more). The difference is subtle
but since this is the marketing thread also worth bringing up. Basically to
your point in paragraph two.   Flink, H2O, Spark, they come and go- with
Mahout your algorithms keep porting (more on this shortly).

Kappa arch- yes, we need to start thinking at least how that is going to
play into mahout.  I agree with you 100% on this.

It sounds like we're getting quorum at least on website revamp. Awesome.  I
think we should solve the other issues (name, logo, etc.) but at least
start looking for community memebers that have the time and skill / trying
to recruit people into the project who do.  In my mind, the website
revamp+jekyl should happen simultaneously.  Let's just build the new site
in Jekyll and then when its ready we'll switch techs + launch in one shot.
If you have some good info, it might be prudent to just post it now, bc I
don't know what the time line for all this will look like.

Naming- can a project simply change its name? Is that even an option.  If
it is- might be a way to go, but do a transition- e.g. Introduce
-Mahout but slowly drop the mahout part, until finally its just
. Or do it quick like a bandaid on a major release (e.g. 0.14.0).  Or
do we just call it Apache Samsara, which, going back to the modular
backends is kind of elegantly appropriate in that the backend have
lifecycles, but your "algorithmic soul" is persistently reincarnated.
There is a 'story' there of - look we had the first big data ML package,
and when the back end it was built for began to fade, so did our projects.
We were the first to have to stare down the barrel of the pains of backends
coming and going so the whole point of this project was to prevent that
from happening again.

Re: nikolai-
100% agree with the structure changes proposed.  Let's find someone who is
willing and able to take point on that project, then start brainstorming
layout but this is a good start. Or even better, lets start a jira ticket
to discuss structure of new site (since again, I think we agree that DOES
need to happen, and we're really just waiting on some techicalities of
how/when/who/what).

I know i brought up that honey badgers eat snakes (python), the one big
danger in this- if we ever do decide to implement python bindings then all
of the sudden things get awkward. (I'm imaginging going to a Python meetup
to talk about new Mahout-Samsara- bindings, and some one asks,
"why is he eating a snake", "oh well because at the time we thought python
was trash and we were very arrogant").





Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Mar 24, 2017 at 10:27 AM, Pat Ferrel  wrote:

> The multiple backend support is such a waste of time IMO. The DSL and GPU
> support is super important and should be made even more distributed. The
> current (as I understand it) single threaded GPU per VM is only the first
> step in what will make Mahout important for a long time to come.
>
> Think of Mahout in 5 years what will be important? H2O? Hadoop Mapreduce?
> Flink? I’ll stake my dollar on no. GPUs yes and up the stakes. Streaming
> online learning (kappa style) yes but not sure Mahout is made for this
> right now.
>
> Or if we are talking about web site revamp +1, I’d be happy to upgrade my
> section and have only held off waiting to see a redesign or moving to
> Jekyll.
>
> As to a new mascot, ok, but the old one fits the name. We tried sub-naming
> Mahout-Samsara to symbolize the changing nature and rebirth of the project,
> maybe we should drop the name Mahout altogether. the name Mahout, like the
> blue man, is not relevant to the project anymore and maybe renaming, is
> good for marketing.
>
> On Mar 24, 2017, at 7:37 AM, Nikolai Sakharnykh 
> wrote:
>
> Agree that the website feels outdated. I would add Samsara code example on
> the front page, list of key algorithms implemented, supported backends,
> github & download links, and cut down the news part especially towards the
> end with flat release numbers and dates. Also probably reorganize the tabs.
>
> If we go with honey badger as a mascot do we have any ideas on the logo
> itself? Honey badger biting/eating a snake?)
>
> -Original Message-
> From: Trevor Grant [mailto:trevor.d.gr...@gmail.com]
> Sent: Thursday, March 23, 2017 8:53 PM
> To: d...@mahout.apache.org
> Cc: user@mahout.apache.org
> Subject: Re: Marketing
>
> A student once asked his teacher, "Master, what is en

Re: Marketing

2017-03-23 Thread Trevor Grant
A student once asked his teacher, "Master, what is enlightenment?"

The master replied, "When hungry, eat. When tired, sleep."

Sounds like the honey badger to me...

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Thu, Mar 23, 2017 at 5:43 PM, Pat Ferrel  wrote:

> The little blue man (the mahout) was reborn (samsara) as a honey-badger?
> He must be close indeed to reaching true enlightenment, or is that Buddhism?
>
>
> On Mar 23, 2017, at 12:42 PM, Andrew Palumbo  wrote:
>
> +1 on revamp.
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
>  Original message 
> From: Trevor Grant 
> Date: 03/23/2017 12:36 PM (GMT-08:00)
> To: user@mahout.apache.org, d...@mahout.apache.org
> Subject: Marketing
>
> Hey user and dev,
>
> With 0.13.0 the Apache Mahout project has added some significant updates.
>
> The website is starting to feel 'dated' I think it could use a reboot.
>
> The blue person riding the elephant has less signifigance in
> Mahout-Samsara's modular backends.
>
> Would like to open the floor to discussion on website reboot (and who might
> be willing to take on such a project), as well as new mascot.
>
> To kick off- in an offline talk there was the idea of
> A honey badger (bc honey-badger don't care, just like mahout don't care
> what back end or native solvers you are using, and also bc a cobra bites a
> honey badger and he takes a little nap then wakes up and finishes eating
> the cobra. honey badger eats snakes, and does all the work while the other
> animals pick up the scraps.
> see this short documentary on the honey badger:
> https://www.youtube.com/watch?v=4r7wHMg5Yjg )
> ^^audio not safe for work
>
> Con: its almost tooo jokey.
>
> Other idea: are coy-wolfs.
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>


Marketing

2017-03-23 Thread Trevor Grant
Hey user and dev,

With 0.13.0 the Apache Mahout project has added some significant updates.

The website is starting to feel 'dated' I think it could use a reboot.

The blue person riding the elephant has less signifigance in
Mahout-Samsara's modular backends.

Would like to open the floor to discussion on website reboot (and who might
be willing to take on such a project), as well as new mascot.

To kick off- in an offline talk there was the idea of
A honey badger (bc honey-badger don't care, just like mahout don't care
what back end or native solvers you are using, and also bc a cobra bites a
honey badger and he takes a little nap then wakes up and finishes eating
the cobra. honey badger eats snakes, and does all the work while the other
animals pick up the scraps.
see this short documentary on the honey badger:
https://www.youtube.com/watch?v=4r7wHMg5Yjg )
^^audio not safe for work

Con: its almost tooo jokey.

Other idea: are coy-wolfs.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: [VOTE] Apache Mahout 0.13.0 Release Candidate

2017-03-03 Thread Trevor Grant
+1

Sigs are good


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Mar 3, 2017 at 12:17 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Confirmed hashes and sigs, tested operations in the shell in src and bin
> artifacts. Would like someone else to check sigs too.
>
> +1 (binding)
>
> On Wed, Mar 1, 2017 at 9:39 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > wrote:
>
> > New RC for 0.13.0 release out; please try out the new artifacts at
> > https://repository.apache.org/content/repositories/orgapachemahout-1034
> >
> > The vote will be going for at least 72 hours and will be closed on
> Friday,
> > March 3rd, 2017 or once there are at least 3 PMC +1 binding votes
> (whichever
> > occurs earlier).  Please download, test and vote with
> >
> > [ ] +1, accept RC as the official 0.13.0 release of Apache Mahout
> > [ ] +0, I don't care either way,
> > [ ] -1, do not accept RC as the official 0.13.0 release of Apache Mahout,
> > because...
> >
> > The git tag to be voted upon is mahout-0.13.0
> >
> > On Wed, Mar 1, 2017 at 11:45 AM, Andrew Palumbo 
> > wrote:
> >
> >> I will verify keys tonight.
> >>
> >>
> >>
> >> Sent from my Verizon Wireless 4G LTE smartphone
> >>
> >>
> >>  Original message 
> >> From: Andrew Musselman 
> >> Date: 03/01/2017 10:20 AM (GMT-08:00)
> >> To: user@mahout.apache.org, d...@mahout.apache.org
> >> Subject: Re: [VOTE] Apache Mahout 0.13.0 Release Candidate
> >>
> >> Nevermind, that was before building the src distro.
> >>
> >> Shell works fine with src and binary distros.
> >>
> >> On Wed, Mar 1, 2017 at 9:39 AM, Andrew Musselman <
> >> andrew.mussel...@gmail.com
> >> > wrote:
> >>
> >> > I'm getting this when starting the spark-shell on a Mac:
> >> >
> >> > Loading /Users/andrew.musselman/Downloads/mahout-testing/
> >> > apache-mahout-distribution-0.13.0/bin/load-shell.scala...
> >> > :36: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.math._
> >> >  ^
> >> > :19: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.math.scalabindings._
> >> >  ^
> >> > :19: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.math.drm._
> >> >  ^
> >> > :19: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.math.scalabindings.RLikeOps._
> >> >  ^
> >> > :19: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.math.drm.RLikeDrmOps._
> >> >  ^
> >> > :19: error: object mahout is not a member of package
> org.apache
> >> >import org.apache.mahout.sparkbindings._
> >> >  ^
> >> > :21: error: object mahout is not a member of package
> org.apache
> >> >implicit val sdc: org.apache.mahout.sparkbinding
> >> s.SparkDistributedContext
> >> > = sc2sdc(sc)
> >> > ^
> >> > :21: error: not found: value sc2sdc
> >> >implicit val sdc: org.apache.mahout.sparkbinding
> >> s.SparkDistributedContext
> >> > = sc2sdc(sc)
> >> >
> >> > On Wed, Mar 1, 2017 at 9:21 AM, Andrew Musselman 
> >> wrote:
> >> >
> >> >> I've confirmed hashes and sigs; if someone other than me could
> confirm
> >> >> all three sigs it'd be good, e.g.:
> >> >>
> >> >> `gpg --verify apache-mahout-distribution-0.13.0-src.tar.gz.asc`
> >> >> `gpg --verify apache-mahout-distribution-0.13.0.pom.asc`
> >> >> `gpg --verify apache-mahout-distribution-0.13.0.tar.gz.asc`
> >> >>
> >> >> I'll vote after running some tests.
> >> >>
> >> >> On Tue, Feb 28, 2017 at 10:58 PM, Andrew Musselman 
> >> >> wrote:
> >> >>
> >> >>> This is the vote for release 0.13.0 of Apache Mahout.
> >> >>>
> >> >>> The vote will be going for at least 72 hours and will be closed on
> >> >>> Friday,
> >> >>> March 3rd, 2017 or once there are at least 3 PMC +1 binding votes
> >> (whichever
> >> >>> occurs earlier).  Please download, test and vote with
> >> >>>
> >> >>> [ ] +1, accept RC as the official 0.13.0 release of Apache Mahout
> >> >>> [ ] +0, I don't care either way,
> >> >>> [ ] -1, do not accept RC as the official 0.13.0 release of Apache
> >> Mahout
> >> >>> ,
> >> >>> because...
> >> >>>
> >> >>>
> >> >>> Maven staging repo:
> >> >>>
> >> >>> https://repository.apache.org/content/repositories/orgapache
> >> >>> mahout-1033/org/apache/mahout/apache-mahout-distribution/0.13.0/
> >> >>>
> >> >>> The git tag to be voted upon is mahout-0.13.0
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
> >
>


Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Trevor Grant
Answers inline below.


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Feb 7, 2017 at 2:31 PM, Saikat Kanjilal  wrote:

> @Trevor Grant
>
> The landscape in machine learning is getting more and more diluted with
> lots of tools, here's a question, given that some folks are taking R and
> connecting it to spark and map reduce to make the R algorithms work at
> scale (https://msdn.microsoft.com/en-us/microsoft-r/scaler/scaler) what
> would be the additional value added in porting the R code using the
> algorithms/samsara framework, to me the MRS efforts and the approach you
> are proposing are 2 parallel tracks,


Correct, one is a commercial product by Microsoft- the other is a
business-friendly open source Apache Software Foundation Project.


> as far as the barriers to entry to contributing I think its largely due to
> the complexity of the codebase and the lack of familiarity with Samsara,

This  is what we hope to overcome with Algorithms framework and perhaps
more documentation.

I'd love to help create some good docs/tutorials on both the algorithms
> framework and samsara when and where it makes sense,

Would love the help- will be easier once we get migrated to Jekyll. (More
motivation to do this).


> however I feel like it'd be useful to really identify the use cases where
> using the algorithms/samsara approach has clear wins versus MRS

When you don't want to pay Microsoft to use your work in production.


> with spark or spark by itself or python/scikit-learn,

Out of scope for Mahout project, but I do have a talk forth coming that
will address this- stay tuned.


> I've found that in general people dont really need custom algorithms in
> datascience , they typically are answering some very basic classification
> or clustering question and can use linear/logistic regression or a variant
> of kmeans.

That has not been my experience.  In fact quite the opposite- most people
need more depth to their algorithms and many other big data ML packages
imply they have more depth than basic linear/logisitc regresion + kmeans,
but in fact that is all their is.  Not to say one is right or wrong- the
data scientists who are happy with simple tools can find them in
SparkML/FlinkML, those who need more advanced tools may turn to Mahout.


> I'd also like to help dig into some use cases with Samsara and put those
> use cases maybe in the examples section.
>
 Tutorials would be great- q.e.d. - more documentation would be helpful.


>
> Thoughts?
>
> ScaleR Functions - msdn.microsoft.com<https://msdn.microsoft.com/en-us/
> microsoft-r/scaler/scaler>
> msdn.microsoft.com
> The RevoScaleR package provides a set of over one hundred portable,
> scalable, and distributable data analysis functions. This topic presents a
> curated list ...
>
>
>
>
> 
> From: Trevor Grant 
> Sent: Tuesday, February 7, 2017 8:47 AM
> To: user@mahout.apache.org; isa...@apache.org
> Subject: Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation
>
> The idea that Andy briefly touched on, is that the Algorithm Framework
> (hopefully) paves the way for R/CRAN like user contribution.
>
> Increased contribution was a goal I had certainly hoped for.  I have begun
> promoting the idea at Meetups.  There hasn't been a concerted effort to
> push the idea, however it is a tagline / call to action I am planning on
> pushing at talks and conferences this spring. Thank you for raising the
> issue on the mailing list as well.
>
> Using the Samsara framework and "Algorithms" framework, it is hoped the the
> barrier to entry for new contributors will be very low, and that they can
> introduce new algorithms or port them from R. Other 'Big Data' Machine
> Learning frameworks suffer because they are not easily extensible.
>
> The algorithms framework makes it (more) clear where a new algorithm would
> go, and in general how it should behave. E.g. This is a Regressor, ok
> probably goes in the regressor package- it needs a fit method that takes a
> DrmX and a DrmY, and a predict method that takes DrmX and returns
> DrmY_hat).  The algorithms framework also provides a consistent interface
> across algorithms and puts up "guard rails" to ensure common things are
> done in an efficient manner (e.g. Serializing just the model, not the
> fitter and additional unneeded things, thank you Dmitriy). The Samsara
> framework makes it easy to 'read' what the person is doing. This makes it
> easier to review PRs, encourages community review, and if (hopefully not,
> but in case it do

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Trevor Grant
The idea that Andy briefly touched on, is that the Algorithm Framework
(hopefully) paves the way for R/CRAN like user contribution.

Increased contribution was a goal I had certainly hoped for.  I have begun
promoting the idea at Meetups.  There hasn't been a concerted effort to
push the idea, however it is a tagline / call to action I am planning on
pushing at talks and conferences this spring. Thank you for raising the
issue on the mailing list as well.

Using the Samsara framework and "Algorithms" framework, it is hoped the the
barrier to entry for new contributors will be very low, and that they can
introduce new algorithms or port them from R. Other 'Big Data' Machine
Learning frameworks suffer because they are not easily extensible.

The algorithms framework makes it (more) clear where a new algorithm would
go, and in general how it should behave. E.g. This is a Regressor, ok
probably goes in the regressor package- it needs a fit method that takes a
DrmX and a DrmY, and a predict method that takes DrmX and returns
DrmY_hat).  The algorithms framework also provides a consistent interface
across algorithms and puts up "guard rails" to ensure common things are
done in an efficient manner (e.g. Serializing just the model, not the
fitter and additional unneeded things, thank you Dmitriy). The Samsara
framework makes it easy to 'read' what the person is doing. This makes it
easier to review PRs, encourages community review, and if (hopefully not,
but in case it does happen) someone makes a so-called 'drive by commit',
that is commits an algorithm and is never heard of again, others can easily
understand and maintain the algorithm in the persons absence.

There are a number of issues labeled as beginner in JIRA now, especially
with respect to the Algorithms package.

It would probably be good to include a lot of this information in a web
page either here https://mahout.apache.org/developers/how-to-contribute.html
or on a page that is linked to by that.

Which leads me in to the last 'piece of the puzzle' I would like to have in
place before aggressively advertising this as a "new-contributor friendly"
project, migrating CMS to Jekyll
https://issues.apache.org/jira/browse/MAHOUT-1933

The rationale for that is so when new algorithms are submitted, the PR will
include relevant documentation (as a convention) and that documentation can
be corrected / expanded as needed in a more non-committer friendly manner.






Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Feb 7, 2017 at 4:30 AM, Isabel Drost  wrote:

> On Wed, Feb 01, 2017 at 03:32:24PM -0800, Dmitriy Lyubimov wrote:
> > Isabel, if i understand it correctly, you are asking whether it makes
> sense
> > add end2end scenarios based on Samsara to current codebase?
>
> Sorry for being fuzzy. The meta question that I'm trying to find an answer
> for
> is if there's something can/ should be done to increase the number of
> people
> that potentially could be assimilated and turned into committers one day.
> One
> specific idea I had on my mind was to make the project easier to use for
> beginners, one idea to get that accomplished I had was to focus on end to
> end
> implementations of popular use cases. (Sorry, fairly meta...)
>
>
> > The answer is, absolutely. Yes it does for both rather isolated issues
> > (like computing clusters) and end-2-end scenarios.
> >
> > The only problem with end 2 end scenarious is they often difficult to
> > demonstrate with batch-oriented coputational system only. That's what
> > prediction.io kind of picked on with COO, they included all of data
> > ingestion, computation and real time scoring queries.
> >
> > But yes, there's, absolutely, tons of value in that. Not everything fits
> > quite nicely, and not everything fits end-2-end (just like with R), but
> > some fairly significant pieces do fit to be written on top.
>
> Makes sense.
>
>
> > > Where do we start? ;)
> > >
> >
> > I would start with figuring a problem I want to solve AND I have a budget
> > to do it AND i can legally contribute on behalf of the IP owner.
>
> I guess given the meta explanation above - if increase in contributions
> was a
> goal one could also think about making potential areas of contribution
> explicit
> and highlight the value the project brings compared to other systems with a
> specific focus on samsara. That's another angle of me asking weird
> questions
> here.
>
>
> > Then we can think of whether it is a good fit (Samsara is mostly limited
> to
> > tensor based data only, just like Mapreduce DRM was/is). Some things may
> > not have a convenient algebraic formulation.
>
> +1
>
> Isabel
>
> --
> Sorry for any typos: Mail was typed in vim, written in mutt, via ssh (most
> likely involving some kind of mobile connection only.)
>


Fwd: IoT at the ASF -- ApacheCon and Project DOAPs [was: Does your project play in the IoT space?]

2017-02-04 Thread Trevor Grant
The first Apache IoT mini-con is happening this year at ApacheCon, Miami!!

http://us.apacheiot.org/

The following is a snipped from Roman Shaponshnik on the dev@community
list, perfectly describes the spirit of the mini-con:

"The whole premise of the track will be "Not your gramps IoT" which means
that unlike IoT events that grew out of the embedded industry we're talking
a very holistic, system view on IoT. Our hope is that Apache IoT will be a
meeting place for next generation IoT 2.0 built by developer, for
developers under the Apache Way governance model.

ASF's breadth starts making a lot of sense when you consider what kind of
technology is needed to build an end-to-end user experience in IoT 2.0: you
start at the edge, you consider the gateways, go to a data center and end
up on a client mobile device. All technology providers are now realizing
that the key to success is allowing developers unprecedented ease of
management and deployment of their business logic all throughout these
layers. Just look at what Amazon is doing with Lambda on the edge
(Amazon's Greengrass)!

The good news is that at ASF we've got all the building blocks available to
us in various communities. So regardless of whether you're an Apache
Mynewt (incubating) developer working on the far fringes of the edge, or
you are a Apache Brooklyn developer automating microservices provisioning
or you're plumbing data streams with Kafka, NiFi or Geode or
you're analyzing that data with Hadoop or you're a Tomcat or httpd
guru facilitating the end-user experience -- we all have pieces to
contribute to the IoT 2.0 puzzle."

I apologize for the mass email (and if you got this multiple times)
however, the Call for Submissions closes February 11th.  All of the
communities copied have something of significance to contribute to IoT, and
the list is not exhaustive- please feel free to forward to any who might be
interested.

Thanks and see you in Miami!


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Trevor Grant
Hello Isabel and Florent,

I'm currently working on a side-by-side demo of R / Python / SparkML(Mllib)
/ Mahout, but in very broad strokes here is how I would compare them:

R- Most statistical functionality.  Most flexibility.  Implement your own
algorithms- mathematically expressive language.  Worst performance- handles
only "small" data sets.  Language is 'math centric'. Easy to extend /
create new algos

Python (sklearn/scikit) - Some mathematical / statistical functionality,
more focused on machine learning. Machine learning library very
sophisticated though.  Much better performance than R, still only single
node. "small to medium" data sets. Language is 'programmer centric'.
Somewhat difficult to extend / create new algos

SparkML / Mllib - Very Limited Mathematical functionality (usually collects
to driver to do anything of substance).  Machine learning rudimentary
compared to sklearn, but still non-trivial one of the best available.
Exceeding performance, well suited to "big" data sets. Language is
'programmer centric'. Very difficult to extend / create new algos.

(FlinkML - Fits in same spot as SparkML, but significantly less developed)

Mahout - Good mathematical functionality.  Good performance relative to
underlying engine (possibly superior with MAHOUT-1885).  Language is 'math
centric'.  Well suited to "medium and big" data sets. Fairly easy to extend
/ create new algos (MAHOUT-1856)

I hope that provides a high level comparison.

Re use cases- the tool to use depends on the job at hand.
Highly advanced mathematical model, small dataset or sampling from full
dataset OK -> Use R
Machine learning on small to medium data set or sampling from full dataset
OK -> Use Python / sklearn
Less sophisticated machine learning on Large dataset -> SparkML
Custom mathematical/statistical model on medium to large data -> Mahout

^^ All of this is just my opinion.

Re: integration-

We're working on that too.  Recently MAHOUT-1896 added convenience methods
for interacting with MLLib type RDDs, and DataFrames
https://issues.apache.org/jira/browse/MAHOUT-1896

(No support yet for SparkML type dataframes, or spitting DRMs back out into
RDDs/DataFrames).

Finally Docs: There has been some talk for sometime of migrating the
website from CMS to Jekyll and its something I strongly support.  The CMS
makes it difficult to keep up with documentation, and Jekyll would open up
documentation /website maintenance to contributors.

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Jan 31, 2017 at 5:31 AM, Florent Empis 
wrote:

> Hi,
>
> I am in the same spot as Isabel.
> Used to use/understand most of the «old» standalone mahout, now doing some
> data transformation with spark, but I am not sure where Samsara fits in the
> ecosystem.
> We also do quite a bit of computation in R.
> Basically we are willing to learn and support the project by for instance
> buying the books Rob mentioned, but a short doc with the outline Isabel
> describes would be great!
>
> Many thanks,
>
> Florent
>
>
> Le 31 janv. 2017 12:01, "Isabel Drost-Fromm"  a écrit :
>
>
> Hi,
>
> On Fri, Sep 16, 2016 at 11:36:03PM -0700, Andrew Musselman wrote:
> > and we're thinking about just how many pre-built algorithms we
> > should include in the library versus working on performance behind the
> > scenes.
>
> To pick this question up: I've been watching Mahout from a distance for
> quite
> some time. So from what limited background I have of Samsara I really like
> it's
> approach to be able to run on more than one execution engine.
>
> To give some advise to downstream users in the field - what would be your
> advise
> for people tasked with concrete use cases (stuff like fraud detection,
> anomaly
> detection, learning search ranking functions, building a recommender
> system)? Is
> that something that can still be done with Mahout? What would it take to
> get
> from raw data to finished system? Is there something we can do to help
> users get
> that accomplished? Is there even interest from users in such a use case
> based
> perspective? If so, would there be interest among the Mahout committers to
> help
> users publicly create docs/examples/modules to support these use cases?
>
>
> Isabel
>


Location of JARs

2016-06-01 Thread Trevor Grant
I'm trying to refactor the Mahout dependency from the pom.xml of the Spark
interpreter (adding Mahout integration to Zeppelin)

Assuming MAHOUT_HOME is available, I see that the jars in source build live
in a different place than the jars in the binary distribution.

I'm to the point where I'm trying to come up with a good place to pick up
the required jars while allowing for:
1. flexability in Mahout versions
2. Not writing a huge block of code designed to scan several conceivable
places throughout the file system.

One thought was to put the onus on the user to move the desired jars to a
local repo within the Zeppelin directory.

Wanted to open up to input from users and dev as I consider this.

Is documentation specifying which JARs need to be moved to a specific
directory and places you are likely to find them to much to ask of users?

Other approaches?

For background, Zeppelin starts a Spark Shell and we need to make sure all
of the required Mahout jars get loaded in the class path when spark starts.
The question is where do all of these JARs relatively live.

Thanks for any feedback,
tg






Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*