[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111856#comment-14111856 ] Ted Dunning commented on MAHOUT-1500: - optimism seems warranted. Worst case is a revert. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Andrew Palumbo > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111691#comment-14111691 ] Anand Avati commented on MAHOUT-1500: - Or optimistically merge, and fix up if things break with a specific error? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Andrew Palumbo > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111687#comment-14111687 ] Anand Avati commented on MAHOUT-1500: - [~Andrew_Palumbo], is it not possible to do a mock run to verify that definitively? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Andrew Palumbo > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111500#comment-14111500 ] Andrew Palumbo commented on MAHOUT-1500: thanks [~tdunning]- It looks like its assigned to me now and i can change the assignee. Any thoughts on the h20/pom.xml? I only ask because I kind of remember the nightly build breaking around the time that the spark module was added, and having to run `mvn clean package install` for a few days while it was fixed. I'm not sure if this had anything to do with adding a new module or not- just wanted to double check. Appreciate it! > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Andrew Palumbo > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-1500: Assignee: Andrew Palumbo (was: Ted Dunning) > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Andrew Palumbo > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111468#comment-14111468 ] Ted Dunning commented on MAHOUT-1500: - Andrew, Go ahead and do the merge without the assignment. I can't assign this to you for some JIRA config reason. I successfully assigned this to me, though, so I will chase down the config problem. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Ted Dunning > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning reassigned MAHOUT-1500: --- Assignee: Ted Dunning > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati >Assignee: Ted Dunning > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111390#comment-14111390 ] Andrew Palumbo commented on MAHOUT-1500: [~pferrel] barring any problems with the h20/pom.xml, I think this is good to go. I'd like to merge it. I'm unable to assign JIRA issues. Could you assign this to me? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111343#comment-14111343 ] ASF GitHub Bot commented on MAHOUT-1500: Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53492639 Tests pass in distributed mode for me. Could someone please double check the h2o/pom.xml for me? I'm not sure if there's anything that needs to be added to not break the nightly build. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111312#comment-14111312 ] ASF GitHub Bot commented on MAHOUT-1500: Github user avati commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16741328 --- Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AewScalar { + /* Element-wise DRM-DRM operations */ + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) { --- End diff -- Oops, pushed the camelcase styling as well. Looks like I had accidentally overwrote a couple of commits when switching between workstation and laptop. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111257#comment-14111257 ] ASF GitHub Bot commented on MAHOUT-1500: Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/21#discussion_r16739580 --- Diff: h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mahout.h2obindings.ops; + +import org.apache.mahout.h2obindings.H2OHelper; +import org.apache.mahout.h2obindings.drm.H2ODrm; + +import water.MRTask; +import water.fvec.Frame; +import water.fvec.Vec; +import water.fvec.Chunk; +import water.fvec.NewChunk; + +public class AewScalar { + /* Element-wise DRM-DRM operations */ + public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String op) { --- End diff -- Possibly one more commit missing? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1486#comment-1486 ] ASF GitHub Bot commented on MAHOUT-1500: Github user avati commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53476236 @andrewpalumbo re-applied the commit. Not sure how it got missed! Thanks for pointing.. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Features by engine page
meant to be normal of course On Tue, Aug 26, 2014 at 10:39 AM, Dmitriy Lyubimov wrote: > scala, if you want) to write something like `new > MultivariateUniformDistribution(mu,sigma).sample()`, so i really just dsl- >
Re: Features by engine page
on distributions, I did not find anything multivariate Mahout Matrix-based. Hopefully, i did not look well enough. Everything univariate seems to be pretty spotty. Aside from that, i need scala traits, plus i find it extremely unelegant (un-scala, if you want) to write something like `new MultivariateUniformDistribution(mu,sigma).sample()`, so i really just dsl-bridged for most part. There are enough third party choices not to bother with filling the gaps. On step-recorded evolutionary search, after my literature search on the topic, it doesnt look like even distant third best choice, in particular under big data training settings. First, i did not find head-to-head comparisons of that with any of top choices. It is not included in Amplab survey of top search choices. GP-EI is Netflix's choice, for example. So there's very little convincing data to go on, to begin with. So given lack of such comparisons, the next best thing is to copy what others do here. Second, under big data settings, every data point (training) is precious. In spark specifically, unlike MR, since we want to retain as much data in RAM is possible and avoid spills, best performance is usually achieved by sequentially semaphoring trainings rather then throwing a whole bunch of them out at once. Especially under circumstances where companies are extremely anemic in provisioning hardware needed for whatever reason. In that sense, exploration algorithms that are capable of making better inference after each new data point, and arrive to a reasonably performing model in ~20..30 sequential trains are infinitely more preferable, rather than those that require a whole bunch of trainings to happen to begin to figure the next centroid of trials. I am not even sure if step-recorded search was even ever tried outside SGD where datapoints are abundant albeit incomplete. On Tue, Aug 26, 2014 at 8:32 AM, Ted Dunning wrote: > On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov > wrote: > > > This work is obviously also interesting in that it > > establishes probabilistic framework in Mahout (distributions & gaussian > > process). > > > > We already have that. > > (distributions not GP) > > Note that we also have an implementation of recorded step evolutionary > programming that works really well for hyper-parameter search. I don't > like the way that the API turned out (too hard to understand). >
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110966#comment-14110966 ] ASF GitHub Bot commented on MAHOUT-1500: Github user andrewpalumbo commented on the pull request: https://github.com/apache/mahout/pull/21#issuecomment-53455542 @avati - it looks like some of your changes from Dmitriy's style reviews have made it back into this branch could you please update those? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110929#comment-14110929 ] Andrew Palumbo commented on MAHOUT-1500: [~pferrel] I do have a couple comments for anand, and need to look it over again - but yes- I can merge it. I could use some guidance though as far as pushing a new module. If someone could look over the h20/pom.xml https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/pom.xml for me I'd appreciate it. > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Features by engine page
On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov wrote: > This work is obviously also interesting in that it > establishes probabilistic framework in Mahout (distributions & gaussian > process). > We already have that. (distributions not GP) Note that we also have an implementation of recorded step evolutionary programming that works really well for hyper-parameter search. I don't like the way that the API turned out (too hard to understand).
[jira] [Commented] (MAHOUT-1500) H2O integration
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110833#comment-14110833 ] Pat Ferrel commented on MAHOUT-1500: [~Andrew_Palumbo] are you planning to assign this to yourself and do the merge? > H2O integration > --- > > Key: MAHOUT-1500 > URL: https://issues.apache.org/jira/browse/MAHOUT-1500 > Project: Mahout > Issue Type: Improvement >Reporter: Anand Avati > Fix For: 1.0 > > > Provide H2O backend for the Mahout DSL -- This message was sent by Atlassian JIRA (v6.2#6252)