[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111856#comment-14111856
 ] 

Ted Dunning commented on MAHOUT-1500:
-


optimism seems warranted.  Worst case is a revert.

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Andrew Palumbo
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Anand Avati (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111691#comment-14111691
 ] 

Anand Avati commented on MAHOUT-1500:
-

Or optimistically merge, and fix up if things break with a specific error?

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Andrew Palumbo
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Anand Avati (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111687#comment-14111687
 ] 

Anand Avati commented on MAHOUT-1500:
-

[~Andrew_Palumbo], is it not possible to do a mock run to verify that 
definitively?

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Andrew Palumbo
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111500#comment-14111500
 ] 

Andrew Palumbo commented on MAHOUT-1500:


thanks [~tdunning]- It looks like its assigned to me now and i can change the 
assignee.   Any thoughts on the h20/pom.xml?   I only ask because I kind of 
remember the nightly build breaking around the time that the spark module was 
added, and having to run `mvn clean package install` for a few days while it 
was fixed. I'm not sure if this had anything to do with adding a new module or 
not- just wanted to double check.

Appreciate it!

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Andrew Palumbo
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAHOUT-1500) H2O integration

2014-08-26 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated MAHOUT-1500:


Assignee: Andrew Palumbo  (was: Ted Dunning)

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Andrew Palumbo
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111468#comment-14111468
 ] 

Ted Dunning commented on MAHOUT-1500:
-

Andrew,

Go ahead and do the merge without the assignment.  I can't assign this to you 
for some JIRA config reason.  I successfully assigned this to me, though, so I 
will chase down the config problem.

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Ted Dunning
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAHOUT-1500) H2O integration

2014-08-26 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning reassigned MAHOUT-1500:
---

Assignee: Ted Dunning

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
>Assignee: Ted Dunning
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111390#comment-14111390
 ] 

Andrew Palumbo commented on MAHOUT-1500:


[~pferrel] barring any problems with the h20/pom.xml,  I think this is good to 
go.  I'd like to merge it.   I'm unable to assign JIRA issues.  Could you 
assign this to me?  

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111343#comment-14111343
 ] 

ASF GitHub Bot commented on MAHOUT-1500:


Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/pull/21#issuecomment-53492639
  
Tests pass in distributed mode for me.  Could someone please double check 
the h2o/pom.xml for me?  I'm not sure if there's anything that needs to be 
added to not break the nightly build.


> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111312#comment-14111312
 ] 

ASF GitHub Bot commented on MAHOUT-1500:


Github user avati commented on a diff in the pull request:

https://github.com/apache/mahout/pull/21#discussion_r16741328
  
--- Diff: 
h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java ---
@@ -0,0 +1,68 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one or more
+ *  contributor license agreements.  See the NOTICE file distributed with
+ *  this work for additional information regarding copyright ownership.
+ *  The ASF licenses this file to You under the Apache License, Version 2.0
+ *  (the "License"); you may not use this file except in compliance with
+ *  the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.mahout.h2obindings.ops;
+
+import org.apache.mahout.h2obindings.H2OHelper;
+import org.apache.mahout.h2obindings.drm.H2ODrm;
+
+import water.MRTask;
+import water.fvec.Frame;
+import water.fvec.Vec;
+import water.fvec.Chunk;
+import water.fvec.NewChunk;
+
+public class AewScalar {
+  /* Element-wise DRM-DRM operations */
+  public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String 
op) {
--- End diff --

Oops, pushed the camelcase styling as well. Looks like I had accidentally 
overwrote a couple of commits when switching between workstation and laptop.


> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111257#comment-14111257
 ] 

ASF GitHub Bot commented on MAHOUT-1500:


Github user andrewpalumbo commented on a diff in the pull request:

https://github.com/apache/mahout/pull/21#discussion_r16739580
  
--- Diff: 
h2o/src/main/java/org/apache/mahout/h2obindings/ops/AewScalar.java ---
@@ -0,0 +1,68 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one or more
+ *  contributor license agreements.  See the NOTICE file distributed with
+ *  this work for additional information regarding copyright ownership.
+ *  The ASF licenses this file to You under the Apache License, Version 2.0
+ *  (the "License"); you may not use this file except in compliance with
+ *  the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.mahout.h2obindings.ops;
+
+import org.apache.mahout.h2obindings.H2OHelper;
+import org.apache.mahout.h2obindings.drm.H2ODrm;
+
+import water.MRTask;
+import water.fvec.Frame;
+import water.fvec.Vec;
+import water.fvec.Chunk;
+import water.fvec.NewChunk;
+
+public class AewScalar {
+  /* Element-wise DRM-DRM operations */
+  public static H2ODrm AewScalar(H2ODrm DrmA, final double s, final String 
op) {
--- End diff --

Possibly one more commit missing?


> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1486#comment-1486
 ] 

ASF GitHub Bot commented on MAHOUT-1500:


Github user avati commented on the pull request:

https://github.com/apache/mahout/pull/21#issuecomment-53476236
  
@andrewpalumbo re-applied the commit. Not sure how it got missed! Thanks 
for pointing..


> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
meant to be normal of course


On Tue, Aug 26, 2014 at 10:39 AM, Dmitriy Lyubimov 
wrote:

> scala, if you want) to write something like `new
> MultivariateUniformDistribution(mu,sigma).sample()`, so i really just dsl-
>


Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
on distributions, I did not find anything multivariate Mahout Matrix-based.
Hopefully, i did not look well enough. Everything univariate seems to be
pretty spotty. Aside from that, i need scala traits, plus i find it
extremely unelegant (un-scala, if you want) to write something like `new
MultivariateUniformDistribution(mu,sigma).sample()`, so i really just
dsl-bridged for most part. There are enough third party choices not to
bother with filling the gaps.

On step-recorded evolutionary search, after my literature search on the
topic, it doesnt look like even distant third best choice, in particular
under big data training settings.

First, i did not find head-to-head comparisons of that with any of top
choices. It is not included in Amplab survey of top search choices. GP-EI
is Netflix's choice, for example. So there's very little convincing data to
go on, to begin with. So given lack of such comparisons, the next best
thing is to copy what others do here.

Second, under big data settings, every data point (training) is precious.
In spark specifically, unlike MR,  since we want to retain as much data in
RAM is possible and avoid spills, best performance is usually achieved by
sequentially semaphoring trainings rather then throwing a whole bunch of
them out at once. Especially under circumstances where companies are
extremely anemic in provisioning hardware needed for whatever reason. In
that sense, exploration algorithms that are capable of making better
inference after each new data point, and arrive to a reasonably performing
model in ~20..30 sequential trains are infinitely more preferable, rather
than those that require a whole bunch of trainings to happen to begin to
figure the next centroid of trials. I am not even sure if step-recorded
search was even ever tried outside SGD where datapoints are abundant albeit
incomplete.



On Tue, Aug 26, 2014 at 8:32 AM, Ted Dunning  wrote:

> On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov 
> wrote:
>
> > This work is obviously also interesting in that it
> > establishes probabilistic framework in Mahout (distributions & gaussian
> > process).
> >
>
> We already have that.
>
> (distributions not GP)
>
> Note that we also have an implementation of recorded step evolutionary
> programming that works really well for hyper-parameter search.  I don't
> like the way that the API turned out (too hard to understand).
>


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110966#comment-14110966
 ] 

ASF GitHub Bot commented on MAHOUT-1500:


Github user andrewpalumbo commented on the pull request:

https://github.com/apache/mahout/pull/21#issuecomment-53455542
  
@avati - it looks like some of your changes from Dmitriy's style reviews 
have made it back into this branch could you please update those? 


> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110929#comment-14110929
 ] 

Andrew Palumbo commented on MAHOUT-1500:


[~pferrel] I do have a couple comments for anand, and need to look it over 
again - but yes- I can merge it.   I could use some guidance though as far as 
pushing a new module.  If someone could look over the h20/pom.xml 

https://github.com/avati/mahout/blob/MAHOUT-1500/h2o/pom.xml

for me I'd appreciate it.  

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Features by engine page

2014-08-26 Thread Ted Dunning
On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov  wrote:

> This work is obviously also interesting in that it
> establishes probabilistic framework in Mahout (distributions & gaussian
> process).
>

We already have that.

(distributions not GP)

Note that we also have an implementation of recorded step evolutionary
programming that works really well for hyper-parameter search.  I don't
like the way that the API turned out (too hard to understand).


[jira] [Commented] (MAHOUT-1500) H2O integration

2014-08-26 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110833#comment-14110833
 ] 

Pat Ferrel commented on MAHOUT-1500:


[~Andrew_Palumbo] are you planning to assign this to yourself and do the merge? 

> H2O integration
> ---
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
>  Issue Type: Improvement
>Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)