[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2015-03-05 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349754#comment-14349754
 ] 

Andrew Palumbo commented on MAHOUT-1603:


this can be closed, right?

> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
>  Labels: DSL, scala, spark
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093397#comment-14093397
 ] 

Hudson commented on MAHOUT-1603:


FAILURE: Integrated in Mahout-Quality #2739 (See 
[https://builds.apache.org/job/Mahout-Quality/2739/])
MAHOUT-1603: Tweaks for Spark 1.0.x (dlyubimov & pferrel) (dlyubimov: rev 
ee6359f621b508ab7f21df0316941e68c75eb3e5)
* spark/src/test/scala/org/apache/mahout/sparkbindings/blas/ABtSuite.scala
* spark/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala
* 
spark/src/test/scala/org/apache/mahout/sparkbindings/test/DistributedSparkSuite.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/blas/BlasSuite.scala
* spark/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala
* CHANGELOG
* spark/src/main/scala/org/apache/mahout/drivers/MahoutDriver.scala
* math-scala/src/test/scala/org/apache/mahout/test/LoggerConfiguration.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/blas/AtASuite.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/blas/AewBSuite.scala
* pom.xml
* spark/src/main/scala/org/apache/mahout/sparkbindings/SparkEngine.scala
* 
spark/src/test/scala/org/apache/mahout/sparkbindings/test/LoggerConfiguration.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/blas/AtSuite.scala
* spark/src/test/scala/org/apache/mahout/drivers/ItemSimilarityDriverSuite.scala


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093233#comment-14093233
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user asfgit closed the pull request at:

https://github.com/apache/mahout/pull/40


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-08 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091158#comment-14091158
 ] 

Dmitriy Lyubimov commented on MAHOUT-1603:
--

merged to apache/spark-1.0.x branch (here: 
https://git-wip-us.apache.org/repos/asf?p=mahout.git;a=shortlog;h=refs/heads/spark-1.0.x)

> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091146#comment-14091146
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51642800
  
excellent. seems to be working for me. 

let me squash it and merge to apache/mahout spark_1.0.x. 


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089966#comment-14089966
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51541649
  
OK, pushed it back to you. The test pass. All drivers and tests share a 
single context and man are they fast now. Still using DistributedSparkSuite.

BTW the tmp dir really needs to be deleted beforeAll and afterEach for 
convenience not afterAll so I changed that. 


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089760#comment-14089760
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51526214
  
Btw your handling of temporary directory is quite to the point, you may 
quite possibly make it part of MahoutSuite. Then i have one other place that 
may use it. Also see similar code in MahoutTest java class for JUnit -- we 
could just use that i guess.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089745#comment-14089745
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51525329
  
@pferrel: So one problem with those tests is that they are creating 2 spark
sessions. 1 session is created by tests and another session is created by
driver.

Spark is very strict with this:

(1) Spark is not reentrant w.r.t. session creation are non-reentrant (not
just thread-unsafe) -- meaning you can only safely have at most 1 session
at a time in a jvm.
(2) Spark session itself is reentrant -- meaning multiple threads may
invoke asynchronous computational actions on the same session.

This may not always manifest, but in the end it always will (ask me how i
know :)

so the problem with those tests is that they probably must not be featuring
DistributedSparkSuite but rather just MahoutSuite. Or alternatively you may
pass an already existing mahoutContext to the driver code for reuse. But
you must ensure the above constraint. The effects will range dramatically
if not (from mislabeled rdd partitions in the block manager to   lockups
and internal race conditions)


On Thu, Aug 7, 2014 at 12:59 PM, Dmitriy Lyubimov  wrote:

> ok i guess like you said tests are still failing
>
>
> On Thu, Aug 7, 2014 at 12:18 PM, Dmitriy Lyubimov 
> wrote:
>
>> assuming you are on your local branch named spark-1.0.x with your commit
>> on top of mine current head, please execute
>>  push g...@github.com:dlyubimov/mahout spark-1.0.x
>>
>> this should go thru
>>
>>
>> On Thu, Aug 7, 2014 at 12:13 PM, Pat Ferrel 
>> wrote:
>>
>>> I can't push them back to you so they are here
>>> https://github.com/pferrel/mahout/tree/spark-1.0.x
>>>
>>> —
>>> Reply to this email directly or view it on GitHub
>>> .
>>>
>>
>>
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089741#comment-14089741
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51525161
  
ok, past the failure. Now I have to do some test cleanup. 


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089737#comment-14089737
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51524833
  
Wait, I found the problem. I'm closing the context at the end of the 
driver. I'll fix it.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089713#comment-14089713
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51523570
  
ok i guess like you said tests are still failing


On Thu, Aug 7, 2014 at 12:18 PM, Dmitriy Lyubimov  wrote:

> assuming you are on your local branch named spark-1.0.x with your commit
> on top of mine current head, please execute
>  push g...@github.com:dlyubimov/mahout spark-1.0.x
>
> this should go thru
>
>
> On Thu, Aug 7, 2014 at 12:13 PM, Pat Ferrel 
> wrote:
>
>> I can't push them back to you so they are here
>> https://github.com/pferrel/mahout/tree/spark-1.0.x
>>
>> —
>> Reply to this email directly or view it on GitHub
>> .
>>
>
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089649#comment-14089649
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51518705
  
oh ok, never mind



> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089645#comment-14089645
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51518645
  
assuming you are on your local branch named spark-1.0.x with your commit on
top of mine current head, please execute
 push g...@github.com:dlyubimov/mahout spark-1.0.x

this should go thru


On Thu, Aug 7, 2014 at 12:13 PM, Pat Ferrel 
wrote:

> I can't push them back to you so they are here
> https://github.com/pferrel/mahout/tree/spark-1.0.x
>
> —
> Reply to this email directly or view it on GitHub
> .
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089638#comment-14089638
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51517989
  
I can't push them back to you so they are here
https://github.com/pferrel/mahout/tree/spark-1.0.x


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089609#comment-14089609
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51515920
  
@pferrel where are the changes?


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089599#comment-14089599
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51514939
  
made changes to use the test context in the driver and tests seem to 
complete correctly up to the point they try to read the output file, which does 
contain the correct results.

```
val indicatorLines = mahoutCtx.textFile(OutPath + 
"/indicator-matrix/part-0")
```

The part file is created in the driver using 
```rdd.saveAsTextFile(dest)```. It seems like something was getting done before 
by shutting down the context, maybe I need to close the output file(s) (not 
sure how to do that since it's created inside the saveAsTextFile call)?

```
java.lang.NullPointerException
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1215)
at 
org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1222)
at 
org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:456)
at 
org.apache.mahout.drivers.ItemSimilarityDriverSuite$$anonfun$4.apply$mcV$sp(ItemSimilarityDriverSuite.scala:303)
```


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088538#comment-14088538
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51417124
  
Ok, I just pushed the new tests, maybe they work. Don't laugh it could 
happen.

There are likely to be problems with my calling afterEach and beforeEach 
since their meaning has changed. Fixing this will require mods to the driver 
too I expect and it'll probably be easier for me to do it.

If you are almost ready with this I'll upgrade to Spark 1.0.1 and grab your 
branch.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088526#comment-14088526
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51416019
  
alternatively, you can also just give me a verbal hint what i need to fix,
and i can try to patch to the best of my ability.


On Wed, Aug 6, 2014 at 5:18 PM, Dmitriy Lyubimov  wrote:

>
>
>
> On Wed, Aug 6, 2014 at 4:56 PM, Pat Ferrel 
> wrote:
>
>> Do you want to push this with the "ignore"s and I'll fix them to use the
>> new DistributedSparkSuite as it gets into the master?
>>
>
> No i probably don't want ot merge it with non-working tests. As usual, i
> can add you as collaborator in my account (if i have not yet done so) so
> you could push directly to my source branch of this (so it reflects in the
> PR instantaniously) or you can PR against my spark 1.0.x branch, or you 
can
> just send me a regular git patch with email, whichever works.
>
>> BTW any reason we aren't doing Scala 2.11 since we are upping to Java 7
>> and Spark 1?
>>
>
> The reason Scala is fixed where it is fixed is because it is paired to
> Spark's version of Scala. Migration between major versions of Scala is a
> big deal, for Spark and otherwise. Stuff will not work. Minor version of
> Scala should be generally portable.
>
>
>
>>  —
>> Reply to this email directly or view it on GitHub
>> .
>>
>
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088520#comment-14088520
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user avati commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51415697
  
Only meant FYI (in case someone is planning anything). Of course we have to 
wait for release.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088517#comment-14088517
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51415624
  
sure. there're tons of stuff in progress but we can only use released
artifact as dependencies.


On Wed, Aug 6, 2014 at 5:19 PM, Anand Avati 
wrote:

> Scala 2.11 port of Spark is in progress [
> https://issues.apache.org/jira/browse/SPARK-1812]
>
> —
> Reply to this email directly or view it on GitHub
> .
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088513#comment-14088513
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user avati commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51415479
  
Scala 2.11 port of Spark is in progress 
[https://issues.apache.org/jira/browse/SPARK-1812]


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088506#comment-14088506
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51415419
  
On Wed, Aug 6, 2014 at 4:56 PM, Pat Ferrel  wrote:

> Do you want to push this with the "ignore"s and I'll fix them to use the
> new DistributedSparkSuite as it gets into the master?
>

No i probably don't want ot merge it with non-working tests. As usual, i
can add you as collaborator in my account (if i have not yet done so) so
you could push directly to my source branch of this (so it reflects in the
PR instantaniously) or you can PR against my spark 1.0.x branch, or you can
just send me a regular git patch with email, whichever works.

> BTW any reason we aren't doing Scala 2.11 since we are upping to Java 7
> and Spark 1?
>

The reason Scala is fixed where it is fixed is because it is paired to
Spark's version of Scala. Migration between major versions of Scala is a
big deal, for Spark and otherwise. Stuff will not work. Minor version of
Scala should be generally portable.



>  —
> Reply to this email directly or view it on GitHub
> .
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088486#comment-14088486
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51413783
  
Do you want to push this with the "ignore"s and I'll fix them to use the 
new DistributedSparkSuite as it gets into the master?

BTW any reason we aren't doing Scala 2.11 since we are upping to Java 7 and 
Spark 1? 


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088452#comment-14088452
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51410605
  
> OK so DistributedSparkSuite moved the create context into the beforeAll?

on this branch, yes.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088443#comment-14088443
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51410370
  
OK so DistributedSparkSuite moved the create context into the beforeAll?


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088433#comment-14088433
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51409540
  
On Wed, Aug 6, 2014 at 3:55 PM, Pat Ferrel  wrote:

> Sorry was off the internet during a move (curse you giant nameless cable
> company!)
>
> Anyway these tests are substantially changed in #36
>  but I haven't been able to get
> the new build until now, will check and push 36 first.
>
> As to building and tearing down contexts I'm not helping things. For each
> driver test DistributedSparkSuite in the beforeEach creates a context so I
> use that to start the test. Then the driver I am using needs to start a
> context so for every time I call a driver I precede it with the 
"afterEach"
> call to shut down the context. Then call the driver, then call 
"beforeEach"
> to restore the test context. I also had to tell the driver in a special
> invisible option not to load Mahout jars with a "--dontAddMahoutJars". So
> the context is being built 3 times for every test. but that hasn't 
changed,
> it's always been that way.
>
> We could reuse a single context per test but it would require disabling
> some stuff in the driver along the lines of what I had to do with
> "--dontAddMahoutJars". Since I've already had to do this I don't think it
> would be a big deal to disable a little more. I'll look at it once 36 is
> pushed.
>
> Is there any reason to build the context more than once per suite?
>
Usually, there's not and that's exactly what this branch is moving towards
(note: this PR is not against master but to  to a side branch called
`spark-1.0.x`).
Also that's what they seem to have done in Spark 1.0 as well.

There are sometimes (in my other projects) a need to create a custom
context but not in Mahout codebase.


> Seems like if I disable the context things in the driver we could run all
> tests in a single context, right?
>
Right. This branch has already switched to doing that. All algebra tests
seem to be fine but these tests are failing now. not sure why. seems
functional to me.

> —
> Reply to this email directly or view it on GitHub
> .
>


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088427#comment-14088427
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user pferrel commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51408987
  
Sorry was off the internet during a move (curse you giant nameless cable 
company!)

Anyway these tests are substantially changed in 
https://github.com/apache/mahout/pull/36 but I haven't been able to get the new 
build until now, will check and push 36 first.

As to building and tearing down contexts I'm not helping things. For each 
driver test DistributedSparkSuite in the beforeEach creates a context so I use 
that to start the test. Then the driver I am using needs to start a context so 
for every time I call a driver I precede it with the "afterEach" call to shut 
down the context. Then call the driver, then call "beforeEach" to restore the 
test context. I also had to tell the driver in a special invisible option not 
to load Mahout jars with a "--dontAddMahoutJars". So the context is being built 
3 times for every test. but that hasn't changed, it's always been that way.

We could reuse a single context per test but it would require disabling 
some stuff in the driver along the lines of what I had to do with 
"--dontAddMahoutJars". Since I've already had to do this I don't think it would 
be a big deal to disable a little more. I'll look at it once 36 is pushed.

Is there any reason to build the context more than once per suite? Seems 
like if I disable the context things in the driver we could run all tests in a 
single context, right?


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088160#comment-14088160
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51389335
  
@pferrel perhaps you could look at ItemSimilaritySuite, it doesn't work on 
spark 1.0 here? I disabled the tests for now since they are failing.


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086989#comment-14086989
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51276205
  
also, tests run much slower although cpu remains unsaturated. Something 
about setting up and tearing down local spark context ???


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086975#comment-14086975
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


Github user dlyubimov commented on the pull request:

https://github.com/apache/mahout/pull/40#issuecomment-51274951
  
itemsimilarity driver stuff is failing on this. 

ItemSimilarityDriverSuite:
113754 [ScalaTest-main-running-ItemSimilarityDriverSuite] DEBUG 
org.apache.mahout.sparkbindings.blas.AtA$  - Applying slim A'A.
114171 [ScalaTest-main-running-ItemSimilarityDriverSuite] DEBUG 
org.apache.mahout.sparkbindings.blas.AtB$  - A and B for A'B are not 
identically partitioned, performing inner join.
- ItemSimilarityDriver, non-full-spec CSV *** FAILED ***
  Set(iphone
galaxy:1.7260924347106847,iphone:1.7260924347106847,ipad:0.6795961471815897,nexus:0.6795961471815897,
 surface   surface:4.498681156950466, nexus
iphone:1.7260924347106847,ipad:0.6795961471815897,surface:0.6795961471815897,nexus:0.6795961471815897,galaxy:1.7260924347106847,
 ipad   
galaxy:1.7260924347106847,iphone:1.7260924347106847,ipad:0.6795961471815897,nexus:0.6795961471815897,
 galaxy
galaxy:1.7260924347106847,iphone:1.7260924347106847,ipad:0.6795961471815897,nexus:0.6795961471815897)
 did not equal Set(nexus   
nexus:0.6795961471815897,iphone:1.7260924347106847,ipad:0.6795961471815897,surface:0.6795961471815897,galaxy:1.7260924347106847,
 ipad   
nexus:0.6795961471815897,iphone:1.7260924347106847,ipad:0.6795961471815897,galaxy:1.7260924347106847,
 surface   surface:4.498681156950466, iphone   
nexus:0.6795961471815897,iphone:1.7260924347106847,ipad:0.6795961471815897,galaxy:1.7260924347106847,
 galaxy
nexus:0.6795961471815897,iphone:1.7260924347106847,ipad:0.6795961471815897,galaxy:1.7260924347106847)
 (ItemSimilarityDriverSuite.scala:142) 


the rest seems to pass


> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085509#comment-14085509
 ] 

ASF GitHub Bot commented on MAHOUT-1603:


GitHub user dlyubimov opened a pull request:

https://github.com/apache/mahout/pull/40

MAHOUT-1603: Tweaks for Spark 1.0.x

For folks who (like me) got tired of waiting for Mahout data frames support 
and would like to run Spark SQL expressions directly in the Mahout Spark shell. 

(you can thank me later)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dlyubimov/mahout spark-1.0.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #40


commit 13e909b58eaa89e212415318655dbe82ef982323
Author: Dmitriy Lyubimov 
Date:   2014-08-04T22:00:59Z

Initial migration.




> Tweaks for Spark 1.0.x 
> ---
>
> Key: MAHOUT-1603
> URL: https://issues.apache.org/jira/browse/MAHOUT-1603
> Project: Mahout
>  Issue Type: Task
>Affects Versions: 0.9
>Reporter: Dmitriy Lyubimov
>Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> Tweaks necessary current codebase on top of spark 1.0.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)