Re: Upgrade to spark 1.0.x

2014-08-09 Thread Pat Ferrel
+1

Seems like we ought to keep up to the bleeding edge until the next Mahout 
release, that’s when the pain of upgrade gets spread much wider. In fact if 
Spark gets moved to Scala 2.11 before our release we probably should consider 
upgrading Scala too.

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #910

2014-08-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/910/

--
[...truncated 196 lines...]
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382)
... 31 more
Caused by: svn: E175002: OPTIONS request failed on '/repos/asf/mahout/trunk'
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:775)
... 32 more
Caused by: svn: E175002: timed out waiting for server
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:514)
... 32 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
at 
org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
... 4 more
java.io.IOException: remote file operation failed: 
https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/ws/ at 
hudson.remoting.Channel@4cbb2332:ubuntu-5
at hudson.FilePath.act(FilePath.java:910)
at hudson.FilePath.act(FilePath.java:887)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:936)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:871)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1414)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580)
at hudson.model.Run.execute(Run.java:1676)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: java.io.IOException: Failed to check out 
https://svn.apache.org/repos/asf/mahout/trunk
at 
hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:110)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:161)
at 
hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1030)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1011)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2462)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:328)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS 
/repos/asf/mahout/trunk failed
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:388)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:361)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:707)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:627)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:102)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1020)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:180)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getRevisionNumber(SVNBasicDelegate.java:480)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getLocations(SVNBasicDelegate.java:833)
at 

Re: Upgrade to spark 1.0.x

2014-08-09 Thread Ted Dunning
+1

Until we release a version that uses spark, we should stay with what helps
us.  Once a release goes out then tracking whichever version of spark that
the big distros put out becomes more important.



On Sat, Aug 9, 2014 at 9:57 AM, Pat Ferrel pat.fer...@gmail.com wrote:

 +1

 Seems like we ought to keep up to the bleeding edge until the next Mahout
 release, that’s when the pain of upgrade gets spread much wider. In fact if
 Spark gets moved to Scala 2.11 before our release we probably should
 consider upgrading Scala too.


Re: Upgrade to spark 1.0.x

2014-08-09 Thread Peng Cheng

+1

1.0.0 is recommended. Many release after 1.0.1 has a short test cycle 
and 1.0.2 apparently reverted many fix for causing more serious problem.


On 14-08-09 04:51 PM, Ted Dunning wrote:

+1

Until we release a version that uses spark, we should stay with what helps
us.  Once a release goes out then tracking whichever version of spark that
the big distros put out becomes more important.



On Sat, Aug 9, 2014 at 9:57 AM, Pat Ferrel pat.fer...@gmail.com wrote:


+1

Seems like we ought to keep up to the bleeding edge until the next Mahout
release, that’s when the pain of upgrade gets spread much wider. In fact if
Spark gets moved to Scala 2.11 before our release we probably should
consider upgrading Scala too.




bugs in NetflixDatasetConverter.java ?

2014-08-09 Thread Wei Zhang


Hi,

I was trying to use NetflixDatasetConverter.java to prep training/probing
data for ALSWR.

I have obtained the netflix data.

I got the following exception:

Exception in thread main java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState
(Preconditions.java:161)
at
org.apache.mahout.cf.taste.hadoop.example.als.netflix.NetflixDatasetConverter.main
(NetflixDatasetConverter.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:76)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:607)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke
(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver
(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:76)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:607)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I have looked at the code, I have some doubts:

I am not sure if Netflix's data set has different namings for the same
file.

For the dataset I obtained, I got training_set (directory, cotains 100M
data points), qualifying.txt (2.8M data points), and probe.txt (1.4M data
points).

According to the Netflix readme, training_set is the superset of probe.txt,
qualifying.txt is used by contesters to submit their predictions (thus, no
ground truth is given for qualifying.txt).

There is not such a file called judging.txt, as suggested by
NetflixDatasetConverter.java's help. I gambled on probe.txt being the
judging.txt.

However, I got the above exception.

On a side note, the naming of variable probes is a bit confusing for me,
as it is created by reading the file named qualifying.txt,
and there is an actual file named probe.txt (at least from Netflix)

But what really matters is that at line 133

 float rating = Float.parseFloat(SEPARATOR.split(line)[0]);

This implies judging.txt should contain the actual rating of the (user,
movie), which is not true for the probe.txt (it doesn't contain such rating
information).

Also,
Line 134-136:
Preference pref = probes.get(ratingsProcessed);
Preconditions.checkState(pref.getItemID() == currentMovieID);
ratingsProcessed++;

Seems to imply that qualifying.txt and judging.txt (probe.txt) have the
exactly same (user, movie ) pairs, the difference is judging.txt has the
rating, qualifying.txt doesn't.

This seems to go against what probe.txt contains and the fact that
probe.txt and qualifying.txt shall not overlap.

So what is this judging.txt file that I am supposed to provide and where
can I get it? Could anybody provide some pointers ?

Thanks,

Wei