Re: Upgrade to spark 1.0.x
+1 Seems like we ought to keep up to the bleeding edge until the next Mahout release, that’s when the pain of upgrade gets spread much wider. In fact if Spark gets moved to Scala 2.11 before our release we probably should consider upgrading Scala too.
Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #910
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/910/ -- [...truncated 196 lines...] at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382) ... 31 more Caused by: svn: E175002: OPTIONS request failed on '/repos/asf/mahout/trunk' at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:775) ... 32 more Caused by: svn: E175002: timed out waiting for server at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:514) ... 32 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618) at org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ... 4 more java.io.IOException: remote file operation failed: https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/ws/ at hudson.remoting.Channel@4cbb2332:ubuntu-5 at hudson.FilePath.act(FilePath.java:910) at hudson.FilePath.act(FilePath.java:887) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:936) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:871) at hudson.model.AbstractProject.checkout(AbstractProject.java:1414) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580) at hudson.model.Run.execute(Run.java:1676) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Caused by: java.io.IOException: Failed to check out https://svn.apache.org/repos/asf/mahout/trunk at hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:110) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1011) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2462) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/mahout/trunk failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:180) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getRevisionNumber(SVNBasicDelegate.java:480) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.getLocations(SVNBasicDelegate.java:833) at
Re: Upgrade to spark 1.0.x
+1 Until we release a version that uses spark, we should stay with what helps us. Once a release goes out then tracking whichever version of spark that the big distros put out becomes more important. On Sat, Aug 9, 2014 at 9:57 AM, Pat Ferrel pat.fer...@gmail.com wrote: +1 Seems like we ought to keep up to the bleeding edge until the next Mahout release, that’s when the pain of upgrade gets spread much wider. In fact if Spark gets moved to Scala 2.11 before our release we probably should consider upgrading Scala too.
Re: Upgrade to spark 1.0.x
+1 1.0.0 is recommended. Many release after 1.0.1 has a short test cycle and 1.0.2 apparently reverted many fix for causing more serious problem. On 14-08-09 04:51 PM, Ted Dunning wrote: +1 Until we release a version that uses spark, we should stay with what helps us. Once a release goes out then tracking whichever version of spark that the big distros put out becomes more important. On Sat, Aug 9, 2014 at 9:57 AM, Pat Ferrel pat.fer...@gmail.com wrote: +1 Seems like we ought to keep up to the bleeding edge until the next Mahout release, that’s when the pain of upgrade gets spread much wider. In fact if Spark gets moved to Scala 2.11 before our release we probably should consider upgrading Scala too.
bugs in NetflixDatasetConverter.java ?
Hi, I was trying to use NetflixDatasetConverter.java to prep training/probing data for ALSWR. I have obtained the netflix data. I got the following exception: Exception in thread main java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState (Preconditions.java:161) at org.apache.mahout.cf.taste.hadoop.example.als.netflix.NetflixDatasetConverter.main (NetflixDatasetConverter.java:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:76) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:607) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke (ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver (ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:76) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:607) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I have looked at the code, I have some doubts: I am not sure if Netflix's data set has different namings for the same file. For the dataset I obtained, I got training_set (directory, cotains 100M data points), qualifying.txt (2.8M data points), and probe.txt (1.4M data points). According to the Netflix readme, training_set is the superset of probe.txt, qualifying.txt is used by contesters to submit their predictions (thus, no ground truth is given for qualifying.txt). There is not such a file called judging.txt, as suggested by NetflixDatasetConverter.java's help. I gambled on probe.txt being the judging.txt. However, I got the above exception. On a side note, the naming of variable probes is a bit confusing for me, as it is created by reading the file named qualifying.txt, and there is an actual file named probe.txt (at least from Netflix) But what really matters is that at line 133 float rating = Float.parseFloat(SEPARATOR.split(line)[0]); This implies judging.txt should contain the actual rating of the (user, movie), which is not true for the probe.txt (it doesn't contain such rating information). Also, Line 134-136: Preference pref = probes.get(ratingsProcessed); Preconditions.checkState(pref.getItemID() == currentMovieID); ratingsProcessed++; Seems to imply that qualifying.txt and judging.txt (probe.txt) have the exactly same (user, movie ) pairs, the difference is judging.txt has the rating, qualifying.txt doesn't. This seems to go against what probe.txt contains and the fact that probe.txt and qualifying.txt shall not overlap. So what is this judging.txt file that I am supposed to provide and where can I get it? Could anybody provide some pointers ? Thanks, Wei