[ https://issues.apache.org/jira/browse/MAHOUT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606414#comment-14606414 ]
ASF GitHub Bot commented on MAHOUT-1653: ---------------------------------------- Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/136#discussion_r33514280 --- Diff: spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala --- @@ -165,7 +168,14 @@ class CheckpointedDrmSpark[K: ClassTag]( else if (classOf[Writable].isAssignableFrom(ktag.runtimeClass)) (x: K) => x.asInstanceOf[Writable] else throw new IllegalArgumentException("Do not know how to convert class tag %s to Writable.".format(ktag)) - rdd.saveAsSequenceFile(path) --- End diff -- That is actually using the non-deprecated `.saveAsSequenceFile(path)` I'm just suggesting that we could skip all of the implicit conversions and we explicitly map the RDD to Writables ourselves. Then call `.saveAsSequenceFile(path)` on the RDD of eg. `[IntWritable, VectorWritable]`. This is actually what Spark does in `.saveAsSequenceFile(path)` : https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala#L97 if either a Key or a Value is not a `Writable`, it converts one or the other or both to a Writable using eg.: ```self.map(x => (anyToWritable(x._1), anyToWritable(x._2)))``` and then calls `.saveAsHadoopFile(...)` on the Mapped RDD. If it detects that both are Writables though as would be the case if we mapped them explicitly, it simply calls `.saveAsHadoopFile(...)`. So By mapping them ourselves in `.dfsWrite(...)` we shouldn't incur any additional overhead. Actually we may just be able to call `.saveAsHadoopFile(...)` directly on a mapped => Writable RDD from `.dfsWrite(...)`. > Spark 1.3 > --------- > > Key: MAHOUT-1653 > URL: https://issues.apache.org/jira/browse/MAHOUT-1653 > Project: Mahout > Issue Type: Dependency upgrade > Reporter: Andrew Musselman > Assignee: Andrew Palumbo > Fix For: 0.11.0 > > > Support Spark 1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)