[ https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337511#comment-14337511 ]
Sean Owen commented on SPARK-4320: ---------------------------------- So these methods all already take a {{Configuration}} or {{JobConf}}, not {{Job}}. The Java API mirrors the Scala API. It sounds like you're asking for a method that takes {{Configuration}} in the first instance. It 'exists' in that you can set these values on the {{Configuration}} object and I assume the idea was to only bother with 1 overload for this completely generic case. Internally it makes a {{Job}}. What does being able to pass a {{Job}} get you that {{Configuration}} doesn't? (There may be, I forget.) > JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object > ---------------------------------------------------------------------------- > > Key: SPARK-4320 > URL: https://issues.apache.org/jira/browse/SPARK-4320 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Spark Core > Reporter: Corey J. Nolet > > I am outputting data to Accumulo using a custom OutputFormat. I have tried > using saveAsNewHadoopFile() and that works- though passing an empty path is a > bit weird. Being that it isn't really a file I'm storing, but rather a > generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() > method, though I'm not at all interested in using the legacy mapred API. > Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think > there should be two ways of calling into this method. Instead of forcing the > user to always set up the Job object explicitly, I'm in the camp of having > the following method signature: > saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : > Class[? extends OutputFormat], conf : Configuration). This way, if I'm > writing spark jobs that are going from Hadoop back into Hadoop, I can > construct my Configuration once. > Perhaps an overloaded method signature could be: > saveAsNewHadoopDataset(job : Job) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org