[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Sean Owen (JIRA) Wed, 25 Feb 2015 15:54:20 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337511#comment-14337511
 ]


Sean Owen commented on SPARK-4320:
----------------------------------

So these methods all already take a {{Configuration}} or {{JobConf}}, not 
{{Job}}. The Java API mirrors the Scala API. It sounds like you're asking for a 
method that takes {{Configuration}} in the first instance. It 'exists' in that 
you can set these values on the {{Configuration}} object and I assume the idea 
was to only bother with 1 overload for this completely generic case. Internally 
it makes a {{Job}}. What does being able to pass a {{Job}} get you that 
{{Configuration}} doesn't? (There may be, I forget.)

> JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4320
>                 URL: https://issues.apache.org/jira/browse/SPARK-4320
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Spark Core
>            Reporter: Corey J. Nolet
>
> I am outputting data to Accumulo using a custom OutputFormat. I have tried 
> using saveAsNewHadoopFile() and that works- though passing an empty path is a 
> bit weird. Being that it isn't really a file I'm storing, but rather a  
> generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
> method, though I'm not at all interested in using the legacy mapred API.
> Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
> there should be two ways of calling into this method. Instead of forcing the 
> user to always set up the Job object explicitly, I'm in the camp of having 
> the following method signature:
> saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
> Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
> writing spark jobs that are going from Hadoop back into Hadoop, I can 
> construct my Configuration once.
> Perhaps an overloaded method signature could be:
> saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Reply via email to