[jira] [Updated] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Josh Rosen (JIRA) Mon, 15 Dec 2014 16:15:25 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh Rosen updated SPARK-4320:
------------------------------
    Fix Version/s:     (was: 1.1.1)
                   1.1.2

> JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4320
>                 URL: https://issues.apache.org/jira/browse/SPARK-4320
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Spark Core
>            Reporter: Corey J. Nolet
>             Fix For: 1.2.0, 1.1.2
>
>
> I am outputting data to Accumulo using a custom OutputFormat. I have tried 
> using saveAsNewHadoopFile() and that works- though passing an empty path is a 
> bit weird. Being that it isn't really a file I'm storing, but rather a  
> generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
> method, though I'm not at all interested in using the legacy mapred API.
> Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
> there should be two ways of calling into this method. Instead of forcing the 
> user to always set up the Job object explicitly, I'm in the camp of having 
> the following method signature:
> saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
> Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
> writing spark jobs that are going from Hadoop back into Hadoop, I can 
> construct my Configuration once.
> Perhaps an overloaded method signature could be:
> saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

Reply via email to