[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

2015-02-25 Thread Corey J. Nolet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337540#comment-14337540
 ] 

Corey J. Nolet commented on SPARK-4320:
---

Sorry- this ticket should have been closed a while ago. I'll go ahead and close 
it now. 

 JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
 

 Key: SPARK-4320
 URL: https://issues.apache.org/jira/browse/SPARK-4320
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Spark Core
Reporter: Corey J. Nolet

 I am outputting data to Accumulo using a custom OutputFormat. I have tried 
 using saveAsNewHadoopFile() and that works- though passing an empty path is a 
 bit weird. Being that it isn't really a file I'm storing, but rather a  
 generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
 method, though I'm not at all interested in using the legacy mapred API.
 Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
 there should be two ways of calling into this method. Instead of forcing the 
 user to always set up the Job object explicitly, I'm in the camp of having 
 the following method signature:
 saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
 Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
 writing spark jobs that are going from Hadoop back into Hadoop, I can 
 construct my Configuration once.
 Perhaps an overloaded method signature could be:
 saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

2015-02-25 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337511#comment-14337511
 ] 

Sean Owen commented on SPARK-4320:
--

So these methods all already take a {{Configuration}} or {{JobConf}}, not 
{{Job}}. The Java API mirrors the Scala API. It sounds like you're asking for a 
method that takes {{Configuration}} in the first instance. It 'exists' in that 
you can set these values on the {{Configuration}} object and I assume the idea 
was to only bother with 1 overload for this completely generic case. Internally 
it makes a {{Job}}. What does being able to pass a {{Job}} get you that 
{{Configuration}} doesn't? (There may be, I forget.)

 JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
 

 Key: SPARK-4320
 URL: https://issues.apache.org/jira/browse/SPARK-4320
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Spark Core
Reporter: Corey J. Nolet

 I am outputting data to Accumulo using a custom OutputFormat. I have tried 
 using saveAsNewHadoopFile() and that works- though passing an empty path is a 
 bit weird. Being that it isn't really a file I'm storing, but rather a  
 generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
 method, though I'm not at all interested in using the legacy mapred API.
 Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
 there should be two ways of calling into this method. Instead of forcing the 
 user to always set up the Job object explicitly, I'm in the camp of having 
 the following method signature:
 saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
 Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
 writing spark jobs that are going from Hadoop back into Hadoop, I can 
 construct my Configuration once.
 Perhaps an overloaded method signature could be:
 saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4320) JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object

2014-11-12 Thread Corey J. Nolet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208084#comment-14208084
 ] 

Corey J. Nolet commented on SPARK-4320:
---

Since this is a simple change, I wanted to work on this myself to get more 
familiar with the code base. Could someone w/ the proper privileges give me 
access to be able to assign this ticket to myself?

 JavaPairRDD should supply a saveAsNewHadoopDataset which takes a Job object 
 

 Key: SPARK-4320
 URL: https://issues.apache.org/jira/browse/SPARK-4320
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Spark Core
Reporter: Corey J. Nolet
 Fix For: 1.1.1, 1.2.0


 I am outputting data to Accumulo using a custom OutputFormat. I have tried 
 using saveAsNewHadoopFile() and that works- though passing an empty path is a 
 bit weird. Being that it isn't really a file I'm storing, but rather a  
 generic Pair dataset, I'd be inclined to use the saveAsHadoopDataset() 
 method, though I'm not at all interested in using the legacy mapred API.
 Perhaps we could supply a saveAsNewHadoopDateset method. Personally, I think 
 there should be two ways of calling into this method. Instead of forcing the 
 user to always set up the Job object explicitly, I'm in the camp of having 
 the following method signature:
 saveAsNewHadoopDataset(keyClass : Class[K], valueClass : Class[V], ofclass : 
 Class[? extends OutputFormat], conf : Configuration). This way, if I'm 
 writing spark jobs that are going from Hadoop back into Hadoop, I can 
 construct my Configuration once.
 Perhaps an overloaded method signature could be:
 saveAsNewHadoopDataset(job : Job)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org