Answering my own question.

I filtered out the keys from the output file by overriding


  MultipleOutputFormat.generateActualKey


to return the empty string.

--

    Nick


class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat<String, 
String> {

    @Override
    protected String generateFileNameForKeyValue(String key, String value, 
String name) {
        return key;
    }

    @Override
    protected String generateActualKey(String key, String value) {
        return "";
    }

}

________________________________
From: Afshartous, Nick <nafshart...@turbine.com>
Sent: Thursday, May 5, 2016 3:35:17 PM
To: Nicholas Chammas; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD



Thanks, I got the example below working.  Though it writes both the keys and 
values to the output file.

Is there any way to write just the values ?

--

    Nick


String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };

sc.parallelize(Arrays.asList(strings))

        .mapToPair(pairFunction)
        .saveAsHadoopFile("s3://...", String.class, String.class, 
RDDMultipleTextOutputFormat.class);


________________________________
From: Nicholas Chammas <nicholas.cham...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD

You're looking for this discussion: http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames: 
https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick 
<nafshart...@turbine.com<mailto:nafshart...@turbine.com>> wrote:

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name 
corresponds to the key name.


Thanks,

--

    Nick

Reply via email to