Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick

Answering my own question.


I filtered out the keys from the output file by overriding


  MultipleOutputFormat.generateActualKey


to return the empty string.

--

Nick


class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat<String, 
String> {

@Override
protected String generateFileNameForKeyValue(String key, String value, 
String name) {
return key;
}

@Override
protected String generateActualKey(String key, String value) {
return "";
}

}


From: Afshartous, Nick <nafshart...@turbine.com>
Sent: Thursday, May 5, 2016 3:35:17 PM
To: Nicholas Chammas; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD



Thanks, I got the example below working.  Though it writes both the keys and 
values to the output file.

Is there any way to write just the values ?

--

Nick


String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };

sc.parallelize(Arrays.asList(strings))

.mapToPair(pairFunction)
.saveAsHadoopFile("s3://...", String.class, String.class, 
RDDMultipleTextOutputFormat.class);



From: Nicholas Chammas <nicholas.cham...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD

You're looking for this discussion: http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames: 
https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick 
<nafshart...@turbine.com<mailto:nafshart...@turbine.com>> wrote:

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name 
corresponds to the key name.


Thanks,

--

Nick


Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick

Thanks, I got the example below working.  Though it writes both the keys and 
values to the output file.

Is there any way to write just the values ?

--

Nick


String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" };

sc.parallelize(Arrays.asList(strings))

.mapToPair(pairFunction)
.saveAsHadoopFile("s3://...", String.class, String.class, 
RDDMultipleTextOutputFormat.class);



From: Nicholas Chammas <nicholas.cham...@gmail.com>
Sent: Wednesday, May 4, 2016 4:21:12 PM
To: Afshartous, Nick; user@spark.apache.org
Subject: Re: Writing output of key-value Pair RDD

You're looking for this discussion: http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames: 
https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick 
<nafshart...@turbine.com<mailto:nafshart...@turbine.com>> wrote:

Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name 
corresponds to the key name.


Thanks,

--

Nick


Re: Writing output of key-value Pair RDD

2016-05-04 Thread Nicholas Chammas
You're looking for this discussion:
http://stackoverflow.com/q/23995040/877069

Also, a simpler alternative with DataFrames:
https://github.com/apache/spark/pull/8375#issuecomment-202458325

On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick 
wrote:

> Hi,
>
>
> Is there any way to write out to S3 the values of a f key-value Pair RDD ?
>
>
> I'd like each value of a pair to be written to its own file where the file
> name corresponds to the key name.
>
>
> Thanks,
>
> --
>
> Nick
>


Writing output of key-value Pair RDD

2016-05-04 Thread Afshartous, Nick
Hi,


Is there any way to write out to S3 the values of a f key-value Pair RDD ?


I'd like each value of a pair to be written to its own file where the file name 
corresponds to the key name.


Thanks,

--

Nick