RDD flatmap to multiple key/value pairs

im281 Fri, 02 Dec 2016 08:06:09 -0800

Here is a MapReduce Example implemented in Java. 
It reads each line of text and for each word in the line of text determines
if it starts 
with an upper case. If so, it creates a key value pair


public class CountUppercaseMapper
    extends Mapper<LongWritable,Text,Text,IntWritable> {
  @Override
  protected void map(LongWritable lineNumber, Text line, Context context)
      throws IOException, InterruptedException {
    for (String word : line.toString().split(" ")) {
      if (Character.isUpperCase(word.charAt(0))) {
        context.write(new Text(word), new IntWritable(1));
      }
    }
  }
}

What is the equivalent spark implementation?

A more use-case specific example below with objects:

In this case, the mapper emits multiple key:value pairs that are
(String,String)

What is the equivalent spark implementation?

import java.io.IOException;

public class IsotopeClusterMapper extends Mapper<LongWritable, 
Text, Text, Text> {

        @Override
        protected void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {
                System.out.println("Inside Isotope Cluster Map !");
                String line = value.toString();

                // Get Isotope clusters here are write out to text
                Detector detector = new Detector();
                
                ArrayList<IsotopeCluster> clusters = detector.GetClusters(line);

                for (int i = 0; i < clusters.size(); i++) {
                        String cKey = detector.WriteClusterKey(clusters.get(i));
                        String cValue = 
detector.WriteClusterValue(clusters.get(i));
                        context.write(new Text(cKey), new Text(cValue));
                }
        }
}








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-flatmap-to-multiple-key-value-pairs-tp28154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RDD flatmap to multiple key/value pairs

Reply via email to