I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all of

2015-09-09 Thread prachicsa
I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all of these token values. I tried the following way: val ECtokens = for (token <- listofECtokens)

Filtering records for all values of an array in Spark

2015-09-09 Thread prachicsa
I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all of these token values. I tried the following way: val ECtokens = for (token <- listofECtokens)

Loading json data into Pair RDD in Spark using java

2015-09-09 Thread prachicsa
I am very new to Spark. I have a very basic question. I read a file in Spark RDD in which each line is a JSON. I want to make apply groupBy like transformations. So I want to transform each JSON line into a PairRDD. Is there a straight forward way to do it in Java? My json is like this: {

Filtering an rdd depending upon a list of values in Spark

2015-09-09 Thread prachicsa
I want to apply filter based on a list of values in Spark. This is how I get the list: DataFrame df = sqlContext.read().json("../sample.json"); df.groupBy("token").count().show(); Tokens = df.select("token").collect(); for(int i = 0; i < Tokens.length; i++){