Filtering an rdd depending upon a list of values in Spark

2015-09-09 Thread prachicsa
I want to apply filter based on a list of values in Spark. This is how I get the list: DataFrame df = sqlContext.read().json("../sample.json"); df.groupBy("token").count().show(); Tokens = df.select("token").collect(); for(int i = 0; i < Tokens.length; i++){

Re: Filtering an rdd depending upon a list of values in Spark

2015-09-09 Thread Ted Yu
Take a look at the following methods: * Filters rows using the given condition. * {{{ * // The following are equivalent: * peopleDf.filter($"age" > 15) * peopleDf.where($"age" > 15) * }}} * @group dfops * @since 1.3.0 */ def filter(condition: Column): DataFrame