I want to apply filter based on a list of values in Spark. This is how I get
the list:

DataFrame df = sqlContext.read().json("../sample.json");

        df.groupBy("token").count().show();

        Tokens = df.select("token").collect();
        for(int i = 0; i < Tokens.length; i++){
            System.out.println(Tokens[i].get(0)); // Need to apply filter
for Token[i].get(0)
        }

Rdd on which I want apply filter is this:

JavaRDD<String> file = context.textFile(args[0]);

I figured out a way to filter in java:

private static final Function<String, Boolean> Filter =
            new Function<String, Boolean>() {
                @Override
                public Boolean call(String s) {
                    return s.contains("Set");
                }
            };

How do I go about it?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Filtering-an-rdd-depending-upon-a-list-of-values-in-Spark-tp24631.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to