Re: Bug in Accumulators...

2014-11-22 Thread lordjoe
I posted several examples in java at http://lordjoesoftware.blogspot.com/ Generally code like this works and I show how to accumulate more complex values. // Make two accumulators using Statistics final AccumulatorInteger totalLetters= ctx.accumulator(0L, ttl);

Re: ReduceByKey but with different functions depending on key

2014-11-18 Thread lordjoe
Map the key value into a key,Tuple2key,value and process that - Also ask the Spark maintainers for a version of keyed operations where the key is passed in as an argument - I run into these cases all the time /** * map a tuple int a key tuple pair to insure subsequent processing has

RE: RDD.aggregate versus accumulables...

2014-11-17 Thread lordjoe
I have been playing with using accumulators (despite the possible error with multiple attempts) These provide a convenient way to get some numbers while still performing business logic. I posted some sample code at http://lordjoesoftware.blogspot.com/. Even if accumulators are not perfect today -

Re: disable log4j for spark-shell

2014-11-10 Thread lordjoe
public static void main(String[] args) throws Exception { System.out.println(Set Log to Warn); Logger rootLogger = Logger.getRootLogger(); rootLogger.setLevel(Level.WARN); ... works for me -- View this message in context:

Re: How to access objects declared and initialized outside the call() method of JavaRDD

2014-10-23 Thread lordjoe
What I have been doing is building a JavaSparkContext the first time it is needed and keeping it as a ThreadLocal - All my code uses SparkUtilities.getCurrentContext(). On a Slave machine you build a new context and don't have to serialize it The code is in a large project at

Re: How to calculate percentiles with Spark?

2014-10-21 Thread lordjoe
A rather more general question is - assume I have an JavaRDDK which is sorted - How can I convert this into a JavaPairRDDInteger,K where the Integer is tie index - 0...N - 1. Easy to do on one machine JavaRDDK values = ... // create here JavaRDDInteger,K positions =