+1 to Sean. Is it possible to rewrite your code to not use SparkContext in RDD. Or why does javaFunctions() need the SparkContext.
On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell < universal.localh...@gmail.com> wrote: > Bang On Sean > > Before sending the issue mail, I was able to remove the compilation error > by making it final but then got the > Caused by: java.io.NotSerializableException: > org.apache.spark.api.java.JavaSparkContext (As you mentioned) > > Now regarding your suggestion of changing the business logic, > 1. *Is the current approach possible if I write the code in Scala ?* I > think probably not but wanted to check with you. > > 2. Brief steps of what the code is doing: > > 1. Get raw sessions data from datatsore (C*) > 2. Process the raw sessions data > 3. Iterate over the processed data(derive from #2) and fetch the > previously aggregated data from store for those rowkeys > Add the values from this batch to previous batch values > 4. Save back the updated values > > * This github gist might explain you more > https://gist.github.com/rssvihla/6577359860858ccb0b33 > <https://gist.github.com/rssvihla/6577359860858ccb0b33> and it does a > similar thing in scala.* > I am trying to achieve a similar thing in Java using Spark Batch with > C* as the datastore. > > I have attached the java code file to provide you some code details. (If I > was not able to explain you the problem so the code will be handy) > > > The reason why I am fetching only selective data (that I will update > later) because Cassanbdra doesn't provide range queries so I thought > fetching complete data might be expensive. > > It will be great if you can share ur thoughts. > > On Thu, Oct 23, 2014 at 1:48 AM, Sean Owen <so...@cloudera.com> wrote: > >> In Java, javaSparkContext would have to be declared final in order for >> it to be accessed inside an inner class like this. But this would >> still not work as the context is not serializable. You should rewrite >> this so you are not attempting to use the Spark context inside an >> RDD. >> >> On Thu, Oct 23, 2014 at 8:46 AM, Localhost shell >> <universal.localh...@gmail.com> wrote: >> > Hey All, >> > >> > I am unable to access objects declared and initialized outside the >> call() >> > method of JavaRDD. >> > >> > In the below code snippet, call() method makes a fetch call to C* but >> since >> > javaSparkContext is defined outside the call method scope so compiler >> give a >> > compilation error. >> > >> > stringRdd.foreach(new VoidFunction<String>() { >> > @Override >> > public void call(String str) throws Exception { >> > JavaRDD<String> vals = >> > javaFunctions(javaSparkContext).cassandraTable("schema", "table", >> > String.class) >> > .select("val"); >> > } >> > }); >> > >> > In other languages I have used closure to do this but not able to >> achieve >> > the same here. >> > >> > Can someone suggest how to achieve this in the current code context? >> > >> > >> > --Unilocal >> > >> > >> > >> > > > > -- > --Unilocal > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >