Re: Indexing Support

2015-10-18 Thread Russ Weeks
Distributed R-Trees are not very common. Most "big data" spatial solutions collapse multi-dimensional data into a distributed one-dimensional index using a space-filling curve. Many implementations exist outside of Spark for eg. Hbase or Accumulo. It's simple enough to write a map function that

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Russ Weeks
Hi, David, This is the code that I use to create a JavaPairRDD from an Accumulo table: JavaSparkContext sc = new JavaSparkContext(conf); Job hadoopJob = Job.getInstance(conf,TestSparkJob); job.setInputFormatClass(AccumuloInputFormat.class); AccumuloInputFormat.setZooKeeperInstance(job,

Re: Reading from HBase is too slow

2014-09-29 Thread Russ Weeks
Hi, Tao, When I used newAPIHadoopRDD (Accumulo not HBase) I found that I had to specify executor-memory and num-executors explicitly on the command line or else I didn't get any parallelism across the cluster. I used --executor-memory 3G --num-executors 24 but obviously other parameters will be

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using Accumulo's Key and Value classes, both of which extend Writable. Works like a charm. I use the same InputFormat for all my MR jobs. -Russ On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis lordjoe2...@gmail.com wrote: I

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
No, they do not implement Serializable. There are a couple of places where I've had to do a Text-String conversion but generally it hasn't been a problem. -Russ On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis lordjoe2...@gmail.com wrote: Do your custom Writable classes implement Serializable - I

Re: Accumulo and Spark

2014-09-10 Thread Russ Weeks
It's very straightforward to set up a Hadoop RDD to use AccumuloInputFormat. Something like this will do the trick: private JavaPairRDDKey,Value newAccumuloRDD(JavaSparkContext sc, AgileConf agileConf, String appName, Authorizations auths) throws IOException, AccumuloSecurityException {

Re: Spark + AccumuloInputFormat

2014-09-10 Thread Russ Weeks
down to 30s from 18 minutes and I'm seeing much better utilization of my accumulo tablet servers. -Russ On Tue, Sep 9, 2014 at 5:13 PM, Russ Weeks rwe...@newbrightidea.com wrote: Hi, I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. Not sure if I should be asking

Spark + AccumuloInputFormat

2014-09-09 Thread Russ Weeks
Hi, I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. Not sure if I should be asking on the Spark list or the Accumulo list, but I'll try here. The problem is that the workload to process SQL queries doesn't seem to be distributed across my cluster very well. My Spark