NPE in Parquet

2015-01-20 Thread Alessandro Baretta
All, I strongly suspect this might be caused by a glitch in the communication with Google Cloud Storage where my job is writing to, as this NPE exception shows up fairly randomly. Any ideas? Exception in thread Thread-126 java.lang.NullPointerException at

Re: Job priority

2015-01-11 Thread Alessandro Baretta
, 2015, Alessandro Baretta alexbare...@gmail.com wrote: Cody, Maybe I'm not getting this, but it doesn't look like this page is describing a priority queue scheduling policy. What this section discusses is how resources are shared between queues. A weight-1000 pool will get 1000 times more

Re: Job priority

2015-01-10 Thread Alessandro Baretta
, Mark Hamstra m...@clearstorydata.com wrote: -dev, +user http://spark.apache.org/docs/latest/job-scheduling.html On Sat, Jan 10, 2015 at 4:40 PM, Alessandro Baretta alexbare...@gmail.com wrote: Is it possible to specify a priority level for a job, such that the active jobs might

Re: Job priority

2015-01-10 Thread Alessandro Baretta
wrote: -dev, +user http://spark.apache.org/docs/latest/job-scheduling.html On Sat, Jan 10, 2015 at 4:40 PM, Alessandro Baretta alexbare...@gmail.com wrote: Is it possible to specify a priority level for a job, such that the active jobs might be scheduled in order of priority? Alex

/tmp directory fills up

2015-01-09 Thread Alessandro Baretta
Gents, I'm building spark using the current master branch and deploying in to Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop cluster provisioning tool. bdutils configures Spark with spark.local.dir=/hadoop/spark/tmp, but this option is ignored in combination with

Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
All, I'm using the Spark shell to interact with a small test deployment of Spark, built from the current master branch. I'm processing a dataset comprising a few thousand objects on Google Cloud Storage, split into a half dozen directories. My code constructs an object--let me call it the Dataset

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
@gmail.com wrote: I'm curious if you're seeing the same thing when using bdutil against GCS? I'm wondering if this may be an issue concerning the transfer rate of Spark - Hadoop - GCS Connector - GCS. On Wed Dec 17 2014 at 10:09:17 PM Alessandro Baretta alexbare...@gmail.com wrote: All

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
result in just as fast scans? On Wed Dec 17 2014 at 10:44:45 PM Alessandro Baretta alexbare...@gmail.com wrote: Denny, No, gsutil scans through the listing of the bucket quickly. See the following. alex@hadoop-m:~/split$ time bash -c gsutil ls gs://my-bucket/20141205/csv/*/*/* | wc -l

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
, Dec 17, 2014 at 11:24 PM, Alessandro Baretta alexbare...@gmail.com wrote: Well, what do you suggest I run to test this? But more importantly, what information would this give me? On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee denny.g@gmail.com wrote: Oh, it makes sense of gsutil scans

Re: Still struggling with building documentation

2014-11-11 Thread Alessandro Baretta
, Nov 7, 2014 at 3:39 PM, Alessandro Baretta alexbare...@gmail.com wrote: I finally came to realize that there is a special maven target to build the scaladocs, although arguably a very unintuitive on: mvn verify. So now I have scaladocs for each package, but not for the whole spark project

Still struggling with building documentation

2014-11-07 Thread Alessandro Baretta
I finally came to realize that there is a special maven target to build the scaladocs, although arguably a very unintuitive on: mvn verify. So now I have scaladocs for each package, but not for the whole spark project. Specifically, build/docs/api/scala/index.html is missing. Indeed the whole

Scaladoc

2014-10-30 Thread Alessandro Baretta
How do I build the scaladoc html files from the spark source distribution? Alex Bareta