I need some configuration / debugging recommendations to work around no
space left on device. I am completely new to Spark, but I have some
experience with Hadoop.
I have a task where I read images stored in sequence files from s3://,
process them with a map in scala, and write the result back
Hi
I’m trying to using a specific dir for spark working directory since I have
limited space at /tmp. I tried:
1)
export SPARK_LOCAL_DIRS=“/mnt/data/tmp”
or 2)
SPARK_LOCAL_DIRS=“/mnt/data/tmp” in spark-env.sh
But neither worked, since the output of spark still saying
ERROR
Although nobody answers the Two questions, in my practice, it seems both
are yes.
2014-08-04 19:50 GMT+08:00 Fengyun RAO raofeng...@gmail.com:
object LogParserWrapper {
private val logParser = {
val settings = new ...
val builders = new
new
Thank you for your help. After restructuring my code to Seans input, it
worked without changing Spark context. I now took the same file format just
a bigger file(2.7GB) from s3 to my cluster with 4 c3.xlarge instances and
Spark 1.0.2. Unluckly my task freezes again after a short time. I tried it
I have a related question. With Hadoop, I would do the same thing for
non-serializable objects and setup(). I also had a use case where it
was so expensive to initialize the non-serializable object that I
would make it a static member of the mapper, turn on JVM reuse across
tasks, and then
I've tried uploading a zip file that contains a csv to hdfs and then
read it into spark using spark-shell and the first line is all messed
up. However when i upload a gzip to hdfs and then read it into spark
it does just fine. See output below:
Is there a way to read a zip file as is from hdfs in
Your map-only job should not be shuffling, but if you want to see what's
running, look at the web UI at http://driver:4040. In fact the job should not
even write stuff to disk except inasmuch as the Hadoop S3 library might build
up blocks locally before sending them on.
My guess is that it's
i am wondering if i can use spark in order to search for interesting
featrures/attributes for modelling. In fact I just come from some
introductional sites about vowpal wabbit. i some how like the idea of out of
the core modelling.
well, i have transactional data where customers purchased
Root partitions on AWS instances tend to be small (for example, an m1.large
instance has 2 420 GB drives, but only a 10 GB root partition). Matei's
probably right on about this - just need to be careful where things like the
logs get stored.
From: Matei Zaharia
Hi
I intend on using the same Spark Streaming program for both real time and
batch processing of my time stamped data. However with batch processing all
window based operations would be meaningless because (I assume) the window
is defined by the arrival times of data and it is not possible to
I have a simple JSON dataset as below. How do I query all parts.lock for
the id=1.
JSON: { id: 1, name: A green door, price: 12.50, tags: [home,
green], parts : [ { lock : One lock, key : single key }, {
lock : 2 lock, key : 2 key } ] }
Query: select id,name,price,parts.lockfrom product where
11 matches
Mail list logo