Nobody?
If that's not supported already, can please, at least, give me a few hints
on how to implement it?
Thanks!
On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais
adamantios.cor...@gmail.com wrote:
Hi,
I am working with the SVMWithSGD classification algorithm on Spark. It
works fine
Hi,
I'm confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset.
What's the difference between the two?
What's the individual use cases of the two APIs?
Could you describe the internal flows of the two APIs briefly?
I've used Spark several months, but I have no experience on
File takes a filename to write to, while Dataset takes only a JobConf. This
means that Dataset is more general (it can also save to storage systems that
are not file systems, such as key-value stores), but is more annoying to use if
you actually have a file.
Matei
On September 21, 2014 at
If your map() sometimes does not emit an element, then you need to
call flatMap() instead, and emit Some(value) (or any collection of
values) if there is an element to return, or None otherwise.
On Mon, Sep 22, 2014 at 4:50 PM, Praveen Sripati
praveensrip...@gmail.com wrote:
During the map based
The only way I find is to turn it into a list - in effect holding
everything in memory (see code below). Surely Spark has a better way.
Also what about unterminated iterables like a Fibonacci series - (useful
only if limited in some other way )
/**
* make an RDD from an iterable
*
I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the
following exception.
java.io.IOException: sendMessageReliably failed because ack was not
received within 60 sec
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)
Consider this snippet from spark scaladoc
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator
,
scala val accum = sc.accumulator(0)
accum: spark.Accumulator[Int] = 0
scala sc.parallelize(Array(1, 2, 3, 4)).foreach(x = accum += x)
...10/09/29 18:41:08 INFO
'accum' is a reference that can't point to another object because it's
val. However the object it points to can certainly change state. 'val'
has an effect mostly like 'final' in Java.
Although the accum += ... syntax might lead you to believe it's
executing accum = accum + ..., as it would in
These are coming from the parquet library and as far as I know can be
safely ignored.
On Mon, Sep 22, 2014 at 3:27 AM, Andrew Ash and...@andrewash.com wrote:
Hi All,
I'm seeing the below WARNINGs in stdout using Spark SQL in Spark 1.1.0 --
is this warning a known issue? I don't see any open
I'm in a situation where I'm running Spark streaming on a single machine
right now. The plan is to ultimately run it on a cluster, but for the next
couple months it will probably stay on one machine.
I tried to do some digging and I can't find any indication of whether it's
better to run spark as
Thanks for the insight, I didn't realize there was internal object reuse
going on. Is this a mechanism of Scala/Java or is this a mechanism of Spark?
I actually just converted the code to use immutable case classes everywhere,
so it will be a little tricky to test foldByKey(). I'll try to get to
Any thoughts on this?
On Sat, Sep 20, 2014 at 12:16 PM, John Omernik j...@omernik.com wrote:
I am running the Thrift server in SparkSQL, and running it on the node I
compiled spark on. When I run it, tasks only work if they landed on that
node, other executors started on nodes I didn't
The Mesos install guide says this:
To use Mesos from Spark, you need a Spark binary package available in a
place accessible by Mesos, and a Spark driver program configured to connect
to Mesos.
For example, putting it in HDFS or copying it to each node in the same
location should do the trick.
Does feature size 43839 equal to the number of terms? Check the output
dimension of your feature vectorizer and reduce number of partitions
to match the number of physical cores. I saw you set
spark.storage.memoryFaction to 0.0. Maybe it is better to keep the
default. Also please confirm the
Hi ,
I have been using spark shell to execute all SQLs. I am connecting to
Cassandra , converting the data in JSON and then running queries on it, I
am using HiveContext (and not SQLContext) because of explode
functionality in it.
I want to see how can I use Spark SQL CLI for directly running
I thought I had this all figured out, but I'm getting some weird errors now
that I'm attempting to deploy this on production-size servers. It's
complaining that I'm not allocating enough memory to the memoryOverhead values.
I tracked it down to this code:
Greg, if you look carefully, the code is enforcing that the memoryOverhead
be lower (and not higher) than spark.driver.memory.
Thanks,
Nishkam
On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill greg.h...@rackspace.com wrote:
I thought I had this all figured out, but I'm getting some weird errors
now
Gah, ignore me again. I was reading the logic backwards. For some reason it
isn't picking up my SPARK_DRIVER_MEMORY environment variable and is using the
default of 512m. Probably an environmental issue.
Greg
From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Date: Monday,
I watched several presentations from the AMP Camp 2013. Many of the Spark
examples are about extracting information from the tsv format Wikipedia
extraction dataset (around 66 GB). It used to be provided as an open data
set in Amazon EBS, but now it already disappeared.
I really want to use these
Maybe try --driver-memory if you are using spark-submit?
Thanks,
Nishkam
On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill greg.h...@rackspace.com wrote:
Ah, I see. It turns out that my problem is that that comparison is
ignoring SPARK_DRIVER_MEMORY and comparing to the default of 512m. Is that
Hi,
I tried running the HdfsWordCount program in the streaming examples in Spark
1.1.0. I provided a directory in the distributed filesystem as input. This
directory has one text file. However, the only thing that the program keeps
printing is the time - but not the word count. I have not used
Thanks for the info Michael. I see this a few other places in the
Impala+Parquet context but a real quick scan didn't reveal any leads on
this warning. I'll ignore for now.
Andrew
On Mon, Sep 22, 2014 at 12:16 PM, Michael Armbrust mich...@databricks.com
wrote:
These are coming from the
Hi Gaurav,
Can you put hive-site.xml in conf/ and try again?
Thanks,
Yin
On Mon, Sep 22, 2014 at 4:02 PM, gtinside gtins...@gmail.com wrote:
Hi ,
I have been using spark shell to execute all SQLs. I am connecting to
Cassandra , converting the data in JSON and then running queries on it,
This issue is resolved. The file needs to be created after the program has
started to execute.
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-HdfsWordCount-does-not-print-any-output-tp14849p14852.html
Sent from the Apache Spark User List
Really sorry to brother everybody. It is my mistake. The data set is still on
the amazon and can be downloaded. The reason of my failure is that I start
an instance not in U.S., so can not attach the EBS volume.
--
View this message in context:
I am having a spark cluster having some high performance nodes and others are
having commodity specs (lower configuration).
When I configure worker memory and instances in spark-env.sh, it reflects to
all the nodes.
Can I change SPARK_WORKER_MEMORY and SPARK_WORKER_INSTANCES properties per
We are now implementing a matrix multiplication algorithm on Spark, which was
designed in the traditional MPI working way before. It assumes every core in
the grid computes in parallel.
Now in our develop environment, each executor node has 16 cores, and I
assign 16 tasks to each executor node
27 matches
Mail list logo