Yes, both of these are derived from the same source, and this source
includes the driver. In other words, if you submit a job with 10 executors
you will get back 11 for both statuses.
2014-07-28 15:40 GMT-07:00 Sung Hwan Chung coded...@cs.stanford.edu:
Do getExecutorStorageStatus and
Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?
Were you able to manually write one record to HBase with the serialize
function? Hardcode and test it ?
From: jianshi.hu...@gmail.com
Date: Fri, 25 Jul 2014
you want to configure it?
Andrew
On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson mar...@skimlinks.com
wrote:
We are having difficulties configuring Spark, partly because we still
don't understand some key concepts. For instance, how many executors are
there per machine in standalone mode
driver.
Andrew
2014-07-23 10:40 GMT-07:00 didi did...@gmail.com:
Hi all
I guess the problem has something to do with the fact i submit the job to
remote location
I submit from OracleVM running ubuntu and suspect some NAT issues maybe?
akka tcp tries this address as follows from the STDERR
Hi Eric,
Have you checked the executor logs? It is possible they died because of
some exception, and the message you see is just a side effect.
Andrew
2014-07-23 8:27 GMT-07:00 Eric Friedman eric.d.fried...@gmail.com:
I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc
in spark
should not be done through any config or environment variable that
references java opts.
Andrew
2014-07-23 1:04 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
Sorry for taking this topic again,still I am confused on this.
I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops
-submit. The equivalent also holds for executor memory (i.e.
--executor-memory). That way you don't have to wrangle with the millions
of overlapping configs / environment variables for all the deploy modes.
-Andrew
2014-07-23 4:18 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
I figured out my problem
Hi Earthson,
Is your problem resolved? The way you submit your application looks alright
to me; spark-submit should be able to parse the combination of --master and
--deploy-mode correctly. I suspect you might have hard-coded yarn-cluster
or something in your application.
Andrew
2014-07-22 1
/ephemeral-hdfs.
Andrew
2014-07-22 7:07 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
Where can I find the version of Hadoop my cluster is using? I launched my
ec2 cluster using the spark-ec2 script with the --hadoop-major-version=2
option. However, the folder hadoop-native/lib in the master node only
for Hive-on-Spark now.
On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee alee...@hotmail.com wrote:
Hive and Hadoop are using an older version of guava libraries (11.0.1) where
Spark Hive is using guava 14.0.1+.
The community isn't willing to downgrade to 11.0.1 which is the current
version
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
work due to the following 2 libraries which are not consistent with Hive 0.12
and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common
practice, they should be consistent to work inter-operable).
is deprecated)
- add --master yarn-cluster in your spark-submit command
Another worrying thing is the warning from your logs:
14/07/21 22:38:42 WARN spark.SparkConf: null jar passed to SparkContext
constructor
How are you creating your SparkContext?
Andrew
2014-07-21 7:47 GMT-07:00 Sam Liu
workaround for this issue, but you might try to reduce the
number of concurrently running tasks (partitions) to avoid emitting too
many events. The root cause of the listener queue taking too much time to
process events is recorded in SPARK-2316, which we also intend to fix by
Spark 1.1.
Andrew
.
Andrew
2014-07-21 8:37 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
I am using pyspark and have persisted a list of rdds within a function, but
I don't have a reference to them anymore. The RDD's are listed in the UI,
under the Storage tab, and they have names associated to them (e.g. 4
line.
Andrew
2014-07-21 10:01 GMT-07:00 Nick R. Katsipoulakis kat...@cs.pitt.edu:
Thank you Abel,
It seems that your advice worked. Even though I receive a message that it
is a deprecated way of defining Spark Memory (the system prompts that I
should set spark.driver.memory), the memory
, it seems that you set your log level to WARN. The cause is most
probably because the cache is not big enough, but setting the log level to
INFO will provide you with more information on the exact sizes that are
being used by the storage and the blocks).
Andrew
2014-07-19 13:01 GMT-07:00 rindra
I'm not sure if you guys ever picked a preferred method for doing this, but
I just encountered it and came up with this method that's working
reasonably well on a small dataset. It should be quite easily
generalizable to non-String RDDs.
def addRowNumber(r: RDD[String]): RDD[Tuple2[Long,String]]
metrics are aggregated over the entire duration of the task (i.e. within
each task you can spill multiple times).
Andrew
2014-07-18 4:09 GMT-07:00 Sébastien Rainville sebastienrainvi...@gmail.com
:
Hi,
in the Spark UI, one of the metrics is shuffle spill (memory). What is
it exactly? Spilling
to be some inconsistency or missing pieces in the logs you
posted. After an executor says driver disassociated, what happens in the
driver logs? Is an exception thrown or something?
It would be useful if you could also post your conf/spark-env.sh.
Andrew
2014-07-17 14:11 GMT-07:00 Marcelo Vanzin
thing to
check is whether the node from which you launch spark submit can access the
internal address of the master (and port 7077). One quick way to verify
that is to attempt a telnet into it.
Let me know if you find anything.
Andrew
2014-07-17 15:57 GMT-07:00 ranjanp piyush_ran...@hotmail.com:
Hi
Hi Chen,
spark.executor.extraJavaOptions is introduced in Spark 1.0, not in Spark
0.9. You need to
export SPARK_JAVA_OPTS= -Dspark.config1=value1 -Dspark.config2=value2
in conf/spark-env.sh.
Let me know if that works.
Andrew
2014-07-17 18:15 GMT-07:00 Tathagata Das tathagata.das1
HDFS. Try removing
all old jars from your .sparkStaging directory and try again?
Let me know if that does the job,
Andrew
2014-07-16 23:42 GMT-07:00 cmti95035 cmti95...@gmail.com:
They're all the same version. Actually even without the --jars parameter
it
got the same error. Looks like
still work (I just tried this on my own EC2 cluster). By the way,
SPARK_MASTER is actually deprecated. Instead, please use bin/spark-submit
--master [your master].
Andrew
2014-07-16 23:46 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com:
You can try the following in the spark-shell:
1. Run
SPARK_JAVA_OPTS is deprecated as of 1.0)
2014-07-17 21:08 GMT-07:00 Chen Song chen.song...@gmail.com:
Thanks Andrew.
Say that I want to turn on CMS gc for each worker.
All I need to do is add the following line to conf/spark-env.sh on node
where I submit the application.
-XX
Hello community,
tried to run storm app on yarn, using cloudera hadoop and spark distro
(from http://archive.cloudera.com/cdh5/cdh/5)
hadoop version: hadoop-2.3.0-cdh5.0.3.tar.gz
spark version: spark-0.9.0-cdh5.0.3.tar.gz
DEFAULT_YARN_APPLICATION_CLASSPATH is part of hadoop-api-yarn jar ...
thanks Sandzy, no CM-managed cluster, straight from cloudera tar (
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.3.tar.gz)
trying your suggestion immediate! thanks so much for taking time..
On Wed, Jul 16, 2014 at 1:10 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Andrew
...@cloudera.com wrote:
Andrew,
Are you running on a CM-managed cluster? I just checked, and there is a
bug here (fixed in 1.0), but it's avoided by having
yarn.application.classpath defined in your yarn-site.xml.
-Sandy
On Wed, Jul 16, 2014 at 10:02 AM, Sean Owen so...@cloudera.com wrote
/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*/value
/property
On Wed, Jul 16, 2014 at 1:47 PM, Andrew Milkowski amgm2...@gmail.com
wrote:
Sandy, perfect! you saved me tons of time! added this in yarn-site.xml job
ran
In general it would be nice to be able to configure replication on a
per-job basis. Is there a way to do that without changing the config
values in the Hadoop conf/ directory between jobs? Maybe by modifying
OutputFormats or the JobConf ?
On Mon, Jul 14, 2014 at 11:12 PM, Matei Zaharia
Hi Nan,
Great digging in -- that makes sense to me for when a job is producing some
output handled by Spark like a .count or .distinct or similar.
For the other part of the question, I'm also interested in side effects
like an HDFS disk write. If one task is writing to an HDFS path and
another
Yes, the documentation is actually a little outdated. We will get around to
fix it shortly. Please use --driver-cores or --executor-cores instead.
2014-07-14 19:10 GMT-07:00 cjwang c...@cjwang.us:
Neither do they work in new 1.0.1 either
--
View this message in context:
As mentioned, deprecated in Spark 1.0+.
Try to use the --driver-class-path:
./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar
Don't use glob *, specify the JAR one by one with colon.
Date: Wed, 9 Jul 2014 13:45:07 -0700
From: kat...@cs.pitt.edu
Subject: SPARK_CLASSPATH Warning
Ok, I found it on JIRA SPARK-2390:
https://issues.apache.org/jira/browse/SPARK-2390
So it looks like this is a known issue.
From: alee...@hotmail.com
To: user@spark.apache.org
Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file
option?
Date: Tue, 8 Jul 2014 15:17:00
-submit (or
spark-shell, which calls spark-submit) with the --verbose flag.
Let me know if this fixes it. I will get to fixing the root problem soon.
Andrew
2014-07-10 18:43 GMT-07:00 cjwang c...@cjwang.us:
Andrew,
Thanks for replying. I did the following and the result was still the
same.
1
?
Andrew
2014-07-10 10:17 GMT-07:00 Aris Vlasakakis a...@vlasakakis.com:
Thank you very much Yana for replying!
So right now the set up is a single-node machine which is my cluster,
and YES you are right my submitting laptop has a different path to the
spark-1.0.0 installation than the cluster
Yes, there are a few bugs in the UI in the event of a node failure.
The duplicated stages in both the active and completed tables should be
fixed by this PR: https://github.com/apache/spark/pull/1262
The fact that the progress bar on the stages page displays an overflow
(e.g. 5/4) is still an
.mbox/%3cCAMJOb8mYTzxrHWcaDOnVoOTw1TFrd9kJjOyj1=nkgmsk5vs...@mail.gmail.com%3e
Andrew
2014-07-10 1:57 GMT-07:00 cjwang c...@cjwang.us:
Not sure that was what I want. I tried to run Spark Shell on a machine
other
than the master and got the same error. The 192 was suppose to be a
simple
I don't see why using SparkSubmit.scala as your entry point would be any
different, because all that does is invoke the main class of Client.scala
(e.g. for Yarn) after setting up all the class paths and configuration
options. (Though I haven't tried this myself)
2014-07-09 9:40 GMT-07:00 Ron
Here's the most updated version of the same page:
http://spark.apache.org/docs/latest/job-scheduling
2014-07-08 12:44 GMT-07:00 Sujeet Varakhedi svarakh...@gopivotal.com:
This is a good start:
http://www.eecs.berkeley.edu/~tdas/spark_docs/job-scheduling.html
On Tue, Jul 8, 2014 at 9:11
Build: Spark 1.0.0 rc11 (git commit tag:
2f1dc868e5714882cf40d2633fb66772baf34789)
Hi All,
When I enabled the spark-defaults.conf for the eventLog, spark-shell broke
while spark-submit works.
I'm trying to create a separate directory per user to keep track with their own
Spark job event
It seems that your driver (which I'm assuming you launched on the master
node) can now connect to the Master, but your executors cannot. Did you
make sure that all nodes have the same conf/spark-defaults.conf,
conf/spark-env.sh, and conf/slaves? It would be good if you can post the
stderr of the
or the --master parameter to
spark-submit.
We will update the documentation shortly. Thanks for letting us know.
Andrew
2014-07-08 16:29 GMT-07:00 Mikhail Strebkov streb...@gmail.com:
Hi! I've been using Spark compiled from 1.0 branch at some point (~2 month
ago). The setup is a standalone cluster with 4
Others have also asked for this on the mailing list, and hence there's a
related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur
brings up a good point in that any current implementation of in-memory
shuffles will compete with application RDD blocks. I think we should
definitely add
the red text is because it appears only on the driver
containers, not the executor containers. This is because SparkUI belongs to
the SparkContext, which only exists on the driver.
Andrew
2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:
Hi guys,
Not sure if you have similar issues. Did
the redirect error has
little to do with Spark itself, but more to do with how you set up the
cluster. I have actually run into this myself, but I haven't found a
workaround. Let me know if you find anything.
2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com:
As Andrew explained, the port
Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say
it is best practice since it requires a lot of customer experience and
feedback, but from a development and operating stand point, it will be great to
separate the YARN container logs with the Spark
LZO-compressed data, so I know there's not a version issue.
Andrew
On Sun, Jul 6, 2014 at 12:02 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
I’ve been reading through several pages trying to figure out how to set up
my spark-ec2 cluster to read LZO-compressed files from S3
= ((k._1, k._2), k.)3)))
Note that when using .join though, that is an inner join so you only get
results from (id1, id2) pairs that have BOTH a score1 and a score2.
Andrew
On Wed, Jul 2, 2014 at 5:12 PM, Sameer Tilak ssti...@live.com wrote:
Hi everyone,
Is it possible to join RDDs using
Hi Christophe, another Andrew speaking.
Your configuration looks fine to me. From the stack trace it seems that we
are in fact closing the file system pre-maturely elsewhere in the system,
such that when it tries to write the APPLICATION_COMPLETE file it throws
the exception you see. This does
Hi Konstatin,
We use hadoop as a library in a few places in Spark. I wonder why the path
includes null though.
Could you provide the full stack trace?
Andrew
2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all,
I'm trying to run some transformation
executor will also die of the same
problem.
Best,
Andrew
2014-07-02 6:22 GMT-07:00 Yana Kadiyska yana.kadiy...@gmail.com:
Can you elaborate why You need to configure the spark.shuffle.spill
true again in the config -- the default for spark.shuffle.spill is
set to true according to the
doc
your null keys before
passing your key value pairs to a combine operator (e.g. groupBy,
reduceBy). For instance, rdd.map { case (k, v) = if (k == null)
(SPECIAL_VALUE, v) else (k, v) }.
Best,
Andrew
2014-07-02 10:22 GMT-07:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all
Hi Christophe,
Make sure you have 3 slashes in the hdfs scheme.
e.g.
hdfs:///server_name:9000/user/user_name/spark-events
and in the spark-defaults.conf as
well.spark.eventLog.dir=hdfs:///server_name:9000/user/user_name/spark-events
Date: Thu, 19 Jun 2014 11:18:51 +0200
From:
RDDs they are most interested
in, so it makes sense to give them control over caching behavior.
Best,
Andrew
2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.com tomsheep...@gmail.com:
Hi all,
I have a newbie question about StorageLevel of spark. I came up with
these sentences in spark documents
Hi Sophia, did you ever resolve this?
A common cause for not giving resources to the job is that the RM cannot
communicate with the workers.
This itself has many possible causes. Do you have a full stack trace from
the logs?
Andrew
2014-06-13 0:46 GMT-07:00 Sophia sln-1...@163.com
://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this
out, and we will get to fixing these shortly.
Best,
Andrew
2014-06-20 6:06 GMT-07:00 Gino Bustelo lbust...@gmail.com:
I've found that the jar will be copied to the worker from hdfs fine, but
it is not added to the spark context
I checked the source code, it looks like it was re-added back based on JIRA
SPARK-1588, but I don't know if there's any test case associated with this?
SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Sandy Ryza sa...@cloudera.com
2014-04-29 12:54:02 -0700
Ah never mind. The 0.0.0.0 is for the UI, not for Master, which uses the
output of the hostname command. But yes, long answer short, go to the web
UI and use that URL.
2014-06-23 11:13 GMT-07:00 Andrew Or and...@databricks.com:
Hm, spark://localhost:7077 should work, because the standalone
Sounds good. Mingyu and I are waiting on 1.0.1 to get the fix for the
below issues without running a patched version of Spark:
https://issues.apache.org/jira/browse/SPARK-1935 -- commons-codec version
conflicts for client applications
https://issues.apache.org/jira/browse/SPARK-2043 --
to the SparkContext (see
http://spark.apache.org/docs/latest/configuration.html#spark-properties).
Andrew
2014-06-18 22:21 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
I have a doubt regarding the options in spark-env.sh. I set the following
values in the file in master and 2 workers
will be done through spark-submit, so you may miss out on relevant new
features or bug fixes.
Andrew
2014-06-19 7:41 GMT-07:00 Koert Kuipers ko...@tresata.com:
still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark
standalone.
for example if i have a akka timeout setting that i
if that does the job.
Andrew
2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com:
I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN +
HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
Now am trying to run the example Spark job . (In Yarn-cluster
(Also, an easier workaround is to simply submit the application from within
your
cluster, thus saving you all the manual labor of reconfiguring everything
to use
public hostnames. This may or may not be applicable to your use case.)
2014-06-19 14:04 GMT-07:00 Andrew Or and...@databricks.com
Hi All,
Have anyone ran into the same problem? By looking at the source code in
official release (rc11),this property settings is set to false by default,
however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it
to fill up the disk pretty fast since SparkContext deploys
Forgot to mention that I am using spark-submit to submit jobs, and a verbose
mode print out looks like this with the SparkPi examples.The .sparkStaging
won't be deleted. My thoughts is that this should be part of the staging and
should be cleaned up as well when sc gets terminated.
What's the advantage of Apache maintaining the brew installer vs users?
Apache handling it means more work on this dev team, but probably a better
experience for brew users. Just wanted to weigh pros/cons before
committing to support this installation method.
Andrew
On Wed, Jun 18, 2014 at 5
Wait, so the file only has four lines and the job running out of heap
space? Can you share the code you're running that does the processing?
I'd guess that you're doing some intense processing on every line but just
writing parsed case classes back to disk sounds very lightweight.
I
On Wed,
Gerard,
Strings in particular are very inefficient because they're stored in a
two-byte format by the JVM. If you use the Kryo serializer and have use
StorageLevel.MEMORY_ONLY_SER then Kryo stores Strings in UTF8, which for
ASCII-like strings will take half the space.
Andrew
On Tue, Jun 17
Standalone-client mode is not officially supported at the moment. For
standalone-cluster and yarn-client modes, however, they should work.
For both modes, are you running spark-submit from within the cluster, or
outside of it? If the latter, could you try running it from within the
cluster and
How long does it get stuck for? This is a common sign for the OS thrashing
due to out of memory exceptions. If you keep it running longer, does it
throw an error?
Depending on how large your other RDD is (and your join operation), memory
pressure may or may not be the problem at all. It could be
In Spark you can use the normal globs supported by Hadoop's FileSystem,
which are documented here:
http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:
Are you referring to accessing a SparkUI for an application that has
finished? First you need to enable event logging while the application is
still running. In Spark 1.0, you set this by adding a line to
$SPARK_HOME/conf/spark-defaults.conf:
spark.eventLog.enabled true
Other than that, the
Hi Wang Hao,
This is not removed. We moved it here:
http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
If you're building with SBT, and you don't specify the
SPARK_HADOOP_VERSION, then it defaults to 1.0.4.
Andrew
2014-06-12 6:24 GMT-07:00 Hao Wang wh.s...@gmail.com
Not sure if this is what you're looking for, but have you looked at java's
ProcessBuilder? You can do something like
for (line - lines) {
val command = line.split( ) // You may need to deal with quoted strings
val process = new ProcessBuilder(command)
// redirect output of process to main
/201406.mbox/%3ccamjob8mr1+ias-sldz_rfrke_na2uubnmhrac4nukqyqnun...@mail.gmail.com%3e
As described in the link, the last resort is to try building your assembly
jar with JAVA_HOME set to Java 6. This usually fixes the problem (more
details in the link provided).
Cheers,
Andrew
2014-06-10 6:35 GMT
Can you try file:/root/spark_log?
2014-06-10 19:22 GMT-07:00 zhen z...@latrobe.edu.au:
I checked the permission on root and it is the following:
drwxr-xr-x 20 root root 4096 Jun 11 01:05 root
So anyway, I changed to use /tmp/spark_log instead and this time I made
sure
that all
No, I meant pass the path to the history server start script.
2014-06-10 19:33 GMT-07:00 zhen z...@latrobe.edu.au:
Sure here it is:
drwxrwxrwx 2 1000 root 4096 Jun 11 01:05 spark_logs
Zhen
--
View this message in context:
Hi Jacob,
The port configuration docs that we worked on together are now available
at:
http://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
Thanks for the help!
Andrew
On Wed, May 28, 2014 at 3:21 PM, Jacob Eisinger jeis...@us.ibm.com wrote:
Howdy
.
Andrew
On Thu, Jun 5, 2014 at 2:15 PM, Oleg Proudnikov oleg.proudni...@gmail.com
wrote:
Hi All,
Please help me set Executor JVM memory size. I am using Spark shell and it
appears that the executors are started with a predefined JVM heap of 512m
as soon as Spark shell starts. How can I change
for why
SPARK_MEM was deprecated. See https://github.com/apache/spark/pull/99
On Thu, Jun 5, 2014 at 2:37 PM, Oleg Proudnikov oleg.proudni...@gmail.com
wrote:
Thank you, Andrew,
I am using Spark 0.9.1 and tried your approach like this:
bin/spark-shell --driver-java-options
think
some fixes in spilling landed.
Andrew
On Thu, Jun 5, 2014 at 3:05 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hey Ajay, thanks for reporting this. There was indeed a bug, specifically
in the way join tasks spill to disk (which happened when you had more
concurrent tasks competing
as the work that Aaron mentioned is happening, I think he might be
referring to the discussion and code surrounding
https://issues.apache.org/jira/browse/SPARK-983
Cheers!
Andrew
On Thu, Jun 5, 2014 at 5:16 PM, Roger Hoover roger.hoo...@gmail.com wrote:
I think it would very handy to be able
Just curious, what do you want your custom RDD to do that the normal ones
don't?
On Wed, Jun 4, 2014 at 6:30 AM, bluejoe2008 bluejoe2...@gmail.com wrote:
hi, folks,
is there any easier way to define a custom RDD in Java?
I am wondering if I have to define a new java class which
can at least confirm that the setting is making it to the application
with that webui.
Cheers,
Andrew
On Wed, Jun 4, 2014 at 3:48 AM, nilmish nilmish@gmail.com wrote:
The error is resolved. I was using a comparator which was not serialised
because of which it was throwing the error.
I have
You can change storage level on an individual RDD with
.persist(StorageLevel.MEMORY_AND_DISK), but I don't think you can change
what the default persistency level is for RDDs.
Andrew
On Wed, Jun 4, 2014 at 1:52 AM, Salih Kardan karda...@gmail.com wrote:
Hi
I'm using Spark 0.9.1 and Shark
When you group by IP address in step 1 to this:
(ip1,(lat1,lon1),(lat2,lon2))
(ip2,(lat3,lon3),(lat4,lat5))
How many lat/lon locations do you expect for each IP address? avg and max
are interesting.
Andrew
On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov oleg.proudni
:
https://www.mail-archive.com/reviews@spark.apache.org/msg08223.html
I've tested that zipped modules can as least be imported via
zipimport.
Any ideas?
-Simon
On Mon, Jun 2, 2014 at 11:50 AM, Andrew Or and...@databricks.com
wrote:
Hi Simon,
You shouldn't have
Your applications are probably not connecting to your existing cluster and
instead running in local mode. Are you passing the master URL to the
SparkPi application?
Andrew
On Tue, Jun 3, 2014 at 12:30 AM, MrAsanjar . afsan...@gmail.com wrote:
- HI all,
- Application running
Hmm that sounds like it could be done in a custom OutputFormat, but I'm not
familiar enough with custom OutputFormats to say that's the right thing to
do.
On Tue, Jun 3, 2014 at 10:23 AM, Gerard Maas gerard.m...@gmail.com wrote:
Hi Andrew,
Thanks for your answer.
The reason of the question
Hi Mayur, is that closure cleaning a JVM issue or a Spark issue? I'm used
to thinking of closure cleaner as something Spark built. Do you have
somewhere I can read more about this?
On Tue, Jun 3, 2014 at 12:47 PM, Mayur Rustagi mayur.rust...@gmail.com
wrote:
So are you using Java 7 or 8.
7
, the steps outlined there
are quite useful.
Let me know if you get it working (or not).
Cheers,
Andrew
2014-06-02 17:24 GMT+02:00 Xu (Simon) Chen xche...@gmail.com:
Hi folks,
I have a weird problem when using pyspark with yarn. I started ipython as
follows:
IPYTHON=1 ./pyspark --master
/issues/171
Pull request that adds an AvroSerializer to Chill:
https://github.com/twitter/chill/pull/172
Issue on the old Spark tracker:
https://spark-project.atlassian.net/browse/SPARK-746
Matt can you comment if this change helps you streamline that gist even
further?
Andrew
On Tue, May 27, 2014
get you started?
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
Cheers,
Andrew
On Tue, May 27, 2014 at 4:10 AM, Carter gyz...@hotmail.com wrote:
Any suggestion is very much appreciated.
--
View this message in context:
http
Hi Roger,
This was due to a bug in the Spark shell code, and is fixed in the latest
master (and RC11). Here is the commit that fixed it:
https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205.
Try it now and it should work. :)
Andrew
2014-05-26 10:35 GMT+02:00 Perttu
Hi Martin,
Tim suggested that you pastebin the mesos logs -- can you share those for
the list?
Cheers,
Andrew
On Thu, May 15, 2014 at 5:02 PM, Martin Weindel martin.wein...@gmail.comwrote:
Andrew,
thanks for your response. When using the coarse mode, the jobs run fine.
My problem
/spark/pull/126
Alternatively, it sounds like your algorithm needs some additional state to
join against to produce each successive iteration of RDD. Have you
considered storing that data in an RDD rather than a broadcast variable?
Andrew
On Wed, May 7, 2014 at 10:02 PM, randylu randyl...@gmail.com
. Do you have a sense of how large the serialized items in
your RDD are?
Andrew
On Sat, May 10, 2014 at 6:32 AM, Andrea Esposito and1...@gmail.com wrote:
UP, doesn't anyone know something about it? ^^
2014-05-06 12:05 GMT+02:00 Andrea Esposito and1...@gmail.com:
Hi there,
sorry if i'm
-port-to-send-to
Thanks for taking a look through!
I also realized that I had a couple mistakes with the 0.9 to 1.0 transition
so appropriately documented those now as well in the updated PR.
Cheers!
Andrew
On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger jeis...@us.ibm.com wrote:
Howdy Andrew
.
Cheers,
Andrew
On Wed, May 7, 2014 at 10:19 AM, Mark Baker dist...@acm.org wrote:
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger jeis...@us.ibm.com wrote:
In a nut shell, Spark opens up a couple of well known ports. And,then
the workers and the shell open up dynamic ports for each job
for
future users to leverage!
Andrew
On Thu, May 22, 2014 at 10:49 AM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I have bunch of vectors like
[0.1234,-0.231,0.23131]
and so on.
and I want to compute cosine similarity and pearson correlation using
pyspark..
How do I do this?
Any
401 - 500 of 574 matches
Mail list logo