Totally agree, also there is a class 'SparkSubmit' you can call directly to
replace shellscript
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-as-web-app-backend-tp8163p8248.html
Sent from the Apache Spark User List mailing list archive at
Expanded to 4 nodes and change the workers to listen to public DNS, but still
it shows the same error (which is obviously wrong). I can't believe I'm the
first to encounter this issue.
--
View this message in context:
I'm trying to link a spark slave with an already-setup master, using:
$SPARK_HOME/sbin/start-slave.sh spark://ip-172-31-32-12:7077
However the result shows that it cannot open a log file it is supposed to
create:
failed to launch org.apache.spark.deploy.worker.Worker:
tail: cannot open
I haven't setup a passwordless login from slave to master node yet (I was
under impression that this is not necessary since they communicate using
port 7077)
--
View this message in context:
make sure all queries are called through class methods and wrap your query
info with a class having only simple properties (strings, collections etc).
If you can't find such wrapper you can also use SerializableWritable wrapper
out-of-the-box, but its not recommended. (developer-api and make fat
I've read somewhere that in 1.0 there is a bash tool called 'spark-config.sh'
that allows you to propagate your config files to a number of master and
slave nodes. However I haven't use it myself
--
View this message in context:
I got 'NoSuchFieldError' which is of the same type. its definitely a
dependency jar conflict. spark driver will load jars of itself which in
recent version get many dependencies that are 1-2 years old. And if your
newer version dependency is in the same package it will be shaded (Java's
first come
I'm afraid persisting connection across two tasks is a dangerous act as they
can't be guaranteed to be executed on the same machine. Your ES server may
think its a man-in-the-middle attack!
I think its possible to invoke a static method that give you a connection in
a local 'pool', so nothing
I'm deploying a cluster to Amazon EC2, trying to override its internal ip
addresses with public dns
I start a cluster with environment parameter: SPARK_PUBLIC_DNS=[my EC2
public DNS]
But it doesn't change anything on the web UI, it still shows internal ip
address
Spark Master at
Right problem solved in a most disgraceful manner. Just add a package
relocation in maven shade config.
The downside is that it is not compatible with my IDE (IntelliJ IDEA), will
cause:
Error:scala.reflect.internal.MissingRequirementError: object scala.runtime
in compiler mirror not found.:
Thanks a lot! Let me check my maven shade plugin config and see if there is a
fix
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-throws-NoSuchFieldError-when-testing-on-cluster-mode-tp8064p8073.html
Sent from the Apache Spark User List mailing list
Indeed I see a lot of duplicate package warning in the maven-shade assembly
package output, so I tried to eliminate them:
First I set scope of dependency to apache-spark to 'provided', as suggested
in this page:
http://spark.apache.org/docs/latest/submitting-applications.html
But spark master
Latest Advancement:
I found the cause of NoClassDef exception: I wasn't using spark-submit,
instead I tried to run the spark application directly with SparkConf set in
the code. (this is handy in local debugging). However the old problem
remains: Even my maven-shade plugin doesn't give any warning
I also found that any buggy application submitted in --deploy-mode = cluster
mode will crash the worker (turn status to 'DEAD'). This shouldn't really
happen, otherwise nobody will use this mode. It is yet unclear whether all
workers will crash or only the one running the driver will (as I only
Hi Sean,
OK I'm about 90% sure about the cause of this problem: Just another classic
Dependency conflict:
Myproject - Selenium - apache.httpcomponents:httpcore 4.3.1 (has
ContentType)
Spark - Spark SQL Hive - Hive - Thrift - apache.httpcomponents:httpcore
4.1.3 (has no ContentType)
Though I
I've tried enabling the speculative jobs, this seems partially solved the
problem, however I'm not sure if it can handle large-scale situations as it
only start when 75% of the job is finished.
--
View this message in context:
My transformations or actions has some external tool set dependencies and
sometimes they just stuck somewhere and there is no way I can fix them. If I
don't want the job to run forever, Do I need to implement several monitor
threads to throws an exception when they stuck. Or the framework can
I wasn't using spark sql before.
But by default spark should retry the exception for 4 times.
I'm curious why it aborted after 1 failure
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-spark-sql-saveAsParquetFile-Error-tp7006p7252.html
Sent from
I speculate that Spark will only retry on exceptions that are registered with
TaskSetScheduler, so a definitely-will-fail task will fail quickly without
taking more resources. However I haven't found any documentation or web page
on it
--
View this message in context:
I think these failed task must got retried automatically if you can't see any
error in your results. Other wise the entire application will throw a
SparkException and abort.
Unfortunately I don't know how to do this, my application always abort.
--
View this message in context:
parameters scattered in two different places (masterURL and
$spark.task.maxFailures).
I'm thinking of adding a new config parameter
$spark.task.maxLocalFailures to override 1, how do you think?
Thanks again buddy.
Yours Peng
On Mon 09 Jun 2014 01:33:45 PM EDT, Aaron Davidson wrote:
Looks like
Oh, and to make things worse, they forgot '\*' in their regex.
Am I the first to encounter this problem before?
On Mon 09 Jun 2014 02:24:43 PM EDT, Peng Cheng wrote:
Thanks a lot! That's very responsive, somebody definitely has
encountered the same problem before, and added two hidden modes
Hi Matei, Yeah you are right this is very niche (my user case is as a
web crawler), but I glad you also like an additional property. Let me
open a JIRA.
Yours Peng
On Mon 09 Jun 2014 03:08:29 PM EDT, Matei Zaharia wrote:
If this is a useful feature for local mode, we should open a JIRA
101 - 123 of 123 matches
Mail list logo