Re: Using Spark as web app backend

2014-06-25 Thread Peng Cheng
Totally agree, also there is a class 'SparkSubmit' you can call directly to replace shellscript -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-as-web-app-backend-tp8163p8248.html Sent from the Apache Spark User List mailing list archive at

Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-06-25 Thread Peng Cheng
Expanded to 4 nodes and change the workers to listen to public DNS, but still it shows the same error (which is obviously wrong). I can't believe I'm the first to encounter this issue. -- View this message in context:

Spark slave fail to start with wierd error information

2014-06-24 Thread Peng Cheng
I'm trying to link a spark slave with an already-setup master, using: $SPARK_HOME/sbin/start-slave.sh spark://ip-172-31-32-12:7077 However the result shows that it cannot open a log file it is supposed to create: failed to launch org.apache.spark.deploy.worker.Worker: tail: cannot open

Re: Spark slave fail to start with wierd error information

2014-06-24 Thread Peng Cheng
I haven't setup a passwordless login from slave to master node yet (I was under impression that this is not necessary since they communicate using port 7077) -- View this message in context:

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
make sure all queries are called through class methods and wrap your query info with a class having only simple properties (strings, collections etc). If you can't find such wrapper you can also use SerializableWritable wrapper out-of-the-box, but its not recommended. (developer-api and make fat

Re: How to Reload Spark Configuration Files

2014-06-24 Thread Peng Cheng
I've read somewhere that in 1.0 there is a bash tool called 'spark-config.sh' that allows you to propagate your config files to a number of master and slave nodes. However I haven't use it myself -- View this message in context:

Re: Upgrading to Spark 1.0.0 causes NoSuchMethodError

2014-06-24 Thread Peng Cheng
I got 'NoSuchFieldError' which is of the same type. its definitely a dependency jar conflict. spark driver will load jars of itself which in recent version get many dependencies that are 1-2 years old. And if your newer version dependency is in the same package it will be shaded (Java's first come

Re: ElasticSearch enrich

2014-06-24 Thread Peng Cheng
I'm afraid persisting connection across two tasks is a dangerous act as they can't be guaranteed to be executed on the same machine. Your ES server may think its a man-in-the-middle attack! I think its possible to invoke a static method that give you a connection in a local 'pool', so nothing

Does PUBLIC_DNS environment parameter really works?

2014-06-24 Thread Peng Cheng
I'm deploying a cluster to Amazon EC2, trying to override its internal ip addresses with public dns I start a cluster with environment parameter: SPARK_PUBLIC_DNS=[my EC2 public DNS] But it doesn't change anything on the web UI, it still shows internal ip address Spark Master at

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-22 Thread Peng Cheng
Right problem solved in a most disgraceful manner. Just add a package relocation in maven shade config. The downside is that it is not compatible with my IDE (IntelliJ IDEA), will cause: Error:scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.:

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng
Thanks a lot! Let me check my maven shade plugin config and see if there is a fix -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-throws-NoSuchFieldError-when-testing-on-cluster-mode-tp8064p8073.html Sent from the Apache Spark User List mailing list

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng
Indeed I see a lot of duplicate package warning in the maven-shade assembly package output, so I tried to eliminate them: First I set scope of dependency to apache-spark to 'provided', as suggested in this page: http://spark.apache.org/docs/latest/submitting-applications.html But spark master

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng
Latest Advancement: I found the cause of NoClassDef exception: I wasn't using spark-submit, instead I tried to run the spark application directly with SparkConf set in the code. (this is handy in local debugging). However the old problem remains: Even my maven-shade plugin doesn't give any warning

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng
I also found that any buggy application submitted in --deploy-mode = cluster mode will crash the worker (turn status to 'DEAD'). This shouldn't really happen, otherwise nobody will use this mode. It is yet unclear whether all workers will crash or only the one running the driver will (as I only

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng
Hi Sean, OK I'm about 90% sure about the cause of this problem: Just another classic Dependency conflict: Myproject - Selenium - apache.httpcomponents:httpcore 4.3.1 (has ContentType) Spark - Spark SQL Hive - Hive - Thrift - apache.httpcomponents:httpcore 4.1.3 (has no ContentType) Though I

Re: What is the best way to handle transformations or actions that takes forever?

2014-06-17 Thread Peng Cheng
I've tried enabling the speculative jobs, this seems partially solved the problem, however I'm not sure if it can handle large-scale situations as it only start when 75% of the job is finished. -- View this message in context:

What is the best way to handle transformations or actions that takes forever?

2014-06-16 Thread Peng Cheng
My transformations or actions has some external tool set dependencies and sometimes they just stuck somewhere and there is no way I can fix them. If I don't want the job to run forever, Do I need to implement several monitor threads to throws an exception when they stuck. Or the framework can

Re: spark1.0 spark sql saveAsParquetFile Error

2014-06-09 Thread Peng Cheng
I wasn't using spark sql before. But by default spark should retry the exception for 4 times. I'm curious why it aborted after 1 failure -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-spark-sql-saveAsParquetFile-Error-tp7006p7252.html Sent from

Re: How to enable fault-tolerance?

2014-06-09 Thread Peng Cheng
I speculate that Spark will only retry on exceptions that are registered with TaskSetScheduler, so a definitely-will-fail task will fail quickly without taking more resources. However I haven't found any documentation or web page on it -- View this message in context:

Re: Occasional failed tasks

2014-06-09 Thread Peng Cheng
I think these failed task must got retried automatically if you can't see any error in your results. Other wise the entire application will throw a SparkException and abort. Unfortunately I don't know how to do this, my application always abort. -- View this message in context:

Re: How to enable fault-tolerance?

2014-06-09 Thread Peng Cheng
parameters scattered in two different places (masterURL and $spark.task.maxFailures). I'm thinking of adding a new config parameter $spark.task.maxLocalFailures to override 1, how do you think? Thanks again buddy. Yours Peng On Mon 09 Jun 2014 01:33:45 PM EDT, Aaron Davidson wrote: Looks like

Re: How to enable fault-tolerance?

2014-06-09 Thread Peng Cheng
Oh, and to make things worse, they forgot '\*' in their regex. Am I the first to encounter this problem before? On Mon 09 Jun 2014 02:24:43 PM EDT, Peng Cheng wrote: Thanks a lot! That's very responsive, somebody definitely has encountered the same problem before, and added two hidden modes

Re: How to enable fault-tolerance?

2014-06-09 Thread Peng Cheng
Hi Matei, Yeah you are right this is very niche (my user case is as a web crawler), but I glad you also like an additional property. Let me open a JIRA. Yours Peng On Mon 09 Jun 2014 03:08:29 PM EDT, Matei Zaharia wrote: If this is a useful feature for local mode, we should open a JIRA

<    1   2