Re: [PSA] Python 2, 3.4 and 3.5 are now dropped
cc user mailing list too. 2020년 7월 14일 (화) 오전 11:27, Hyukjin Kwon 님이 작성: > I am sending another email to make sure dev people know. Python 2, 3.4 and > 3.5 are now dropped at https://github.com/apache/spark/pull/28957. > > >
Re: Issue in parallelization of CNN model using spark
link to a free book which may be useful. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron https://bit.ly/2zxueGt 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron implementation in Spark ML, but > that's not what you're looking for. > To parallelize model training developed using standard libraries like > Keras, use Horovod from Uber. > https://horovod.readthedocs.io/en/stable/spark_include.html > > On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: > > > > Dear Spark User > > > > I am trying to parallelize the CNN (convolutional neural network) model > using spark. I have developed the model using python and Keras library. The > model works fine on a single machine but when we try on multiple machines, > the execution time remains the same as sequential. > > Could you please tell me that there is any built-in library for CNN to > parallelize in spark framework. Moreover, MLLIB does not have any support > for CNN. > > Best regards > > Mukhtaj > > > > > > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Issue in parallelization of CNN model using spark
This is very useful for me leading on from week4 of the Andrew Ng course. On Mon, 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron implementation in Spark ML, but > that's not what you're looking for. > To parallelize model training developed using standard libraries like > Keras, use Horovod from Uber. > https://horovod.readthedocs.io/en/stable/spark_include.html > > On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: > > > > Dear Spark User > > > > I am trying to parallelize the CNN (convolutional neural network) model > using spark. I have developed the model using python and Keras library. The > model works fine on a single machine but when we try on multiple machines, > the execution time remains the same as sequential. > > Could you please tell me that there is any built-in library for CNN to > parallelize in spark framework. Moreover, MLLIB does not have any support > for CNN. > > Best regards > > Mukhtaj > > > > > > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance
What do you mean "without conversion"? def flatten(rdd: RDD[NestedStructure]): Dataset[MyCaseClass] = { rdd.flatMap { nestedElement => flatten(nestedElement) /** List[MyCaseClass] */ } .toDS() } Can it be better? вт, 14 июл. 2020 г. в 01:13, Sean Owen : > Wouldn't toDS() do this without conversion? > > On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov wrote: > > > > Hi! > > I'm trying to understand the cost of RDD to Dataset conversion > > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > > It takes around 15 minutes to convert them to Dataset[MyCaseClass] > > The shema of MyCaseClass is > > str01: String, > > str02: String, > > str03: String, > > str04: String, > > long01: Long, > > long02: Long, > > double01: Double, > > map: Map[String, Double] > > > > What can i do in order to run it faster? >
Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance
Wouldn't toDS() do this without conversion? On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov wrote: > > Hi! > I'm trying to understand the cost of RDD to Dataset conversion > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > It takes around 15 minutes to convert them to Dataset[MyCaseClass] > The shema of MyCaseClass is > str01: String, > str02: String, > str03: String, > str04: String, > long01: Long, > long02: Long, > double01: Double, > map: Map[String, Double] > > What can i do in order to run it faster? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance
Hi! I'm trying to understand the cost of RDD to Dataset conversion It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 records It takes around 15 minutes to convert them to Dataset[MyCaseClass] The shema of MyCaseClass is str01: String, str02: String, str03: String, str04: String, long01: Long, long02: Long, double01: Double, map: Map[String, Double] What can i do in order to run it faster?
Using Spark UI with Running Spark on Hadoop Yarn
Is there anyway to make the spark process visible via Spark UI when running Spark 3.0 on a Hadoop yarn cluster? The spark documentation talked about replacing Spark UI with the spark history server, but didn't give much details. Therefore I would assume it is still possible to use Spark UI when running spark on a hadoop yarn cluster. Is this correct? Does the spark history server have the same user functions as the Spark UI? But how could this be possible (the possibility of using Spark UI) if the Spark master server isn't active when all the job scheduling and resource allocation tasks are replaced by yarn servers? Thanks! -- ND - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
org.apache.spark.deploy.yarn.ExecutorLauncher not found when running Spark 3.0 on Hadoop
I've been trying to set up the latest stable version of Spark 3.0 on a hadoop cluster using yarn. When running spark-submit in client mode, I always got an error of org.apache.spark.deploy.yarn.ExecutorLauncher not found. This happened when I preload the spark jar files onto HDFS and specified the spark.yarn.jars property to the HDFS address (i.e. set spark.yarn.jars to hdfs:///spark-3/jars or hdfs://namenode:8020/spark-3/jars). I've checked the /spark-3/jars directory on HDFS and all the jar files are accessible. The exception messages are listed below. This problem won't occur when I commended out the spark.yarn.jars line in the spark-defaults.conf file. spark-submit finishes without any problems. Any ideas what I have done wrong? Thanks! -- ND == Exception in thread "main" org.apache.spark.SparkException: Application application_1594664166056_0005 failed 2 times due to AM Container for appattempt_1594664166056_0005_02 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-07-13 20:07:20.882]Exception from container-launch. Container id: container_1594664166056_0005_02_01 Exit code: 1 [2020-07-13 20:07:20.886]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
Re: Blog : Apache Spark Window Functions
Further to the feedback you requested , I forgot to mention another point , that with the insight you will gain after three weeks spent on that course, You will be on par with the aformentioned minority of engineers who are helping their companies "make tons of money" a quote from Professor Andrew Ng. You will no longer be part of the majority of engineers who are spending six months on analytical projects when from day one YOU can see that wasn't going to work. Another quote from professor Andrew Ng. If you value the idea of joining the minority engineers "making tons of money for companies" then the same three weeks spent on that course will yield greater value comparatively spent on writing Apache spark examples of the type you are currently engaged in. I have gone past week 3 蘿so I have the insight. It is against my personal values to use a product which is given on a trial period basis , so I use Free Octave , a project started 32 years ago. You can profit by MATLAB 's investment. You can watch MATLAB videos on how use and apply what you have learnt to Octave because the syntax is exactly the same. Then you can parallelise your octave app on Apache Spark. You can use Apache spark on a standalone whilst you prototype then with one line of code, change the parallelism to a distributed parallelism across cluster(s) of PCs. On Fri, 10 Jul 2020, 04:50 Anwar AliKhan, wrote: > My opinion would be go here. > > https://www.coursera.org/courses?query=machine%20learning%20andrew%20ng > > Machine learning by Andrew Ng. > > After three weeks you will have more valuable skills than most engineers > in silicon valley in the USA. I am past week 3. 蘿 > > He does go 90 miles per hour. > I wish somebody had pointed me there as the starting point. > > > > On Thu, 25 Jun 2020, 18:58 neeraj bhadani, > wrote: > >> Hi Team, >> I would like to share with the community that my blog on "Apache >> Spark Window Functions" got published. PFB link if anyone interested. >> >> Link: >> https://medium.com/expedia-group-tech/deep-dive-into-apache-spark-window-functions-7b4e39ad3c86 >> >> Please share your thoughts and feedback. >> >> Regards, >> Neeraj >> >
Re: Issue in parallelization of CNN model using spark
There is a multilayer perceptron implementation in Spark ML, but that's not what you're looking for. To parallelize model training developed using standard libraries like Keras, use Horovod from Uber. https://horovod.readthedocs.io/en/stable/spark_include.html On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: > > Dear Spark User > > I am trying to parallelize the CNN (convolutional neural network) model using > spark. I have developed the model using python and Keras library. The model > works fine on a single machine but when we try on multiple machines, the > execution time remains the same as sequential. > Could you please tell me that there is any built-in library for CNN to > parallelize in spark framework. Moreover, MLLIB does not have any support for > CNN. > Best regards > Mukhtaj > > > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Issue in parallelization of CNN model using spark
Hi Mukhtaj, Parallelization on Spark is abstracted on the DataFrame. You can run anything locally on the driver but to make it run in parallel on the cluster you'll need to use the DataFrame abstraction. You may want to check maxpumperla/elephas. | | | | | | | | | | | maxpumperla/elephas Distributed Deep learning with Keras & Spark. Contribute to maxpumperla/elephas development by creating an accou... | | | Regards,Juan Martín. El lunes, 13 de julio de 2020 08:59:35 ART, Mukhtaj Khan escribió: Dear Spark User I am trying to parallelize the CNN (convolutional neural network) model using spark. I have developed the model using python and Keras library. The model works fine on a single machine but when we try on multiple machines, the execution time remains the same as sequential.Could you please tell me that there is any built-in library for CNN to parallelize in spark framework. Moreover, MLLIB does not have any support for CNN.Best regardsMukhtaj
Issue in parallelization of CNN model using spark
Dear Spark User I am trying to parallelize the CNN (convolutional neural network) model using spark. I have developed the model using python and Keras library. The model works fine on a single machine but when we try on multiple machines, the execution time remains the same as sequential. Could you please tell me that there is any built-in library for CNN to parallelize in spark framework. Moreover, MLLIB does not have any support for CNN. Best regards Mukhtaj