Re: [PSA] Python 2, 3.4 and 3.5 are now dropped

2020-07-13 Thread Hyukjin Kwon
cc user mailing list too. 2020년 7월 14일 (화) 오전 11:27, Hyukjin Kwon 님이 작성: > I am sending another email to make sure dev people know. Python 2, 3.4 and > 3.5 are now dropped at https://github.com/apache/spark/pull/28957. > > >

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
link to a free book which may be useful. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron https://bit.ly/2zxueGt 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
This is very useful for me leading on from week4 of the Andrew Ng course. On Mon, 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron implementation in Spark ML, but > that's not what you're looking for. > To parallelize model training developed using standard libraries

Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

2020-07-13 Thread Ivan Petrov
What do you mean "without conversion"? def flatten(rdd: RDD[NestedStructure]): Dataset[MyCaseClass] = { rdd.flatMap { nestedElement => flatten(nestedElement) /** List[MyCaseClass] */ } .toDS() } Can it be better? вт, 14 июл. 2020 г. в 01:13, Sean Owen : > Wouldn't toDS() do this

Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

2020-07-13 Thread Sean Owen
Wouldn't toDS() do this without conversion? On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov wrote: > > Hi! > I'm trying to understand the cost of RDD to Dataset conversion > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > It takes around 15 minutes to convert

scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

2020-07-13 Thread Ivan Petrov
Hi! I'm trying to understand the cost of RDD to Dataset conversion It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 records It takes around 15 minutes to convert them to Dataset[MyCaseClass] The shema of MyCaseClass is str01: String, str02: String, str03: String, str04:

Using Spark UI with Running Spark on Hadoop Yarn

2020-07-13 Thread ArtemisDev
Is there anyway to make the spark process visible via Spark UI when running Spark 3.0 on a Hadoop yarn cluster?  The spark documentation talked about replacing Spark UI with the spark history server, but didn't give much details.  Therefore I would assume it is still possible to use Spark UI

org.apache.spark.deploy.yarn.ExecutorLauncher not found when running Spark 3.0 on Hadoop

2020-07-13 Thread ArtemisDev
I've been trying to set up the latest stable version of Spark 3.0 on a hadoop cluster using yarn.  When running spark-submit in client mode, I always got an error of org.apache.spark.deploy.yarn.ExecutorLauncher not found.  This happened when I preload the spark jar files onto HDFS and

Re: Blog : Apache Spark Window Functions

2020-07-13 Thread Anwar AliKhan
Further to the feedback you requested , I forgot to mention another point , that with the insight you will gain after three weeks spent on that course, You will be on par with the aformentioned minority of engineers who are helping their companies "make tons of money" a quote from Professor

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Sean Owen
There is a multilayer perceptron implementation in Spark ML, but that's not what you're looking for. To parallelize model training developed using standard libraries like Keras, use Horovod from Uber. https://horovod.readthedocs.io/en/stable/spark_include.html On Mon, Jul 13, 2020 at 6:59 AM

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Juan Martín Guillén
Hi Mukhtaj, Parallelization on Spark is abstracted on the DataFrame. You can run anything locally on the driver but to make it run in parallel on the cluster you'll need to use the DataFrame abstraction. You may want to check maxpumperla/elephas. | | | | | | | | | | |

Issue in parallelization of CNN model using spark

2020-07-13 Thread Mukhtaj Khan
Dear Spark User I am trying to parallelize the CNN (convolutional neural network) model using spark. I have developed the model using python and Keras library. The model works fine on a single machine but when we try on multiple machines, the execution time remains the same as sequential. Could