Re: Kotlin Spark API

2020-07-14 Thread Anwar AliKhan
Is kotlin another new language ? GRADY BOOCH; The United States Department of defence (DOD) is perhaps the largest user of computers in the world. By the mid-1970s, software development for its systems had reached crisis proportions: projects were often late, over budget and they often failed to

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
, examples etc. so it is a smooth transition from that courses. On Tue, 14 Jul 2020, 15:52 Sean Owen, wrote: > It is still copyrighted material, no matter its state of editing. Yes, > you should not be sharing this on the internet. > > On Tue, Jul 14, 2020 at 9:46 AM Anwar AliK

Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
ook is not freely available. > > I own it and it's wonderful, Mr. Géron deserves to benefit from it. > > On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan > wrote: > >> link to a free book which may be useful. >> >> Hands-On Machine Learning with Scikit-Learn, K

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
link to a free book which may be useful. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron https://bit.ly/2zxueGt 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron

Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Anwar AliKhan
This is very useful for me leading on from week4 of the Andrew Ng course. On Mon, 13 Jul 2020, 15:18 Sean Owen, wrote: > There is a multilayer perceptron implementation in Spark ML, but > that's not what you're looking for. > To parallelize model training developed using standard libraries

Re: Blog : Apache Spark Window Functions

2020-07-13 Thread Anwar AliKhan
on Apache Spark. You can use Apache spark on a standalone whilst you prototype then with one line of code, change the parallelism to a distributed parallelism across cluster(s) of PCs. On Fri, 10 Jul 2020, 04:50 Anwar AliKhan, wrote: > My opinion would be go here. > > https://w

Re: Blog : Apache Spark Window Functions

2020-07-09 Thread Anwar AliKhan
My opinion would be go here. https://www.coursera.org/courses?query=machine%20learning%20andrew%20ng Machine learning by Andrew Ng. After three weeks you will have more valuable skills than most engineers in silicon valley in the USA. I am past week 3. 蘿 He does go 90 miles per hour. I wish

Re: When is a Bigint a long and when is a long a long

2020-06-28 Thread Anwar AliKhan
.reduce(_+_) >>> >>> If you collect(), you still have an Array[java.lang.Long]. But Scala >>> implicits and conversions make .reduce(_+_) work fine on that; there >>> is no "Java-friendly" overload in the way. >>> >>> Normally all of

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
OK Thanks On Sat, 27 Jun 2020, 17:36 Sean Owen, wrote: > It does not return a DataFrame. It returns Dataset[Long]. > You do not need to collect(). See my email. > > On Sat, Jun 27, 2020, 11:33 AM Anwar AliKhan > wrote: > >> So the range function actually return

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
quot;Java-friendly" overload in the way. > > Normally all of this just works and you can ignore these differences. > This is a good example of a corner case in which it's inconvenient, > because of the old Java-friendly overloads. This is by design though. > > On Sat, Jun 27, 2020 at 8:29 AM

When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
*As you know I have been puzzling over this issue :* *How come spark.range(100).reduce(_+_)* *worked in earlier spark version but not with the most recent versions.* *well,* *When you first create a dataset, by default the column "id" datatype is [BigInt],* *It is a bit like a coin Long on one

Re: Where are all the jars gone ?

2020-06-25 Thread Anwar AliKhan
g) => java.lang.Long)java.lang.Long cannot be applied to ((java.lang.Long, java.lang.Long) => scala.Long) spark.range(1,101).reduce(_+_) <http://www.backbutton.co.uk/> On Wed, 24 Jun 2020, 19:54 Anwar AliKhan, wrote: > > I am using the method describe on this p

Suggested Amendment to ./dev/make-distribution.sh

2020-06-25 Thread Anwar AliKhan
 expectation especially if a project has been going for 10 years. A message to say these packages are needed but not installed . Please wait while packages are being installed will be helpful to the user experience.珞 On Wed, 24 Jun 2020, 16:21 Anwar AliKhan, wrote: > THA

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
ions, which is a > different > beast entirely > <https://maven.apache.org/guides/getting-started/index.html#What_is_a_SNAPSHOT_version> > . > > On Wed, Jun 24, 2020 at 10:39 AM Anwar AliKhan > wrote: > >> THANKS >> >> >> It appears the direc

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
sually in > the .m2 directory. > > Hope this helps. > > -ND > On 6/23/20 3:21 PM, Anwar AliKhan wrote: > > Hi, > > I prefer to do most of my projects in Python and for that I use Jupyter. > I have been downloading the compiled version of spark. > > I do not

Re: Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
ks like you haven't installed the 'e1071' package. > > 2020년 6월 24일 (수) 오후 6:49, Anwar AliKhan 님이 작성: > >> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr >> -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes >> <http://www.backbutton.co.uk/>

Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes minor error Spark r test failed , I don't use r so it doesn't effect me. ***installing help indices ** building package indices **

Found jars in /assembly/target/scala-2.12/jars

2020-06-23 Thread Anwar AliKhan

Where are all the jars gone ?

2020-06-23 Thread Anwar AliKhan
Hi, I prefer to do most of my projects in Python and for that I use Jupyter. I have been downloading the compiled version of spark. I do not normally like the source code version because the build process makes me nervous. You know with lines of stuff scrolling up the screen. What am I am

Re: Hey good looking toPandas () error stack

2020-06-21 Thread Anwar AliKhan
le major > version 55' > > I see posts about the Java version being used. Are you sure your configs > are right? > > https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version > > On Sat, Jun 20, 2020 at 6:17 AM Anwar AliKhan > wrote:

Re: Hey good looking toPandas () error stack

2020-06-20 Thread Anwar AliKhan
ntException(s.split(': ', 1)[1], stackTrace) 80 raise 81 return deco IllegalArgumentException: 'Unsupported class file major version 55' On Fri, 19 Jun 2020, 08:06 Stephen Boesch, wrote: > afaik It has been there since Spark 2.0 in 2015. Not certain about >

Re: Hey good looking toPandas ()

2020-06-19 Thread Anwar AliKhan
; On Thu, 18 Jun 2020 at 23:56, Anwar AliKhan > wrote: > >> I first ran the command >> df.show() >> >> For sanity check of my dataFrame. >> >> I wasn't impressed with the display. >> >> I then ran >> df.toPandas() in Jupiter Notebook.

Hey good looking toPandas ()

2020-06-19 Thread Anwar AliKhan
I first ran the command df.show() For sanity check of my dataFrame. I wasn't impressed with the display. I then ran df.toPandas() in Jupiter Notebook. Now the display is really good looking . Is toPandas() a new function which became available in Spark 3.0 ?

Add python library

2020-06-06 Thread Anwar AliKhan
" > Have you looked into this article? https://medium.com/@SSKahani/pyspark-applications-dependencies-99415e0df987 " This is weird ! I was hanging out here https://machinelearningmastery.com/start-here/. When I came across this post. The weird part is I was just wondering how I can take one

Re: Spark dataframe hdfs vs s3

2020-05-30 Thread Anwar AliKhan
Optimisation of Spark applications Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article presents several Spark

Re: [pyspark 2.3+] Dedupe records

2020-05-30 Thread Anwar AliKhan
What meaning Dataframes are RDDs under the cover ? What meaning deduplication ? Please send your bio data history and past commercial projects. The Wali Ahad agreed to release 300 million USD for new machine learning research Project to centralize government facilities to find better way to

Re: Spark Security

2020-05-29 Thread Anwar AliKhan
What is the size of your .tsv file sir ? What is the size of your local hard drive sir ? Regards Wali Ahaad On Fri, 29 May 2020, 16:21 , wrote: > Hello, > > I plan to load in a local .tsv file from my hard drive using sparklyr (an > R package). I have figured out how to do this