Apache Spark Machine Learning Unleashed Book Review author: Jillur Quddus

2020-05-30 Thread patrice molinchaeux
@ Jillur Qudus aka Scammer I know you are hiding on this mailing or at least or your friends are. @Sean Owen Book/Theatre critic is a profession When I first saw the following code on the introductory Page https://spark.apache.org/examples.htm def inside(p): x, y = random.random(),

[bug] Scala reflection "assertion failed: class Byte" in Dataset.toJSON

2020-05-30 Thread Brandon Vincent
Hi all, I have a job that executes a query and collects the results as JSON using Dataset.toJSON. For the most part it is stable, but sometimes it fails randomly with a scala assertion error. Here is the stack trace: org.apache.spark.sql.Dataset.toJSON

Re: Dataframe to nested json document

2020-05-30 Thread neeraj bhadani
Hi, Apologies for missing link in the previous mail. You can follow the below link to save your DataFrame as JSON file. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.json Regards, Neeraj

Unsubscribe

2020-05-30 Thread Sunil Prabhakara

Re: Dataframe to nested json document

2020-05-30 Thread neeraj bhadani
Hi, You can follow this to save your DataFrame as JSON file. Regards, Neeraj On Sat, May 30, 2020 at 12:44 PM zakaria benzidalmal wrote: > Hi > > Just save it as json > > Le sam. 30 mai 2020 à 13:15, Chidananda Unchi a > écrit : > >> Hi All, >> >>> >>> I want to convert dataframe to JSOn

Re: Spark dataframe hdfs vs s3

2020-05-30 Thread Anwar AliKhan
Optimisation of Spark applications Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article presents several Spark

Re: [pyspark 2.3+] Dedupe records

2020-05-30 Thread Anwar AliKhan
What meaning Dataframes are RDDs under the cover ? What meaning deduplication ? Please send your bio data history and past commercial projects. The Wali Ahad agreed to release 300 million USD for new machine learning research Project to centralize government facilities to find better way to

Re: Dataframe to nested json document

2020-05-30 Thread zakaria benzidalmal
Hi Just save it as json Le sam. 30 mai 2020 à 13:15, Chidananda Unchi a écrit : > Hi All, > >> >> I want to convert dataframe to JSOn Dcoumnet using spark scala. >> >> Can some one share me sample codes or any suggestions >> >> Regards, >> Chidananda >> >>

Dataframe to nested json document

2020-05-30 Thread Chidananda Unchi
Hi All, > > I want to convert dataframe to JSOn Dcoumnet using spark scala. > > Can some one share me sample codes or any suggestions > > Regards, > Chidananda > >

Re: [pyspark 2.3+] Dedupe records

2020-05-30 Thread Molotch
The performant way would be to partition your dataset into reasonably small chunks and use a bloom filter to see if the entity might be in your set before you make a lookup. Check the bloom filter, if the entity might be in the set, rely on partition pruning to read and backfill the relevant

Re: Spark dataframe hdfs vs s3

2020-05-30 Thread Dark Crusader
Thanks all for the replies. I am switching to hdfs since it seems like an easier solution. To answer some of your questions, my hdfs space is a part of my nodes I use for computation on spark. >From what I understand, this helps because of the data locality advantage. Which means that there is

Re: Using Spark Accumulators with Structured Streaming

2020-05-30 Thread Srinivas V
It’s in constructor On Sat, May 30, 2020 at 4:15 AM Something Something < mailinglist...@gmail.com> wrote: > I mean... I don't see any reference to 'accumulator' in your Class > *definition*. How can you access it in the class if it's not in your > definition of class: > > public class