Re: unsubscribe

2020-06-27 Thread Wesley Peng
please send an empty email to: user-unsubscr...@spark.apache.org to unsubscribe yourself from the list. Sri Kris wrote: Sent from Mail for Windows 10 - To unsubscribe

Spark 3.0 almost 1000 times slower to read json than Spark 2.4

2020-06-27 Thread Sanjeev Mishra
I have large amount of json files that Spark can read in 36 seconds but Spark 3.0 takes almost 33 minutes to read the same. On closer analysis, looks like Spark 3.0 is choosing different DAG than Spark 2.0. Does anyone have any idea what is going on? Is there any configuration problem with Spark

Re: unsubscribe

2020-06-27 Thread Jeff Evans
That is not how you unsubscribe. See here for instructions: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Sat, Jun 27, 2020, 6:08 PM Sri Kris wrote: > > > > > Sent from Mail for > Windows 10 > > >

unsubscribe

2020-06-27 Thread Sri Kris
Sent from Mail for Windows 10

Spark 3.0.0 spark.read.json never completes

2020-06-27 Thread Sanjeev Mishra
HI all, I have huge amount of json files that Spark 2.4 can easily finish reading but Spark 3.0.0 never competes. I am running both Spark 2 and Spark 3 on Mac

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
OK Thanks On Sat, 27 Jun 2020, 17:36 Sean Owen, wrote: > It does not return a DataFrame. It returns Dataset[Long]. > You do not need to collect(). See my email. > > On Sat, Jun 27, 2020, 11:33 AM Anwar AliKhan > wrote: > >> So the range function actually returns BigInt (Spark SQL type) >> and

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Sean Owen
It does not return a DataFrame. It returns Dataset[Long]. You do not need to collect(). See my email. On Sat, Jun 27, 2020, 11:33 AM Anwar AliKhan wrote: > So the range function actually returns BigInt (Spark SQL type) > and the fact Dataset[Long] and printSchema are displaying (toString()) >

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
So the range function actually returns BigInt (Spark SQL type) and the fact Dataset[Long] and printSchema are displaying (toString()) Long instead of BigInt needs looking into. Putting that to one side My issue with using collect() to get around the casting of elements returned by range is, I

Re: When is a Bigint a long and when is a long a long

2020-06-27 Thread Sean Owen
There are several confusing things going on here. I think this is part of the explanation, not 100% sure: 'bigint' is the Spark SQL type of an 8-byte long. 'long' is the type of a JVM primitive. Both are the same, conceptually, but represented differently internally as they are logically somewhat

Distributed Anomaly Detection using MIDAS

2020-06-27 Thread Shivin Srivastava
Hi All, I have recently been exploring MIDAS: an algorithm for Streaming Anomaly Detection. A production level parallel and distributed implementation of MIDAS should be quite useful to the industry. I feel that Spark is very well-suited for the same as MIDAS deals with streaming data. If anyone

When is a Bigint a long and when is a long a long

2020-06-27 Thread Anwar AliKhan
*As you know I have been puzzling over this issue :* *How come spark.range(100).reduce(_+_)* *worked in earlier spark version but not with the most recent versions.* *well,* *When you first create a dataset, by default the column "id" datatype is [BigInt],* *It is a bit like a coin Long on one