date:20200618

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Takeshi Yamamuro

Congrats, all! Bests, Takeshi On Fri, Jun 19, 2020 at 1:16 PM Felix Cheung wrote: > Congrats > > -- > *From:* Jungtaek Lim > *Sent:* Thursday, June 18, 2020 8:18:54 PM > *To:* Hyukjin Kwon > *Cc:* Mridul Muralidharan ; Reynold Xin < > r...@databricks.com>; dev ;

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Felix Cheung

Congrats From: Jungtaek Lim Sent: Thursday, June 18, 2020 8:18:54 PM To: Hyukjin Kwon Cc: Mridul Muralidharan ; Reynold Xin ; dev ; user Subject: Re: [ANNOUNCE] Apache Spark 3.0.0 Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19,

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Jungtaek Lim

Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19, 2020 at 12:13 PM Hyukjin Kwon wrote: > Yay! > > 2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 님이 작성: > >> Great job everyone ! Congratulations :-) >> >> Regards, >> Mridul >> >> On Thu, Jun 18, 2020 at 10:21 AM Reynold

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Hyukjin Kwon

Yay! 2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 님이 작성: > Great job everyone ! Congratulations :-) > > Regards, > Mridul > > On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin wrote: > >> Hi all, >> >> Apache Spark 3.0.0 is the first release of the 3.x line. It builds on >> many of the innovations

Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-18 Thread Stephen Coy

Hi Murat Migdisoglu, Unfortunately you need the secret sauce to resolve this. It is necessary to check out the Apache Spark source code and build it with the right command line options. This is what I have been using: dev/make-distribution.sh --name my-spark --tgz -Pyarn -Phadoop-3.2 -Pyarn

Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-18 Thread murat migdisoglu

Hi all I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that. Do I need to add a third party jar to support new comitters? java.lang.ClassNotFoundException:

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Mridul Muralidharan

Great job everyone ! Congratulations :-) Regards, Mridul On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin wrote: > Hi all, > > Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many > of the innovations from Spark 2.x, bringing new ideas as well as continuing > long-term

Custom Metrics

2020-06-18 Thread Bryan Jeffrey

Hello. We're using Spark 2.4.4. We have a custom metrics sink consuming the Spark-produced metrics (e.g. heap free, etc.). I am trying to determine a good mechanism to pass the Spark application name into the metrics sink. Current the application ID is included, but not the application name. Is

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Gaetano Fabiano

Congratulations 拾 Celebrating 拾 Sent from my iPhone > On 18 Jun 2020, at 20:38, Gourav Sengupta wrote: > > > CELEBRATIONS!!! > >> On Thu, Jun 18, 2020 at 6:21 PM Reynold Xin wrote: >> Hi all, >> >> Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many >> of

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Gourav Sengupta

CELEBRATIONS!!! On Thu, Jun 18, 2020 at 6:21 PM Reynold Xin wrote: > Hi all, > > Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many > of the innovations from Spark 2.x, bringing new ideas as well as continuing > long-term projects that have been in development.

[ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Reynold Xin

Hi all, Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many of the innovations from Spark 2.x, bringing new ideas as well as continuing long-term projects that have been in development. This release resolves more than 3400 tickets. We'd like to thank our contributors

Re: Reading TB of JSON file

2020-06-18 Thread Stephan Wehner

It's an interesting problem. What is the structure of the file? One big array? On hash with many key-value pairs? Stephan On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri wrote: > Hi Spark Users, > > I have a 50GB of JSON file, I would like to read and persist at HDFS so it > can be taken into

Re: Reading TB of JSON file

2020-06-18 Thread Gourav Sengupta

Hi, So you have a single JSON record in multiple lines? And all the 50 GB is in one file? Regards, Gourav On Thu, 18 Jun 2020, 14:34 Chetan Khatri, wrote: > It is dynamically generated and written at s3 bucket not historical data > so I guess it doesn't have jsonlines format > > On Thu, Jun

Re: GPU Acceleration for spark-3.0.0

2020-06-18 Thread Bobby Evans

"So if I am going to use GPU in my job running on the spark , I still need to code the map and reduce function in cuda or in c++ and then invoke them throught jni or something like GPUEnabler , is that right ?" Sort of. You could go through all of that work yourself, or you could use the plugin

Re: Reading TB of JSON file

2020-06-18 Thread Chetan Khatri

It is dynamically generated and written at s3 bucket not historical data so I guess it doesn't have jsonlines format On Thu, Jun 18, 2020 at 9:16 AM Jörn Franke wrote: > Depends on the data types you use. > > Do you have in jsonlines format? Then the amount of memory plays much less > a role. >

Re: Reading TB of JSON file

2020-06-18 Thread Chetan Khatri

File is available at S3 Bucket. On Thu, Jun 18, 2020 at 9:15 AM Patrick McCarthy wrote: > Assuming that the file can be easily split, I would divide it into a > number of pieces and move those pieces to HDFS before using spark at all, > using `hdfs dfs` or similar. At that point you can use

Re: Reading TB of JSON file

2020-06-18 Thread nihed mbarek

Hi, What is the size of one json document ? There is also the scan of your json to define the schema, the overhead can be huge. 2 solution: define a schema and use directly during the load or ask spark to analyse a small part of the json file (I don't remember how to do it) Regards, On Thu,

Re: Reading TB of JSON file

2020-06-18 Thread Jörn Franke

Depends on the data types you use. Do you have in jsonlines format? Then the amount of memory plays much less a role. Otherwise if it is one large object or array I would not recommend it. > Am 18.06.2020 um 15:12 schrieb Chetan Khatri : > > > Hi Spark Users, > > I have a 50GB of JSON

Re: Reading TB of JSON file

2020-06-18 Thread Patrick McCarthy

Assuming that the file can be easily split, I would divide it into a number of pieces and move those pieces to HDFS before using spark at all, using `hdfs dfs` or similar. At that point you can use your executors to perform the reading instead of the driver. On Thu, Jun 18, 2020 at 9:12 AM Chetan

Reading TB of JSON file

2020-06-18 Thread Chetan Khatri

Hi Spark Users, I have a 50GB of JSON file, I would like to read and persist at HDFS so it can be taken into next transformation. I am trying to read as spark.read.json(path) but this is giving Out of memory error on driver. Obviously, I can't afford having 50 GB on driver memory. In general,

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-18 Thread Jacek Laskowski

Hi Rachana, > Should I go backward and use Spark Streaming DStream based. No. Never. It's no longer supported (and should really be removed from the codebase once and for all - dreaming...). Spark focuses on Spark SQL and Spark Structured Streaming as user-facing modules for batch and streaming

Re: [ANNOUNCE] Apache Spark 3.0.0

Re: [ANNOUNCE] Apache Spark 3.0.0

Re: [ANNOUNCE] Apache Spark 3.0.0

Re: [ANNOUNCE] Apache Spark 3.0.0

Re: java.lang.ClassNotFoundException for s3a comitter

Re: java.lang.ClassNotFoundException for s3a comitter

Re: [ANNOUNCE] Apache Spark 3.0.0

Custom Metrics

Re: [ANNOUNCE] Apache Spark 3.0.0

Re: [ANNOUNCE] Apache Spark 3.0.0

[ANNOUNCE] Apache Spark 3.0.0

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: GPU Acceleration for spark-3.0.0

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Reading TB of JSON file

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

21 matches

Site Navigation

Mail list logo

Footer information