Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Xiao Li
Try to clear your browsing data or use a different web browser. Enjoy it, Xiao On Thu, Nov 8, 2018 at 4:15 PM Reynold Xin wrote: > Do you have a cached copy? I see it here > > http://spark.apache.org/downloads.html > > > > On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > >> this is wonderful !

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Reynold Xin
Do you have a cached copy? I see it here http://spark.apache.org/downloads.html On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > this is wonderful ! > I noticed the official spark download site does not have 2.4 download > links yet. > > On Thu, Nov 8, 2018, 4:11 PM Swapnil Shinde wrote: > >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Li Gao
this is wonderful ! I noticed the official spark download site does not have 2.4 download links yet. On Thu, Nov 8, 2018, 4:11 PM Swapnil Shinde Great news.. thank you very much! > > On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com wrote: > >> Awesome! >> >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Swapnil Shinde
Great news.. thank you very much! On Thu, Nov 8, 2018, 5:19 PM Stavros Kontopoulos < stavros.kontopou...@lightbend.com wrote: > Awesome! > > On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote: > >> Indeed! >> >> Sent from my iPhone >> Pardon the dumb thumb typos :) >> >> On Nov 8, 2018, at 11:31

Spark event logging with s3a

2018-11-08 Thread David Hesson
We are trying to use spark event logging with s3a as a destination for event data. We added these settings to the spark submits: spark.eventLog.dir s3a://ourbucket/sparkHistoryServer/eventLogs spark.eventLog.enabled true Everything works fine with smaller jobs, and we can see the history data

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Stavros Kontopoulos
Awesome! On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote: > Indeed! > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Nov 8, 2018, at 11:31 AM, Dongjoon Hyun > wrote: > > Finally, thank you all. Especially, thanks to the release manager, Wenchen! > > Bests, > Dongjoon. > > > On

Is dataframe write blocking? what can be done for fair scheduler?

2018-11-08 Thread ramannan...@gmail.com
Hi, I have noticed that in fair scheduler setting, if i block on dataframe write to complete, using AwaitResult, the API call ends up returning, whereas that is not what I intend to do as it can cause inconsistencies later in the pipeline. Is there a way to make the dataframe write call blocking?

Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Yes, now I have allocated 100 cores and 8 kafka partitions, and then repartition it to 100 to feed 100 cores. In following stage I have map action, will it also cause slow down? Regard, Junfeng Chen On Thu, Nov 8, 2018 at 12:34 AM Shahbaz wrote: > Hi , > >- Do you have adequate CPU cores

Is Dataframe write blocking?

2018-11-08 Thread Ramandeep Singh Nanda
HI, I have some futures setup to operate in stages, where I expect one stage to complete before another begins. I was hoping that dataframe write call is blocking, whereas the behavior i see is that the call returns before data is persisted. This can cause unintended consequences. I am also using

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Jules Damji
Indeed! Sent from my iPhone Pardon the dumb thumb typos :) > On Nov 8, 2018, at 11:31 AM, Dongjoon Hyun wrote: > > Finally, thank you all. Especially, thanks to the release manager, Wenchen! > > Bests, > Dongjoon. > > >> On Thu, Nov 8, 2018 at 11:24 AM Wenchen Fan wrote: >> + user list >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Dongjoon Hyun
Finally, thank you all. Especially, thanks to the release manager, Wenchen! Bests, Dongjoon. On Thu, Nov 8, 2018 at 11:24 AM Wenchen Fan wrote: > + user list > > On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > >> resend >> >> On Thu, Nov 8, 2018 at 11:02 PM Wenchen Fan wrote: >> >>> >>>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Wenchen Fan
+ user list On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > resend > > On Thu, Nov 8, 2018 at 11:02 PM Wenchen Fan wrote: > >> >> >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@ >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >> To: Spark dev list >> >> >> Hi all, >> >> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds >> Barrier

[no subject]

2018-11-08 Thread JF Chen
I am working on a spark streaming application, and I want it to read configuration from mongodb every hour, where the batch interval is 10 minutes. Is it practicable? As I know spark streaming batch are related to the Dstream, how to implement this function which seems not related to dstream data?

StorageLevel: OffHeap

2018-11-08 Thread Jack Kolokasis
Hello everyone,     I am running a simple word count in Spark and I persist my RDDs using StorageLevel.OFF_HEAP. While I am running the application, i see through the Spark Web UI that are persisted in Disk.  Why this happen?? Can anyone tell me how off heap storage Level work ?? Thanks for

Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Hi, I have test it on my production environment, and I find a strange thing. After I set the kafka partition to 100, some tasks are executed very fast, but some are slow. The slow ones cost double time than fast ones(from event timeline). However, I have checked the consumer offsets, the data

Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Memory is not a big problem for me... SO no any other bad effect? Regard, Junfeng Chen On Wed, Nov 7, 2018 at 4:51 PM Michael Shtelma wrote: > If you configure to many Kafka partitions, you can run into memory issues. > This will increase memory requirements for spark job a lot. > > Best, >