Re: Writing to HDFS and cluster utilization

2018-06-15 Thread Rohit Karlupia
Hi, The minimal solution is to enable dynamicAllocation and set idle timeout to low value. This will ensure that idle executors are killed and resources available for others to use, spark.dynamicAllocation.enabled spark.dynamicAllocation.executorIdleTimeout If you would like to understand

Re: time for Apache Spark 3.0?

2018-06-15 Thread Reynold Xin
Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan wrote: > I agree, I dont see pressing need for major version bump as well. > > > Regards, > Mridul > On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra > wrote: > > > >

Re: Time for 2.1.3

2018-06-15 Thread Wenchen Fan
+1 On Fri, Jun 15, 2018 at 7:10 AM, Tom Graves wrote: > +1 for doing a 2.1.3 release. > > Tom > > On Wednesday, June 13, 2018, 7:28:26 AM CDT, Marco Gaido < > marcogaid...@gmail.com> wrote: > > > Yes, you're right Herman. Sorry, my bad. > > Thanks. > Marco > > 2018-06-13 14:01 GMT+02:00 Herman

Unsubscribe

2018-06-15 Thread Mikhail Dubkov
Unsubscribe On Thu, Jun 14, 2018 at 8:38 PM Kumar S, Sajive wrote: > Unsubscribe >

Re: time for Apache Spark 3.0?

2018-06-15 Thread Mridul Muralidharan
I agree, I dont see pressing need for major version bump as well. Regards, Mridul On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra wrote: > > Changing major version numbers is not about new features or a vague notion > that it is time to do something that will be seen to be a significant >

Re: time for Apache Spark 3.0?

2018-06-15 Thread Mark Hamstra
Changing major version numbers is not about new features or a vague notion that it is time to do something that will be seen to be a significant release. It is about breaking stable public APIs. I still remain unconvinced that the next version can't be 2.4.0. On Fri, Jun 15, 2018 at 1:34 AM Andy

Re: Time for 2.1.3

2018-06-15 Thread Tom Graves
+1 for doing a 2.1.3 release.   Tom On Wednesday, June 13, 2018, 7:28:26 AM CDT, Marco Gaido wrote: Yes, you're right Herman. Sorry, my bad. Thanks.Marco 2018-06-13 14:01 GMT+02:00 Herman van Hövell tot Westerflier : Isn't this only a problem with Spark 2.3.x? On Wed, Jun 13, 2018 at

Writing to HDFS and cluster utilization

2018-06-15 Thread Alessandro Liparoti
Hi, I would like to briefly present you my use case and gather possible useful suggestions from the community. I am developing a spark job which massively read from and write to Hive. Usually, I use 200 executors with 12g memory each and a parallelism level of 600. The main run of the application

Re: Very slow complex type column reads from parquet

2018-06-15 Thread Jakub Wozniak
Hello, I’m sorry to bother you again but it is quite important for us to understand the problem better. One more finding in our problem is that the performance of queries in a timestamp sorted file depend a lot on the predicate timestamp. If you are lucky to get some records from the start of

Re: time for Apache Spark 3.0?

2018-06-15 Thread Andy
*Dear all:* It have been 2 months since this topic being proposed. Any progress now? 2018 has been passed about 1/2. I agree with that the new version should be some exciting new feature. How about this one: *6. ML/DL framework to be integrated as core component and feature. (Such as Angel /

Re: Re: Support SqlStreaming in spark

2018-06-15 Thread Hadrien Chicault
Unsuscribe 2018-06-15 9:20 GMT+02:00 stc : > The repo you give may solve some of SqlStreaming problems, but not > friendly enough, user need to learn this new syntax. > > -- > Jacky Lee > Mail:qcsd2...@163.com > > At 2018-06-15 11:48:01, "Bowden, Chris" > wrote: > > Not sure if there is a

Re:Re: Support SqlStreaming in spark

2018-06-15 Thread stc
The repo you give may solve some of SqlStreaming problems, but not friendly enough, user need to learn this new syntax. -- Jacky Lee Mail:qcsd2...@163.com At 2018-06-15 11:48:01, "Bowden, Chris" wrote: Not sure if there is a question in here, but if you are hinting that structured