Re: Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Ismaël Mejía
: > > Hi Ismaël, > > Thank you, it's interesting. Is this message relevant only to > maintainers/contributors of top-level Apache projects or works for other > maintainers of Apache-licensed software too? > > Regards, > Pasha > > ср, 18 мая 2022 г.,

Data Engineering Track at ApacheCon (October 3-6, New Orleans) - CFP ends 23/05

2022-05-17 Thread Ismaël Mejía
Hello, ApacheCon North America is back in person this year in October. https://apachecon.com/acna2022/ Together with Jarek Potiuk, we are organizing for the first time a Data Engineering Track as part of ApacheCon. You might be wondering why a different track if we already have the Big Data

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-25 Thread Ismaël Mejía
The ready to use docker images are great news. I have been waiting for this for so long! Extra kudos for including ARM64 versions too! I am curious, what are the non-ASF artifacts included in them (or you refer to the OS specific elements with other licenses?), and what consequences might be for

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-14 Thread Ismaël Mejía
+1 (non-binding) Tested on downstream project without further issues. Ship it time! On Tue, May 11, 2021 at 9:37 PM Liang-Chi Hsieh wrote: > The staging repository for this release can be accessed now too: > https://repository.apache.org/content/repositories/orgapachespark-1383/ > > Thanks

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-29 Thread Ismaël Mejía
+1 (non-binding) On Mon, Mar 29, 2021 at 7:54 AM Wenchen Fan wrote: > > +1 > > On Mon, Mar 29, 2021 at 1:45 PM Holden Karau wrote: >> >> +1 >> >> On Sun, Mar 28, 2021 at 10:25 PM sarutak wrote: >>> >>> +1 (non-binding) >>> >>> - Kousuke >>> >>> > +1 (non-binding) >>> > >>> > On Sun, Mar 28,

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-15 Thread Ismaël Mejía
+1 Bringing a Pandas API for pyspark to upstream Spark will only bring benefits for everyone (more eyes to use/see/fix/improve the API) as well as better alignment with core Spark improvements, the extra weight looks manageable. On Mon, Mar 15, 2021 at 4:45 PM Nicholas Chammas wrote: > > On

Re: Apache Spark Docker image repository

2021-03-03 Thread Ismaël Mejía
? Are there still some issues/blockers or reasons to not move forward? On Tue, Feb 18, 2020 at 2:29 PM Ismaël Mejía wrote: > > +1 to have Spark docker images for Dongjoon's arguments, having a container > based distribution is definitely something in the benefit of users and the >

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-25 Thread Ismaël Mejía
Since the TPC-DS performance tests are one of the main validation sources for regressions on Spark releases maybe it is time to automate the query outputs validation to find correctness issues eagerly (it would be also nice to validate the performance regressions but correctness >>> performance).

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-15 Thread Ismaël Mejía
Any chance that SPARK-29536 PySpark does not work with Python 3.8.0 can be backported to 2.4.7 ? This was not done for Spark 2.4.6 because it was too late on the vote process but it makes perfect sense to have this in 2.4.7. On Wed, Jul 15, 2020 at 9:07 AM Wenchen Fan wrote: > > Yea I think

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-30 Thread Ismaël Mejía
I was wondering if there is any chance that "SPARK-29536 PySpark does not work with Python 3.8.0" could get eventually backported for a future RC. Seems important enough considering that python 3.8 is now the default version for people working on the latest Ubuntu LTS. I understand however that

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-08 Thread Ismaël Mejía
+1 (non-binding) Michael's section on the trade-offs of maintaining / removing an API are one of the best reads I have seeing in this mailing list. Enthusiast +1 On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun wrote: > > This new policy has a good indention, but can we narrow down on the migration

Re: [DISCUSS] Shall we mark spark streaming component as deprecated.

2020-03-02 Thread Ismaël Mejía
Is it really ready to be deprecated? The fact that we cannot do multiple aggregations with structured streaming [1] is a serious runtime limitation that the DStream API does not have, Is it worth to deprecate without having an equivalent set of features? [1]

Re: Apache Spark Docker image repository

2020-02-18 Thread Ismaël Mejía
+1 to have Spark docker images for Dongjoon's arguments, having a container based distribution is definitely something in the benefit of users and the project too. Having this in the Apache Spark repo matters because of multiple eyes to fix/ímprove the images for the benefit of everyone. What

Re: SPIP: Spark on Kubernetes

2017-08-16 Thread Ismaël Mejía
+1 (non-binding) This is something really great to have. More schedulers and runtime environments are a HUGE win for the Spark ecosystem. Amazing work, Big kudos for the guys who created and continue working on this. On Wed, Aug 16, 2017 at 2:07 AM, lucas.g...@gmail.com

Re: spark-packages with maven

2016-07-15 Thread Ismaël Mejía
haven't yet found out how to do local >> publishing. >> >> If such a guide existed for Maven I could use it for sbt easily too :-) >> >> Ping me Ismael if you don't hear back from the group so I feel invited >> for digging into the plugin's sources. >> >> Best

spark-packages with maven

2016-07-15 Thread Ismaël Mejía
Hello, I would like to know if there is an easy way to package a new spark-package with maven, I just found this repo, but I am not an sbt user. https://github.com/databricks/sbt-spark-package One more question, is there a formal specification or documentation of what do you need to include in a