Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Dhrubajyoti Hati
Just checked from where the script is submitted i.e. wrt Driver, the python env are different. Jupyter one is running within a the virtual environment which is Python 2.7.5 and the spark-submit one uses 2.6.6. But the executors have the same python version right? I tried doing a spark-submit from

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Abdeali Kothari
Maybe you can try running it in a python shell or jupyter-console/ipython instead of a spark-submit and check how much time it takes too. Compare the env variables to check that no additional env configuration is present in either environment. Also is the python environment for both the exact

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Stephen Boesch
Ok. Can't think of why that would happen. Am Di., 10. Sept. 2019 um 20:26 Uhr schrieb Dhrubajyoti Hati < dhruba.w...@gmail.com>: > As mentioned in the very first mail: > * same cluster it is submitted. > * from same machine they are submitted and also from same user > * each of them has 128

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Dhrubajyoti Hati
As mentioned in the very first mail: * same cluster it is submitted. * from same machine they are submitted and also from same user * each of them has 128 executors and 2 cores per executor with 8Gigs of memory each and both of them are getting that while running to clarify more let me quote what

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Stephen Boesch
Sounds like you have done your homework to properly compare . I'm guessing the answer to the following is yes .. but in any case: are they both running against the same spark cluster with the same configuration parameters especially executor memory and number of workers? Am Di., 10. Sept. 2019

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Dhrubajyoti Hati
No, i checked for that, hence written "brand new" jupyter notebook. Also the time taken by both are 30 mins and ~3hrs as i am reading a 500 gigs compressed base64 encoded text data from a hive table and decompressing and decoding in one of the udfs. Also the time compared is from Spark UI not how

Re: Request for contributor permissions

2019-09-10 Thread Takeshi Yamamuro
Hi, Alaa Thanks for you contact! You can file a jira without any permission. btw, have you checked the contribution guide? https://spark.apache.org/contributing.html You'd be better to check that before contributions. Bests, Takeshi On Wed, Sep 11, 2019 at 4:37 AM Alaa Zbair wrote: > Hello

Request for contributor permissions

2019-09-10 Thread Alaa Zbair
Hello dev, I am interested in contributing in the Spark project, please add me to the contributors list. My Jira username is: Chilio Thanks. Alaa Zbair.

[jira] Lantao Jin shared "SPARK-29038: SPIP: Support Spark Materialized View" with you

2019-09-10 Thread Lantao Jin (Jira)
Lantao Jin shared an issue with you SPIP: Support Spark Materialized View > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL:

Re: Welcoming some new committers and PMC members

2019-09-10 Thread Stavros Kontopoulos
Congrats! Well deserved. On Tue, Sep 10, 2019 at 1:20 PM Driesprong, Fokko wrote: > Congrats all, well deserved! > > > Cheers, Fokko > > Op di 10 sep. 2019 om 10:21 schreef Gabor Somogyi < > gabor.g.somo...@gmail.com>: > >> Congrats Guys! >> >> G >> >> >> On Tue, Sep 10, 2019 at 2:32 AM Matei

Re: Welcoming some new committers and PMC members

2019-09-10 Thread Driesprong, Fokko
Congrats all, well deserved! Cheers, Fokko Op di 10 sep. 2019 om 10:21 schreef Gabor Somogyi : > Congrats Guys! > > G > > > On Tue, Sep 10, 2019 at 2:32 AM Matei Zaharia > wrote: > >> Hi all, >> >> The Spark PMC recently voted to add several new committers and one PMC >> member. Join me in

[DISCUSS][SPIP][SPARK-29031] Materialized columns

2019-09-10 Thread Jason Guo
Hi, I'd like to propose a feature name materialized column. This feature will boost queries on complex type columns. https://docs.google.com/document/d/186bzUv4CRwoYY_KliNWTexkNCysQo3VUTLQVrVijyl4/edit?usp=sharing *Background* In data warehouse domain, there is a common requirement to add new

Re: Welcoming some new committers and PMC members

2019-09-10 Thread Gabor Somogyi
Congrats Guys! G On Tue, Sep 10, 2019 at 2:32 AM Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers and one PMC > member. Join me in welcoming them to their new roles! > > New PMC member: Dongjoon Hyun > > New committers: Ryan Blue, Liang-Chi Hsieh,

RE: Welcoming some new committers and PMC members

2019-09-10 Thread Dilip Biswal
Congratulations !! Very well deserved.   -- Dilip   - Original message -From: "Kazuaki Ishizaki" To: Matei Zaharia Cc: dev Subject: [EXTERNAL] Re: Welcoming some new committers and PMC membersDate: Mon, Sep 9, 2019 9:25 PM Congrats! Well deserved.Kazuaki Ishizaki,From:        Matei Zaharia