I’m +1 on switching to Python by default given what I see at the majority of 
users. I like the idea of investigating a way to save the language choice in a 
cookie and to switch all code examples on the page to a new language when you 
click one of the tabs. We used to have the switching behavior at least (e.g. 
see this archived page from 2016 
https://web.archive.org/web/20160308055505/https://spark.apache.org/docs/latest/quick-start.html
 
<https://web.archive.org/web/20160308055505/https://spark.apache.org/docs/latest/quick-start.html>),
 so I’m not sure what happened to that. We might never have had the cookie, but 
that is worth investigating.

Matei

> On Feb 23, 2023, at 11:31 PM, Santosh Pingale 
> <santosh.ping...@adyen.com.invalid> wrote:
> 
> Yes, I definitely agree and +1 to the proposal (FWIW). 
> 
> I was looking at Dongjoon's comments which made a lot of sense to me and 
> trying to come up with an approach that provides smooth segway to python as 
> first tab later on. But this is mostly guess work as I do not personally know 
> the actual user behaviour on docs site.
> 
> On Fri, Feb 24, 2023, 8:01 AM Hyukjin Kwon <gurwls...@gmail.com 
> <mailto:gurwls...@gmail.com>> wrote:
> That sounds good to have that especially given that it will allow more 
> flexibility to the users.
> But I think that's slightly orthogonal to this proposal since this proposal 
> is more about the default (before users take an action).
> 
> 
> On Fri, 24 Feb 2023 at 15:35, Santosh Pingale <santosh.ping...@adyen.com 
> <mailto:santosh.ping...@adyen.com>> wrote:
> Very interesting and user focused discussion, thanks for the proposal.
> 
> Would it be better if we rather let users set the preference about the 
> language they want to see first in the code examples? This preference can be 
> easily stored on the browser side and used to decide ordering. This is inline 
> with freedom users have with spark today.
> 
> 
> On Fri, Feb 24, 2023, 4:46 AM Allan Folting <afolting...@gmail.com 
> <mailto:afolting...@gmail.com>> wrote:
> I think this needs to be consistently done on all relevant pages and my 
> intent is to do that work in time for when it is first released.
> I started with the "Spark SQL, DataFrames and Datasets Guide" page to break 
> it up into multiple, scoped PRs.
> I should have made that clear before.
> 
> I think it's a great idea to have an umbrella JIRA for this to outline the 
> full scope and track overall progress and I'm happy to create it.
> 
> I can't speak on behalf of all Scala users of course, but I don't think this 
> change makes Scala appear as a 2nd class citizen, like I don't think of 
> Python as a 2nd class citizen because it is not first currently, but it does 
> recognize that Python is more broadly popular today.
> 
> Thanks,
> Allan
> 
> On Thu, Feb 23, 2023 at 6:55 PM Dongjoon Hyun <dongjoon.h...@gmail.com 
> <mailto:dongjoon.h...@gmail.com>> wrote:
> Thank you all.
> 
> Yes, attracting more Python users and being more Python user-friendly is 
> always good.
> 
> Basically, SPARK-42493 is proposing to introduce intentional inconsistency to 
> Apache Spark documentation.
> 
> The inconsistency from SPARK-42493 might give Python users the following 
> questions first.
> 
> - Why not RDD pages which are the heart of Apache Spark? Is Python not good 
> in RDD?
> - Why not ML and Structured Streaming pages when DATA+AI Summit focuses on ML 
> heavily?
> 
> Also, more questions to the Scala users.
> - Is Scala language stepping down to the 2nd citizen language?
> - What about Scala 3?
> 
> Of course, I understand SPARK-42493 has specific scopes 
> (SQL/Dataset/Dataframe) and didn't mean anything like the above at all.
> However, if SPARK-42493 is emphasized as "the first step" to introduce that 
> inconsistency, I'm wondering 
> - What direction we are heading?
> - What is the next target scope?
> - When it will be achieved (or completed)?
> - Or, is the goal to be permanently inconsistent in terms of the 
> documentation?
> 
> It's unclear even in the documentation-only scope. If we are expecting more 
> and more subtasks during Apache Spark 3.5 timeframe, shall we have an 
> umbrella JIRA?
> 
> Bests,
> Dongjoon.
> 
> 
> On Thu, Feb 23, 2023 at 6:15 PM Allan Folting <afolting...@gmail.com 
> <mailto:afolting...@gmail.com>> wrote:
> Thanks a lot for the questions and comments/feedback!
> 
> To address your questions Dongjoon, I do not intend for these updates to the 
> documentation to be tied to the potential changes/suggestions you ask about.
> 
> In other words, this proposal is only about adjusting the documentation to 
> target the majority of people reading it - namely the large and growing 
> number of Python users - and new users in particular as they are often 
> already familiar with and have a preference for Python when evaluating or 
> starting to use Spark.
> 
> While we may want to strengthen support for Python in other ways, I think 
> such efforts should be tracked separately from this.
> 
> Allan
> 
> On Thu, Feb 23, 2023 at 1:44 AM Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> If this is not just flip flopping the document pages and involves other 
> changes, then a proper impact analysis needs to be done to assess the efforts 
> involved. Personally I don't think it really matters.
> 
> HTH
> 
> 
>    view my Linkedin profile 
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh 
> <https://en.everybodywiki.com/Mich_Talebzadeh>
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Thu, 23 Feb 2023 at 01:40, Hyukjin Kwon <gurwls...@gmail.com 
> <mailto:gurwls...@gmail.com>> wrote:
> > 1. Does this suggestion imply Python API implementation will be the new 
> > blocker in the future in terms of feature parity among languages? Until 
> > now, Python API feature parity was one of the audit items because it's not 
> > enforced. In other words, Scala and Java have been the full feature because 
> > they are the underlying main developer languages while Python/R/SQL 
> > environments were the nice-to-have.
> 
> I think it wouldn't be treated as a blocker .. but I do believe we have added 
> all new features into the Python side for the last couple of releases. So, I 
> wouldn't worry about this at this moment - we have been doing fine in terms 
> of feature parity.
> 
> > 2. Does this suggestion assume that the Python environment is easier for 
> > users than Scala/Java always? Given that we support Python 3.8 to 3.11, the 
> > support matrix for Python library dependency is a problem for the Apache 
> > Spark community to solve in order to claim that. As we say at SPARK-41454, 
> > Python language also introduces breaking changes to us historically and we 
> > have many `Pinned` python libraries issues.
> 
> Yes. In fact, regardless of this change, I do believe we should test more 
> versions, etc. At least scheduled jobs like we're doing JDK and Scala 
> versions.
> 
> 
> FWIW, my take about this change is: people use Python and PySpark more 
> (according to the chart and stats provided) so let's put those examples first 
> :-).
> 
> 
> On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com 
> <mailto:dongjoon.h...@gmail.com>> wrote:
> I have two questions to clarify the scope and boundaries.
> 
> 1. Does this suggestion imply Python API implementation will be the new 
> blocker in the future in terms of feature parity among languages? Until now, 
> Python API feature parity was one of the audit items because it's not 
> enforced. In other words, Scala and Java have been the full feature because 
> they are the underlying main developer languages while Python/R/SQL 
> environments were the nice-to-have.
> 
> 2. Does this suggestion assume that the Python environment is easier for 
> users than Scala/Java always? Given that we support Python 3.8 to 3.11, the 
> support matrix for Python library dependency is a problem for the Apache 
> Spark community to solve in order to claim that. As we say at SPARK-41454, 
> Python language also introduces breaking changes to us historically and we 
> have many `Pinned` python libraries issues.
> 
> Changing documentation is easy, but I hope we can give clear communication 
> and direction in this effort because this is one of the most user-facing 
> changes.
> 
> Dongjoon.
> 
> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com <mailto:416161...@qq.com> 
> <ruife...@foxmail.com <mailto:ruife...@foxmail.com>> wrote:
> +1 LGTM
> 
>       
> Ruifeng Zheng
> ruife...@foxmail.com
>  
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=>
>  
> 
> 
> ------------------ Original ------------------
> From: "Xinrong Meng" <xinrong.apa...@gmail.com 
> <mailto:xinrong.apa...@gmail.com>>;
> Date: Thu, Feb 23, 2023 09:17 AM
> To: "Allan Folting"<afolting...@gmail.com <mailto:afolting...@gmail.com>>;
> Cc: "dev"<dev@spark.apache.org <mailto:dev@spark.apache.org>>;
> Subject: Re: [DISCUSS] Show Python code examples first in Spark documentation
> 
> +1 Good idea!
> 
> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson <jackagood...@gmail.com 
> <mailto:jackagood...@gmail.com>> wrote:
> Good idea, at the company I work at we discussed using Scala as our primary 
> language because technically it is slightly stronger than python but 
> ultimately chose python in the end as it’s easier for other devs to be on 
> boarded to our platform and future hiring for the team etc would be easier 
> 
> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <gurwls...@gmail.com 
> <mailto:gurwls...@gmail.com>> wrote:
> +1 I like this idea too.
> 
> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting <afolting...@gmail.com 
> <mailto:afolting...@gmail.com>> wrote:
> Hi all,
> 
> I would like to propose that we show Python code examples first in the Spark 
> documentation where we have multiple programming language examples.
> An example is on the Quick Start page:
> https://spark.apache.org/docs/latest/quick-start.html 
> <https://spark.apache.org/docs/latest/quick-start.html>
> 
> I propose this change because Python has become more popular than the other 
> languages supported in Apache Spark. There are a lot more users of Spark in 
> Python than Scala today and Python attracts a broader set of new users.
> For Python usage data, see https://www.tiobe.com/tiobe-index/ 
> <https://www.tiobe.com/tiobe-index/> and 
> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava 
> <https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava>.
> 
> Also, this change aligns with Python already being the first tab on our home 
> page:
> https://spark.apache.org/ <https://spark.apache.org/>
> 
> Anyone who wants to use another language can still just click on the other 
> tabs.
> 
> I created a draft PR for the Spark SQL, DataFrames and Datasets Guide page as 
> a first step:
> https://github.com/apache/spark/pull/40087 
> <https://github.com/apache/spark/pull/40087>
> 
> 
> I would appreciate it if you could share your thoughts on this proposal.
> 
> 
> Thanks a lot,
> Allan Folting

Reply via email to