PySpark documentation main page
Hi all, I am trying to write up the main page of PySpark documentation at https://github.com/apache/spark/pull/29320. While I think the current proposal might be good enough, I would like to collect more feedback about the contents, structure and image since this is the entrance page of PySpark documentation. For example, sharing a reference site is also very welcome. Let me know if any of you guys have a good idea to share. I plan to leave it open for some more days. PS: thanks @Liang-Chi Hsieh and @Sean Owen for taking a look at it quickly.
Re: Spark-submit --files option help
You can use SparkFiles.get(path) Example here https://github.com/datastax/spark-cassandra-connector/blob/master/connector/src/main/scala/com/datastax/spark/connector/cql/CassandraConnectionFactory.scala#L152 Also this is probably a better question for the user list than the dev one On Sat, Aug 1, 2020, 8:34 AM rahul c wrote: > Hi all, > > I am trying to pass some configuration files via spark-submit command in > cluster mode. > From logs I can see the files are transferred to each executors. > But how to build the absolute path of the file in the code? > > Can anyone plz guide on it with some references. > > Appreciate your help on this. > > Thanks and regards > Rahul > > > >
Spark-submit --files option help
Hi all, I am trying to pass some configuration files via spark-submit command in cluster mode. >From logs I can see the files are transferred to each executors. But how to build the absolute path of the file in the code? Can anyone plz guide on it with some references. Appreciate your help on this. Thanks and regards Rahul
Re: Contributing to JIRA Maintenance
Thank you! On Sat, 1 Aug 2020, 19:31 Takeshi Yamamuro, wrote: > Great work and thanks for your JIRA maintenance and this heads-up (sorry > for my late reply...) > Yea, I noticed that I didn't take much time recently on the JIRA side. > So, I will take more care about it from now on for the community's help. > > On Wed, Jul 29, 2020 at 10:52 AM Hyukjin Kwon wrote: > >> Yeah, to contribute to JIRA maintenance, it does not need a lot of codes >> given my experience. >> >> Just to share my own story: >> 4 years ago when I was one of contributors, I have been looking for many >> other ways around to >> contribute to Spark. I noticed Sean was making exceptional efforts in the >> JIRA maintenance >> contribution - he monitored JIRAs basically 24/7. I started to make >> sustained efforts and contributions >> there when he asked some help in the dev mailing list. I also did some >> code work but my JIRA >> maintenance contribution is also one of the important community >> activities. >> This was appropriately considered and recognised by other PMCs. >> >> The commit bit. Probably the ideal case is to have contributions in >> balance across many >> aspects. But If somebody makes a lot of sustained efforts and >> contributions to one >> aspect, this can be also the case we take into account. Yeah, I think >> Shane is a good example. >> >> >> 2020년 7월 29일 (수) 오전 2:57, Rohit Mishra 님이 작성: >> >>> Thanks Sean for your elaborate and valuable explanation. I will look >>> into it from tomorrow and will reach out if required. >>> >>> Have a good day. >>> >>> Regards, >>> Rohit Mishra >>> >>> On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen wrote: >>> To help with JIRA, I don't think you need to know a lot about the code structure. I think we're talking about more basic triage, like, is it a question that should go to the mailing list instead? is there enough detail to understand it at all? is it tagged with a few appropriate components, does its affected version make sense? Finding duplicate issues is hard but quite valuable if you can identify related issues and mark them. I can also tell you about using the JIRA Client to search for issues that don't make much sense, like, open and targeting a released version. Actually I think anyone can modify issues in JIRA, so you don't need special permission. You could consult with me or Hyukjin or dev@ after making a few changes to check if they're on the right track. iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails about changes. I don't know if it's that useful to subscribe to. Documenting the code structure - might be kind of hard in any detail, but if you put together a doc that is useful and doesn't require a lot of maintenance, that gives a good overview, we could consider adding that to the developer docs. On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra wrote: > > Hello All, > > I have recently joined the Dev mailing list to help the community. Since I am in my attempt to understand the code base before contributing, I think looking into Jira maintenance will be a good way to help. I will start looking into it. Do I need anyone’s approval? > > In case I need any help in the beginning can I mail here or there is a separate mailing id related to Jira maintenance? > > Just a trivial question- Do we have any document to give an overview of the code structure for newbie like me, I can create one if there isn’t any. > > Thanks, > Rohit Mishra > > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: >> >> Thanks for doing this - and I will say this is a great way for anyone >> out there to contribute directly to the project. Issue trackers need >> maintenance too. It's not that hard to spot basic problems with JIRAs >> and request fixes, as a way to engage the reporter usefully. >> >> I triage PRs but rarely look at JIRAs anymore, just because the volume >> and noise level is larger. But it is important. >> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon wrote: >> > >> > Hi all, >> > >> > I would like to ask for some help about JIRA maintenance contributions in Apache Spark. >> > I tend to see less and less people active in JIRA maintenance contributions. >> > >> > I have regularly checked all JIRAs and monitored them continuously for the last 4 years. >> > For the last week, I didn't have time to take a look, and I felt frustrated that there are >> > many JIRAs that look clearly needing action. Here are the examples only from the last week: >> > >> > Exact duplication: >> > Resolve one and link another one as a duplicate. >> > - https://issues.apache.org/jira/browse/SPARK-32370
Re: Contributing to JIRA Maintenance
Great work and thanks for your JIRA maintenance and this heads-up (sorry for my late reply...) Yea, I noticed that I didn't take much time recently on the JIRA side. So, I will take more care about it from now on for the community's help. On Wed, Jul 29, 2020 at 10:52 AM Hyukjin Kwon wrote: > Yeah, to contribute to JIRA maintenance, it does not need a lot of codes > given my experience. > > Just to share my own story: > 4 years ago when I was one of contributors, I have been looking for many > other ways around to > contribute to Spark. I noticed Sean was making exceptional efforts in the > JIRA maintenance > contribution - he monitored JIRAs basically 24/7. I started to make > sustained efforts and contributions > there when he asked some help in the dev mailing list. I also did some > code work but my JIRA > maintenance contribution is also one of the important community activities. > This was appropriately considered and recognised by other PMCs. > > The commit bit. Probably the ideal case is to have contributions in > balance across many > aspects. But If somebody makes a lot of sustained efforts and > contributions to one > aspect, this can be also the case we take into account. Yeah, I think > Shane is a good example. > > > 2020년 7월 29일 (수) 오전 2:57, Rohit Mishra 님이 작성: > >> Thanks Sean for your elaborate and valuable explanation. I will look into >> it from tomorrow and will reach out if required. >> >> Have a good day. >> >> Regards, >> Rohit Mishra >> >> On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen wrote: >> >>> To help with JIRA, I don't think you need to know a lot about the code >>> structure. I think we're talking about more basic triage, like, is it >>> a question that should go to the mailing list instead? is there enough >>> detail to understand it at all? is it tagged with a few appropriate >>> components, does its affected version make sense? Finding duplicate >>> issues is hard but quite valuable if you can identify related issues >>> and mark them. >>> >>> I can also tell you about using the JIRA Client to search for issues >>> that don't make much sense, like, open and targeting a released >>> version. >>> >>> Actually I think anyone can modify issues in JIRA, so you don't need >>> special permission. You could consult with me or Hyukjin or dev@ after >>> making a few changes to check if they're on the right track. >>> >>> iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails >>> about changes. I don't know if it's that useful to subscribe to. >>> >>> Documenting the code structure - might be kind of hard in any detail, >>> but if you put together a doc that is useful and doesn't require a lot >>> of maintenance, that gives a good overview, we could consider adding >>> that to the developer docs. >>> >>> >>> >>> On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra >>> wrote: >>> > >>> > Hello All, >>> > >>> > I have recently joined the Dev mailing list to help the community. >>> Since I am in my attempt to understand the code base before contributing, I >>> think looking into Jira maintenance will be a good way to help. I will >>> start looking into it. Do I need anyone’s approval? >>> > >>> > In case I need any help in the beginning can I mail here or there is a >>> separate mailing id related to Jira maintenance? >>> > >>> > Just a trivial question- Do we have any document to give an overview >>> of the code structure for newbie like me, I can create one if there isn’t >>> any. >>> > >>> > Thanks, >>> > Rohit Mishra >>> > >>> > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: >>> >> >>> >> Thanks for doing this - and I will say this is a great way for anyone >>> >> out there to contribute directly to the project. Issue trackers need >>> >> maintenance too. It's not that hard to spot basic problems with JIRAs >>> >> and request fixes, as a way to engage the reporter usefully. >>> >> >>> >> I triage PRs but rarely look at JIRAs anymore, just because the volume >>> >> and noise level is larger. But it is important. >>> >> >>> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon >>> wrote: >>> >> > >>> >> > Hi all, >>> >> > >>> >> > I would like to ask for some help about JIRA maintenance >>> contributions in Apache Spark. >>> >> > I tend to see less and less people active in JIRA maintenance >>> contributions. >>> >> > >>> >> > I have regularly checked all JIRAs and monitored them continuously >>> for the last 4 years. >>> >> > For the last week, I didn't have time to take a look, and I felt >>> frustrated that there are >>> >> > many JIRAs that look clearly needing action. Here are the examples >>> only from the last week: >>> >> > >>> >> > Exact duplication: >>> >> > Resolve one and link another one as a duplicate. >>> >> > - https://issues.apache.org/jira/browse/SPARK-32370 >>> >> > - https://issues.apache.org/jira/browse/SPARK-32369 >>> >> > >>> >> > Different languages: >>> >> > Ask English translations which dev people use to communicate. >>> >> > If the