Re: ASF board report for November 2019

2019-11-11 Thread Matei Zaharia
Good catch, thanks. > On Nov 11, 2019, at 6:46 PM, Jungtaek Lim > wrote: > > nit: - The latest committer was added on Sept 4th, 2019 (Dongjoon Hyun). <= > s/committer/PMC member > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Tue, Nov 12, 2019 at 11:38 AM Matei Zaharia

Re: Ask for ARM CI for spark

2019-11-11 Thread Tianhua huang
Hi all, Spark arm jobs have built for some time, and now there are two jobs[1] spark-master-test-maven-arm and spark-master-test-python-arm , we can

Re: Is RDD thread safe?

2019-11-11 Thread Weichen Xu
Hi Chang, RDD/Dataframe is immutable and lazy computed. They are thread safe. Thanks! On Tue, Nov 12, 2019 at 12:31 PM Chang Chen wrote: > Hi all > > I meet a case where I need cache a source RDD, and then create different > DataFrame from it in different threads to accelerate query. > > I

Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
In few days, I will wrote this in our guidelines probably after rewording it a bit better: 1. Add a prefix into a test name when a PR adds a couple of tests. 2. Uses "SPARK-: test name" format. Please let me know if you have any different opinion about what/when to write the JIRA ID as the

Is RDD thread safe?

2019-11-11 Thread Chang Chen
Hi all I meet a case where I need cache a source RDD, and then create different DataFrame from it in different threads to accelerate query. I know that SparkSession is thread safe( https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure whether RDD is thread safe or not Thanks

Re: ASF board report for November 2019

2019-11-11 Thread Jungtaek Lim
nit: - The latest committer was added on Sept 4th, 2019 (Dongjoon Hyun). <= s/committer/PMC member Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Nov 12, 2019 at 11:38 AM Matei Zaharia wrote: > Hi all, > > It’s time to send our quarterly report to the ASF board. Here is my draft > — please feel

ASF board report for November 2019

2019-11-11 Thread Matei Zaharia
Hi all, It’s time to send our quarterly report to the ASF board. Here is my draft — please feel free to suggest any changes. Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python and R as well as a

Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Gengliang
+1 for making it a guideline. This is helpful when the test cases are moved to a different file. On Mon, Nov 11, 2019 at 3:23 PM Takeshi Yamamuro wrote: > +1 for having that consistent rule in test names. > This is a trivial problem though, I think documenting this rule in the > contribution

Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Takeshi Yamamuro
+1 for having that consistent rule in test names. This is a trivial problem though, I think documenting this rule in the contribution guide might be able to make reviewer overhead a little smaller. Bests, Takeshi On Tue, Nov 12, 2019 at 1:46 AM Hyukjin Kwon wrote: > Hi all, > > Maybe it's not

Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
Hi all, Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases. Currently we have many test case names with prefixes as below:

Does StreamingSymmetricHashJoinExec work with watermark? I don't think so

2019-11-11 Thread Jacek Laskowski
Hi, I think watermark does not work for StreamingSymmetricHashJoinExec because of the following: 1. leftKeys and rightKeys have no spark.watermarkDelayMs metadata entry at planning [1] 2. Since the left and right keys had no watermark delay at planning the code [2] won't find it at execution Is