Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-16 Thread Sean Owen
the order in which Maven executes the test cases in > the `connect` module. > > > > I have submitted a backport PR > <https://github.com/apache/spark/pull/45141> to branch-3.5, and if > necessary, we can merge it to fix this test issue. > > > > Jie Yang > > >

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Sean Owen
Is anyone seeing this Spark Connect test failure? then again, I have some weird issue with this env that always fails 1 or 2 tests that nobody else can replicate. - Test observe *** FAILED *** == FAIL: Plans do not match === !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS

Re: Removing Kinesis in Spark 4

2024-01-20 Thread Sean Owen
I'm not aware of much usage. but that doesn't mean a lot. FWIW, in the past month or so, the Kinesis docs page got about 700 views, compared to about 1400 for Kafka

Re: Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-08 Thread Sean Owen
Agreed, that looks wrong. From the code, it seems that "timezone" is only used for testing, though apparently no test caught this. I'll submit a PR to patch it in any event: https://github.com/apache/spark/pull/44619 On Mon, Jan 8, 2024 at 1:33 AM Janda Martin wrote: > I think that >

Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-04 Thread Sean Owen
It already does. I think that's not the same idea? On Mon, Dec 4, 2023, 8:12 PM Almog Tavor wrote: > I think Spark should start shading it’s problematic deps similar to how > it’s done in Flink > > On Mon, 4 Dec 2023 at 2:57 Sean Owen wrote: > >> I am not sure we can con

Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-03 Thread Sean Owen
I am not sure we can control that - the Scala _x.y suffix has particular meaning in the Scala ecosystem for artifacts and thus the naming of .jar files. And we need to work with the Scala ecosystem. What can't handle these files, Spring Boot? does it somehow assume the .jar file name relates to

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Sean Owen
I think it's the same, and always has been - yes you don't have a guaranteed ordering unless an operation produces a specific ordering. Could be the result of order by, yes; I believe you would be guaranteed that reading input files results in data in the order they appear in the file, etc. 1:1

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-31 Thread Sean Owen
I think you're talking past Hyukjin here. I think the response is: none of that is managed by Pyspark now, and this proposal does not change that. Your current interpreter and environment is used to execute the stored procedure, which is just Python code. It's on you to bring an environment that

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Sean Owen
to verify? > > > > Thanks, > > Jie Yang > > > > *发件人**: *Dipayan Dev > *日期**: *2023年8月30日 星期三 17:01 > *收件人**: *Sean Owen > *抄送**: *Yuanjian Li , Spark dev list < > dev@spark.apache.org> > *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC3) > >

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-29 Thread Sean Owen
It looks good except that I'm getting errors running the Spark Connect tests at the end (Java 17, Scala 2.13) It looks like I missed something necessary to build; is anyone getting this? [ERROR] [Error]

Re: [VOTE] Release Apache Spark 3.5.0 (RC2)

2023-08-19 Thread Sean Owen
+1 this looks better to me. Works with Scala 2.13 / Java 17 for me. On Sat, Aug 19, 2023 at 3:23 AM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC2) as Apache Spark > version 3.5.0. > > The vote is open until 11:59pm Pacific time Aug 23th and passes if a > majority +1

Re: Question about ARRAY_INSERT between Spark and Databricks

2023-08-13 Thread Sean Owen
There shouldn't be any difference here. In fact, I get the results you list for 'spark' from Databricks. It's possible the difference is a bug fix along the way that is in the Spark version you are using locally but not in the DBR you are using. But, yeah seems to work as. you say. If you're

What else could be removed in Spark 4?

2023-08-07 Thread Sean Owen
While we're noodling on the topic, what else might be worth removing in Spark 4? For example, looks like we're finally hitting problems supporting Java 8 through 21 all at once, related to Scala 2.13.x updates. It would be reasonable to require Java 11, or even 17, as a baseline for the

Re: [VOTE] Release Apache Spark 3.5.0 (RC1)

2023-08-06 Thread Sean Owen
Aug 5, 2023 at 5:42 PM Sean Owen wrote: > I'm still testing other combinations, but it looks like tests fail on Java > 17 after building with Java 8, which should be a normal supported > configuration. > This is described at https://github.com/apache/spark/pull/41943 and looks > l

Re: [VOTE] Release Apache Spark 3.5.0 (RC1)

2023-08-05 Thread Sean Owen
I'm still testing other combinations, but it looks like tests fail on Java 17 after building with Java 8, which should be a normal supported configuration. This is described at https://github.com/apache/spark/pull/41943 and looks like it is resolved by moving back to Scala 2.13.8 for now. Unless

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Sean Owen
+1 I think that porting the package 'as is' into Spark is probably worthwhile. That's relatively easy; the code is already pretty battle-tested and not that big and even originally came from Spark code, so is more or less similar already. One thing it never got was DSv2 support, which means XML

Re: Spark 3.0.0 EOL

2023-07-26 Thread Sean Owen
There aren't "LTS" releases, though you might expect the last 3.x release will see maintenance releases longer. See end of https://spark.apache.org/versioning-policy.html On Wed, Jul 26, 2023 at 3:56 AM Manu Zhang wrote: > Will Apache Spark 3.5 be a LTS version? > > Thanks, > Manu > > On Mon,

Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-16 Thread Sean Owen
On Fri, Jun 16, 2023 at 3:58 PM Dongjoon Hyun wrote: > I started the thread about already publicly visible version issues > according to the ASF PMC communication guideline. It's no confidential, > personal, or security-related stuff. Are you insisting this is confidential? > Discussion about a

Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-16 Thread Sean Owen
As we noted in the last thread, this discussion should have been on private@ to begin with, but, the ship has sailed. You are suggesting that non-PMC members vote on whether the PMC has to do something? No, that's not how anything works here. It's certainly the PMC that decides what to put in the

Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-16 Thread Sean Owen
What does a vote on dev@ mean? did you mean this for the PMC list? Dongjoon - this offers no rationale about "why". The more relevant thread begins here: https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb but it likewise never got to connecting a specific observation to policy.

Re: JDK version support policy?

2023-06-08 Thread Sean Owen
in Spark 4, just > thought I'd bring this issue to your attention. > > Best Regards, Martin > -- > *From:* Jungtaek Lim > *Sent:* Wednesday, June 7, 2023 23:19 > *To:* Sean Owen > *Cc:* Dongjoon Hyun ; Holden Karau < > hol...@pigscanfly.ca&

Re: JDK version support policy?

2023-06-07 Thread Sean Owen
2:42:19 yangjie01 wrote: >> > +1 on dropping Java 8 in Spark 4.0, and I even hope Spark 4.0 can only >> support Java 17 and the upcoming Java 21. >> > >> > 发件人: Denny Lee >> > 日期: 2023年6月7日 星期三 07:10 >> > 收件人: Sean Owen >> > 抄送: Dav

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Sean Owen
Hi Dongjoon, I think this conversation is not advancing anymore. I personally consider the matter closed unless you can find other support or respond with more specifics. While this perhaps should be on private@, I think it's not wrong as an instructive discussion on dev@. I don't believe you've

Re: ASF policy violation and Scala version issues

2023-06-07 Thread Sean Owen
(With consent, shall we move this to the PMC list?) No, I don't think that's what this policy says. First, could you please be more specific here? why do you think a certain release is at odds with this? Because so far you've mentioned, I think, not taking a Scala maintenance release update.

Re: JDK version support policy?

2023-06-06 Thread Sean Owen
I haven't followed this discussion closely, but I think we could/should drop Java 8 in Spark 4.0, which is up next after 3.5? On Tue, Jun 6, 2023 at 2:44 PM David Li wrote: > Hello Spark developers, > > I'm from the Apache Arrow project. We've discussed Java version support > [1], and

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
I think the issue is whether a distribution of Spark is so materially different from OSS that it causes problems for the larger community of users. There's a legitimate question of whether such a thing can be called "Apache Spark + changes", as describing it that way becomes meaningfully

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
On Mon, Jun 5, 2023 at 12:01 PM Dongjoon Hyun wrote: > 1. For the naming, yes, but the company should use different version > numbers instead of the exact "3.4.0". As I shared the screenshot in my > previous email, the company exposes "Apache Spark 3.4.0" exactly because > they build their

Re: ASF policy violation and Scala version issues

2023-06-05 Thread Sean Owen
1/ Regarding naming - I believe releasing "Apache Foo X.Y + patches" is acceptable, if it is substantially Apache Foo X.Y. This is common practice for downstream vendors. It's fair nominative use. The principle here is consumer confusion. Is anyone substantially misled? Here I don't think so. I

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Sean Owen
It does seem risky; there are still likely libs out there that don't cross compile for 2.13. I would make it the default at 4.0, myself. On Mon, May 29, 2023 at 7:16 PM Hyukjin Kwon wrote: > While I support going forward with a higher version, actually using Scala > 2.13 by default is a big

Re: Spark 3.4.0 with Hadoop2.7 cannot be downloaded

2023-04-20 Thread Sean Owen
We just removed it now, yes. On Thu, Apr 20, 2023 at 9:08 AM Emil Ejbyfeldt wrote: > Hi, > > I think this is expected as it was dropped from the release process in > https://issues.apache.org/jira/browse/SPARK-40651 > > Also I don't see a Hadoop2.7 option when selecting Spark 3.4.0 on >

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Sean Owen
+1 from me On Sun, Apr 9, 2023 at 7:19 PM Dongjoon Hyun wrote: > I'll start with my +1. > > I verified the checksum, signatures of the artifacts, and documentations. > Also, ran the tests with YARN and K8s modules. > > Dongjoon. > > On 2023/04/09 23:46:10 Dongjoon Hyun wrote: > > Please vote on

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-08 Thread Sean Owen
+1 form me, same result as last time. On Fri, Apr 7, 2023 at 6:30 PM Xinrong Meng wrote: > Please vote on releasing the following candidate(RC7) as Apache Spark > version 3.4.0. > > The vote is open until 11:59pm Pacific time *April 12th* and passes if a > majority +1 PMC votes are cast, with a

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-03-30 Thread Sean Owen
+1 same result from me as last time. On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng wrote: > Please vote on releasing the following candidate(RC5) as Apache Spark > version 3.4.0. > > The vote is open until 11:59pm Pacific time *April 4th* and passes if a > majority +1 PMC votes are cast, with a

Re: [VOTE] Release Apache Spark 3.4.0 (RC3)

2023-03-09 Thread Sean Owen
not in the AS-IS commit log status because it's screwed already > as Emil wrote. > Did you check the branch-3.2 commit log, Sean? > > Dongjoon. > > > On Thu, Mar 9, 2023 at 11:42 AM Sean Owen wrote: > >> We can just push the tags onto the branches as needed right? No need to >>

Re: [VOTE] Release Apache Spark 3.4.0 (RC3)

2023-03-09 Thread Sean Owen
We can just push the tags onto the branches as needed right? No need to roll a new release On Thu, Mar 9, 2023, 1:36 PM Dongjoon Hyun wrote: > Yes, I also confirmed that the v3.4.0-rc3 tag is invalid. > > I guess we need RC4. > > Dongjoon. > > On Thu, Mar 9, 2023 at 7:13 AM Emil Ejbyfeldt >

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-03 Thread Sean Owen
path get set up differently when running via > SBT vs. Maven? > > On Thu, Mar 2, 2023 at 5:37 PM Sean Owen wrote: > >> Thanks, that's good to know. The workaround (deleting the thriftserver >> target dir) works for me. Who knows? >> >> But I'm als

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Sean Owen
/sbt/issues/6183>. > > One thing that I did find to help was to > delete sql/hive-thriftserver/target between building Spark and running the > tests. This helps in my builds where the issue only occurs during the > testing phase and not during the initial build phase, but of cours

Re: [VOTE] Release Apache Spark 3.4.0 (RC2)

2023-03-02 Thread Sean Owen
Has anyone seen this behavior -- I've never seen it before. The Hive thriftserver module for me just goes into an infinite loop when running tests: ... [INFO] done compiling [INFO] compiling 22 Scala sources and 24 Java sources to

Re: [Question] LimitedInputStream license issue in Spark source.

2023-03-01 Thread Sean Owen
Right, it contains ALv2 licensed code attributed to two authors - some is from Guava, some is from Apache Spark contributors. I thought this is how we should handle this. It's not feasible to go line by line and say what came from where. On Wed, Mar 1, 2023 at 1:33 AM Dongjoon Hyun wrote: > May

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-22 Thread Sean Owen
FWIW I agree with this. On Wed, Feb 22, 2023 at 2:59 PM Allan Folting wrote: > Hi all, > > I would like to propose that we show Python code examples first in the > Spark documentation where we have multiple programming language examples. > An example is on the Quick Start page: >

Re: [DISCUSS] Make release cadence predictable

2023-02-15 Thread Sean Owen
wait for next releases more easily. > > In addition, I want to add the first RC1 date requirement because RC1 > always did a great job for us. > > I guess `branch-cut + 1M (no later than 1month)` could be the reasonable > deadline. > > Thanks, > Dongjoon. > > > O

Re: [DISCUSS] Make release cadence predictable

2023-02-14 Thread Sean Owen
I'm fine with shifting to a stricter cadence-based schedule. Sometimes, it'll mean some significant change misses a release rather than delays it. If people are OK with that discipline, sure. A hard 6-month cycle would mean the minor releases are more frequent and have less change in them. That's

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Sean Owen
Agree, just, if it's such a tiny change, and it actually fixes the issue, maybe worth getting that into 3.3.x. I don't feel strongly. On Mon, Feb 13, 2023 at 11:19 AM L. C. Hsieh wrote: > If it is not supported in Spark 3.3.x, it looks like an improvement at > Spark 3.4. > For such cases we

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Sean Owen
? When I use the latest >>> Python 3.11, I can reproduce similar test failures (43 tests of sql module >>> fail), but when I use python 3.10, they will succeed >>> >>> >>> >>> YangJie >>> >>> >>> >>> *发件人**: *

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-11 Thread Sean Owen
+1 The tests and all results were the same as ever for me (Java 11, Scala 2.13, Ubuntu 22.04) I also didn't see that issue ... maybe somehow locale related? which could still be a bug. On Sat, Feb 11, 2023 at 8:49 PM L. C. Hsieh wrote: > Thank you for testing it. > > I was going to run it again

Re: Building Spark to run PySpark Tests?

2023-01-19 Thread Sean Owen
enJDK 64-Bit Server VM Homebrew (build 11.0.17+0, mixed mode) > > > > OS > > Ventura 13.1 (22C65) > > > Best, > > > Adam Chhina > > On Jan 18, 2023, at 6:50 PM, Sean Owen wrote: > > Release _branches_ are tested as commits arrive to the branch, ye

Re: Can you create an apache jira account for me? Thanks very much!

2023-01-19 Thread Sean Owen
I can help offline. Send me your preferred JIRA user name. On Thu, Jan 19, 2023 at 7:12 AM Wei Yan wrote: > When I tried to sign up through this site: > https://issues.apache.org/jira/secure/Signup!default.jspa > I got an error message:"Sorry, you can't sign up to this Jira site at the > moment

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
java_server > self.socket.connect((self.java_address, self.java_port)) > ConnectionRefusedError: [Errno 61] Connection refused > > ------ > Ran 7 tests in 12.950s > > FAILED (errors=7) > sys:1: ResourceWarning: unclosed f

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
b spark-321 v3.2.1 > > with > git clone --branch branch-3.2 https://github.com/apache/spark.git > This will give you branch 3.2 as today, what I suppose you call upstream > > https://github.com/apache/spark/commits/branch-3.2 > and right now all tests in github action are passed

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Sean Owen
Never seen those, but it's probably a difference in pandas, numpy versions. You can see the current CICD test results in GitHub Actions. But, you want to use release versions, not an RC. 3.2.1 is not the latest version, and it's possible the tests were actually failing in the RC. On Wed, Jan 18,

Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-15 Thread Sean Owen
+1 from me, at least from my testing. Java 8 + Scala 2.12 and Java 8 + Scala 2.13 worked for me, and I didn't see a test hang. I am testing with Python 3.10 FWIW. On Tue, Nov 15, 2022 at 6:37 AM Yang,Jie(INF) wrote: > Hi, all > > > > I test v3.2.3 with following command: > > > > ``` > >

Re: CVE-2022-42889

2022-10-27 Thread Sean Owen
ement about this from > Spark? > > We weren’t able to find references to 2022-42889 here: > https://spark.apache.org/security.html (likely because Spark determined > it is not affected?) > > > > *From:* Sean Owen > *Sent:* Thursday, October 27, 2022 10:27 AM

Re: CVE-2022-42889

2022-10-27 Thread Sean Owen
Probably a few months between maintenance releases. It does not appear to affect Spark, however. On Thu, Oct 27, 2022 at 9:24 AM Pastrana, Rodrigo (RIS-BCT) wrote: > Hello, > > This issue (SPARK-40801) > which addresses > CVE-2022-42889

Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Sean Owen
OK by me, if someone is willing to drive it. On Tue, Oct 18, 2022 at 11:47 AM Chao Sun wrote: > Hi All, > > It's been more than 3 months since 3.2.2 (tagged at Jul 11) was > released There are now 66 patches accumulated in branch-3.2, including > 2 correctness issues. > > Is it a good time to

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-17 Thread Sean Owen
+1 from me, same as last time On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.1. > > The vote is open until 11:59pm Pacific time October 21th and passes if a > majority +1 PMC votes are cast, with a minimum of

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-10-11 Thread Sean Owen
ec;2.3.7!hive-exec.jar >>> >>> I worked around it by adding them locally explicitly - we should >>> probably add them as test dependency ? >>> Not sure if this changed in this release though (I had cleaned my local >>> .m2 recently) >>> >

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

2022-10-05 Thread Sean Owen
I'm OK with this. It simplifies maintenance a bit, and specifically may allow us to finally move off of the ancient version of Guava (?) On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun wrote: > Hi, All. > > I'm wondering if the following Apache Spark Hadoop2 Binary Distribution > is still used by

Re: [VOTE] Release Spark 3.3.1 (RC2)

2022-09-28 Thread Sean Owen
+1 from me, same result as last RC. On Wed, Sep 28, 2022 at 12:21 AM Yuming Wang wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.1. > > The vote is open until 11:59pm Pacific time October 3th and passes if a > majority +1 PMC votes are cast, with a

Re: Why are hash functions seeded with 42?

2022-09-26 Thread Sean Owen
aps it’s a > nod to Douglas Adams (author of The Hitchhiker’s Guide to the Galaxy). > > > https://news.mit.edu/2019/answer-life-universe-and-everything-sum-three-cubes-mathematics-0910 > > On Sep 26, 2022, at 16:59, Sean Owen wrote: > >  > OK, it came to my atten

Why are hash functions seeded with 42?

2022-09-26 Thread Sean Owen
OK, it came to my attention today that hash functions in spark, like xxhash64, actually always seed with 42: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L655 This is an issue if you want the hash of some value in

Re: [VOTE] Release Spark 3.3.1 (RC1)

2022-09-17 Thread Sean Owen
+1 LGTM. I tested Scala 2.13 + Java 11 on Ubuntu 22.04. I get the same results as usual. On Sat, Sep 17, 2022 at 2:42 AM Yuming Wang wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.1. > > The vote is open until 11:59pm Pacific time September 22th and

Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Sean Owen
Yeah we're not going to make convenience binaries for all possible combinations. It's a pretty good assumption that anyone moving to later Scala versions is also off old Hadoop versions. You can of course build the combo you like. On Wed, Sep 14, 2022 at 11:26 AM Denis Bolshakov wrote: >

Re: Support for spark-packages.org

2022-09-13 Thread Sean Owen
I think that in practice it is unsupported at this point. I'd just release your packages on Github / Maven Central / Pypi. On Tue, Sep 13, 2022 at 3:36 AM Enrico Minack wrote: > Hi devs, > > I understand that spark-packages.org is not associated with Apache and > Apache Spark, but hosted by

Re: Java object serialization error, java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible

2022-08-25 Thread Sean Owen
This suggests you have mixed two versions of Spark libraries. You probably packaged Spark itself in your Spark app? On Thu, Aug 25, 2022 at 4:56 PM Elliot Metsger wrote: > Elliot Metsger > 9:48 AM (7 hours ago) > to dev > Howdy folks, > > Relative newbie to Spark, and super new to Beam. (I've

Re: Update Spark 3.4 Release Window?

2022-07-20 Thread Sean Owen
I don't know any better than others when it will actually happen, though historically, it's more like 7-8 months between minor releases. I might therefore expect a release more like February 2023, and work backwards from there. Doesn't really matter, this is just a public guess and can be changed.

CVE-2022-33891: Apache Spark shell command injection vulnerability via Spark UI

2022-07-17 Thread Sean Owen
Severity: important Description: The Apache Spark UI offers the possibility to enable ACLs via the configuration option spark.acls.enable. With an authentication filter, this checks whether a user has access permissions to view or modify the application. If ACLs are enabled, a code path in

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-11 Thread Sean Owen
Is anyone seeing this error? I'm on OpenJDK 8 on a Mac: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962, tid=0x1603 # # JRE version: OpenJDK Runtime Environment (8.0_322) (build

Re: Javascript Based UDFs

2022-06-27 Thread Sean Owen
t would only be initialized once > per executor right? Or would it be once per task? > > I also was worried that this would mean you end up paying a lot in SerDe > cost if you send each row over to the VM one by one? > > On Mon, Jun 27, 2022 at 10:02 PM Sean Owen wrote: > >> Rather tha

Re: Javascript Based UDFs

2022-06-27 Thread Sean Owen
Rather than reimplement a new UDF, why not indeed just use an embedded interpreter? if something can turn javascript into something executable you can wrap that in a normal Java/Scala UDF and go. On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes wrote: > Hi all, I'm thinking about trying to implement

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Sean Owen
+1 still looks good, same as last results. On Thu, Jun 9, 2022 at 11:27 PM Maxim Gekk wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.3.0. > > The vote is open until 11:59pm Pacific time June 14th and passes if a > majority +1 PMC votes are cast, with a

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-04 Thread Sean Owen
+1 looks good now on Scala 2.13 On Sat, Jun 4, 2022 at 9:51 AM Maxim Gekk wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.3.0. > > The vote is open until 11:59pm Pacific time June 8th and passes if a > majority +1 PMC votes are cast, with a minimum of 3 +1

Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Sean Owen
Ah yeah, I think it's this change from 15 hrs ago. That needs to be .toSeq: https://github.com/apache/spark/commit/4a0f0ff6c22b85cb0fc1eef842da8dbe4c90543a#diff-01813c3e2e933ed573e4a93750107f004a86e587330cba5e91b5052fa6ade2a5R146 On Fri, Jun 3, 2022 at 4:13 PM Sean Owen wrote: > In Scala 2

Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Sean Owen
In Scala 2.13, I'm getting errors like this: analyzer should replace current_timestamp with literals *** FAILED *** java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer cannot be cast to class scala.collection.immutable.Seq (scala.collection.mutable.ArrayBuffer and

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-25 Thread Sean Owen
+1 works for me as usual, with Java 8 + Scala 2.12, Java 11 + Scala 2.13. On Tue, May 24, 2022 at 12:14 PM Maxim Gekk wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.3.0. > > The vote is open until 11:59pm Pacific time May 27th and passes if a > majority +1

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-16 Thread Sean Owen
I'm still seeing failures related to the function registry, like: ExpressionsSchemaSuite: - Check schemas for expression examples *** FAILED *** 396 did not equal 398 Expected 396 blocks in result file but got 398. Try regenerating the result files. (ExpressionsSchemaSuite.scala:161) -

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Sean Owen
, could you move them out from the 3.3.0 milestone? >>> Otherwise, we cannot distinguish new real blocker issues from those >>> obsolete JIRA issues. >>> >>> Thanks, >>> Dongjoon. >>> >>> >>> On Thu, May 5, 2022 at 11:46 AM Adam Bi

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-05 Thread Sean Owen
I'm seeing test failures; is anyone seeing ones like this? This is Java 8 / Scala 2.12 / Ubuntu 22.04: - SPARK-37618: Sub dirs are group writable when removing from shuffle service enabled *** FAILED *** [OWNER_WRITE, GROUP_READ, GROUP_WRITE, GROUP_EXECUTE, OTHERS_READ, OWNER_READ,

Re: CVE-2020-13936

2022-05-05 Thread Sean Owen
This is a Velocity issue. Spark doesn't use it, although it looks like Avro does. From reading the CVE, I do not believe it would impact Avro's usage - velocity templates it may use for codegen aren't exposed that I know of. Is there a known relationship to Spark here? That is the key question in

Re: CVE-2021-22569

2022-05-04 Thread Sean Owen
Sure, did you search the JIRA? https://issues.apache.org/jira/browse/SPARK-38340 Does this affect Spark's usage of protobuf? Looks like it can't be updated to 3.x -- this is really not a dependency of Spark but underlying dependencies. Feel free to re-attempt a change that might work, at least

Re: CVE -2020-28458, How to upgrade datatables dependency

2022-04-16 Thread Sean Owen
FWIW here's an update to 1.10.25: https://github.com/apache/spark/pull/36226 On Wed, Apr 13, 2022 at 8:28 AM Sean Owen wrote: > You can see the files in > core/src/main/resources/org/apache/spark/ui/static - you can try dropping > in the new minified versions and see if the UI is OK.

Re: CVE-2021-38296: Apache Spark Key Negotiation Vulnerability - 2.4 Backport?

2022-04-14 Thread Sean Owen
It does affect 2.4.x, yes. 2.4.x was EOL a while ago, so there wouldn't be a new release of 2.4.x in any event. It's recommended to update instead, at least to 3.1.3. On Thu, Apr 14, 2022 at 12:07 PM Chris Nauroth wrote: > A fix for CVE-2021-38296 was committed and released in Apache Spark

Re: CVE -2020-28458, How to upgrade datatables dependency

2022-04-13 Thread Sean Owen
You can see the files in core/src/main/resources/org/apache/spark/ui/static - you can try dropping in the new minified versions and see if the UI is OK. You can open a pull request if it works to update it, in case this affects Spark. It looks like the smaller upgrade to 1.10.22 is also

Re: Spark 3.0.1 and spark 3.2 compatibility

2022-04-07 Thread Sean Owen
(Don't cross post please) Generally you definitely want to compile and test vs what you're running on. There shouldn't be many binary or source incompatibilities -- these are avoided in a major release where possible. So it may need no code change. But I would certainly recompile just on

Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
to > comments on Jira. > > Turning off these GitBox emails should not have in impact on the usual > GitHub emails we are all already familiar with. > > > On Apr 4, 2022, at 9:47 AM, Sean Owen wrote: > > I think this must be related to the Gitbox migration that just happened. &g

Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
I think this must be related to the Gitbox migration that just happened. It does seem like I'm getting more emails - some are on PRs I'm attached to, but some I don't recognize. The thing is, I'm not yet clear if they duplicate the normal Github emails - that is if we turn them off do we have

Re: Tools for regression testing

2022-03-24 Thread Sean Owen
Hm, then what are you looking for besides all the tests in Spark? On Thu, Mar 24, 2022, 2:34 PM Mich Talebzadeh wrote: > Thanks > > I know what unit testing is. The question was not about unit testing. it > was specific to regression testing >

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-23 Thread Sean Owen
someone” should refer to us, and then it is no longer a > matter of “help”. It is a matter of “responsibility”, as you said. > > 2022년 3월 18일 (금) 오후 10:15, Sean Owen 님이 작성: > >> I think we can assume that someone upgrading Kafka will be responsible >> for thinking through the breaking

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Sean Owen
I think we can assume that someone upgrading Kafka will be responsible for thinking through the breaking changes. We can help by listing anything we know could affect Spark-Kafka usage and calling those out in a release note, for sure. I don't think we need to get into items that would affect

Re: bazel and external/

2022-03-17 Thread Sean Owen
has been baked in bazel since the >>> beginning and there is no plan from bazel devs to attempt to fix this >>> <https://github.com/bazelbuild/bazel/issues/4508#issuecomment-724055371> >>> . >>> >>> On Thu, Mar 17, 2022 at 7:52 PM Sean Owen wrote:

Re: bazel and external/

2022-03-17 Thread Sean Owen
Just checking - there is no way to tell bazel to look somewhere else for whatever 'external' means to it? It's a kinda big ugly change but it's not a functional change. If anything it might break some downstream builds that rely on the current structure too. But such is life for developers? I

Re: Apache Spark 3.3 Release

2022-03-03 Thread Sean Owen
I think it's fine to pursue the existing plan - code freeze in two weeks and try to close off key remaining issues. Final release pending on how those go, and testing, but fine to get the ball rolling. On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk wrote: > Hello All, > > I would like to bring on

Re: Which manufacturers' GPUs support Spark?

2022-02-16 Thread Sean Owen
Spark itself does not use GPUs, and is agnostic to what GPUs exist on a cluster, scheduled by the resource manager, and used by an application. In practice, virtually all GPU-related use cases (for deep learning for example) use CUDA, and this is NVIDIA-specific. Certainly, RAPIDS is from NVIDIA.

Re: [VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Sean Owen
Looks good to me, same results as last RC, +1 On Mon, Feb 14, 2022 at 2:55 PM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.3. > > The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes if > a majority > +1 PMC votes are

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
t; On Thu, Feb 10, 2022 at 5:37 PM Sean Owen wrote: > >> Yes I've seen this; the JVM stack size needs to be increased. I'm not >> sure if it's env specific (though you and I at least have hit it, I think >> others), or whether we need to change our build script. >

Re: Help needed to locate the csv parser (for Spark bug reporting/fixing)

2022-02-10 Thread Sean Owen
It starts in org.apache.spark.sql.execution.datasources.csv.CSVDataSource. Yes univocity is used for much of the parsing. I am not sure of the cause of the bug but it does look like one indeed. In one case the parser is asked to read all fields, in the other, to skip one. The pushdown helps

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
Yes I've seen this; the JVM stack size needs to be increased. I'm not sure if it's env specific (though you and I at least have hit it, I think others), or whether we need to change our build script. In the pom.xml file, find "-Xss..." settings and make them something like "-Xss4m", see if that

Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Sean Owen
+1 from me, same result as the last release on my end. I think releasing 3.1.3 is fine, it's 7 months since 3.1.2. On Tue, Feb 1, 2022 at 7:12 PM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.3. > > The vote is open until Feb. 4th at 5 PM

Re: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Sean Owen
(BTW you are sending to the Spark incubator list, and Spark has not been in incubation for about 7 years. Use u...@spark.apache.org) What update are you looking for? this has been discussed extensively on the Spark mailing list. Spark is not evidently vulnerable to this. 3.3.0 will include log4j

Re: Log likelhood in GeneralizedLinearRegression

2022-01-22 Thread Sean Owen
This exists in the evaluator MulticlassClassificationEvaluator instead (which can be used for binary), does that work? On Sat, Jan 22, 2022 at 4:36 AM Phillip Henry wrote: > Hi, > > As far as I know, there is no function to generate the log likelihood from > a GeneralizedLinearRegression model.

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
+1 with same result as last time. On Thu, Jan 20, 2022 at 9:59 PM huaxin gao wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if > a majority +1 PMC votes are cast, with a minimum of 3 +1

  1   2   3   4   5   6   7   8   9   10   >