Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
Jørgensen wrote: > Ok, but deleting users' data without them knowing it is never a good idea. > That's why I give this RC -1. > > lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen : > >> (Bjorn - unless this is a regression, it would not block a release, even >> if it's a bug)

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
/jira/browse/SPARK-37981 for this > bug. > > > > > fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen : > >> (Are you suggesting this is a regression, or is it a general question? >> here we're trying to figure out whether there are critical bugs introduced >> in 3.

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
(Are you suggesting this is a regression, or is it a general question? here we're trying to figure out whether there are critical bugs introduced in 3.2.1 vs 3.2.0) On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen wrote: > Hi, I am wondering if it's a bug or not. > > I do have a lot of json

Re: Spark on Oracle available as an Apache licensed open source repo

2022-01-13 Thread Sean Owen
-user Thank you for this, but just a small but important point about the use of the Spark name. Please take a look at https://spark.apache.org/trademarks.html Specifically, this should reference "Apache Spark" at least once prominently with a link to the project. It's also advisable to avoid using

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Sean Owen
+1 looks good to me. I ran all tests with scala 2.12 and 2.13 and had the same results as 3.2.0 testing. On Mon, Jan 10, 2022 at 12:10 PM huaxin gao wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.2.1. > > The vote is open until Jan. 13th at 12 PM PST (8 PM

Re: About contribution

2022-01-05 Thread Sean Owen
(There is no project chat) See https://spark.apache.org/contributing.html On Tue, Jan 4, 2022 at 11:42 PM Dennis Jung wrote: > Hello, I hope this is not a silly question. > (I couldn't find any chat room on spark project, so asking on mail) > > It has been about a year since using spark in

Re: ivy unit test case filing for Spark

2021-12-21 Thread Sean Owen
You would have to make it available? This doesn't seem like a spark issue. On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar wrote: > Hi Spark Team > > I am building a spark in VPN . But the unit test case below is failing. > This is pointing to ivy location which cannot be reached within VPN . Any

Re: spark jdbc

2021-12-17 Thread Sean Owen
I'm not sure we want to do that. If you "SELECT foo AS bar", then the column name is foo but the column label is bar. We probably want to return the latter. On Fri, Dec 17, 2021 at 9:07 AM Gary Liu wrote: > In spark sql jdbc module, it's using getColumnLabel to get column names > from the

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Sean Owen
Parquet or ORC have the necessary stats to make this fast too already, but only helps if you want the median of sorted data as stored on disk, rather than the general case. Not sure you can do better than roughly what a sort entails if you want the exact median On Wed, Dec 15, 2021, 8:56 AM Pol

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Sean Owen
It might imply that this is a way to fund Spark alone, and it isn't. Probably no big deal either way but maybe not worth it. It won't be a mystery how to find and fund the ASF for the few orgs that want to, as compared to a small project On Wed, Dec 15, 2021, 8:34 AM Maciej wrote: > Hi All, > >

Re: Log4j 1.2.17 spark CVE

2021-12-14 Thread Sean Owen
FWIW here is the Databricks statement on it. Not the same as Spark but includes Spark of course. https://databricks.com/blog/2021/12/13/log4j2-vulnerability-cve-2021-44228-research-and-assessment.html Yes the question is almost surely more whether user apps are affected, not Spark itself. On

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
at 6:33 PM James Yu wrote: > Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x > and gets submitted to the Spark cluster. Which version of log4j gets > actually used during the Spark session? > -- > *From:* Sean Owen > *Sent:

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
AM Jörn Franke wrote: > Is it in any case appropriate to use log4j 1.x which is not maintained > anymore and has other security vulnerabilities which won’t be fixed anymore > ? > > Am 13.12.2021 um 06:06 schrieb Sean Owen : > >  > Check the CVE - the log4j vulnerability a

Re: Log4j 1.2.17 spark CVE

2021-12-12 Thread Sean Owen
Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. There was mention that it could affect 1.x when used with JNDI or SMS handlers, but Spark does neither. (unless anyone can think of something I'm missing, but never heard or seen that come up at all in 7 years in Spark)

Re: Time for Spark 3.2.1?

2021-12-06 Thread Sean Owen
Always fine by me if someone wants to roll a release. It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new release of those wouldn't hurt either, if any of our release managers have the time or inclination. 3.0.x is reaching unofficial end-of-life around now anyway. On Mon,

Re: Scala 3 support approach

2021-12-03 Thread Sean Owen
is needed for Scala 3. >> >> Bests, >> Dongjoon. >> >> On Sun, Oct 18, 2020 at 1:33 PM Koert Kuipers wrote: >> >>> i think scala 3.0 will be able to use libraries built with Scala 2.13 >>> (as long as they dont use macros) >>> >&g

Re: Jira components cleanup

2021-11-15 Thread Sean Owen
Done. Now let's see if that generated 86 update emails! On Mon, Nov 15, 2021 at 11:03 AM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > https://issues.apache.org/jira/projects/SPARK?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page > > I think the "docs" component

Re: Update Spark 3.3 release window?

2021-10-27 Thread Sean Owen
Seems fine to me - as good a placeholder as anything. Would that be about time to call 2.x end-of-life? On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon wrote: > Hi all, > > Spark 3.2. is out. Shall we update the release window > https://spark.apache.org/versioning-policy.html? > I am thinking of

Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-17 Thread Sean Owen
> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> >>> >>> >>> On Tue, 12 Oct 2021 at 08:15, Gengliang Wang wrote: >>> >>> The vote passes with 28 +1s (10 binding +1s). >>> Thanks to

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-07 Thread Sean Owen
+1 again. Looks good in Scala 2.12, 2.13, and in Java 11. I note that the mem requirements for Java 11 tests seem to need to be increased but we're handling that separately. It doesn't really affect users. On Wed, Oct 6, 2021 at 11:49 AM Gengliang Wang wrote: > Please vote on releasing the

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-29 Thread Sean Owen
+1 looks good to me as before, now that a few recent issues are resolved. On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.2.0. > > The vote is open until 11:59pm Pacific time September 30 and passes if a >

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
, 2021 at 6:58 PM Chao Sun wrote: > Hmm it may be related to the commit. Sean: how do I reproduce this? > > On Mon, Sep 27, 2021 at 4:56 PM Sean Owen wrote: > >> Another "is anyone else seeing this"? in compiling common/yarn-network: >> >> [ERROR] [Error]

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Another "is anyone else seeing this"? in compiling common/yarn-network: [ERROR] [Error] /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:32: package com.google.common.annotations does not exist [ERROR] [Error]

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Has anyone seen a StackOverflowError when running tests? It happens in compilation. I heard from another user who hit this earlier, and I had not, until just today testing this: [ERROR] ## Exception when compiling 495 sources to /mnt/data/testing/spark-3.2.0/sql/catalyst/target/scala-2.12/classes

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Hm... it does just affect Mac OS (?) and only if you don't have JAVA_HOME set (which people often do set) and only affects build/mvn, vs built-in maven (which people often have installed). Only affects those building. I'm on the fence about whether it blocks 3.2.0, as it doesn't affect downstream

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Sean Owen
Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912 This _is_ a test-only dependency which makes it less of an issue. I'm guessing it's not in Maven as it's a small one-off utility; we _could_ just inline the ~100 lines of code in test code instead? On Tue, Sep 21, 2021 at

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-20 Thread Sean Owen
+1 from me, same results as the last RC from my side. The Scala 2.13 POM issue was resolved and the 2.13 build appears to be OK. On Sat, Sep 18, 2021 at 10:19 PM Gengliang Wang wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.2.0. > > The vote is open until

Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Sean Owen
Sure, doesn't hurt to have a placeholder. On Mon, Sep 13, 2021, 5:32 PM Holden Karau wrote: > Hi Folks, > > I'm going through the Spark 3.2 tickets just to make sure were not missing > anything important and I was wondering what folks thoughts are on adding > Spark 4 so we can target API

Re: [SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Sean Owen
That does seem pointless. The body could just be .flatten()-ed to achieve the same result. Maybe it was just written that way for symmetry with the block above. You could open a PR to change it. On Wed, Sep 8, 2021 at 4:31 AM Jacek Laskowski wrote: > Hi Spark Devs, > > I'm curious what your

Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Sean Owen
This RC looks OK to me too, understanding we may need to have RC3 for the outstanding issues though. The issue with the Scala 2.13 POM is still there; I wasn't able to figure it out (anyone?), though it may not affect 'normal' usage (and is work-around-able in other uses, it seems), so may be

Re: Question using multiple partition for Window cumulative functions when partition is not specified.

2021-08-30 Thread Sean Owen
You just have 1 partition here because the input is so small. You can always repartition this further for parallelism. Is the issue that you're not partitioning the window itself, maybe? On Mon, Aug 30, 2021 at 12:59 AM Haejoon Lee wrote: > Hi all, > > I noticed that Spark uses only one

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-27 Thread Sean Owen
ean, > > I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will > help you out here. > > Cheers, > > Steve C > > On 27 Aug 2021, at 12:29 pm, Sean Owen wrote: > > OK right, you would have seen a different error otherwise. > > Yes profiles are

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
; org.scala-lang.modules > > scala-parallel-collections_${scala.binary.version} > > > > > which means this dependency will be missing for unit tests that create > SparkSessions from library code only, a technique inspired by Spark’s own > unit tests. > >

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to update POMs. It works fine for me. On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy wrote: > Hi all, > > Being adventurous I have built the RC1 code with: > > -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Sean Owen
I think we'll need this revert: https://github.com/apache/spark/pull/33819 Between that and a few other minor but important issues I think I'd say -1 myself and ask for another RC. On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski wrote: > Hi Yi Wu, > > Looks like the issue has got resolution:

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
ext.scala:2686) > at > org.bdgenomics.adam.ds.ADAMContext.loadVariants(ADAMContext.scala:3608) > at > org.bdgenomics.adam.ds.variant.VariantDatasetSuite.$anonfun$new$1(VariantDatasetSuite.scala:128) > at > org.bdgenomics.utils.misc.SparkFunSuite.$anonfun$sparkTest$1(SparkFunSui

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look good per usual. Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're OK for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus some very minor corners of the project's deps) I think we're

Re: Nabble archive is down

2021-08-17 Thread Sean Owen
Oh duh, right, much better idea! On Tue, Aug 17, 2021 at 2:56 PM Micah Kornfield wrote: > https://lists.apache.org/list.html?u...@spark.apache.org should be > searchable (although the UI is a little clunky). > > On Tue, Aug 17, 2021 at 12:52 PM Sean Owen wrote: > >>

Re: Nabble archive is down

2021-08-17 Thread Sean Owen
If the links are down and not evidently coming back, yeah let's change any website links. Probably best to depend on ASF resources foremost, but, the ASF archive isn't searchable: https://mail-archives.apache.org/mod_mbox/spark-user/ What about things like

Re: Access to Apache GitHub

2021-08-15 Thread Sean Owen
No, we can't give write access to Apache repos of course, not to anyone but committers. People contribute by opening pull requests. On Sun, Aug 15, 2021 at 10:11 AM Mich Talebzadeh wrote: > > Hi, > > > With reference to recent threads/discussions on creating ready-made > docker images for

Re: TreeNode.exists?

2021-08-11 Thread Sean Owen
If this is repeated a bunch of places in the code, sure, a utility method could be good. I think .find(x).isDefined is even not optimal - .exists(x) is a little easier and may be slightly faster? If you find a chance for refactoring, sure open a minor PR. On Wed, Aug 11, 2021 at 9:42 AM Jacek

Re: Spark 3: Resource Discovery

2021-07-17 Thread Sean Owen
At the moment this is really about discovering GPUs, so that the scheduler can schedule tasks that need to allocate whole GPUs. On Sat, Jul 17, 2021 at 5:14 PM ayan guha wrote: > Hi > > As I was going through Spark 3 config params, I noticed following group of > params. I could not understand

Re: Removing references to Master

2021-07-09 Thread Sean Owen
We maybe don't need to litigate this one again. I do think this point of view is legitimate, as is the point of view that 'master' is inextricably linked to 'master/slave' as an unfortunate term of art; it did not originate in reference to mastery of a skill but of another entity. Even if one

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Sean Owen
The downside here is that it would break downstream builds that set hadoop-3.2 if it's now called hadoop-3. That's not a huge deal. We can retain dummy profiles under the old names that do nothing, but that would be a quieter 'break'. I suppose this naming is only of importance to developers, who

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-17 Thread Sean Owen
+1 same result as ever. Signatures are OK, tags look good, tests pass. On Thu, Jun 17, 2021 at 5:11 AM Yi Wu wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.0.3. > > The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC > votes are cast,

How to think about SparkPullRequestBuilder-K8s?

2021-06-11 Thread Sean Owen
I find that somewhat often, the K8S PR builders will fail on a PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/ ... when the PR seems totally unrelated to K8S. I've kind of learned to ignore them in that case but that seems wrong. Are they just kind of flaky? am I

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Sean Owen
+1 same result as in previous tests On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.2. > > The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1

Re: Resolves too old JIRAs as incomplete

2021-05-19 Thread Sean Owen
I agree. Such old JIRAs are 99% obsolete. If anyone objects to a particular issue being closed, they can comment and we can reopen. It's a very reversible thing. There is value in keeping JIRA up to date with reality. On Wed, May 19, 2021 at 8:47 PM Takeshi Yamamuro wrote: > Hi, dev, > > As you

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
ybe leave as it > is? > > > Sean Owen-2 wrote > > Hm, yes I see it at > > > http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee=on=index > > but not on keyserver.ubuntu.com for some reason. > > What happens if you try to close it again, perhaps even manua

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
Hm, yes I see it at http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee=on=index but not on keyserver.ubuntu.com for some reason. What happens if you try to close it again, perhaps even manually in the UI there? I don't want to click it unless it messes up the workflow On Tue, May

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-10 Thread Sean Owen
It looks like the repository is "open" - it doesn't publish until "closed" after all artifacts are uploaded. Is that it? Otherwise +1 from me. On Mon, May 10, 2021 at 1:10 AM Liang-Chi Hsieh wrote: > Yea, I don't know why it happens. > > I remember RC1 also has the same issue. But RC2 and RC3

Re: [apache/spark-website] Update contributing to include code of conduct section (#335)

2021-05-04 Thread Sean Owen
Just FYI - proposed update to the CoC for the project. Looks reasonable to simply adopt the ASF code of conduct, per the PR. On Tue, May 4, 2021 at 2:02 AM Jungtaek Lim wrote: > I think the rationalization is great, but why not going through dev@ > mailing list? Many contributors are

Re: Should we add built in support for bouncy castle EC w/Kube

2021-04-29 Thread Sean Owen
I recall that Bouncy Castle has some crypto export implications. If it's in the distro then I think we'd have to update https://www.apache.org/licenses/exports/ to reflect that Bouncy Castle is again included in the product. But that's doable. Just have to recall how one updates that. On Thu, Apr

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Sean Owen
+1 from me too, same result as last time. On Wed, Apr 28, 2021 at 11:33 AM Liang-Chi Hsieh wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until May 4th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of

Re: mvn auto-downloading on fresh clone

2021-04-21 Thread Sean Owen
I agree, it looks like the automatic redirector has changed behavior. It still sends you to an HTML page for the mirror, but previously that link would cause it to redirect straight to the download. While the script can fallback to archive.apache.org, it doesn't because the HTML downloads

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-12 Thread Sean Owen
+1 same result as last RC for me. On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until Apr 15th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Sean Owen
Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with about all profiles enabled. I still get an odd failure in the Hive versions suite, but I keep seeing that in my env and think it's something odd about my setup. +1

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-14 Thread Sean Owen
I like koalas a lot. Playing devil's advocate, why not just let it continue to live as an add on? Usually the argument is it'll be maintained better in Spark but it's well maintained. It adds some overhead to maintaining Spark conversely. On the upside it makes it a little more discoverable. Are

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
; > Ya, exactly, we can release 2.4.8 as a normal release first and use 2.4.9 > as the EOL release. > > Since 2.4.7 was released almost 6 months ago, 2.4.8 is a little late in > terms of the cadence. > > Bests, > Dongjoon. > > > On Wed, Mar 3, 2021 at 10:55 AM Sean

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
For reference, 2.3.x was maintained from February 2018 (2.3.0) to Sep 2019 (2.3.4), or about 19 months. The 2.4 branch should probably be maintained longer than that, as the final 2.x branch. 2.4.0 was released in Nov 2018. A final release in, say, April 2021 would be about 30 months. That feels

Re: Apache Spark 3.2 Expectation

2021-02-25 Thread Sean Owen
I'd roughly expect 3.2 in, say, July of this year, given the usual cadence. No reason it couldn't be a little sooner or later. There is already some good stuff in 3.2 and will be a good minor release in 5-6 months. On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun wrote: > Hi, All. > > Since we

Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Sean Owen
Shane would you know? May be a problem with a single worker. On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry wrote: > > Hi, > > Silly question: the Jenkins build for my PR is failing but it seems > outside of my control. What must I do to remedy this? > > I've submitted > >

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Sean Owen
, 2021 at 4:06 AM Enrico Minack wrote: > Am 18.02.21 um 16:34 schrieb Sean Owen: > > One other aspect is that a committer is taking some degree of > > responsibility for merging a change, so the ask is more than just a > > few minutes of eyeballing. If it breaks someth

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Sean Owen
+1 LGTM, same results as last time. Does anyone see the error below? It is probably env-specific as the Jenkins jobs don't hit this. Just checking. SPARK-29604 external listeners should be initialized with Spark classloader *** FAILED *** java.lang.RuntimeException: [download failed:

Re: Java Code Style

2021-02-20 Thread Sean Owen
Do you just mean you want to adjust the code style rules? Yes you can do that in IJ, just a matter of finding the indent rule to adjust. The Spark style is pretty normal stuff, though not 100% consistent.I prefer the first style in this case. Sometimes it's a matter of judgment when to differ from

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Sean Owen
cklog which > can be done in a couple of months or so with assigning to him/herself, > which effectively blocks others from working or proposing the same. I > consider this as preemptive which sounds bad and even unfair. > > On Fri, Feb 19, 2021 at 12:14 AM Sean Owen wrote: > >> I think

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-18 Thread Sean Owen
Holden is absolutely correct - pinging relevant individuals is probably your best bet. I skim the 40-50 PRs that have activity each day and look into a few that look like I would know something about by the title, but, easy to miss something I could weigh in on. There is no way to force people to

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Sean Owen
I think it's OK to raise particular instances. It's hard for me to evaluate further in the abstract. I don't think we use Assignee much at all, except to kinda give credit when something is done. No piece of code or work can be solely owned by one person; this is just ASF policy. I think we've

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-17 Thread Sean Owen
Hyun wrote: > I didn't see them. Could you describe your environment: OS, Java, > Maven/SBT, profiles? > > On Wed, Feb 17, 2021 at 6:26 PM Sean Owen wrote: > >> I think I'm +1 on this, in that I don't see any more test failures than I >> usually do, and I think

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-17 Thread Sean Owen
I think I'm +1 on this, in that I don't see any more test failures than I usually do, and I think they're due to my local env, but is anyone seeing these failures? - includes jars passed in through --jars *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail.

Re: Apache Spark 3.0.2 Release ?

2021-02-12 Thread Sean Owen
Sounds like a fine time to me, sure. On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun wrote: > Hi, All. > > As of today, `branch-3.0` has 307 patches (including 25 correctness > patches) since v3.0.1 tag (released on September 8th, 2020). > > Since we stabilized branch-3.0 during 3.1.x preparation

Re: Using bundler for Jekyll?

2021-02-12 Thread Sean Owen
Seems fine to me. How about just regenerating the whole site once with the latest version and requiring that? On Fri, Feb 12, 2021 at 7:09 AM attilapiros wrote: > I run into the same problem today and tried to find the version where the > diff is minimal, so I wrote a script: > > ``` >

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Sean Owen
Same result as last time for me, +1. Tested with Java 11. I fixed the two issues without assignee; one was WontFix though. On Mon, Feb 8, 2021 at 7:43 PM Hyukjin Kwon wrote: > Let's set the assignees properly then. Shouldn't be a problem for the > release. > > On Tue, 9 Feb 2021, 10:40 Yuming

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Sean Owen
t; Phillip > > > > > > On Sat, Jan 30, 2021 at 2:00 PM Sean Owen wrote: > >> I was thinking ParamGridBuilder would have to change to accommodate a >> continuous range of values, and that's not hard, though other code wouldn't >> understand that type of value,

Re: Public API access to UDTs

2021-02-03 Thread Sean Owen
I opened https://github.com/apache/spark/pull/31461 to track the discussion further. It narrowly proposes making a few types public. On Mon, Feb 1, 2021 at 8:52 AM Fitch, Simeon wrote: >  > > On Mon, Feb 1, 2021 at 9:38 AM Sean Owen wrote: > >> I'm not hearing any o

Re: Public API access to UDTs

2021-02-01 Thread Sean Owen
I'm not hearing any objection to making it public as a @DeveloperApi ? anyone object to a PR on that? On Fri, Jan 29, 2021 at 8:46 AM Sean Owen wrote: > I'm also interested: are there problems with opening up this API beyond > needing to freeze it and keep it stable? it's pretty

Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Sean Owen
eropt, then so be it. It might be useful for people not using Python but > they can just roll-their-own, I guess. > > Anyway, looking forward to hearing what you think. > > Regards, > > Phillip > > > > On Fri, Jan 29, 2021 at 4:18 PM Sean Owen wrote: > >>

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
uldn't make sense (unless the grid > was warped, of course). > > Does that make sense? It might be better for me to just write the code as > I don't think it would be very complicated. > > Happy to hear your thoughts. > > Phillip > > > > On Fri, Jan 29, 2021 at 1:47 P

Re: Public API access to UDTs

2021-01-29 Thread Sean Owen
I'm also interested: are there problems with opening up this API beyond needing to freeze it and keep it stable? it's pretty stable. As @DeveloperApi at least? Are there implications for storing UDTs in particular engines or formats? Just making it public for developers, even with a 'use at your

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Sean Owen
I don't know of anyone working on that. Yes I think it could be useful. I think it might be easiest to implement by simply having some parameter to the grid search process that says what fraction of all possible combinations you want to randomly test. On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Sean Owen
I think I can articulate the general idea here, though I expect it is not deployed consistently. Yes there's a general desire to make APIs consistent across languages. Python and Scala should track pretty closely, even if R isn't really that consistent. SQL is a somewhat different case. There

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-19 Thread Sean Owen
+1 from me. Same results as in 3.1.0 testing. On Mon, Jan 18, 2021 at 6:06 AM Hyukjin Kwon wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.1. > > The vote is open until January 22nd 4PM PST and passes if a majority +1 > PMC votes are cast, with a minimum

Re: Chnage Restful API's default quanlites same as WebUI

2021-01-12 Thread Sean Owen
The change makes sense of course; I think the real question for everyone is: would you change the semantics of this info from the API in a minor release, or would it have to wait? is it more bug fix than breaking change? On Sun, Jan 10, 2021 at 7:38 PM angers.zhu wrote: > Hi devs, > > These

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Sean Owen
While we can delete the tag, maybe just leave it. As a general rule we would not remove anything pushed to the main git repo. On Thu, Jan 7, 2021 at 8:31 AM Jacek Laskowski wrote: > Hi, > > BTW, wondering aloud. Since it was agreed to skip 3.1.0 and go ahead with > 3.1.1, what's gonna happen

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Sean Owen
above issues are not regression, those are enough for me to > give -1 for 3.1.0 RC1. > > On Wed, Jan 6, 2021 at 3:52 PM Sean Owen wrote: > >> I just don't see a reason to believe there's a rush? just test it as >> normal? I did, you can too, etc. >> Or specifically what blocks the current RC? >> >

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Sean Owen
I just don't see a reason to believe there's a rush? just test it as normal? I did, you can too, etc. Or specifically what blocks the current RC? On Wed, Jan 6, 2021 at 5:46 PM Jungtaek Lim wrote: > No worries about the accident. We're human beings, and everyone can make a > mistake. Let's wait

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Sean Owen
OK, we'll have to update the release page to clarify there was never a real 3.1.0 release then. But I'm not suggesting releasing 3.1.0 _because_ it was published accidentally. I'm suggesting we figure out normally whether we would have released it, and if so, great. If not, fine we must skip the

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Sean Owen
Er, yeah uh oh. Did the staging repo accidentally get closed/released? Maybe I'm also missing something. If so then one way or the other we can't undo that as the 3.1.0 release in Maven, as far as I know. We can make a 3.1.1, but then it's kind of weird there was never any 3.1.0 source release.

Re: Usage of JDK Vector API in ML/MLLib

2020-12-16 Thread Sean Owen
n my end. > > > > Thank you, > > Ludovic > > > > [1] https://github.com/luhenry/blas > > > > *From: *Erik Krogen > *Sent: *Tuesday, 15 December 2020 17:33 > *To: *Sean Owen > *Cc: *Ludovic Henry ; dev@spark.apache.org; Bernhard > Urban-Forster &

Re: Usage of JDK Vector API in ML/MLLib

2020-12-15 Thread Sean Owen
Yes it's intriguing, though as you say not readily available in the wild yet. I would also expect native BLAS to outperform f2j also, so yeah that's the interesting question, whether this is a win over native code or not. I suppose the upside is eventually, we may expect this API to be available

Re: Seeking committers' help to review on SS PR

2020-11-27 Thread Sean Owen
I don't know the code well, but those look minor and straightforward. They have reviews from the two most knowledgeable people in this area. I don't think you need to block for 6 months after proactively seeking all likely reviewers - I'm saying that's the resolution to this type of situation

Re: Seeking committers' help to review on SS PR

2020-11-23 Thread Sean Owen
t PR. I'm sorry that you had to wait so > long. > > On Mon, Nov 23, 2020 at 7:11 AM Sean Owen wrote: > >> I don't see any objections on that thread. You're a committer and have >> reviews from other knowledgeable people in this area. Do you have any >> reason to believ

Re: Seeking committers' help to review on SS PR

2020-11-23 Thread Sean Owen
I don't see any objections on that thread. You're a committer and have reviews from other knowledgeable people in this area. Do you have any reason to believe it's controversial, like, changes semantics or APIs? Were there related discussions elsewhere that expressed any concern? >From a glance,

Re: [DISCUSS] Review/merge phase, and post-review

2020-11-13 Thread Sean Owen
I am sure you are referring to some specific instances but I have not followed enough to know what they are. Can you point them out? I think that is most productive for everyone to understand. On Fri, Nov 13, 2020 at 10:16 PM Jungtaek Lim wrote: > Hi devs, > > I know this is a super sensitive

Spark on JDK 14

2020-10-28 Thread Sean Owen
For kicks, I tried Spark on JDK 14. 11 -> 14 doesn't change much, not as much as 8 -> 9 (-> 11), and indeed, virtually all tests pass. For the interested, these two seem to fail: - ZooKeeperPersistenceEngine *** FAILED *** org.apache.zookeeper.KeeperException$ConnectionLossException:

Re: Unpersist return type

2020-10-22 Thread Sean Owen
le but when you run it, it fails. > > > https://stackoverflow.com/questions/63642364/how-to-use-foreachbatch-batchdf-unpersist-appropriately-in-structured-streamin > > [image: Captura de pantalla 2020-10-22 a las 16.01.33.png] > > On Thu, 22 Oct 2020 at 15:53, Sean Owen wrote: > &

Re: Unpersist return type

2020-10-22 Thread Sean Owen
Probably for purposes of chaining, though won't be very useful here. Like df.unpersist().cache(... some other settings ...) foreachBatch wants a function that evaluates to Unit, but this qualifies - doesn't matter what the value of the block is, if it's ignored. This does seem to compile; are you

Re: Scala 2.13 actual class used for Seq

2020-10-19 Thread Sean Owen
change is in master (compared to 3.0.1) and it is not > limited to Scala 2.13. > this might impact user programs somewhat? List has different performance > characteristics than WrappedArray... for starters it is not an IndexedSeq. > > > On Mon, Oct 19, 2020 at 8:24 AM Sean Owen wrote:

Re: Scala 2.13 actual class used for Seq

2020-10-19 Thread Sean Owen
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of things will now return an immutable Seq. Almost all code doesn't care what Seq it returns and we didn't change any of that in the code, so, this is just what we're getting as a 'default' from whatever operations produce the

Re: Scala 3 support approach

2020-10-18 Thread Sean Owen
Spark depends on a number of Scala libraries, so needs them all to support version X before Spark can. This only happened for 2.13 about 4-5 months ago. I wonder if even a fraction of the necessary libraries have 3.0 support yet? It can be difficult to test and support multiple Scala versions

<    1   2   3   4   5   6   7   8   9   10   >