Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Yinan Li
+1. This and the GCP Spark Operator, although being very useful for k8s users, are not something needed by all Spark users, not even by all Spark on k8s users. On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos < stavros.kontopou...@lightbend.com> wrote: > Hi all, > > I also left a comment on

Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Stavros Kontopoulos
Hi all, I also left a comment on the PR with more details. I dont see why the java operator should be maintained by the Spark project. This is an interesting project and could thrive on its own as an external operator project. Best, Stavros On Thu, Oct 10, 2019 at 7:51 PM Sean Owen wrote: >

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Weichen Xu
Wait... I have some supplement: *New API:* SPARK-25097 Support prediction on single instance in KMeans/BiKMeans/GMM SPARK-28045 add missing RankingEvaluator SPARK-29121 Support Dot Product for Vectors *Behavior change or new API with behavior change:* SPARK-23265 Update multi-column error

DataSourceV2 sync notes - 2 October 2019

2019-10-10 Thread Ryan Blue
Here are my notes from last week's DSv2 sync. *Attendees*: Ryan Blue Terry Kim Wenchen Fan *Topics*: - SchemaPruning only supports Parquet and ORC? - Out of order optimizer rules - 3.0 work - Rename session catalog to spark_catalog - Finish TableProvider update to avoid

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Xingbo Jiang
Hi all, Here is the updated feature list: SPARK-11215 Multiple columns support added to various Transformers: StringIndexer SPARK-11150 Implement Dynamic Partition Pruning SPARK-13677

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Dongjoon Hyun
+1 Bests, Dongjoon On Thu, Oct 10, 2019 at 10:14 Ryan Blue wrote: > +1 > > Thanks for fixing this! > > On Thu, Oct 10, 2019 at 6:30 AM Xiao Li wrote: > >> +1 >> >> On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon wrote: >> >>> +1 (binding) >>> >>> 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Sean Owen
See the JIRA - this is too open-ended and not obviously just due to choices in data representation, what you're trying to do, etc. It's correctly closed IMHO. However, identifying the issue more narrowly, and something that looks ripe for optimization, would be useful. On Thu, Oct 10, 2019 at

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread antonkulaga
I think for sure SPARK-28547 At the moment there are some flows in Spark architecture and it performs miserably or even freezes everywhere where column number exceeds 10-15K (even simple describe function takes ages while the

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Ryan Blue
+1 Thanks for fixing this! On Thu, Oct 10, 2019 at 6:30 AM Xiao Li wrote: > +1 > > On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon wrote: > >> +1 (binding) >> >> 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이 작성: >> >>> Thanks for the great work, Gengliang! >>> >>> +1 for that. >>> As I said

Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Sean Owen
I'd have the same question on the PR - why does this need to be in the Apache Spark project vs where it is now? Yes, it's not a Spark package per se, but it seems like this is a tool for K8S to use Spark rather than a core Spark tool. Yes of course all the packages, licenses, etc have to be

[k8s] Spark operator (the Java one)

2019-10-10 Thread Jiri Kremser
Hello, Spark Operator is a tool that can deploy/scale and help with monitoring of Spark clusters on Kubernetes. It follows the operator pattern [1] introduced by CoreOS so it watches for changes in custom resources representing the desired state of the clusters and does the steps to achieve this

Re: Committing while Jenkins down?

2019-10-10 Thread Shane Knapp
for running k8s tests locally, i have a section dedicated to that here: https://spark.apache.org/developer-tools.html minikube and friends is pretty straightforward to set up, but we're running an older version of the former. i am planning on addressing that (and moving us to a recent release)

Re: Committing while Jenkins down?

2019-10-10 Thread Shane Knapp
yeah, as long as tests are run locally i'm ok w/merging. once california is elevated from 'developing country' status to 'holy crap, the magic of electricity has returned!' and we can automatically build again any errors that slipped through will be caught in jenkins. this means we'll need to

Re: Committing while Jenkins down?

2019-10-10 Thread Holden Karau
On Thu, Oct 10, 2019 at 9:13 AM Xiao Li wrote: > Thanks! Shane! > > AFAIK, it normally takes *more than 5/6 hours* to run all the tests. Any > major changes in Core/SQL require running all the tests. If any committer > did it before merging the code, I think it is fine to merge it. > Glad were

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
Thanks! Shane! AFAIK, it normally takes *more than 5/6 hours* to run all the tests. Any major changes in Core/SQL require running all the tests. If any committer did it before merging the code, I think it is fine to merge it. Xiao Holden Karau 于2019年10月10日周四 上午9:11写道: > Awesome, thanks Shane

Re: Committing while Jenkins down?

2019-10-10 Thread Holden Karau
Awesome, thanks Shane :) In the meantime I think committers can just run tests locally and it’ll be a slower process but I don’t think we need to halt all merging. On Thu, Oct 10, 2019 at 9:07 AM Shane Knapp wrote: > if we do get power back before the weekend, i can have my sysadmin > head

Re: Committing while Jenkins down?

2019-10-10 Thread Shane Knapp
if we do get power back before the weekend, i can have my sysadmin head down to the colo friday afternoon and power up jenkins. he knows the drill. On Thu, Oct 10, 2019 at 8:50 AM Holden Karau wrote: > > I think a reasonable, albeit slow, option is to run the tests locally. Since > the outage

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-10 Thread Shane Knapp
another quick update: campus lost power ~1130pm, and is closed for the entirety of today. no word on power restoration, campus status, etc etc. updates as they come. :\ On Wed, Oct 9, 2019 at 2:34 PM Shane Knapp wrote: > > quick update: > > campus is losing power @ 8pm. this is after we

Re: Committing while Jenkins down?

2019-10-10 Thread Holden Karau
I think a reasonable, albeit slow, option is to run the tests locally. Since the outage could be as long as five days I’d rather not just have PRs pile up for that entire period. On Thu, Oct 10, 2019 at 8:38 AM Xiao Li wrote: > I think we are unable to merge any major PR if we do not know

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
Please check the note from Shane. [build system] IMPORTANT! northern california fire danger, potential power outage(s) Thomas graves 于2019年10月10日周四 上午8:35写道: > This is directed towards committers/PMC members. > > It looks like Jenkins will be down for a while, what is everyone's > thoughts on

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
I think we are unable to merge any major PR if we do not know whether the tests can pass. Xiao Xiao Li 于2019年10月10日周四 上午8:36写道: > Please check the note from Shane. > > [build system] IMPORTANT! northern california fire danger, potential power > outage(s) > > Thomas graves 于2019年10月10日周四

Committing while Jenkins down?

2019-10-10 Thread Thomas graves
This is directed towards committers/PMC members. It looks like Jenkins will be down for a while, what is everyone's thoughts on committing PRs while its down? Do we want to wait for Jenkins to come back up, manually run things ourselves and commit? Tom

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Xiao Li
+1 On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon wrote: > +1 (binding) > > 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이 작성: > >> Thanks for the great work, Gengliang! >> >> +1 for that. >> As I said before, the behaviour is pretty common in DBMSs, so the change >> helps for DMBS users. >> >>

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-10 Thread Jacek Laskowski
Hi, Thanks much for such thorough conversation. Enjoyed it very much. > Source/Sink traits are in org.apache.spark.sql.execution and thus they are private. That would explain why I couldn't find scaladocs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Hyukjin Kwon
+1 (binding) 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이 작성: > Thanks for the great work, Gengliang! > > +1 for that. > As I said before, the behaviour is pretty common in DBMSs, so the change > helps for DMBS users. > > Bests, > Takeshi > > > On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang < >

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Takeshi Yamamuro
Thanks for the great work, Gengliang! +1 for that. As I said before, the behaviour is pretty common in DBMSs, so the change helps for DMBS users. Bests, Takeshi On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang wrote: > Hi everyone, > > I'd like to call for a new vote on SPARK-28885 >