Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Stephen Boesch
+1 Thx for seeing this through On Wed, 1 Jul 2020 at 20:03, Imran Rashid wrote: > +1 > > I think this is going to be a really important feature for Spark and I'm > glad to see Holden focusing on it. > > On Wed, Jul 1, 2020 at 8:38 PM Mridul Muralidharan > wrote: > >> +1 >> >> Thanks, >> Mridul

Re: Initial Decom PR for Spark 3?

2020-06-22 Thread Stephen Boesch
raft for comment by the end of Spark summit. I'll be using the same design >> document for the design component, so if anyone has input on the design >> document feel free to start leaving comments there now. >> >> On Sat, Jun 20, 2020 at 4:23 PM Stephen Boesch wr

Re: Initial Decom PR for Spark 3?

2020-06-20 Thread Stephen Boesch
Hi given there is a design doc (contrary to that common) - is this going to move forward? On Thu, 18 Jun 2020 at 18:05, Hyukjin Kwon wrote: > Looks it had to be with SPIP and a proper design doc to discuss. > > 2020년 2월 9일 (일) 오전 1:23, Erik Erlandson 님이 작성: > >> I'd be willing to pull this in,

Re: Initial Decom PR for Spark 3?

2020-06-18 Thread Stephen Boesch
Second paragraph of the PR lists the design doc. > There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing On Thu, 18 Jun 2020 at 18:05, Hyukjin Kwon wrote: > Looks it had to be with SPIP and a proper design doc to discuss.

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Stephen Boesch
same code. why running them two different ways vary so much in the > execution time. > > > > > *Regards,Dhrubajyoti Hati.Mob No: 9886428028/9652029028* > > > On Wed, Sep 11, 2019 at 8:42 AM Stephen Boesch wrote: > >> Sounds like you have done your homework to

Re: script running in jupyter 6-7x faster than spark submit

2019-09-10 Thread Stephen Boesch
Sounds like you have done your homework to properly compare . I'm guessing the answer to the following is yes .. but in any case: are they both running against the same spark cluster with the same configuration parameters especially executor memory and number of workers? Am Di., 10. Sept. 2019

Re: [MLlib] PCA Aggregator

2018-10-19 Thread Stephen Boesch
Erik - is there a current locale for approved/recommended third party additions? The spark-packages has been stale for years it seems. Am Fr., 19. Okt. 2018 um 07:06 Uhr schrieb Erik Erlandson < eerla...@redhat.com>: > Hi Matt! > > There are a couple ways to do this. If you want to submit it

Re: Spark.ml roadmap 2.3.0 and beyond

2018-03-20 Thread Stephen Boesch
gree to shepherd them. (Committers, make sure to check what > you're currently listed as shepherding!) The links for searching can be > useful too. > > On Thu, Dec 7, 2017 at 3:55 PM, Stephen Boesch <java...@gmail.com> wrote: > >> Thanks Joseph. We can wait for post 2.3.0. >

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-10 Thread Stephen Boesch
A relevant observation: there was a closed/executed jira last year to remove the option to disable the codegen flag (and unsafe flag as well): https://issues.apache.org/jira/browse/SPARK-11644 2017-12-10 13:16 GMT-08:00 Jacek Laskowski : > Hi, > > I'm wondering why a physical

Re: Spark.ml roadmap 2.3.0 and beyond

2017-12-07 Thread Stephen Boesch
gt; > Joseph > > On Wed, Nov 29, 2017 at 6:52 AM, Stephen Boesch <java...@gmail.com> wrote: > >> There are several JIRA's and/or PR's that contain logic the Data Science >> teams that I work with use in their local models. We are trying to >> determine if/when these

Re: Spark.ml roadmap 2.3.0 and beyond

2017-11-29 Thread Stephen Boesch
were headed? 2017-11-29 6:39 GMT-08:00 Stephen Boesch <java...@gmail.com>: > Any further information/ thoughts? > > > > 2017-11-22 15:07 GMT-08:00 Stephen Boesch <java...@gmail.com>: > >> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available: >&g

Spark.ml roadmap 2.3.0 and beyond

2017-11-22 Thread Stephen Boesch
The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available: 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581 .. It seems those roadmaps were not available per se' for 2.3.0 and later? Is there a different mechanism for that

Re: Add a machine learning algorithm to sparkml

2017-10-20 Thread Stephen Boesch
A couple of less obvious facets of getting over the (significant!) hurdle to have an algorithm accepted into mllib (/spark.ml): - the review time can be *very *long - a few to many months is a typical case even for relatively fast tracked algorithms - you will likely be asked to

Re: MLlib mission and goals

2017-01-24 Thread Stephen Boesch
re: spark-packages.org and "Would these really be better in the core project?" That was not at all the intent of my input: instead to ask "how and where to structure/place deployment quality code that yet were *not* part of the distribution?" The spark packages has no curation whatsoever : no

Re: MLlib mission and goals

2017-01-23 Thread Stephen Boesch
Along the lines of #1: the spark packages seemed to have had a good start about two years ago: but now there are not more than a handful in general use - e.g. databricks CSV. When the available packages are browsed the majority are incomplete, empty, unmaintained, or unclear. Any ideas on how to

Re: Organizing Spark ML example packages

2016-09-12 Thread Stephen Boesch
Yes: will you have cycles to do it? 2016-09-12 9:09 GMT-07:00 Nick Pentreath : > Never actually got around to doing this - do folks still think it > worthwhile? > > On Thu, 21 Apr 2016 at 00:10 Joseph Bradley wrote: > >> Sounds good to me. I'd

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Stephen Boesch
+1 for java8 only +1 for 2.11+ only .At this point scala libraries supporting only 2.10 are typically less active and/or poorly maintained. That trend will only continue when considering the lifespan of spark 2.X. 2016-03-24 11:32 GMT-07:00 Steve Loughran : > > On

Re: spark task scheduling delay

2016-01-20 Thread Stephen Boesch
Which Resource Manager are you using? 2016-01-20 21:38 GMT-08:00 Renu Yadav : > Any suggestions? > > On Wed, Jan 20, 2016 at 6:50 PM, Renu Yadav wrote: > >> Hi , >> >> I am facing spark task scheduling delay issue in spark 1.4. >> >> suppose I have 1600

Re: what is the best way to debug spark / mllib?

2015-12-27 Thread Stephen Boesch
1) you should run zinc incremental compiler 2) if you want breakpoints that should likely be done in local mode 3) adjust the log4j.properties settings and you can start to see the logInfo 2015-12-27 0:20 GMT-08:00 salexln : > Hi guys, > > I'm debugging my code in

Re: SQL language vs DataFrame API

2015-12-09 Thread Stephen Boesch
Is this a candidate for the version 1.X/2.0 split? 2015-12-09 16:29 GMT-08:00 Michael Armbrust : > Yeah, I would like to address any actual gaps in functionality that are > present. > > On Wed, Dec 9, 2015 at 4:24 PM, Cristian Opris > wrote:

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Stephen Boesch
I will echo Steve L's comment about having zinc running (with --nailed). That provides at least a 2X speedup - sometimes without it spark simply does not build for me. 2015-12-08 9:33 GMT-08:00 Josh Rosen : > @Nick, on a fresh EC2 instance a significant chunk of the

Re: A proposal for Spark 2.0

2015-11-12 Thread Stephen Boesch
My understanding is that the RDD's presently have more support for complete control of partitioning which is a key consideration at scale. While partitioning control is still piecemeal in DF/DS it would seem premature to make RDD's a second-tier approach to spark dev. An example is the use of

Re: State of the Build

2015-11-05 Thread Stephen Boesch
Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build environment by updating the pom.xml in each of the subprojects. If you were able to come up with a structure that avoids that approach it would be an improvement. 2015-11-05 15:38 GMT-08:00 Jakob Odersky

Re: Using scala-2.11 when making changes to spark source

2015-09-28 Thread Stephen Boesch
> > On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch <java...@gmail.com> wrote: > >> >> The dev/change-scala-version.sh [2.11] script modifies in-place the >> pom.xml files across all of the modules. This is a git-visible change. So >> if we wish to make

Using scala-2.11 when making changes to spark source

2015-09-20 Thread Stephen Boesch
The dev/change-scala-version.sh [2.11] script modifies in-place the pom.xml files across all of the modules. This is a git-visible change. So if we wish to make changes to spark source in our own fork's - while developing with scala 2.11 - we would end up conflating those updates with our own.

Re: Enum parameter in ML

2015-09-16 Thread Stephen Boesch
There was a long thread about enum's initiated by Xiangrui several months back in which the final consensus was to use java enum's. Is that discussion (/decision) applicable here? 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander : > Hi Joseph, > > > > Strings sounds

Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Stephen Boesch
Please do a *clean* package and reply back if you still encounter issues. 2015-07-09 7:24 GMT-07:00 Yijie Shen henry.yijies...@gmail.com: Hi, I use the clean version just clone from the master branch, build with: build/mvn -Phive -Phadoop-2.4 -DskipTests package And BUILD FAILURE at last,

Re: enum-like types in Spark

2015-07-01 Thread Stephen Boesch
I am reviving an old thread here. The link for the example code for the java enum based solution is now dead: would someone please post an updated link showing the proper interop? Specifically: it is my understanding that java enum's may not be created within Scala. So is the proposed solution

Re: How to link code pull request with JIRA ID?

2015-05-13 Thread Stephen Boesch
following up from Nicholas, it is [SPARK-12345] Your PR description where 12345 is the jira number. One thing I tend to forget is when/where to include the subproject tag e.g. [MLLIB] 2015-05-13 11:11 GMT-07:00 Nicholas Chammas nicholas.cham...@gmail.com: That happens automatically when

Re: Pickling error when attempting to add a method in pyspark

2015-04-30 Thread Stephen Boesch
Bumping this. Anyone of you having some familiarity with py4j interface in pyspark? thanks 2015-04-27 22:09 GMT-07:00 Stephen Boesch java...@gmail.com: My intention is to add pyspark support for certain mllib spark methods. I have been unable to resolve pickling errors of the form

Re: IntelliJ Runtime error

2015-04-04 Thread Stephen Boesch
Thanks Cheng. Yes, the problem is that the way to set up to run inside Intellij changes v frequently. It is unfortunately not simply a one-time investment to get IJ debugging working properly: the steps required are a moving target approximately monthly to bi-monthly. Doing remote debugging is

Re: enum-like types in Spark

2015-03-04 Thread Stephen Boesch
#4 but with MemoryOnly (more scala-like) http://docs.scala-lang.org/style/naming-conventions.html Constants, Values, Variable and Methods Constant names should be in upper camel case. That is, if the member is final, immutable and it belongs to a package object or an object, it may be

Re: Broken record a bit here: building spark on intellij with sbt

2015-02-05 Thread Stephen Boesch
...@sigmoidanalytics.com: Here's the sbt version https://docs.sigmoidanalytics.com/index.php/Step_by_Step_instructions_on_how_to_build_Spark_App_with_IntelliJ_IDEA Thanks Best Regards On Thu, Feb 5, 2015 at 8:55 AM, Stephen Boesch java...@gmail.com wrote: For building in intellij with sbt my mileage has

Broken record a bit here: building spark on intellij with sbt

2015-02-04 Thread Stephen Boesch
For building in intellij with sbt my mileage has varied widely: it had built as late as Monday (after the 1.3.0 release) - and with zero 'special' steps: just import as sbt project. However I can not presently repeat the process. The wiki page has the latest instructions on how to build with

Re: Building Spark with Pants

2015-02-02 Thread Stephen Boesch
There is a significant investment in sbt and maven - and they are not at all likely to be going away. A third build tool? Note that there is also the perspective of building within an IDE - which actually works presently for sbt and with a little bit of tweaking with maven as well. 2015-02-02

Adding third party jars to classpath used by pyspark

2014-12-29 Thread Stephen Boesch
What is the recommended way to do this? We have some native database client libraries for which we are adding pyspark bindings. The pyspark invokes spark-submit. Do we add our libraries to the SPARK_SUBMIT_LIBRARY_PATH ? This issue relates back to an error we have been seeing Py4jError:

Re: Required file not found in building

2014-12-02 Thread Stephen Boesch
Thanks Sean, I followed suit (brew install zinc) and that is working. 2014-12-01 22:39 GMT-08:00 Sean Owen so...@cloudera.com: I'm having no problems with the build or zinc on my Mac. I use zinc from brew install zinc. On Tue, Dec 2, 2014 at 3:02 AM, Stephen Boesch java...@gmail.com wrote

Required file not found in building

2014-12-01 Thread Stephen Boesch
It seems there were some additional settings required to build spark now . This should be a snap for most of you ot there about what I am missing. Here is the command line I have traditionally used: mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests That command line is

Re: Required file not found in building

2014-12-01 Thread Stephen Boesch
the same command on MacBook and didn't experience the same error. Which OS are you using ? Cheers On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch java...@gmail.com wrote: It seems there were some additional settings required to build spark now . This should be a snap for most of you ot

Re: Required file not found in building

2014-12-01 Thread Stephen Boesch
such error. How did you install zinc-0.3.5.3 ? Cheers On Mon, Dec 1, 2014 at 8:00 PM, Stephen Boesch java...@gmail.com wrote: Anyone maybe can assist on how to run zinc with the latest maven build? I am starting zinc as follows: /shared/zinc-0.3.5.3/dist/target/zinc-0.3.5.3/bin/zinc

Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Stephen Boesch
HI Michael, That insight is useful. Some thoughts: * I moved from sbt to maven in June specifically due to Andrew Or's describing mvn as the default build tool. Developers should keep in mind that jenkins uses mvn so we need to run mvn before submitting PR's - even if sbt were used for day to

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Stephen Boesch
Yes I have seen this same error - and for team members as well - repeatedly since June. A Patrick and Cheng mentioned, the next step is to do an sbt clean 2014-11-02 19:37 GMT-08:00 Cheng Lian lian.cs@gmail.com: I often see this when I first build the whole Spark project with SBT, then

Re: Spark consulting

2014-10-31 Thread Stephen Boesch
May we please refrain from using spark mailing list for job inquiries. Thanks. 2014-10-31 13:35 GMT-07:00 Alessandro Baretta alexbare...@gmail.com: Hello, Is anyone open to do some consulting work on Spark in San Mateo? Thanks. Alex

Re: best IDE for scala + spark development?

2014-10-30 Thread Stephen Boesch
HI Nabeel, In what ways is the IJ version of scala repl enhanced? thx! 2014-10-30 3:41 GMT-07:00 nm3...@gmail.com: IntelliJ idea scala plugin comes with an enhanced REPL. It's a pretty decent option too. Nabeel On Oct 28, 2014, at 5:34 AM, Cheng Lian lian.cs@gmail.com wrote: My

HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar errors: Error:(173, 38) not found: value HiveShim

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
, Stephen Boesch java...@gmail.com wrote: I have run on the command line via maven and it is fine: mvn -Dscalastyle.failOnViolation=false -DskipTests -Pyarn -Phadoop-2.3 compile package install But with the latest code Intellij builds do not work. Following is one of 26 similar

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
PM, Stephen Boesch java...@gmail.com wrote: Hi Matei, Until my latest pull from upstream/master it had not been necessary to add the hive profile: is it now?? I am not using sbt gen-idea. The way to open in intellij has been to Open the parent directory. IJ recognizes it as a maven

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
On Oct 28, 2014, at 9:09 PM, Stephen Boesch java...@gmail.com wrote: Thanks Patrick for the heads up. I have not been successful to discover a combination of profiles (i.e. enabling hive or hive-0.12.0 or hive-13.0) that works in Intellij with maven. Anyone who knows how to handle this - a quick

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build (and hopefully run/debug..) under

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Stephen Boesch
issue. Also, you can remove sub projects unrelated to your tasks to accelerate compilation and/or avoid other IDEA build issues (e.g. Avro related Spark streaming build failure in IDEA). On 10/29/14 12:42 PM, Stephen Boesch wrote: I am interested specifically in how to build

Re: best IDE for scala + spark development?

2014-10-26 Thread Stephen Boesch
Many of the spark developers use Intellij. You will in any case probably want a full IDE (either IJ or eclipse) 2014-10-26 8:07 GMT-07:00 ll duy.huynh@gmail.com: i'm new to both scala and spark. what IDE / dev environment do you find most productive for writing code in scala with

Re: scalastyle annoys me a little bit

2014-10-24 Thread Stephen Boesch
Sean Owen beat me to (strongly) recommending running zinc server. Using the -pl option is great too - but be careful to only use it when your work is restricted to the modules in the (comma separated) list you provide to -pl. Also before using -pl you should do a mvn compile package install on

Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
Within its compute.close method, the JdbcRDD class has this interesting logic for closing jdbc connection: try { if (null != conn ! stmt.isClosed()) conn.close() logInfo(closed connection) } catch { case e: Exception = logWarning(Exception closing connection,

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
it is an oversight. Would you like to submit a pull request to fix that? On Tue, Aug 5, 2014 at 12:14 PM, Stephen Boesch java...@gmail.com wrote: Within its compute.close method, the JdbcRDD class has this interesting logic for closing jdbc connection: try { if (null

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
by context.addOnCompleteCallback{ () = closeIfNeeded() } or am I misunderstanding? On Tue, Aug 5, 2014 at 3:15 PM, Reynold Xin r...@databricks.com wrote: Thanks. Those are definitely great problems to fix! On Tue, Aug 5, 2014 at 1:11 PM, Stephen Boesch java...@gmail.com wrote

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
{ () = closeIfNeeded() } or am I misunderstanding? On Tue, Aug 5, 2014 at 3:15 PM, Reynold Xin r...@databricks.com wrote: Thanks. Those are definitely great problems to fix! On Tue, Aug 5, 2014 at 1:11 PM, Stephen Boesch java...@gmail.com wrote: Thanks Reynold, Ted Yu did mention offline

Re: 'Proper' Build Tool

2014-07-28 Thread Stephen Boesch
Hi Steve, I had the opportunity to ask this question at the Summit to Andrew Orr. He mentioned that with 1.0 the recommended build tool is with maven. sbt is however still supported. You will notice that the dependencies are now completely handled within the maven pom.xml: the SparkBuild.scala

No such file or directory errors running tests

2014-07-27 Thread Stephen Boesch
I have pulled latest from github this afternoon. There are many many errors: source_home/assembly/target/scala-2.10: No such file or directory This causes many tests to fail. Here is the command line I am running mvn -Pyarn -Phadoop-2.3 -Phive package test

Re: No such file or directory errors running tests

2014-07-27 Thread Stephen Boesch
. The ScalaTest plugin also supports running only a specific test suite as follows: mvn -Dhadoop.version=... -DwildcardSuites=org.apache.spark.repl.ReplSuite test On Sun, Jul 27, 2014 at 7:07 PM, Stephen Boesch java...@gmail.com wrote: I have pulled latest from github this afternoon

Re: No such file or directory errors running tests

2014-07-27 Thread Stephen Boesch
OK i'll do it after confirming all the tests run 2014-07-27 19:36 GMT-07:00 Reynold Xin r...@databricks.com: Would you like to submit a pull request? All doc source code are in the docs folder. Cheers. On Sun, Jul 27, 2014 at 7:35 PM, Stephen Boesch java...@gmail.com wrote: i Reynold

Re: No such file or directory errors running tests

2014-07-27 Thread Stephen Boesch
is 0.12.0, and Parquet is only supported in Hive 0.13 (HDP is 0.13) Any idea on what it would take to bump the Hive version up to the latest? Regards, - SteveN On 7/27/14, 19:39, Stephen Boesch java...@gmail.com wrote: OK i'll do it after confirming all the tests run 2014-07-27

Re: SQLQuerySuite error

2014-07-24 Thread Stephen Boesch
are passing after having properly performed the mvn install before running with the mvn -pl sql/core. 2014-07-24 12:04 GMT-07:00 Stephen Boesch java...@gmail.com: Are other developers seeing the following error for the recently added substr() method? If not, any ideas why the following

Current way to include hive in a build

2014-07-17 Thread Stephen Boesch
Having looked at trunk make-distribution.sh the --with-hive and --with-yarn are now deprecated. Here is the way I have built it: Added to pom.xml: profile idcdh5/id activation activeByDefaultfalse/activeByDefault /activation properties