Re: Running LocalClusterSparkContext

2015-04-03 Thread Marcelo Vanzin
When was the last time you pulled? That should have been fixed as part of SPARK-6473. Notice latest master suffers from SPARK-6673 on Windows. On Fri, Apr 3, 2015 at 12:21 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I am trying to execute unit tests with LocalClusterSparkContext

Re: Unit test logs in Jenkins?

2015-04-02 Thread Marcelo Vanzin
On Thu, Apr 2, 2015 at 3:01 AM, Steve Loughran ste...@hortonworks.com wrote: That would be really helpful to debug build failures. The scalatest output isn't all that helpful. Potentially an issue with the test runner, rather than the tests themselves. Sorry, that was me over-generalizing.

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Marcelo Vanzin
+1 (non-binding, doc issues aside) Ran batch of tests against yarn and standalone, including tests for rc2 blockers, all looks fine. On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Marcelo Vanzin
I haven't tested the rc2 bits yet, but I'd consider https://issues.apache.org/jira/browse/SPARK-6144 a serious regression from 1.2 (since it affects existing addFile() functionality if the URL is hdfs:...). Will test other parts separately. On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Marcelo Vanzin
-1 (non-binding) because of SPARK-6144. But aside from that I ran a set of tests on top of standalone and yarn and things look good. On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Marcelo Vanzin
Hey Patrick, Do you have a link to the bug related to Python and Yarn? I looked at the blockers in Jira but couldn't find it. On Mon, Feb 23, 2015 at 10:18 AM, Patrick Wendell pwend...@gmail.com wrote: So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Marcelo Vanzin
Hi Tom, are you using an sbt-built assembly by any chance? If so, take a look at SPARK-5808. I haven't had any problems with the maven-built assembly. Setting SPARK_HOME on the executors is a workaround if you want to use the sbt assembly. On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Marcelo Vanzin
Hi Jay, On Tue, Feb 3, 2015 at 6:28 AM, jayhutfles jayhutf...@gmail.com wrote: // Exposed for testing private[spark] var printStream: PrintStream = System.err But as the comment states that it's for testing, maybe I'm misunderstanding its intent... The comment is there to tell

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-30 Thread Marcelo Vanzin
+1 (non-binding) Ran spark-shell and Scala jobs on top of yarn (using the hadoop-2.4 tarball). There's a very slight behavioral change in the API. This code now throws an NPE: new SparkConf().setIfMissing(foo, null) It worked before. It's probably fine, though, since `SparkConf.set` would

Re: Setting JVM options to Spark executors in Standalone mode

2015-01-16 Thread Marcelo Vanzin
On Fri, Jan 16, 2015 at 10:07 AM, Michel Dufresne sparkhealthanalyt...@gmail.com wrote: Thank for your reply, I've should have mentioned that spark-env.sh is the only option i found because: - I'm creating the SpeakConf/SparkContext from a Play Application (therefore I'm not using

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
Hi RJ, I think I remember noticing in the past that some Guava metadata ends up overwriting maven-generated metadata in the assembly's manifest. That's probably something we should fix if that still affects the build. That being said, this is probably happening because you're using install-file

Re: Incorrect Maven Artifact Names

2015-01-14 Thread Marcelo Vanzin
On Wed, Jan 14, 2015 at 1:40 PM, RJ Nowling rnowl...@gmail.com wrote: What is the difference between pom and jar packaging? If you do an install on a pom packaging module, it will only install the module's pom file in the target repository. -- Marcelo

Re: cleaning up cache files left by SPARK-2713

2014-12-22 Thread Marcelo Vanzin
https://github.com/apache/spark/pull/3705 On Mon, Dec 22, 2014 at 10:19 AM, Cody Koeninger c...@koeninger.org wrote: Is there a reason not to go ahead and move the _cache and _lock files created by Utils.fetchFiles into the work directory, so they can be cleaned up more easily? I saw comments

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread Marcelo Vanzin
When building against Hadoop 2.x, you need to enable the appropriate profile, aside from just specifying the version. e.g. -Phadoop-2.3 for Hadoop 2.3. On Fri, Dec 5, 2014 at 12:51 PM, spark.dubovsky.ja...@seznam.cz wrote: Hi devs, I play with your amazing Spark here in Prague for some

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven examples, here is another related issue I'm having: I'd like to compile just the `core` module after a `mvn clean`, without building an assembly JAR first. Is this possible? Out of

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Marcelo: by my count, there are 19 maven modules in the codebase. I am typically only concerned with core (and therefore its two dependencies as well, `network/{shuffle,common}`). But you only need to compile

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: But you only need to compile the others once. once... every time I rebase off master, or am obliged to `mvn clean` by some other build-correctness bug, as I said before. In my experience this works out to a few

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-13 Thread Marcelo Vanzin
Hello there, So I just took a quick look at the pom and I see two problems with it. - activatedByDefault does not work like you think it does. It only activates by default if you do not explicitly activate other profiles. So if you do mvn package, scala-2.10 will be activated; but if you do mvn

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-13 Thread Marcelo Vanzin
Hey Patrick, On Thu, Nov 13, 2014 at 10:49 AM, Patrick Wendell pwend...@gmail.com wrote: I'm not sure chaining activation works like that. At least in my experience activation based on properties only works for properties explicitly specified at the command line rather than declared elsewhere

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-13 Thread Marcelo Vanzin
On Thu, Nov 13, 2014 at 10:58 AM, Patrick Wendell pwend...@gmail.com wrote: That's true, but note the code I posted activates a profile based on the lack of a property being set, which is why it works. Granted, I did not test that if you activate the other profile, the one with the property

Re: src/main/resources/kv1.txt not found in example of HiveFromSpark

2014-11-05 Thread Marcelo Vanzin
Yeah, the code looks for the file in the source location, not in the packaged location. It's in the root of the examples jar; you can extract it to src/main/resources/ kv1.txt in the local directory (creating the subdirs) and then you can run the example. Probably should be fixed though (bonus if

Re: Hadoop configuration for checkpointing

2014-11-04 Thread Marcelo Vanzin
On Tue, Nov 4, 2014 at 9:34 AM, Cody Koeninger c...@koeninger.org wrote: 2. Is there a reason StreamingContext.getOrCreate defaults to a blank hadoop configuration rather than org.apache.spark.deploy.SparkHadoopUtil.get.conf, which would pull values from spark config? This is probably

Re: HiveContext bug?

2014-10-27 Thread Marcelo Vanzin
Well, looks like a huge coincidence, but this was just sent to github: https://github.com/apache/spark/pull/2967 On Mon, Oct 27, 2014 at 3:25 PM, Marcelo Vanzin van...@cloudera.com wrote: Hey guys, I've been using the HiveFromSpark example to test some changes and I ran into an issue

Re: scalastyle annoys me a little bit

2014-10-24 Thread Marcelo Vanzin
On Fri, Oct 24, 2014 at 12:59 PM, Koert Kuipers ko...@tresata.com wrote: mvn clean package -DskipTests takes about 30 mins for me. thats painful since its needed for the tests. does anyone know any tricks to speed it up? (besides getting a better laptop). does zinc help? I noticed this too,

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Marcelo Vanzin
resource or 2) add dynamic resource management for Yarn mode is very much wanted. Jianshi On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: That's not something you might want to do usually

Re: scalastyle annoys me a little bit

2014-10-23 Thread Marcelo Vanzin
I know this is all very subjective, but I find long lines difficult to read. I also like how 100 characters fit in my editor setup fine (split wide screen), while a longer line length would mean I can't have two buffers side-by-side without horizontal scrollbars. I think it's fine to add a

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
Hi Ashwin, Let me try to answer to the best of my knowledge. On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Here are my questions : 1. Sharing spark context : How exactly multiple users can share the cluster using same spark context ? That's not

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: That's not something you might want to do usually. In general, a SparkContext maps to a user application My question was basically this. In this page in the official doc, under Scheduling within an application

Re: Raise Java dependency from 6 to 7

2014-10-18 Thread Marcelo Vanzin
Hadoop, for better or worse, depends on an ancient version of Jetty (6), that is even on a different package. So Spark (or anyone trying to use a newer Jetty) is lucky on that front... IIRC Hadoop is planning to move to Java 7-only starting with 2.7. Java 7 is also supposed to be EOL some time

Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Marcelo Vanzin
Another option is to add new style rules that trigger too many errors as warnings, and slowly clean them up. This means that reviewers will be burdened with manually enforcing the rules for a while, and we need to remember to turn them to errors once some threshold is reached. (The Hadoop build

Re: guava version conflicts

2014-09-22 Thread Marcelo Vanzin
Hi Cody, I'm still writing a test to make sure I understood exactly what's going on here, but from looking at the stack trace, it seems like the newer Guava library is picking up the Optional class from the Spark assembly. Could you try one of the options that put the user's classpath before the

Re: guava version conflicts

2014-09-22 Thread Marcelo Vanzin
at 12:46 PM, Cody Koeninger c...@koeninger.org wrote: We're using Mesos, is there a reasonable expectation that spark.files.userClassPathFirst will actually work? On Mon, Sep 22, 2014 at 1:42 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, I'm still writing a test to make sure I

Re: guava version conflicts

2014-09-22 Thread Marcelo Vanzin
the same version of guava 14 that spark is using. Obviously things break whenever a guava 15 / 16 feature is used at runtime, so a long term solution is needed. On Mon, Sep 22, 2014 at 3:13 PM, Marcelo Vanzin van...@cloudera.com wrote: Hmmm, a quick look at the code indicates this should work

Re: guava version conflicts

2014-09-20 Thread Marcelo Vanzin
Hmm, looks like the hack to maintain backwards compatibility in the Java API didn't work that well. I'll take a closer look at this when I get to work on Monday. On Fri, Sep 19, 2014 at 10:30 PM, Cody Koeninger c...@koeninger.org wrote: After the recent spark project changes to guava shading,

Re: How to kill a Spark job running in local mode programmatically ?

2014-09-05 Thread Marcelo Vanzin
I don't think that's possible at the moment, mainly because SparkSubmit expects it to be run from the command line, and not programatically, so it doesn't return anything that can be used to control what's going on. You may try to interrupt the thread calling into SparkSubmit, but that might not

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Marcelo Vanzin
+1 (non-binding) - checked checksums of a few packages - ran few jobs against yarn client/cluster using hadoop2.3 package - played with spark-shell in yarn-client mode On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Marcelo Vanzin
In our internal projects we use this bit of code in the maven pom to create a properties file with build information (sorry for the messy indentation). Then we have code that reads this property file somewhere and provides that info. This should make it easier to not have to change version numbers

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Marcelo Vanzin
need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit

SparkContext.hadoopConfiguration vs. SparkHadoopUtil.newConfiguration()

2014-08-01 Thread Marcelo Vanzin
Hi all, While working on some seemingly unrelated code, I ran into this issue where spark.hadoop.* configs were not making it to the Configuration objects in some parts of the code. I was trying to do that to avoid having to do dirty ticks with the classpath while running tests, but that's a

Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Marcelo Vanzin
Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is

RFC: [SPARK-529] Create constants for known config variables.

2014-06-23 Thread Marcelo Vanzin
I started with some code to implement an idea I had for SPARK-529, and before going much further (since it's a large and kinda boring change) I'd like to get some feedback from people. Current code it at: https://github.com/vanzin/spark/tree/SPARK-529 There are still some parts I haven't fully

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-06-02 Thread Marcelo Vanzin
Hi Patrick, Thanks for all the explanations, that makes sense. @DeveloperApi worries me a little bit especially because of the things Colin mentions - it's sort of hard to make people move off of APIs, or support different versions of the same API. But maybe if expectations (or lack thereof) are

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
On Fri, May 30, 2014 at 12:05 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: I don't know if Scala provides any mechanisms to do this beyond what Java provides. In fact it does. You can say something like private[foo] and the annotated element will be visible for all classes under foo (where

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Marcelo Vanzin
Hi Patrick, On Fri, May 30, 2014 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: 2. private[spark] 3. @Experimental or @DeveloperApi I understand @Experimental, but when would you use @DeveloperApi instead of private[spark]? Seems to me that, for the API user, they both mean very

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Marcelo Vanzin
Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-20 Thread Marcelo Vanzin
+1 (non-binding) I have: - checked signatures and checksums of the files - built the code from the git repo using both sbt and mvn (against hadoop 2.3.0) - ran a few simple jobs in local, yarn-client and yarn-cluster mode Haven't explicitly tested any of the recent fixes, streaming nor sql. On

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
) res0: Option[String] = Some(a) scala sys.props.get(bar) res1: Option[String] = Some(b) - Patrick On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote: Hello all, Maybe my brain is not evolved enough to be able to trace through what happens with command-line

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly

Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem

Re: Spark 1.0.0 rc3

2014-04-29 Thread Marcelo Vanzin
Hi Patrick, What are the expectations / guarantees on binary compatibility between 0.9 and 1.0? You mention some API changes, which kinda hint that binary compatibility has already been broken, but just wanted to point out there are other cases. e.g.: Exception in thread main

SparkListener questions

2014-04-18 Thread Marcelo Vanzin
Hello all, I'm currently taking a look at how to hook Spark with the Application Timeline Server (ATS) work going on in Yarn (YARN-1530). I've got a reasonable idea of how the Yarn part works, and the basic idea of what needs to be done in Spark, but I've run into a couple of issues with the

Re: RFC: varargs in Logging.scala?

2014-04-11 Thread Marcelo Vanzin
On Thu, Apr 10, 2014 at 5:46 PM, Michael Armbrust mich...@databricks.com wrote: ... all of the suffer from the fact that the log message needs to be built even though it might not be used. This is not true of the current implementation (and this is actually why Spark has a logging trait

RFC: varargs in Logging.scala?

2014-04-10 Thread Marcelo Vanzin
Hey there, While going through the try to get the hang of things, I've noticed several different styles of logging. They all have some downside (readability being one of them in certain cases), but all of the suffer from the fact that the log message needs to be built even though it might not be

<    1   2   3   4