Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

2016-05-13 Thread Marcelo Vanzin
On Fri, May 13, 2016 at 10:18 AM, Sean Busbey wrote: > I think LimitedPrivate gets a bad rap due to the way it is misused in > Hadoop. The use case here -- "we offer this to developers of > intermediate layers; those willing to update their software as we > update ours" I

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
On Mon, May 9, 2016 at 3:34 PM, Matt Cheah wrote: > @Marcelo: Interesting - why would this manifest on the YARN-client side > though (as Spark is the client to YARN in this case)? Spark as a client > shouldn’t care about what auxiliary services are on the YARN cluster. The

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
Hi Jesse, On Mon, May 9, 2016 at 2:52 PM, Jesse F Chen wrote: > Sean - thanks. definitely related to SPARK-12154. > Is there a way to continue use Jersey 1 for existing working environment? The error you're getting is because of a third-party extension that tries to talk to

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
y >>> package. >>> >>> I hope before long the hbase-spark module in HBase can arrive at a state >>> which we can advertise as mature - but we're not there yet. >>> >>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <van...@cloudera.com&

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
il.com> wrote: > There is an Open JIRA for fixing the documentation: HBASE-15473 > > I would say the refguide link you provided should not be considered as > complete. > > Note it is marked as Blocker by Sean B. > > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <van...@c

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
bq. it's actually in use right now in spite of not being in any upstream > HBase release > > If it is not in upstream, then it is not relevant for discussion on Apache > mailing list. > > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: >>

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
lease see HBASE-15333 'Enhance the filter to handle short, integer, long, > float and double' which is a bug fix. > > Please exclude presence of related of module in vendor distro from this > discussion. > > Thanks > > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin <van...@c

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
maybe create a separate tarball for them; removing HBaseTest only solves one of the dependency problems. Since we have examples for flume and kafka, for example, the Spark distribution ends up shipping flume and kafka jars (and a bunch of other things). > On Tue, Apr 19, 2016 at 10:15 AM, Marc

Re: YARN Shuffle service and its compatibility

2016-04-18 Thread Marcelo Vanzin
On Mon, Apr 18, 2016 at 3:09 PM, Reynold Xin wrote: > IIUC, the reason for that PR is that they found the string comparison to > increase the size in large shuffles. Maybe we should add the ability to > support the short name to Spark 1.6.2? Is that something that really

Re: YARN Shuffle service and its compatibility

2016-04-18 Thread Marcelo Vanzin
On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin wrote: > The bigger problem is that it is much easier to maintain backward > compatibility rather than dictating forward compatibility. For example, as > Marcin said, if we come up with a slightly different shuffle layout to >

Re: YARN Shuffle service and its compatibility

2016-04-18 Thread Marcelo Vanzin
On Mon, Apr 18, 2016 at 1:53 PM, Reynold Xin wrote: > That's not the only one. For example, the hash shuffle manager has been off > by default since Spark 1.2, and we'd like to remove it in 2.0: > https://github.com/apache/spark/pull/12423 If I understand things correctly,

Re: Build changes after SPARK-13579

2016-04-04 Thread Marcelo Vanzin
No, tests (except pyspark) should work without having to package anything first. On Mon, Apr 4, 2016 at 9:58 PM, Koert Kuipers <ko...@tresata.com> wrote: > do i need to run sbt package before doing tests? > > On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin <van...@cloudera.com

Build changes after SPARK-13579

2016-04-04 Thread Marcelo Vanzin
Hey all, We merged SPARK-13579 today, and if you're like me and have your hands automatically type "sbt assembly" anytime you're building Spark, that won't work anymore. You should now use "sbt package"; you'll still need "sbt assembly" if you require one of the remaining assemblies (streaming

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-28 Thread Marcelo Vanzin
Finally got some internal feedback on this, and we're ok with requiring people to deploy jdk8 for 2.0, so +1 too. On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende wrote: > +1, I also checked with few projects inside IBM that consume Spark and they > seem to be ok with the

Re: SPARK-13843 Next steps

2016-03-28 Thread Marcelo Vanzin
On Mon, Mar 28, 2016 at 8:33 AM, Cody Koeninger wrote: > There are compatibility problems with the java namespace changing > (e.g. access to private[spark]) I think it would be fine to keep the package names for backwards compatibility, but I think if these external projects

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
e of my earlier replies, it should be even possible to just do everything in a single job - compile for java 7 and still be able to test things in 1.8, including lambdas, which seems to be the main thing you were worried about. > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <van..

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin wrote: > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are not > binary compatible, whereas JVM 7 and 8 are binary compatible except certain

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky wrote: > You can, but since it's going to be a maintainability issue I would > argue it is in fact a problem. Every thing you choose to support generates a maintenance burden. Support 3 versions of Scala would be a huge

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
Hi Jakob, On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky wrote: > Reynold's 3rd point is particularly strong in my opinion. Supporting > Consider what would happen if Spark 2.0 doesn't require Java 8 and > hence not support Scala 2.12. Will it be stuck on an older version >

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
On Thu, Mar 24, 2016 at 10:13 AM, Reynold Xin wrote: > Yes So is it safe to say the only hard requirements for Java 8 in your list is (4)? (1) and (3) are infrastructure issues. Yes, it sucks to maintain more testing infrastructure and potentially more complicated build

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
On Thu, Mar 24, 2016 at 9:54 AM, Koert Kuipers wrote: > i guess what i am saying is that in a yarn world the only hard restrictions > left are the the containers you run in, which means the hadoop version, java > version and python version (if you use python). It is

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Marcelo Vanzin
On Thu, Mar 24, 2016 at 1:04 AM, Reynold Xin wrote: > I actually talked quite a bit today with an engineer on the scala compiler > team tonight and the scala 2.10 + java 8 combo should be ok. The latest > Scala 2.10 release should have all the important fixes that are needed

Re: SPARK-13843 Next steps

2016-03-22 Thread Marcelo Vanzin
+1 for getting flume back. On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote: > Hello all, > > I'd like to close out the discussion on SPARK-13843 by getting a poll from > the community on which components we should seriously reconsider re-adding > back to Apache

Re: SPARK-13843 and future of streaming backends

2016-03-20 Thread Marcelo Vanzin
Hi Reynold, thanks for the info. On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > If one really feels strongly that we should go through all the overhead to > setup an ASF subproject for these modules that won't work with the new > structured streaming, and want to

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Thu, Mar 17, 2016 at 12:01 PM, Cody Koeninger wrote: > i. An ASF project can clearly decide that some of its code is no > longer worth maintaining and delete it. This isn't really any > different. It's still apache licensed so ultimately whoever wants the > code can get

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 10:09 AM, Jean-Baptiste Onofré wrote: > a project can have multiple repos: it's what we have in ServiceMix, in > Karaf. > For the *-extra on github, if the code has been in the ASF, the PMC members > have to vote to move the code on *-extra. That's

SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Hello all, Recently a lot of the streaming backends were moved to a separate project on github and removed from the main Spark repo. While I think the idea is great, I'm a little worried about the execution. Some concerns were already raised on the bug mentioned above, but I'd like to have a

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
aware of a discussion in Dev list about this - agree with most of >> the observations. >> In addition, I did not see PMC signoff on moving (sub-)modules out. >> >> Regards >> Mridul >> >> >> >> On Thursday, March 17, 2016, Marcelo Vanzin <van...@cloudera.com>

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Also, just wanted to point out something: On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > Thanks for initiating this discussion. I merged the pull request because it > was unblocking another major piece of work for Spark 2.0: not requiring > assembly jars While I do

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
Hi Steve, thanks for the write up. On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran wrote: > If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs > to go through incubation. While normally its the incubator PMC which > sponsors/oversees the

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote: > So, my comment here is that any code *cannot* be removed from an Apache > project if there is a VETO issued which so far I haven't seen, though maybe > Marcelo can clarify that. No, my intention was not to veto the

Re: pull request template

2016-03-15 Thread Marcelo Vanzin
" with instructions to delete >> the text and replace with a description. This keeps the boilerplate >> titles out of the commit message. >> >> The special character and post processing just takes that a step further. >> >> On Sat, Mar 12, 2016 at 1:31 AM, Marce

Re: spark 2.0 logging binary incompatibility

2016-03-15 Thread Marcelo Vanzin
Logging is a "private[spark]" class so binary compatibility is not important at all, because code outside of Spark isn't supposed to use it. Mixing Spark library versions is also not recommended, not just because of this reason. There have been other binary changes in the Logging class in the

Re: SparkConf constructor now private

2016-03-15 Thread Marcelo Vanzin
Oh, my bad. I think I left that from a previous part of the patch and forgot to revert it. Will fix. On Tue, Mar 15, 2016 at 7:37 AM, Koert Kuipers <ko...@tresata.com> wrote: > in this commit > > 8301fadd8d269da11e72870b7a889596e3337839 > Author: Marcelo Vanzin <van...@cloude

Re: pull request template

2016-03-11 Thread Marcelo Vanzin
Hey all, Just wanted to ask: how do people like this new template? While I think it's great to have instructions for people to write proper commit messages, I think the current template has a few downsides. - I tend to write verbose commit messages already when I'm preparing a PR. Now when I

Re: Build fails

2016-02-24 Thread Marcelo Vanzin
The error is right there. Just read the output more carefully. On Wed, Feb 24, 2016 at 11:37 AM, Minudika Malshan wrote: > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-versions) @ > spark-parent_2.11 --- > [WARNING] Rule 0:

Re: Build fails

2016-02-24 Thread Marcelo Vanzin
Well, did you do what the message instructed you to do and looked above the message you copied for more specific messages for why the build failed? On Wed, Feb 24, 2016 at 11:28 AM, Minudika Malshan wrote: > Hi, > > I am trying to build from spark source code which was

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

2016-02-09 Thread Marcelo Vanzin
On Tue, Feb 9, 2016 at 12:16 PM, Jonathan Kelly wrote: > And we do set yarn.app.mapreduce.am.labels=CORE That sounds very mapreduce-specific, so I doubt Spark (or anything non-MR) would honor it. -- Marcelo

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

2016-02-09 Thread Marcelo Vanzin
You should be able to use spark.yarn.am.nodeLabelExpression if your version of YARN supports node labels (and you've added a label to the node where you want the AM to run). On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov wrote: > Am container starts first and yarn

Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Marcelo Vanzin
On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski wrote: > /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes... > [error] Cannot run program "javac": error=2, No such file or directory That doesn't exactly look like a Spark problem. -- Marcelo

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-18 Thread Marcelo Vanzin
+1 (non-binding) Tests the without-hadoop binaries (so didn't run Hive-related tests) with a test batch including standalone / client, yarn / client and cluster, including core, mllib and streaming (flume and kafka). On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Marcelo Vanzin
I was going to say that spark.executor.port is not used anymore in 1.6, but damn, there's still that akka backend hanging around there even when netty is being used... we should fix this, should be a simple one-liner. On Wed, Dec 16, 2015 at 2:35 PM, singinpirate

Re: Re: does spark really support label expr like && or || ?

2015-12-16 Thread Marcelo Vanzin
On Wed, Dec 16, 2015 at 6:31 PM, Allen Zhang wrote: > so , my question is does the spark.yarn.executor.nodeLabelExpression and > spark.yarn.am.nodeLabelExpression really support "EXPRESSION" like and &&, > or ||, or even ! and so on. Spark doesn't do anything with those

VerifyError running Spark SQL code?

2015-11-25 Thread Marcelo Vanzin
I've been running into this error when running Spark SQL recently; no matter what I try (completely clean build or anything else) doesn't seem to fix it. Anyone has some idea of what's wrong? [info] Exception encountered when attempting to run a suite with class name:

Re: VerifyError running Spark SQL code?

2015-11-25 Thread Marcelo Vanzin
mode) > > on OSX. > > On Wed, Nov 25, 2015 at 4:51 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> I've been running into this error when running Spark SQL recently; no >> matter what I try (completely clean build or anything else) doesn't >> seem to f

Re: VerifyError running Spark SQL code?

2015-11-25 Thread Marcelo Vanzin
relativeSD match { +case Literal(d: Double, DoubleType) => d +case _ => + throw new AnalysisException("The second argument should be a double literal.") + } + } On Wed, Nov 25, 2015 at 5:29 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > $ java -version > j

Re: A proposal for Spark 2.0

2015-11-10 Thread Marcelo Vanzin
On Tue, Nov 10, 2015 at 6:51 PM, Reynold Xin wrote: > I think we are in agreement, although I wouldn't go to the extreme and say > "a release with no new features might even be best." > > Can you elaborate "anticipatory changes"? A concrete example or so would be > helpful.

Re: Master build fails ?

2015-11-06 Thread Marcelo Vanzin
On Fri, Nov 6, 2015 at 2:21 AM, Steve Loughran wrote: > Maven's closest-first policy has a different flaw, namely that its not always > obvious why a guava 14.0 that is two hops of transitiveness should take > priority over a 16.0 version three hops away. Especially when

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Marcelo Vanzin
The way I read Tom's report, it just affects a long-deprecated command line option (--num-workers). I wouldn't block the release for it. On Fri, Nov 6, 2015 at 12:10 PM, Sean Owen wrote: > Hm, if I read that right, looks like --num-executors doesn't work at > all on YARN

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Does anyone know how to get something similar to "mvn dependency:tree" from sbt? mvn dependency:tree with hadoop 2.6.0 does not show any instances of guava 16... On Thu, Nov 5, 2015 at 11:37 AM, Ted Yu wrote: > build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver >

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
, 2015 at 11:55 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Answering my own question: "dependency-graph" > > On Thu, Nov 5, 2015 at 11:44 AM, Marcelo Vanzin <van...@cloudera.com> wrote: >> Does anyone know how to get something similar to "mv

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Seems like it's an sbt issue, not a maven one, so "dependency:tree" might not help. Still, the command line would be helpful. I use sbt and don't see this. On Thu, Nov 5, 2015 at 10:44 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Hi Jeff, > > On Tue, Nov 3, 2015 at

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Hi Jeff, On Tue, Nov 3, 2015 at 2:50 AM, Jeff Zhang wrote: > Looks like it's due to guava version conflicts, I see both guava 14.0.1 and > 16.0.1 under lib_managed/bundles. Anyone meet this issue too ? What command line are you using to build? Can you run "mvn dependency:tree"

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Answering my own question: "dependency-graph" On Thu, Nov 5, 2015 at 11:44 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Does anyone know how to get something similar to "mvn dependency:tree" from > sbt? > > mvn dependency:tree with hadoop 2.6.0 does

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
FYI I pushed a fix for this to github; so if you pull everything should work now. On Thu, Nov 5, 2015 at 12:07 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Man that command is slow. Anyway, it seems guava 16 is being brought > transitively by curator 2.6.0 which should have bee

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Marcelo Vanzin
What is broken? Looks fine to me. On Fri, Oct 2, 2015 at 10:49 AM, andy petrella wrote: > Yup folks, > > I've been reported by someone building the Spark-Notebook that repo1 is > apparently broken for scala 2.10 and spark 1.5.0. > > Check this >

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Marcelo Vanzin
Hmm, now I get that too (did not get it before). Maybe the servers are having issues. On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu wrote: > I tried to access > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom > on Chrome

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-25 Thread Marcelo Vanzin
Ignoring my previous question, +1. Tested several different jobs on YARN and standalone with dynamic allocation on. On Fri, Sep 25, 2015 at 11:32 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Mostly for my education (I hope), but I was testing > "spark-1.5.1-bin-without-had

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-25 Thread Marcelo Vanzin
Mostly for my education (I hope), but I was testing "spark-1.5.1-bin-without-hadoop.tgz" assuming it would contain everything (including HiveContext support), just without the Hadoop common jars in the assembly. But HiveContext is not there. Is this expected? On Thu, Sep 24, 2015 at 12:27 AM,

Re: RFC: packaging Spark without assemblies

2015-09-25 Thread Marcelo Vanzin
On Wed, Sep 23, 2015 at 4:43 PM, Patrick Wendell wrote: > For me a key step in moving away would be to fully audit/understand > all compatibility implications of removing it. If other people are > supportive of this plan I can offer to help spend some time thinking > about any

RFC: packaging Spark without assemblies

2015-09-23 Thread Marcelo Vanzin
Hey all, This is something that we've discussed several times internally, but never really had much time to look into; but as time passes by, it's increasingly becoming an issue for us and I'd like to throw some ideas around about how to fix it. So, without further ado:

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-15 Thread Marcelo Vanzin
That test explicitly sets the number of executor cores to 32. object TestHive extends TestHiveContext( new SparkContext( System.getProperty("spark.sql.test.master", "local[32]"), On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin wrote: > Yea I think this is where

Re: Deserializing JSON into Scala objects in Java code

2015-09-08 Thread Marcelo Vanzin
Hi Kevin, How did you try to use the Scala module? Spark has this code when setting up the ObjectMapper used to generate the output: mapper.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule) As for supporting direct serialization to Java objects, I don't think that was the

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-28 Thread Marcelo Vanzin
Hi Jonathan, Can you be more specific about what problem you're running into? SPARK-6869 fixed the issue of pyspark vs. assembly jar by shipping the pyspark archives separately to YARN. With that fix in place, pyspark doesn't need to get anything from the Spark assembly, so it has no problems

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Marcelo Vanzin
Are you just submitting from Windows or are you also running YARN on Windows? If the former, I think the only fix that would be needed is this line (from that same patch): https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L434 I don't

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Marcelo Vanzin
+1. I tested the without hadoop binary package and ran our internal tests on it with dynamic allocation both on and off. The Windows issue Sen raised could be considered a regression / blocker, though, and it's a one line fix. If we feel that's important, let me know and I'll put up a PR against

Re: Building with sbt impossible to get artifacts when data has not been loaded

2015-08-26 Thread Marcelo Vanzin
I ran into the same error (different dependency) earlier today. In my case, the maven pom files and the sbt dependencies had a conflict (different versions of the same artifact) and ivy got confused. Not sure whether that will help in your case or not... On Wed, Aug 26, 2015 at 2:23 PM, Holden

Re: Spark (1.2.0) submit fails with exception saying log directory already exists

2015-08-25 Thread Marcelo Vanzin
This probably means your app is failing and the second attempt is hitting that issue. You may fix the directory already exists error by setting spark.eventLog.overwrite=true in your conf, but most probably that will just expose the actual error in your app. On Tue, Aug 25, 2015 at 9:37 AM,

Re: Spark builds: allow user override of project version at buildtime

2015-08-25 Thread Marcelo Vanzin
On Tue, Aug 25, 2015 at 2:17 AM, andrew.row...@thomsonreuters.com wrote: Then, if I wanted to do a build against a specific profile, I could also pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts correctly named. The default behaviour should be the same. Child pom

Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Marcelo Vanzin
Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Marcelo Vanzin
to improve that heuristic, it would be great. - Patrick On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Marcelo Vanzin
The pom files look correct, but this file is not: https://github.com/apache/spark/blob/4c56ad772637615cc1f4f88d619fac6c372c8552/core/src/main/scala/org/apache/spark/package.scala So, I guess, -1? On Fri, Aug 21, 2015 at 2:17 PM, mkhaitman mark.khait...@chango.com wrote: Just a heads up that

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Marcelo Vanzin
Why do you need to use Spark or Flume for this? You can just use curl and hdfs: curl ftp://blah | hdfs dfs -put - /blah On Fri, Aug 14, 2015 at 1:15 PM, Varadhan, Jawahar varad...@yahoo.com.invalid wrote: What is the best way to bring such a huge file from a FTP server into Hadoop to

Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Marcelo Vanzin
Just note that if you have mvn in your path, you need to use build/mvn --force. On Mon, Aug 3, 2015 at 12:38 PM, Sean Owen so...@cloudera.com wrote: Using ./build/mvn should always be fine. Your local mvn is fine too if it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users on

Re: Opinion on spark-class script simplification and posix compliance

2015-07-28 Thread Marcelo Vanzin
On Tue, Jul 28, 2015 at 12:13 PM, Félix-Antoine Fortin felix-antoine.for...@calculquebec.ca wrote: The while loop cannot be executed with sh, while the single line can be. Since on my system, sh is simply a link on bash, with some options activated, I guess this simply means that the while

Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Marcelo Vanzin
Or, alternatively, the bus could catch that error and ignore / log it, instead of stopping the context... On Wed, Jul 15, 2015 at 12:20 PM, Marcelo Vanzin van...@cloudera.com wrote: Hmm, the Java listener was added in 1.3, so I think it will work for my needs. Might be worth it to make

Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Marcelo Vanzin
/src/main/java/org/apache/spark/JavaSparkListener.java#L23 I think it might be reasonable that the Scala trait provides only source compatibitly and the Java class provides binary compatibility. - Patrick On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote: Hey all

Re: Joining Apache Spark

2015-07-13 Thread Marcelo Vanzin
Hello, welcome, and please start by going through the web site ( http://spark.apache.org/), especially the Contributors section at the bottom. On Mon, Jul 13, 2015 at 3:58 PM, Animesh Tripathy a.tripathy...@gmail.com wrote: I would like to join the Apache Spark Development Team in order to

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 5:19 AM, Sean Owen so...@cloudera.com wrote: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Marcelo Vanzin
+1 (non-binding) Ran some of our internal test suite (yarn + standalone) against the hadoop-2.6 and without-hadoop binaries. On Tue, Jun 2, 2015 at 8:53 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
cache, but I don't know more than this. On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com wrote: Hey all, I've been bit by something really weird lately and I'm starting to think it's related to the ivy support we have in Spark, and running unit tests that use

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
in our build system, and when it has happened, i wasn't able to determine the cause. On Thu, Jun 4, 2015 at 10:16 AM, Marcelo Vanzin van...@cloudera.com wrote: On Thu, Jun 4, 2015 at 10:04 AM, shane knapp skn...@berkeley.edu wrote: this has occasionally happened on our jenkins as well (twice

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Marcelo Vanzin
, Burak On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote: I've definitely seen the dependency path must be relative problem, and fixed it by deleting the ivy cache, but I don't know more than this. On Thu, Jun 4, 2015 at 1:33 AM, Marcelo Vanzin van...@cloudera.com wrote

Ivy support in Spark vs. sbt

2015-06-03 Thread Marcelo Vanzin
Hey all, I've been bit by something really weird lately and I'm starting to think it's related to the ivy support we have in Spark, and running unit tests that use that code. The first thing that happens is that after running unit tests, sometimes my sbt builds start failing with error saying

Re: Change for submitting to yarn in 1.3.1

2015-05-22 Thread Marcelo Vanzin
Hi Kevin, One thing that might help you in the meantime, while we work on a better interface for all this... On Thu, May 21, 2015 at 5:21 PM, Kevin Markey kevin.mar...@oracle.com wrote: Making *yarn.Client* private has prevented us from moving from Spark 1.0.x to Spark 1.2 or 1.3 despite many

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Marcelo Vanzin
for you. It seems like adding a way to get back the appID would be a reasonable addition to the launcher. - Patrick On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin van...@cloudera.com van...@cloudera.com wrote: On Tue, May 12, 2015 at 11:34 AM, Kevin Markey kevin.mar...@oracle.com

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Marcelo Vanzin
Hi Nathan, On Thu, May 21, 2015 at 7:30 PM, Nathan Kronenfeld nkronenfeld@uncharted.software wrote: In researching and discussing these issues with Cloudera and others, we've been told that only one mechanism is supported for starting Spark jobs: the *spark-submit* scripts. Is this new?

Re: userClassPathFirst and loader constraint violation

2015-05-20 Thread Marcelo Vanzin
Hmm... this seems to be particular to logging (KafkaRDD.scala:89 in my tree is a log statement). I'd expect KafkaRDD to be loaded from the system class loader - or are you repackaging it in your app? I'd have to investigate more to come with an accurate explanation here... but it seems that the

Re: Recent Spark test failures

2015-05-15 Thread Marcelo Vanzin
Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second, can the PR builder at least cover hadoop 2.2? That's the actual version used to create the official Spark artifacts for maven, and the oldest version Spark supports for YARN..

Re: Change for submitting to yarn in 1.3.1

2015-05-15 Thread Marcelo Vanzin
Hi Chester, Writing a design / requirements doc sounds great. One comment though: On Thu, May 14, 2015 at 11:18 PM, Chester At Work ches...@alpinenow.com wrote: For #5 yes, it's about the command line args. These are args are the input for the spark jobs. Seems a bit too much to create

Re: Change for submitting to yarn in 1.3.1

2015-05-14 Thread Marcelo Vanzin
Hi Chester, Thanks for the feedback. A few of those are great candidates for improvements to the launcher library. On Wed, May 13, 2015 at 5:44 AM, Chester At Work ches...@alpinenow.com wrote: 1) client should not be private ( unless alternative is provided) so we can call it directly.

Re: Change for submitting to yarn in 1.3.1

2015-05-12 Thread Marcelo Vanzin
On Tue, May 12, 2015 at 11:34 AM, Kevin Markey kevin.mar...@oracle.com wrote: I understand that SparkLauncher was supposed to address these issues, but it really doesn't. Yarn already provides indirection and an arm's length transaction for starting Spark on a cluster. The launcher introduces

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Marcelo Vanzin
As for the idea, I'm +1. Spark is the only reason I still have jdk6 around - exactly because I don't want to cause the issue that started this discussion (inadvertently using JDK7 APIs). And as has been pointed out, even J7 is about to go EOL real soon. Even Hadoop is moving away (I think 2.7

Uninitialized session in HiveContext?

2015-04-30 Thread Marcelo Vanzin
Hey all, We ran into some test failures in our internal branch (which builds against Hive 1.1), and I narrowed it down to the fix below. I'm not super familiar with the Hive integration code, but does this look like a bug for other versions of Hive too? This caused an error where some internal

Re: Uninitialized session in HiveContext?

2015-04-30 Thread Marcelo Vanzin
of the session not lazy. It would be great to hear if this also works for your internal integration tests once the patch is up (hopefully this weekend). Michael On Thu, Apr 30, 2015 at 2:36 PM, Marcelo Vanzin van...@cloudera.com wrote: Hey all, We ran into some test failures in our internal

Re: Plans for upgrading Hive dependency?

2015-04-27 Thread Marcelo Vanzin
That's a lot more complicated than you might think. We've done some basic work to get HiveContext to compile against Hive 1.1.0. Here's the code: https://github.com/cloudera/spark/commit/00e2c7e35d4ac236bcfbcd3d2805b483060255ec We didn't sent that upstream because that only solves half of the

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Marcelo Vanzin
On Tue, Apr 21, 2015 at 1:30 AM, Hrishikesh Subramonian hrishikesh.subramon...@flytxt.com wrote: Run streaming tests ... Failed to find Spark Streaming Kafka assembly jar in /home/xyz/spark/external/kafka-assembly You need to build Spark with 'build/sbt assembly/assembly

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-13 Thread Marcelo Vanzin
+1 (non-binding) Tested 2.6 build with standalone and yarn (no external shuffle service this time, although it does come up). On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag

Re: Connect to remote YARN cluster

2015-04-09 Thread Marcelo Vanzin
If YARN is authenticating users it's probably running on kerberos, so you need to log in with your kerberos credentials (kinit) before submitting an application. On Thu, Apr 9, 2015 at 4:57 AM, Zoltán Zvara zoltan.zv...@gmail.com wrote: I'm trying to debug Spark in yarn-client mode. On my local,

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Marcelo Vanzin
+1 (non-binding) Ran standalone and yarn tests on the hadoop-2.6 tarball, with and without the external shuffle service in yarn mode. On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The

<    1   2   3   4   >