Re: Hadoop 3 support

2018-04-02 Thread Marcelo Vanzin
Saisai filed SPARK-23534, but the main blocking issue is really SPARK-18673. On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin wrote: > Does anybody know what needs to be done in order for Spark to support Hadoop > 3? > -- Marcelo

Re: [Spark-core] why Executors send HeartBeat to driver but not App Master

2018-03-02 Thread Marcelo Vanzin
The app master doesn't have anything it needs to periodically tell the driver, so there was no need for a heartbeat. On Fri, Mar 2, 2018 at 1:44 AM, sandeep_katta wrote: > I want to attempt *SPARK-23545* bug,so I have some questions regarding the > design, > > I

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
annel so job > will not be completed. > > So needed a mechanism to close only invalid connections . > > > On Wed, 28 Feb 2018 at 10:54 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Spark already has code to monitor idle connections and close

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
Spark already has code to monitor idle connections and close them. That's in TransportChannelHandler.java. If there's anything to do here, it's to allow all users of the transport library to support the "close idle connections" feature of that class. On Wed, Feb 28, 2018 at 9:07 AM,

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
mise when it was proposed back in May 2017. > > I don’t think I can capture the long reviews and many discussed that went > in, for further discussion please start from JIRA SPARK-20889. > > > > ____________ > From: Marcelo Vanzin <van...@cloudera.com>

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
I followed Misi's instructions: - click on https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html - click on "s" at the top - find "sin" and click on it And that does not give me the documentation for the "sin" function. That leads to you to a really ugly list of

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Marcelo Vanzin
+1 Checked the archives; ran a subset of our internal tests on the hadoop2.7 archive, looks good. On Thu, Feb 22, 2018 at 2:23 PM, Sameer Agarwal wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.0. The vote is open until Tuesday

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Done, thanks! On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <samee...@apache.org> wrote: > Sure, please feel free to backport. > > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Hey Sameer, >> >> Mind includi

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Hey Sameer, Mind including https://github.com/apache/spark/pull/20643 (SPARK-23468) in the new RC? It's a minor bug since I've only hit it with older shuffle services, but it's pretty safe. On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal wrote: > This RC has failed due to

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Marcelo Vanzin
Since it seems there are other issues to fix, I raised SPARK-23413 to blocker status to avoid having to change the disk format of history data in a minor release. On Wed, Feb 14, 2018 at 11:06 PM, Nick Pentreath wrote: > -1 for me as we elevated

Re: File JIRAs for all flaky test failures

2018-02-08 Thread Marcelo Vanzin
Hey all, I just wanted to bring up Kay's old e-mail about this. If you see a flaky test during a PR, don't just ask for a re-test. File a bug so that we know that test is flaky and someone will eventually take a look at it. A lot of them also make great newbie bugs. I've filed a bunch of these

Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Marcelo Vanzin
I think it would make sense to drop one of them, but not necessarily 2.6. It kinda depends on what wire compatibility guarantees the Hadoop libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)? Is the opposite safe (not sure)? If the answer to the latter question is "no", then

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen wrote: > I am still seeing these tests fail or hang: > > - subscribing topic by name from earliest offsets (failOnDataLoss: false) > - subscribing topic by name from earliest offsets (failOnDataLoss: true) This is something that we

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209 and that's a regression we need to fix. I'll post a patch soon. So -1 (although others have already -1'ed). On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Given that the bugs I was worr

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-24 Thread Marcelo Vanzin
Given that the bugs I was worried about have been dealt with, I'm upgrading to +1. On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > +0 > > Signatures check out. Code compiles, although I see the errors in [1] > when untarring the source archive; per

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Marcelo Vanzin
On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen wrote: > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking > it with 'xvzf' and also unzipping it first, and it untarred without warnings > in either case. The warnings just show up if you unpack using GNU

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Marcelo Vanzin
+0 Signatures check out. Code compiles, although I see the errors in [1] when untarring the source archive; perhaps we should add "use GNU tar" to the RM checklist? Also ran our internal tests and they seem happy. My concern is the list of open bugs targeted at 2.3.0 (ignoring the documentation

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 1:53 PM, Anirudh Ramanathan wrote: > As I understand, the bigger change discussed here are like the init > containers, which will be more on the implementation side than a user facing > change/behavioral change - which is why it seemed okay to

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
ds), > there should be enough time to make the change, test and release with > confidence. > > On Wed, Jan 10, 2018 at 3:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan >> <ramanath...@google.

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 4:13 AM, Eric Charles wrote: >> Again, I don't see what is all this hoopla about fine grained control >> of dependency downloads. Spark solved this years ago for Spark >> applications. Don't reinvent the wheel. > > Init-containers are used today to

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan wrote: > We can start by getting a PR going perhaps, and start augmenting the > integration testing to ensure that there are no surprises - with/without > credentials, accessing GCS, S3 etc as well. > When we get enough

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:51 PM, Matt Cheah wrote: > those sidecars may perform side effects that are undesirable if the main > Spark application failed because dependencies weren’t available If the contract is that the Spark driver pod does not have an init container, and

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:30 PM, Yinan Li wrote: > 1. Retries of init-containers are automatically supported by k8s through pod > restart policies. For this point, sorry I'm not sure how spark-submit > achieves this. Great, add that feature to spark-submit, everybody

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:16 PM, Yinan Li wrote: > but we can not rule out the benefits init-containers bring either. Sorry, but what are those again? So far all the benefits are already provided by spark-submit... > Again, I would suggest we look at this more thoroughly

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:00 PM, Yinan Li wrote: > I want to re-iterate on one point, that the init-container achieves a clear > separation between preparing an application and actually running the > application. It's a guarantee provided by the K8s admission control and >

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:47 PM, Matt Cheah wrote: >> With a config value set by the submission code, like what I'm doing to >> prevent client mode submission in my p.o.c.? > > The contract for what determines the appropriate scheduler backend to > instantiate is then going

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:33 PM, Matt Cheah wrote: > If we use spark-submit in client mode from the driver container, how do we > handle needing to switch between a cluster-mode scheduler backend and a > client-mode scheduler backend in the future? With a config value set

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah wrote: > I’d imagine this is a reason why YARN hasn’t went with using spark-submit > from the application master... I wouldn't use YARN as a template to follow when writing a new backend. A lot of the reason why the YARN backend

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
his idea in detail, and understand the implications and then get > back to you. > > Thanks for the detailed responses here, and for spending time with the idea. > (Also, you're more than welcome to attend the meeting - there's a link here > if you're around.) > > Cheers, > Anir

Re: Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
One thing I forgot in my previous e-mail is that if a resource is remote I'm pretty sure (but haven't double checked the code) that executors will download it directly from the remote server, and not from the driver. So there, distributed download without an init container. On Tue, Jan 9, 2018 at

Re: Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas wrote: > You can argue that executors downloading from > external servers would be faster than downloading from the driver, but > I’m not sure I’d agree - it can go both ways. > > On a tangentially related note, one of

Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
Hello, Me again. I was playing some more with the kubernetes backend and the whole init container thing seemed unnecessary to me. Currently it's used to download remote jars and files, mount the volume into the driver / executor, and place those jars in the classpath / move the files to the

Re: Kubernetes backend and docker images

2018-01-08 Thread Marcelo Vanzin
On Mon, Jan 8, 2018 at 1:39 PM, Matt Cheah wrote: > We would still want images to be able to be uniquely specified for the > driver vs. the executors. For example, not all of the libraries required on > the driver may be required on the executors, so the user would want to >

Kubernetes backend and docker images

2018-01-05 Thread Marcelo Vanzin
Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have

Re: Rolling policy in Spark event logs for long living streaming applications

2017-12-01 Thread Marcelo Vanzin
There's really no current solution to this. There's a brief discussion about it on SPARK-12140. Here we recommend people disable event logs for streaming apps, as sub-optimal as that might be... On Fri, Dec 1, 2017 at 3:27 AM, kankalapti omkar naidu wrote: > Dear spark

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-17 Thread Marcelo Vanzin
This is https://issues.apache.org/jira/browse/SPARK-20201. On Fri, Nov 17, 2017 at 8:51 AM, Felix Cheung wrote: > I wasn’t able to test this out. > > Is anyone else seeing this error? I see a few JVM fixes and getting back > ported, are they related to this? > > This

Re: Set spark.*.retained* configs to 0 when the UI is disabled?

2017-10-13 Thread Marcelo Vanzin
On Fri, Oct 13, 2017 at 12:49 PM, Craig Ingram wrote: > Are you referring to SPARK-20421 > > and SPARK-18085 ? If I > can lend a hand in this, just let me know. Yes,

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-03 Thread Marcelo Vanzin
Maybe you're running as root (or the admin account on your OS)? On Tue, Oct 3, 2017 at 12:12 PM, Nick Pentreath wrote: > Hmm I'm consistently getting this error in core tests: > > - SPARK-3697: ignore directories that cannot be read. *** FAILED *** > 2 was not equal

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-21 Thread Marcelo Vanzin
While you're at it, one thing that needs to be done is create a 2.1.3 version on JIRA. Not sure if you have enough permissions to do that. Fixes after an RC should use the new version, and if you create a new RC, you'll need to go and backdate the patches that went into the new RC. On Mon, Sep

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Marcelo Vanzin
+1 to this. There should be a script in the Spark repo that has all the logic needed for a release. That script should take the RM's key as a parameter. if there's a desire to keep the current Jenkins job to create the release, it should be based on that script. But from what I'm seeing there are

Re: Are there multiple processes out there running JIRA <-> Github maintenance tasks?

2017-08-30 Thread Marcelo Vanzin
s. > > On Mon, Aug 28, 2017 at 12:02 PM Marcelo Vanzin <van...@cloudera.com> wrote: >> >> It seems a little wonky, though. Feels like it's updating JIRA every >> time you comment on a PR. Or maybe it's still working through the >> backlog... >>

Re: Are there multiple processes out there running JIRA <-> Github maintenance tasks?

2017-08-28 Thread Marcelo Vanzin
It seems a little wonky, though. Feels like it's updating JIRA every time you comment on a PR. Or maybe it's still working through the backlog... On Mon, Aug 28, 2017 at 9:57 AM, Reynold Xin wrote: > The process for doing that was down before, and might've come back up and >

Re: SPIP: Spark on Kubernetes

2017-08-17 Thread Marcelo Vanzin
I have just some very high level knowledge of kubernetes, so I can't really comment on the details of the proposal that relate to it. But I have some comments about other areas of the linked documents: - It's good to know that there's a community behind this effort and mentions of lots of

Re: [VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-08-03 Thread Marcelo Vanzin
This vote passes with 3 binding +1 votes, 5 non-binding votes, and no -1 votes. Thanks all! +1 votes (binding): Tom Graves Sean Owen Marcelo Vanzin +1 votes (non-binding): Ryan Blue Denis Bolshakov Dong Joon Hyun Hyukjin Kwon Ashutosh Pathak On Mon, Jul 31, 2017 at 10:27 AM, Marcelo Vanzin

Re: [VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-08-01 Thread Marcelo Vanzin
Thanks all for the comments. Just a clarification: On Tue, Aug 1, 2017 at 2:18 AM, Sean Owen wrote: > Is 'spark-ui' too broad? doesn't sound like this module would actually house > all the UIs. spark-shs-ui or something? > Good that this can be implemented in parallel to the

Re: [VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-07-31 Thread Marcelo Vanzin
Adding my own +1 (binding). On Mon, Jul 31, 2017 at 10:27 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Hey all, > > Following the SPIP process, I'm putting this SPIP up for a vote. It's > been open for comments as an SPIP for about 3 weeks now, and had been > open w

[VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-07-31 Thread Marcelo Vanzin
Hey all, Following the SPIP process, I'm putting this SPIP up for a vote. It's been open for comments as an SPIP for about 3 weeks now, and had been open without the SPIP label for about 9 months before that. There has been no new feedback since it was tagged as an SPIP, so I'm assuming all the

Re: Spark history server running on Mongo

2017-07-19 Thread Marcelo Vanzin
On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov wrote: > Repository that I linked to does not require rebuilding Spark and could be > used with current distribution, which is preferable in my case. Fair enough, although that means that you're re-implementing the Spark UI,

Re: Spark history server running on Mongo

2017-07-18 Thread Marcelo Vanzin
See SPARK-18085. That has much of the same goals re: SHS resource usage, and also provides a (currently non-public) API where you could just create a MongoDB implementation if you want. On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov wrote: > Hello everyone! > > I have

SPIP: SPARK-18085: History server enhancements

2017-07-12 Thread Marcelo Vanzin
Hi all, I've requested feedback for this a few times in the past, but since now it's been labeled as an SPIP, I'll do it again. Please take a look and provide any feedback you might have! Also, be aware that development is well under way for this and the current code diverged slightly from the

Re: PR permission to kick Jenkins?

2017-05-05 Thread Marcelo Vanzin
I noticed it on this one (before I said "ok to test"): https://github.com/apache/spark/pull/17658 My requests went through, Tom's seemed to be ignored. On Fri, May 5, 2017 at 1:06 PM, shane knapp wrote: > also, do we have any recent PRs where i can see this happening? it

Re: Why "Executor" have no NettyRpcEndpointRef?

2017-05-05 Thread Marcelo Vanzin
I don't understand what it is you're trying to reach at. Are you just trying to understand the RPC library? Or do you actually have a question? On Fri, May 5, 2017 at 3:33 AM, cane wrote: > Yes,that's the true process. > And i think client is initalized when NettyRpcRef

Re: Why "Executor" have no NettyRpcEndpointRef?

2017-05-04 Thread Marcelo Vanzin
There's no actor system or akka dependency anymore, since 2.0 IIRC. So I'm really not sure what you're referring to. On Thu, May 4, 2017 at 8:29 PM, cane wrote: > Executor will start an actor system for remoting after rpcenv has been > created.You can refer to

Re: Why "Executor" have no NettyRpcEndpointRef?

2017-05-04 Thread Marcelo Vanzin
No sure I fully understand what you're asking, but the executor doesn't need to listen for incoming connections, so it doesn't need to start a server and advertise an address. On Thu, May 4, 2017 at 4:23 AM, cane wrote: > NettyRpcEndpointRef#toString if RpcAddress is

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Marcelo Vanzin
+1 (non-binding). Ran the hadoop-2.6 binary against our internal tests and things look good. On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.1. The vote is open until Fri, April

Re: Removing SSL from Spark's internal communications

2017-04-19 Thread Marcelo Vanzin
There's no file server anymore. And both the RPC endpoint (used to transfer files) and the block manager (broadcasts + other blocks) support encryption without SSL. On Wed, Apr 19, 2017 at 8:55 AM, Rostyslav Sotnychenko wrote: > Hi all, > > I am wondering what Community

Re: RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-24 Thread Marcelo Vanzin
On Fri, Mar 24, 2017 at 1:18 PM, Ryan Blue wrote: > For the status tracker, I'd like to see it replaced with something better > before deprecating it. I've been looking at it for implementing better > feedback in notebooks, and it looks sufficient for that at the moment. Is >

Re: RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-24 Thread Marcelo Vanzin
On Fri, Mar 24, 2017 at 1:13 PM, Josh Rosen wrote: > Let's not deprecate SparkStatusTracker (at least not until there's a > suitable replacement that's just as easy to use). Deprecating it now would > leave users with an unactionable warning and that's not great for

Re: RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-24 Thread Marcelo Vanzin
On Fri, Mar 24, 2017 at 12:07 PM, Josh Rosen wrote: > I think that it should be safe to remove JobProgressListener but I'd like to > keep the SparkStatusTracker API. Thanks Josh. I can work with that. My main concern would be keeping the listener around. Is it worth it

RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-23 Thread Marcelo Vanzin
Hello all, For those not following, I'm working on SPARK-18085, where my goal is to decouple the storage of UI data from the actual UI implementation. This is mostly targeted at the history server, so that it's possible to quickly load a "database" with UI information instead of the existing way

Re: spark-without-hive assembly for hive build/development purposes

2017-03-16 Thread Marcelo Vanzin
The solution to this is being tracked in https://issues.apache.org/jira/browse/HIVE-15302, although I haven't seen activity in a while. On Thu, Mar 16, 2017 at 3:42 PM, Zoltan Haindrich wrote: > Hello, > > Hive needs a spark assembly to execute the HoS tests. > Until now…this

Re: [Newbie] spark conf

2017-02-10 Thread Marcelo Vanzin
If you place core-site.xml in $SPARK_HOME/conf, I'm pretty sure Spark will pick it up. (Sounds like you're not running YARN, which would require HADOOP_CONF_DIR.) Also this is more of a user@ question. On Fri, Feb 10, 2017 at 1:35 PM, Sam Elamin wrote: > Hi All, > > >

Re: clientMode in RpcEnv.create in Spark on YARN vs general case (driver vs executors)?

2017-01-18 Thread Marcelo Vanzin
On Wed, Jan 18, 2017 at 1:29 AM, Jacek Laskowski wrote: > I'm trying to get the gist of clientMode input parameter for > RpcEnv.create [1]. It is disabled (i.e. false) by default. "clientMode" means whether the RpcEnv only opens external connections (client) or also accepts

Re: can someone review my PR?

2017-01-18 Thread Marcelo Vanzin
On Wed, Jan 18, 2017 at 6:16 AM, Steve Loughran wrote: > it's failing on the dependency check as the dependencies have changed. > that's what it's meant to do. should I explicitly be changing the values so > that the build doesn't notice the change? Yes. There's no

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-14 Thread Marcelo Vanzin
n your confusion? On Sat, Jan 14, 2017 at 11:37 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Are you actually seeing a problem or just questioning the code? > > I have never seen a situation where there's a failure because of that > part of the current code. > > On

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-14 Thread Marcelo Vanzin
Are you actually seeing a problem or just questioning the code? I have never seen a situation where there's a failure because of that part of the current code. On Fri, Jan 13, 2017 at 3:24 AM, Rostyslav Sotnychenko wrote: > Hi all! > > I am a bit confused why Spark AM

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Marcelo Vanzin
On Thu, Jan 5, 2017 at 4:58 PM, Kay Ousterhout wrote: > But is there any non-memory-leak reason why the tests should need more > memory? In theory each test should be cleaning up it's own Spark Context > etc. right? My memory is that OOM issues in the tests in the past

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Marcelo Vanzin
Seems like the OOM is coming from tests, which most probably means it's not an infrastructure issue. Maybe tests just need more memory these days and we need to update maven / sbt scripts. On Thu, Jan 5, 2017 at 1:19 PM, shane knapp wrote: > as of first thing this morning,

Re: spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Marcelo Vanzin
on > I've proposed here (splitting spark-tags' test-bits into a "-tests" JAR and > having spark-core "test"-depend on that) is discussed there. > > thanks for re-opening the JIRA; I can't promise a PR for it atm but I will > think about it :) > > On Thu, Dec 15, 2

Re: spark-core "compile"-scope transitive-dependency on scalatest

2016-12-15 Thread Marcelo Vanzin
You're right; we had a discussion here recently about this. I'll re-open that bug, if you want to send a PR. (I think it's just a matter of making the scalatest dependency "provided" in spark-tags, if I remember the discussion.) On Thu, Dec 15, 2016 at 4:15 PM, Ryan Williams

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-12 Thread Marcelo Vanzin
Another failing test is "ReplSuite:should clone and clean line object in ClosureCleaner". It never passes for me, just keeps spinning until the JVM eventually starts throwing OOM errors. Anyone seeing that? On Thu, Dec 8, 2016 at 12:39 AM, Reynold Xin wrote: > Please vote on

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-12 Thread Marcelo Vanzin
at 2:03 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > I'm running into this when building / testing on 1.7 (haven't tried 1.8): > > udf3Test(test.org.apache.spark.sql.JavaUDFSuite) Time elapsed: 0.079 > sec <<< ERROR! > java.lang.NoSuchMethodError: > org.

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-12 Thread Marcelo Vanzin
I'm running into this when building / testing on 1.7 (haven't tried 1.8): udf3Test(test.org.apache.spark.sql.JavaUDFSuite) Time elapsed: 0.079 sec <<< ERROR! java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(Lcom/google/common/reflect/TypeToken;)Lsc

Re: getting PRs into the spark hive dependency

2016-12-02 Thread Marcelo Vanzin
I believe the latest one is actually in Josh's repository. Which kinda raises a more interesting question: Should we create a repository managed by the Spark project, using the Apache infrastructure, to handle that fork? It seems not very optimal to have this lie in some random person's github

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-29 Thread Marcelo Vanzin
I'll send a -1 because of SPARK-18546. Haven't looked at anything else yet. On Mon, Nov 28, 2016 at 5:25 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Thursday, December 1, 2016 at 18:00 UTC

Running lint-java during PR builds?

2016-11-15 Thread Marcelo Vanzin
Hey all, Is there a reason why lint-java is not run during PR builds? I see it seems to be maven-only, is it really expensive to run after an sbt build? I see a lot of PRs coming in to fix Java style issues, and those all seem a little unnecessary. Either we're enforcing style checks or we're

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Marcelo Vanzin
The proposal looks OK to me. I assume, even though it's not explicitly called, that voting would happen by e-mail? A template for the proposal document (instead of just a bullet nice) would also be nice, but that can be done at any time. BTW, shameless plug: I filed SPARK-18085 which I consider a

Re: Spark has a compile dependency on scalatest

2016-10-28 Thread Marcelo Vanzin
Hmm. Yes, that makes sense. Spark's root pom does not affect your application's pom, in which case it will pick compile over test if there are conflicting dependencies. Perhaps spark-tags should override it to provided instead of compile... On Fri, Oct 28, 2016 at 1:22 PM, Shixiong(Ryan) Zhu

Re: Spark has a compile dependency on scalatest

2016-10-28 Thread Marcelo Vanzin
The root pom declares scalatest explicitly with test scope. It's added by default to all sub-modules, so every one should get it in test scope unless the module explicitly overrides that, like the tags module does. If you look at the "blessed" dependency list in dev/deps, there's no scalatest.

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Marcelo Vanzin
+1 On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and passes if a > majority of at least 3+1 PMC votes are cast. > > [ ] +1 Release

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-26 Thread Marcelo Vanzin
The part I don't understand is: why do you care so much about the mesos profile? The same code exists in branch-2.0, it just doesn't need a separate profile to be enabled (it's part of core). As Sean said, the change in master was purely organizational, there's no added or lost functionality. On

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-24 Thread Marcelo Vanzin
There is no "mesos" profile in 2.0.1. On Sat, Sep 24, 2016 at 2:19 PM, Jacek Laskowski wrote: > Hi, > > I keep asking myself why are you guys not including -Pmesos in your > builds? Is this on purpose or have you overlooked it? > > Pozdrawiam, > Jacek Laskowski > >

Re: Compatibility of 1.6 spark.eventLog with a 2.0 History Server

2016-09-15 Thread Marcelo Vanzin
It should work fine. 2.0 dropped support for really old event logs (pre-Spark 1.3 I think), but 1.6 should work, and if it doesn't it should be considered a bug. On Thu, Sep 15, 2016 at 10:21 AM, Mario Ds Briggs wrote: > Hi, > > I would like to use a Spark 2.0 History

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-08 Thread Marcelo Vanzin
Not after SPARK-14642, right? On Thu, Sep 8, 2016 at 5:07 PM, Reynold Xin wrote: > There is a package called scala. > > > On Friday, September 9, 2016, Hyukjin Kwon wrote: >> >> I was also actually wondering why it is being written like this. >> >> I

Re: Mesos is now a maven module

2016-08-30 Thread Marcelo Vanzin
On Tue, Aug 30, 2016 at 11:32 AM, Sean Owen wrote: > Ah, I helped miss that. We don't enable -Pyarn for YARN because it's > already always set? I wonder if it makes sense to make that optional > in order to speed up builds, or, maybe I'm missing a reason it's > always

Re: Mesos is now a maven module

2016-08-30 Thread Marcelo Vanzin
A quick look shows that maybe dev/sparktestsupport/modules.py needs to be modified, and a "build_profile_flags" added to the mesos section (similar to hive / hive-thriftserver). Note not all PR builds will trigger mesos currently, since it's listed as an independent module in the above file. On

Re: Mesos is now a maven module

2016-08-30 Thread Marcelo Vanzin
Michael added the profile to the build scripts, but maybe some script or code path was missed... On Tue, Aug 30, 2016 at 9:56 AM, Dongjoon Hyun wrote: > Hi, Michael. > > It's a great news! > > BTW, I'm wondering if the Jenkins (SparkPullRequestBuilder) knows this new >

Re: Clarifying that spark-x.x.x-bin-hadoopx.x.tgz doesn't include Hadoop itself

2016-07-29 Thread Marcelo Vanzin
On Fri, Jul 29, 2016 at 1:13 PM, Nicholas Chammas wrote: > The Hadoop jars packaged with Spark just allow Spark to interact with Hadoop, > or allow it to use the Hadoop API for interacting with systems like S3, > right? If you want HDFS, MapReduce, etc. you're

Re: Clarifying that spark-x.x.x-bin-hadoopx.x.tgz doesn't include Hadoop itself

2016-07-29 Thread Marcelo Vanzin
Why do you say Hadoop is not included? The Hadoop jars are there in the tarball, and match the advertised version. There is (or at least there was in 1.x) a version called "without-hadoop" which did not include any Hadoop jars. On Fri, Jul 29, 2016 at 12:56 PM, Nicholas Chammas

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-19 Thread Marcelo Vanzin
+0 Our internal test suites seem mostly happy, except for SPARK-16632. Since there's a somewhat easy workaround, I don't think it's a blocker for 2.0.0. On Thu, Jul 14, 2016 at 11:59 AM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Marcelo Vanzin
(Actually that's "spark" and not "spark2", so yeah, that doesn't really answer the question.) On Thu, Jul 7, 2016 at 11:38 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > My guess would be https://github.com/pwendell/hive/tree/release-1.2.1-spark > > On Thu,

Re: Anyone knows the hive repo for spark-2.0?

2016-07-07 Thread Marcelo Vanzin
My guess would be https://github.com/pwendell/hive/tree/release-1.2.1-spark On Thu, Jul 7, 2016 at 11:37 AM, Zhan Zhang wrote: > I saw the pom file having hive version as > 1.2.1.spark2. But I cannot find the branch in > https://github.com/pwendell/ > > Does anyone know where

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Marcelo Vanzin
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote: > -1 > > Spark Unit tests fail on Windows. Still not resolved, though marked as > resolved. To be pedantic, it's marked as a duplicate (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't mean

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-21 Thread Marcelo Vanzin
On Tue, Jun 21, 2016 at 10:49 AM, Sean Owen wrote: > I'm getting some errors building on Ubuntu 16 + Java 7. First is one > that may just be down to a Scala bug: > > [ERROR] bad symbolic reference. A signature in WebUI.class refers to > term eclipse > in package org which is

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Marcelo Vanzin
It doesn't hurt to have a bug tracking it, in case anyone else has time to look at it before I do. On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Thanks for the confirmation! Shall I cut a JIRA issue? > > On Mon, Jun 20, 2016 at 10:42 AM Marc

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Marcelo Vanzin
I just tried this locally and can see the wrong behavior you mention. I'm running a somewhat old build of 2.0, but I'll take a look. On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly wrote: > Does anybody have any thoughts on this? > > On Fri, Jun 17, 2016 at 6:36 PM

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-17 Thread Marcelo Vanzin
-1 (non-binding) SPARK-16017 shows a severe perf regression in YARN compared to 1.6.1. On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.2! > > The vote is open until Sunday, June 19, 2016 at

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Marcelo Vanzin
On Wed, Jun 1, 2016 at 2:51 PM, Sean Owen wrote: > I'd think we want less effort, not more, to let people test it? for > example, right now I can't easily try my product build against > 2.0.0-preview. While I understand your point of view, I like the extra effort to get to

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Marcelo Vanzin
So are RCs, aren't they? Personally I'm fine with not releasing to maven central. Any extra effort needed by regular users to use a preview / RC is good with me. On Wed, Jun 1, 2016 at 1:50 PM, Reynold Xin wrote: > To play devil's advocate, previews are technically not RCs.

Re: SBT doesn't pick resource file after clean

2016-05-17 Thread Marcelo Vanzin
Perhaps you need to make the "compile" task of the appropriate module depend on the task that generates the resource file? Sorry but my knowledge of sbt doesn't really go too far. On Tue, May 17, 2016 at 11:58 AM, dhruve ashar wrote: > We are trying to pick the spark

<    1   2   3   4   >