Builds are failing

2016-02-22 Thread Iulian Dragoș
Just in case you missed this:
https://issues.apache.org/jira/browse/SPARK-13431

Builds are failing with 'Method code too large' in the "shading" step with
Maven.

iulian

-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: pull request template

2016-02-19 Thread Iulian Dragoș
It's a good idea. I would add in there the spec for the PR title. I always
get wrong the order between Jira and component.

Moreover, CONTRIBUTING.md is also lacking them. Any reason not to add it
there? I can open PRs for both, but maybe you want to keep that info on the
wiki instead.

iulian

On Thu, Feb 18, 2016 at 4:18 AM, Reynold Xin  wrote:

> Github introduced a new feature today that allows projects to define
> templates for pull requests. I pushed a very simple template to the
> repository:
>
> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
>
>
> Over time I think we can see how this works and perhaps add a small
> checklist to the pull request template so contributors are reminded every
> time they submit a pull request the important things to do in a pull
> request (e.g. having proper tests).
>
>
>
> ## What changes were proposed in this pull request?
>
> (Please fill in changes proposed in this fix)
>
>
> ## How was the this patch tested?
>
> (Please explain how this patch was tested. E.g. unit tests, integration
> tests, manual tests)
>
>
> (If this patch involves UI changes, please attach a screenshot; otherwise,
> remove this)
>
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș
Thanks for the pointer. It seems to be really a pathological case, since
the file that's in error is part of the splinter file (the smaller one,
IndetifiersParser). I'll see if I can work around by splitting it some more.

iulian

On Thu, Jan 28, 2016 at 4:43 PM, Ted Yu  wrote:

> After this change:
> [SPARK-12681] [SQL] split IdentifiersParser.g into two files
>
> the biggest file under
> sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is 
> SparkSqlParser.g
>
> Maybe split SparkSqlParser.g up as well ?
>
> On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș  > wrote:
>
>> Hi,
>>
>> Has anyone seen this error?
>>
>> The code of method specialStateTransition(int, IntStream) is exceeding the 
>> 65535 bytes limitSparkSqlParser_IdentifiersParser.java:39907
>>
>> The error is in ANTLR generated files and it’s (according to Stack
>> Overflow) due to state explosion in parser (or lexer). That seems
>> plausible, given that one file has >5 lines of code. Some suggest that
>> refactoring the grammar would help.
>>
>> I’m seeing this error only sometimes on the command line (building with
>> Sbt), but every time when building with Eclipse (which has its own Java
>> compiler, so it’s not surprising that it has a different behavior). Same
>> behavior with both Java 1.7 and 1.8.
>>
>> Any ideas?
>>
>> iulian
>> ​
>> --
>>
>> --
>> Iulian Dragos
>>
>> --
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș
Hi,

Has anyone seen this error?

The code of method specialStateTransition(int, IntStream) is exceeding
the 65535 bytes limitSparkSqlParser_IdentifiersParser.java:39907

The error is in ANTLR generated files and it’s (according to Stack
Overflow) due to state explosion in parser (or lexer). That seems
plausible, given that one file has >5 lines of code. Some suggest that
refactoring the grammar would help.

I’m seeing this error only sometimes on the command line (building with
Sbt), but every time when building with Eclipse (which has its own Java
compiler, so it’s not surprising that it has a different behavior). Same
behavior with both Java 1.7 and 1.8.

Any ideas?

iulian
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Unable to compile and test Spark in IntelliJ

2016-01-26 Thread Iulian Dragoș
On Tue, Jan 19, 2016 at 6:06 AM, Hyukjin Kwon  wrote:

> Hi all,
>
> I usually have been working with Spark in IntelliJ.
>
> Before this PR,
> https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc
>  for
> `[SPARK-12575][SQL] Grammar parity with existing SQL parser`. I was able to
> just open the project and then run some tests with IntelliJ Run button.
>
> However, it looks that PR adds some ANTLR files for parsing and I cannot
> run the tests as I did. So, I ended up with doing this by mvn compile first
> and then running some tests with IntelliJ.
>
> I can still run some tests with sbt or maven in comment line but this is a
> bit inconvenient. I just want to run some tests as I did in IntelliJ.
>
> I followed this
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
> several times but it still emits some exceptions such as
>
> Error:(779, 34) not found: value SparkSqlParser
> case ast if ast.tokenType == SparkSqlParser.TinyintLiteral =>
>  ^
>
> and I still should run mvn compile or mvn test first for them.
>
> Is there any good way to run some Spark tests within IntelliJ as I did
> before?
>

I'm using Eclipse, but all I had to do in order to build in the IDE was to
add `target/generated-sources/antlr3` to the project sources, after
building once in Sbt. You probably have the sources there already.

iulian


>
> Thanks!
>



-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Removing the Mesos fine-grained mode

2016-01-20 Thread Iulian Dragoș
That'd be great, thanks Adam!

On Tue, Jan 19, 2016 at 5:41 PM, Adam McElwee  wrote:

> Sorry, I never got a chance to circle back with the master logs for this.
> I definitely can't share the job code, since it's used to build a pretty
> core dataset for my company, but let me see if I can pull some logs
> together in the next couple days.
>
> On Tue, Jan 19, 2016 at 10:08 AM, Iulian Dragoș <
> iulian.dra...@typesafe.com> wrote:
>
>> It would be good to get to the bottom of this.
>>
>> Adam, could you share the Spark app that you're using to test this?
>>
>> iulian
>>
>> On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen  wrote:
>>
>>> Hi Adam,
>>>
>>> Thanks for the graphs and the tests, definitely interested to dig a
>>> bit deeper to find out what's could be the cause of this.
>>>
>>> Do you have the spark driver logs for both runs?
>>>
>>> Tim
>>>
>>> On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee  wrote:
>>> > To eliminate any skepticism around whether cpu is a good performance
>>> metric
>>> > for this workload, I did a couple comparison runs of an example job to
>>> > demonstrate a more universal change in performance metrics (stage/job
>>> time)
>>> > between coarse and fine-grained mode on mesos.
>>> >
>>> > The workload is identical here - pulling tgz archives from s3, parsing
>>> json
>>> > lines from the files and ultimately creating documents to index into
>>> solr.
>>> > The tasks are not inserting into solr (just to let you know that
>>> there's no
>>> > network side-effect of the map task). The runs are on the same exact
>>> > hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
>>> > exact same jvm and it's not dependent on order of running the jobs,
>>> meaning
>>> > I get the same results whether I run the coarse first or whether I run
>>> the
>>> > fine-grained first. No other frameworks/tasks are running on the mesos
>>> > cluster during the test. I see the same results whether it's a 3-node
>>> > cluster, or whether it's a 200-node cluster.
>>> >
>>> > With the CMS collector in fine-grained mode, the map stage takes
>>> roughly
>>> > 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially
>>> start
>>> > out performing similarly, the total execution time gap widens as the
>>> job
>>> > size grows. To put that another way, the difference is much smaller for
>>> > jobs/stages < 1 hour. When I submit this job for a much larger dataset
>>> that
>>> > takes 5+ hours, the difference in total stage time moves closer and
>>> closer
>>> > to roughly 20-30% longer execution time.
>>> >
>>> > With the G1 collector in fine-grained mode, the map stage takes roughly
>>> > 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and
>>> coarse-grained
>>> > execution tests are on the exact same machines, exact same dataset,
>>> and only
>>> > changing spark.mesos.coarse to true/false.
>>> >
>>> > Let me know if there's anything else I can provide here.
>>> >
>>> > Thanks,
>>> > -Adam
>>> >
>>> >
>>> > On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee 
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
>>> >>  wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee 
>>> wrote:
>>> >>>>
>>> >>>> I've used fine-grained mode on our mesos spark clusters until this
>>> week,
>>> >>>> mostly because it was the default. I started trying coarse-grained
>>> because
>>> >>>> of the recent chatter on the mailing list about wanting to move the
>>> mesos
>>> >>>> execution path to coarse-grained only. The odd things is,
>>> coarse-grained vs
>>> >>>> fine-grained seems to yield drastic cluster utilization metrics for
>>> any of
>>> >>>> our jobs that I've tried out this week.
>>> >>>>
>>> >>>> If this is best as a new thread, please let me know, and I

Re: Removing the Mesos fine-grained mode

2016-01-19 Thread Iulian Dragoș
It would be good to get to the bottom of this.

Adam, could you share the Spark app that you're using to test this?

iulian

On Mon, Nov 30, 2015 at 10:10 PM, Timothy Chen  wrote:

> Hi Adam,
>
> Thanks for the graphs and the tests, definitely interested to dig a
> bit deeper to find out what's could be the cause of this.
>
> Do you have the spark driver logs for both runs?
>
> Tim
>
> On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee  wrote:
> > To eliminate any skepticism around whether cpu is a good performance
> metric
> > for this workload, I did a couple comparison runs of an example job to
> > demonstrate a more universal change in performance metrics (stage/job
> time)
> > between coarse and fine-grained mode on mesos.
> >
> > The workload is identical here - pulling tgz archives from s3, parsing
> json
> > lines from the files and ultimately creating documents to index into
> solr.
> > The tasks are not inserting into solr (just to let you know that there's
> no
> > network side-effect of the map task). The runs are on the same exact
> > hardware in ec2 (m2.4xlarge, with 68GB of ram and 45G executor memory),
> > exact same jvm and it's not dependent on order of running the jobs,
> meaning
> > I get the same results whether I run the coarse first or whether I run
> the
> > fine-grained first. No other frameworks/tasks are running on the mesos
> > cluster during the test. I see the same results whether it's a 3-node
> > cluster, or whether it's a 200-node cluster.
> >
> > With the CMS collector in fine-grained mode, the map stage takes roughly
> > 2.9h, and coarse-grained mode takes 3.4h. Because both modes initially
> start
> > out performing similarly, the total execution time gap widens as the job
> > size grows. To put that another way, the difference is much smaller for
> > jobs/stages < 1 hour. When I submit this job for a much larger dataset
> that
> > takes 5+ hours, the difference in total stage time moves closer and
> closer
> > to roughly 20-30% longer execution time.
> >
> > With the G1 collector in fine-grained mode, the map stage takes roughly
> > 2.2h, and coarse-grained mode takes 2.7h. Again, the fine and
> coarse-grained
> > execution tests are on the exact same machines, exact same dataset, and
> only
> > changing spark.mesos.coarse to true/false.
> >
> > Let me know if there's anything else I can provide here.
> >
> > Thanks,
> > -Adam
> >
> >
> > On Mon, Nov 23, 2015 at 11:27 AM, Adam McElwee  wrote:
> >>
> >>
> >>
> >> On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș
> >>  wrote:
> >>>
> >>>
> >>>
> >>> On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee  wrote:
> >>>>
> >>>> I've used fine-grained mode on our mesos spark clusters until this
> week,
> >>>> mostly because it was the default. I started trying coarse-grained
> because
> >>>> of the recent chatter on the mailing list about wanting to move the
> mesos
> >>>> execution path to coarse-grained only. The odd things is,
> coarse-grained vs
> >>>> fine-grained seems to yield drastic cluster utilization metrics for
> any of
> >>>> our jobs that I've tried out this week.
> >>>>
> >>>> If this is best as a new thread, please let me know, and I'll try not
> to
> >>>> derail this conversation. Otherwise, details below:
> >>>
> >>>
> >>> I think it's ok to discuss it here.
> >>>
> >>>>
> >>>> We monitor our spark clusters with ganglia, and historically, we
> >>>> maintain at least 90% cpu utilization across the cluster. Making a
> single
> >>>> configuration change to use coarse-grained execution instead of
> fine-grained
> >>>> consistently yields a cpu utilization pattern that starts around 90%
> at the
> >>>> beginning of the job, and then it slowly decreases over the next
> 1-1.5 hours
> >>>> to level out around 65% cpu utilization on the cluster. Does anyone
> have a
> >>>> clue why I'd be seeing such a negative effect of switching to
> coarse-grained
> >>>> mode? GC activity is comparable in both cases. I've tried 1.5.2, as
> well as
> >>>> the 1.6.0 preview tag that's on github.
> >>>
> >>>
> >>> I'm not very familiar with Ganglia, and how it computes utilization.
> But
> >>> one thing comes to mind: did you enable dynamic allocation on
> coarse-grained
> >>> mode?
> >>
> >>
> >> Dynamic allocation is definitely not enabled. The only delta between
> runs
> >> is adding --conf "spark.mesos.coarse=true" the job submission. Ganglia
> is
> >> just pulling stats from the procfs, and I've never seen it report bad
> >> results. If I sample any of the 100-200 nodes in the cluster, dstat
> reflects
> >> the same average cpu that I'm seeing reflected in ganglia.
> >>>
> >>>
> >>> iulian
> >>
> >>
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-23 Thread Iulian Dragoș
+1 (non-binding)

Tested Mesos deployments (client and cluster-mode, fine-grained and
coarse-grained). Things look good
.

iulian

On Wed, Dec 23, 2015 at 2:35 PM, Sean Owen  wrote:

> Docker integration tests still fail for Mark and I, and should
> probably be disabled:
> https://issues.apache.org/jira/browse/SPARK-12426
>
> ... but if anyone else successfully runs these (and I assume Jenkins
> does) then not a blocker.
>
> I'm having intermittent trouble with other tests passing, but nothing
> unusual.
> Sigs and hashes are OK.
>
> We have 30 issues fixed for 1.6.1. All but those resolved in the last
> 24 hours or so should be fixed for 1.6.0 right? I can touch that up.
>
>
>
>
>
> On Tue, Dec 22, 2015 at 8:10 PM, Michael Armbrust
>  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.0!
> >
> > The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes
> if
> > a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v1.6.0-rc4
> > (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1176/
> >
> > The test repository (versioned as v1.6.0-rc4) for this release can be
> found
> > at:
> > https://repository.apache.org/content/repositories/orgapachespark-1175/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
> > should only occur for significant regressions from 1.5. Bugs already
> present
> > in 1.5, minor regressions, or bugs related to new features will not block
> > this release.
> >
> > ===
> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> > ===
> > 1. It is OK for documentation patches to target 1.6.0 and still go into
> > branch-1.6, since documentations will be published separately from the
> > release.
> > 2. New features for non-alpha-modules should target 1.7+.
> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> > version.
> >
> >
> > ==
> > == Major changes to help you focus your testing ==
> > ==
> >
> > Notable changes since 1.6 RC3
> >
> >
> >   - SPARK-12404 - Fix serialization error for Datasets with
> > Timestamps/Arrays/Decimal
> >   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
> >   - SPARK-12395 - Fix join columns of outer join for DataFrame using
> >   - SPARK-12413 - Fix mesos HA
> >
> >
> > Notable changes since 1.6 RC2
> >
> >
> > - SPARK_VERSION has been set correctly
> > - SPARK-12199 ML Docs are publishing correctly
> > - SPARK-12345 Mesos cluster mode has been fixed
> >
> > Notable changes since 1.6 RC1
> >
> > Spark Streaming
> >
> > SPARK-2629  trackStateByKey has been renamed to mapWithState
> >
> > Spark SQL
> >
> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
> execution.
> > SPARK-12258 correct passing null into ScalaUDF
> >
> > Notable Features Since 1.5
> >
> > Spark SQL
> >
> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> > using flat schemas.
> > SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> > even on shared clusters.
> > SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs
> > many operations on serialized binary data and code generation (i.e.
> Project
> > Tungsten).
> > SPARK-1 Unified Memory Management - Shared memory for execution and
> > caching instead of exclusive division of the regions.
> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> > over files of any supported format without registeri

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-17 Thread Iulian Dragoș
-0 (non-binding)

Unfortunately the Mesos cluster regression is still there (see my comment
 for
explanations). I'm not voting to delay the release any longer though.

We tested (and passed) Mesos in:
 - client mode
 - fine/coarse-grained
 - with/without roles

iulian

On Thu, Dec 17, 2015 at 1:37 AM, Andrew Or  wrote:

> +1
>
> Mesos cluster mode regression in RC2 is now fixed (SPARK-12345
>  / PR10332
> ).
>
> Also tested on standalone client and cluster mode. No problems.
>
> 2015-12-16 15:16 GMT-08:00 Rad Gruchalski :
>
>> I also noticed that spark.replClassServer.host and
>> spark.replClassServer.port aren’t used anymore. The transport now happens
>> over the main RpcEnv.
>>
>> Kind regards,
>> Radek Gruchalski
>> ra...@gruchalski.com 
>> de.linkedin.com/in/radgruchalski/
>>
>>
>> *Confidentiality:*This communication is intended for the above-named
>> person and may be confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>> On Wednesday, 16 December 2015 at 23:43, Marcelo Vanzin wrote:
>>
>> I was going to say that spark.executor.port is not used anymore in
>> 1.6, but damn, there's still that akka backend hanging around there
>> even when netty is being used... we should fix this, should be a
>> simple one-liner.
>>
>> On Wed, Dec 16, 2015 at 2:35 PM, singinpirate 
>> wrote:
>>
>> -0 (non-binding)
>>
>> I have observed that when we set spark.executor.port in 1.6, we get
>> thrown a
>> NPE in SparkEnv$.create(SparkEnv.scala:259). It used to work in 1.5.2. Is
>> anyone else seeing this?
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-17 Thread Iulian Dragoș
On Wed, Dec 16, 2015 at 5:42 PM, Aaron  wrote:

> Wrt to PR, sure, let me update the documentation, i'll send it out
> shortly.  My Fork is on Github..is the PR from there ok?
>

Absolutely. Have a look at
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark if
you haven't done so already, it should answer most questions about starting
to contribute to Spark.

thanks,
iulian


>
> Cheers,
> Aaron
>
> On Wed, Dec 16, 2015 at 11:33 AM, Timothy Chen  wrote:
> > Yes if want to manually override what IP to use to be contacted by the
> > master you can set LIPROCESS_IP and LIBPROCESS_PORT.
> >
> > It is a Mesos specific settings. We can definitely update the docs.
> >
> > Note that in the future as we move to use the new Mesos Http API these
> > configurations won't be needed (also libmesos!).
> >
> > Tim
> >
> > On Dec 16, 2015, at 8:09 AM, Iulian Dragoș 
> > wrote:
> >
> > LIBPROCESS_IP has zero hits in the Spark code base. This seems to be a
> > Mesos-specific setting.
> >
> > Have you tried setting SPARK_LOCAL_IP?
> >
> > On Wed, Dec 16, 2015 at 5:07 PM, Aaron  wrote:
> >>
> >> Found this thread that talked about it to help understand it better:
> >>
> >>
> >>
> https://mail-archives.apache.org/mod_mbox/mesos-user/201507.mbox/%3ccajq68qf9pejgnwomasm2dqchyaxpcaovnfkfgggxxpzj2jo...@mail.gmail.com%3E
> >>
> >> >
> >> > When you run Spark on Mesos it needs to run
> >> >
> >> > spark driver
> >> > mesos scheduler
> >> >
> >> > and both need to be visible to outside world on public iface IP
> >> >
> >> > you need to tell Spark and Mesos on which interface to bind - by
> default
> >> > they resolve node hostname to ip - this is loopback address in your
> case
> >> >
> >> > Possible solutions - on slave node with public IP 192.168.56.50
> >> >
> >> > 1. Set
> >> >
> >> >    export LIBPROCESS_IP=192.168.56.50
> >> >export SPARK_LOCAL_IP=192.168.56.50
> >> >
> >> > 2. Ensure your hostname resolves to public iface IP - (for testing)
> edit
> >> > /etc/hosts to resolve your domain name to 192.168.56.50
> >> > 3. Set correct hostname/ip in mesos configuration - see Nikolaos
> answer
> >> >
> >>
> >> Cheers,
> >> Aaron
> >>
> >> On Wed, Dec 16, 2015 at 11:00 AM, Iulian Dragoș
> >>  wrote:
> >> > Hi Aaron,
> >> >
> >> > I never had to use that variable. What is it for?
> >> >
> >> > On Wed, Dec 16, 2015 at 2:00 PM, Aaron  wrote:
> >> >>
> >> >> In going through running various Spark jobs, both Spark 1.5.2 and the
> >> >> new Spark 1.6 SNAPSHOTs, on a Mesos cluster (currently 0.25), we
> >> >> noticed that is in order to run the Spark shells (both python and
> >> >> scala), we needed to set the LIBPROCESS_IP environment variable
> before
> >> >> running.
> >> >>
> >> >> Was curious if the Spark on Mesos docs should be updated, under the
> >> >> Client Mode section, to include setting this environment variable?
> >> >>
> >> >> Cheers
> >> >> Aaron
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > --
> >> > Iulian Dragos
> >> >
> >> > --
> >> > Reactive Apps on the JVM
> >> > www.typesafe.com
> >> >
> >
> >
> >
> >
> > --
> >
> > --
> > Iulian Dragos
> >
> > --
> > Reactive Apps on the JVM
> > www.typesafe.com
> >
>



-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Iulian Dragoș
Sure, documenting this would be great, I just wanted to understand the
context. There is a related ticket: SPARK-5488
<https://issues.apache.org/jira/browse/SPARK-5488>.

Would you mind opening a PR?

On Wed, Dec 16, 2015 at 5:11 PM, Aaron  wrote:

> Basically, my hostname doesn't resolve to an "accessible" IP
> address...which isn't a big deal, I normally set SPARK_LOCAL_IP when I
> am doing things on a YARN cluster.  But, we've moved to a Mesos
> Cluster recently, and had to track down when it wasn't working...I
> assumed (badly obviously) that setting SPARK_LOCAL_IP was
> sufficient...need to tell the Mesos scheduler as well I guess.
>
> Not sure if you if would be a good idea to put in actual code,
> something like:  "when SPARK_LCOAL_IP is set, and using mesos:// as
> the master..set LIBPROCESS_IP," but, some kind of documentation about
> this possible issue would have saved me some time.
>
> Cheers,
> Aaron
>
> On Wed, Dec 16, 2015 at 11:07 AM, Aaron  wrote:
> > Found this thread that talked about it to help understand it better:
> >
> >
> https://mail-archives.apache.org/mod_mbox/mesos-user/201507.mbox/%3ccajq68qf9pejgnwomasm2dqchyaxpcaovnfkfgggxxpzj2jo...@mail.gmail.com%3E
> >
> >>
> >> When you run Spark on Mesos it needs to run
> >>
> >> spark driver
> >> mesos scheduler
> >>
> >> and both need to be visible to outside world on public iface IP
> >>
> >> you need to tell Spark and Mesos on which interface to bind - by default
> >> they resolve node hostname to ip - this is loopback address in your case
> >>
> >> Possible solutions - on slave node with public IP 192.168.56.50
> >>
> >> 1. Set
> >>
> >>export LIBPROCESS_IP=192.168.56.50
> >>export SPARK_LOCAL_IP=192.168.56.50
> >>
> >> 2. Ensure your hostname resolves to public iface IP - (for testing) edit
> >> /etc/hosts to resolve your domain name to 192.168.56.50
> >> 3. Set correct hostname/ip in mesos configuration - see Nikolaos answer
> >>
> >
> > Cheers,
> > Aaron
> >
> > On Wed, Dec 16, 2015 at 11:00 AM, Iulian Dragoș
> >  wrote:
> >> Hi Aaron,
> >>
> >> I never had to use that variable. What is it for?
> >>
> >> On Wed, Dec 16, 2015 at 2:00 PM, Aaron  wrote:
> >>>
> >>> In going through running various Spark jobs, both Spark 1.5.2 and the
> >>> new Spark 1.6 SNAPSHOTs, on a Mesos cluster (currently 0.25), we
> >>> noticed that is in order to run the Spark shells (both python and
> >>> scala), we needed to set the LIBPROCESS_IP environment variable before
> >>> running.
> >>>
> >>> Was curious if the Spark on Mesos docs should be updated, under the
> >>> Client Mode section, to include setting this environment variable?
> >>>
> >>> Cheers
> >>> Aaron
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> --
> >> Iulian Dragos
> >>
> >> --
> >> Reactive Apps on the JVM
> >> www.typesafe.com
> >>
>



-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Iulian Dragoș
LIBPROCESS_IP has zero hits in the Spark code base. This seems to be a
Mesos-specific setting.

Have you tried setting SPARK_LOCAL_IP?

On Wed, Dec 16, 2015 at 5:07 PM, Aaron  wrote:

> Found this thread that talked about it to help understand it better:
>
>
> https://mail-archives.apache.org/mod_mbox/mesos-user/201507.mbox/%3ccajq68qf9pejgnwomasm2dqchyaxpcaovnfkfgggxxpzj2jo...@mail.gmail.com%3E
>
> >
> > When you run Spark on Mesos it needs to run
> >
> > spark driver
> > mesos scheduler
> >
> > and both need to be visible to outside world on public iface IP
> >
> > you need to tell Spark and Mesos on which interface to bind - by default
> > they resolve node hostname to ip - this is loopback address in your case
> >
> > Possible solutions - on slave node with public IP 192.168.56.50
> >
> > 1. Set
> >
> >export LIBPROCESS_IP=192.168.56.50
> >export SPARK_LOCAL_IP=192.168.56.50
> >
> > 2. Ensure your hostname resolves to public iface IP - (for testing) edit
> > /etc/hosts to resolve your domain name to 192.168.56.50
> > 3. Set correct hostname/ip in mesos configuration - see Nikolaos answer
> >
>
> Cheers,
> Aaron
>
> On Wed, Dec 16, 2015 at 11:00 AM, Iulian Dragoș
>  wrote:
> > Hi Aaron,
> >
> > I never had to use that variable. What is it for?
> >
> > On Wed, Dec 16, 2015 at 2:00 PM, Aaron  wrote:
> >>
> >> In going through running various Spark jobs, both Spark 1.5.2 and the
> >> new Spark 1.6 SNAPSHOTs, on a Mesos cluster (currently 0.25), we
> >> noticed that is in order to run the Spark shells (both python and
> >> scala), we needed to set the LIBPROCESS_IP environment variable before
> >> running.
> >>
> >> Was curious if the Spark on Mesos docs should be updated, under the
> >> Client Mode section, to include setting this environment variable?
> >>
> >> Cheers
> >> Aaron
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
> >
> >
> > --
> >
> > --
> > Iulian Dragos
> >
> > --
> > Reactive Apps on the JVM
> > www.typesafe.com
> >
>



-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Iulian Dragoș
Hi Aaron,

I never had to use that variable. What is it for?

On Wed, Dec 16, 2015 at 2:00 PM, Aaron  wrote:

> In going through running various Spark jobs, both Spark 1.5.2 and the
> new Spark 1.6 SNAPSHOTs, on a Mesos cluster (currently 0.25), we
> noticed that is in order to run the Spark shells (both python and
> scala), we needed to set the LIBPROCESS_IP environment variable before
> running.
>
> Was curious if the Spark on Mesos docs should be updated, under the
> Client Mode section, to include setting this environment variable?
>
> Cheers
> Aaron
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Iulian Dragoș
Thanks for the heads up.

On Tue, Dec 15, 2015 at 11:40 PM, Michael Armbrust 
wrote:

> This vote is canceled due to the issue with the incorrect version.  This
> issue will be fixed by https://github.com/apache/spark/pull/10317
>
> We can wait a little bit for a fix to
> https://issues.apache.org/jira/browse/SPARK-12345.  However if it looks
> like there is not an easy fix coming soon, I'm planning to move forward
> with RC3.
>
> On Mon, Dec 14, 2015 at 9:31 PM, Mark Hamstra 
> wrote:
>
>> I'm afraid you're correct, Krishna:
>>
>> core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
>> "1.6.0-SNAPSHOT"
>> docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
>>
>> On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar 
>> wrote:
>>
>>> Guys,
>>>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
>>> pl verify ?
>>> Cheers
>>> 
>>>
>>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 1.6.0!

 The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
 passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.6.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is *v1.6.0-rc2
 (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
 *

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1169/

 The test repository (versioned as v1.6.0-rc2) for this release can be
 found at:
 https://repository.apache.org/content/repositories/orgapachespark-1168/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/

 ===
 == How can I help test this release? ==
 ===
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 
 == What justifies a -1 vote for this release? ==
 
 This vote is happening towards the end of the 1.6 QA period, so -1
 votes should only occur for significant regressions from 1.5. Bugs already
 present in 1.5, minor regressions, or bugs related to new features will not
 block this release.

 ===
 == What should happen to JIRA tickets still targeting 1.6.0? ==
 ===
 1. It is OK for documentation patches to target 1.6.0 and still go into
 branch-1.6, since documentations will be published separately from the
 release.
 2. New features for non-alpha-modules should target 1.7+.
 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
 target version.


 ==
 == Major changes to help you focus your testing ==
 ==

 Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming

- SPARK-2629  
trackStateByKey has been renamed to mapWithState

 Spark SQL

- SPARK-12165 
SPARK-12189  Fix
bugs in eviction of storage memory by execution.
- SPARK-12258  
 correct
passing null into ScalaUDF

 Notable Features Since 1.5Spark SQL

- SPARK-11787  
 Parquet
Performance - Improve Parquet scan performance when using flat
schemas.
- SPARK-10810 
Session Management - Isolated devault database (i.e USE mydb) even
on shared clusters.
- SPARK-   Dataset
API - A type-safe API (similar to RDDs) that performs many
operations on serialized binary data and code generation (i.e. Project
Tungsten).
- SPARK-1 

Re: [VOTE] Release Apache Spark 1.6.0 (RC2)

2015-12-15 Thread Iulian Dragoș
-1 (non-binding)

Cluster mode on Mesos is broken (regression compared to 1.5.2). It seems to
be related to the way SPARK_HOME is handled. In the driver logs I see:

I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0
I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave
130bdc39-44e7-4256-8c22-602040d337f1-S1
bin/spark-submit: line 27:
/Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class:
No such file or directory

The path is my local SPARK_HOME, but that’s of course not the one in the
Mesos slave.

iulian

On Tue, Dec 15, 2015 at 6:31 AM, Mark Hamstra 
wrote:

I'm afraid you're correct, Krishna:
>
> core/src/main/scala/org/apache/spark/package.scala:  val SPARK_VERSION =
> "1.6.0-SNAPSHOT"
> docs/_config.yml:SPARK_VERSION: 1.6.0-SNAPSHOT
>
> On Mon, Dec 14, 2015 at 6:51 PM, Krishna Sankar 
> wrote:
>
>> Guys,
>>The sc.version gives 1.6.0-SNAPSHOT. Need to change to 1.6.0. Can you
>> pl verify ?
>> Cheers
>> 
>>
>> On Sat, Dec 12, 2015 at 9:39 AM, Michael Armbrust > > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Tuesday, December 15, 2015 at 6:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc2
>>> (23f8dfd45187cb8f2216328ab907ddb5fbdffd0b)
>>> *
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1169/
>>>
>>> The test repository (versioned as v1.6.0-rc2) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1168/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc2-docs/
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==
>>> == Major changes to help you focus your testing ==
>>> ==
>>>
>>> Spark 1.6.0 PreviewNotable changes since 1.6 RC1Spark Streaming
>>>
>>>- SPARK-2629  
>>>trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>- SPARK-12165 
>>>SPARK-12189  Fix
>>>bugs in eviction of storage memory by execution.
>>>- SPARK-12258  correct
>>>passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>- SPARK-11787  Parquet
>>>Performance - Improve Parquet scan performance when using flat
>>>schemas.
>>>- SPARK-10810 
>>>Session Management - Isolated devault database (i.e USE mydb) even
>>>on shared clusters.
>>>- SPARK-   Dataset
>>>API - A type-safe API (similar to RDDs) that performs many
>>>operations on serialized binary data and code generation (i.e. Project
>>>Tungsten).
>>>- SPARK-1 

Re: How to debug Spark source using IntelliJ/ Eclipse

2015-12-07 Thread Iulian Dragoș
What errors do you see? I’m using Eclipse and things work pretty much as
described (I’m using Scala 2.11 so there’s a slight difference for that,
but if you’re fine using Scala 2.10 it should be good to go).

One little difference: the sbt command is no longer in the sbt directory,
instead run:

build/sbt eclipse

iulian
​

On Sun, Dec 6, 2015 at 3:57 AM, jatinganhotra 
wrote:

> Hi,
>
> I am trying to understand Spark internal code and wanted to debug Spark
> source, to add a new feature. I have tried the steps lined out here on the
> Spark Wiki page IDE setup
> <
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
> >
> , but they don't work.
>
> I also found other posts in the Dev mailing list such as -
>
> 1.  Spark-1-5-0-setting-up-debug-env
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-0-setting-up-debug-env-td14056.html
> >
> , and
>
> 2.  using-IntelliJ-to-debug-SPARK-1-1-Apps-with-mvn-sbt-for-beginners
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/Intro-to-using-IntelliJ-to-debug-SPARK-1-1-Apps-with-mvn-sbt-for-beginners-td9429.html
> >
>
> But, I found many issues with both the links. I have tried both these
> articles many times, often re-starting the whole process from scratch after
> deleting everything and re-installing again, but I always face some
> dependency issues.
>
> It would be great if someone from the Spark developers group could point me
> to the steps for setting up Spark debug environment.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-debug-Spark-source-using-IntelliJ-Eclipse-tp15477.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Iulian Dragoș
On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee  wrote:

> I've used fine-grained mode on our mesos spark clusters until this week,
> mostly because it was the default. I started trying coarse-grained because
> of the recent chatter on the mailing list about wanting to move the mesos
> execution path to coarse-grained only. The odd things is, coarse-grained vs
> fine-grained seems to yield drastic cluster utilization metrics for any of
> our jobs that I've tried out this week.
>
> If this is best as a new thread, please let me know, and I'll try not to
> derail this conversation. Otherwise, details below:
>

I think it's ok to discuss it here.


> We monitor our spark clusters with ganglia, and historically, we maintain
> at least 90% cpu utilization across the cluster. Making a single
> configuration change to use coarse-grained execution instead of
> fine-grained consistently yields a cpu utilization pattern that starts
> around 90% at the beginning of the job, and then it slowly decreases over
> the next 1-1.5 hours to level out around 65% cpu utilization on the
> cluster. Does anyone have a clue why I'd be seeing such a negative effect
> of switching to coarse-grained mode? GC activity is comparable in both
> cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.
>

I'm not very familiar with Ganglia, and how it computes utilization. But
one thing comes to mind: did you enable dynamic allocation

on coarse-grained mode?

iulian


Re: Removing the Mesos fine-grained mode

2015-11-20 Thread Iulian Dragoș
This is a good point. We should probably document this better in the
migration notes. In the mean time:

http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos

Roughly, dynamic allocation lets Spark add and kill executors based on the
scheduling delay. The min and max number of executors can be configured.
Would this fit your use-case?

iulian


On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers 
wrote:

> As a recent fine-grained mode adopter I'm now confused after reading this
> and other resources from spark-summit, the docs, ...  so can someone please
> advise me for our use-case?
>
> We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs
> which should take resources away from the streaming jobs and give 'em back
> upon completion.
>
> Can someone point me at the docs or a guide to set this up?
>
> Thanks!
>
> - Jo Voordeckers
>
>
> On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris  wrote:
>
>> I was one that argued for fine-grain mode, and there is something I still
>> appreciate about how fine-grain mode operates in terms of the way one would
>> define a Mesos framework. That said, with dyn-allocation and Mesos support
>> for both resource reservation, oversubscription and revocation, I think the
>> direction is clear that the coarse mode is the proper way forward, and
>> having the two code paths is just noise.
>>
>> -Chris
>>
>> From: Iulian Dragoș 
>> Date: Thursday, November 19, 2015 at 6:42 AM
>> To: "dev@spark.apache.org" 
>> Subject: Removing the Mesos fine-grained mode
>>
>> Hi all,
>>
>> Mesos is the only cluster manager that has a fine-grained mode, but it's
>> more often than not problematic, and it's a maintenance burden. I'd like to
>> suggest removing it in the 2.0 release.
>>
>> A few reasons:
>>
>> - code/maintenance complexity. The two modes duplicate a lot of
>> functionality (and sometimes code) that leads to subtle differences or
>> bugs. See SPARK-10444
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D10444&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=4_2dJBDiLqTcfXfX1LZluOo1U6tRKR2wKGGzfwiKdVY&e=>
>>  and
>> also this thread
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201510.mbox_-253CCALxMP-2DA-2BaygNwSiyTM8ff20-2DMGWHykbhct94a2hwZTh1jWHp-5Fg-40mail.gmail.com-253E&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=SNFPzodGw7sgp3km9NKYM46gZHLguvxVNzCIeUlJzOw&e=>
>>  and MESOS-3202
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_MESOS-2D3202&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=d-U4CohYsiZc0Zmj4KETn2dT_2ZFe5s3_IIbMm2tjJo&e=>
>> - it's not widely used (Reynold's previous thread
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Please-2Dreply-2Dif-2Dyou-2Duse-2DMesos-2Dfine-2Dgrained-2Dmode-2Dtd14930.html&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=HGMiKyzxFDhpbomduKVIIRHWk9RDGDCk7tneJVQqTwo&e=>
>> got very few responses from people relying on it)
>> - similar functionality can be achieved with dynamic allocation +
>> coarse-grained mode
>>
>> I suggest that Spark 1.6 already issues a warning if it detects
>> fine-grained use, with removal in the 2.0 release.
>>
>> Thoughts?
>>
>> iulian
>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Removing the Mesos fine-grained mode

2015-11-19 Thread Iulian Dragoș
Hi all,

Mesos is the only cluster manager that has a fine-grained mode, but it's
more often than not problematic, and it's a maintenance burden. I'd like to
suggest removing it in the 2.0 release.

A few reasons:

- code/maintenance complexity. The two modes duplicate a lot of
functionality (and sometimes code) that leads to subtle differences or
bugs. See SPARK-10444  and
also this thread

 and MESOS-3202 
- it's not widely used (Reynold's previous thread

got very few responses from people relying on it)
- similar functionality can be achieved with dynamic allocation +
coarse-grained mode

I suggest that Spark 1.6 already issues a warning if it detects
fine-grained use, with removal in the 2.0 release.

Thoughts?

iulian


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Iulian Dragoș
Hi Jo,

I agree that there's something fishy with the cluster dispatcher, I've seen
some issues like that.

I think it actually tries to send all properties as part of
`SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:

https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377

Can you please open a Jira ticket and describe also the symptoms? This
might be related, or the same issue: SPARK-11280
 and also SPARK-11327


thanks,
iulian




On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers 
wrote:

>
> Hi all,
>
> I'm running the mesos cluster dispatcher, however when I submit jobs with
> things like jvm args, classpath order and UI port aren't added to the
> commandline executed by the mesos scheduler. In fact it only cares about
> the class, jar and num cores/mem.
>
>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
>
> I've made an attempt at adding a few of the args that I believe are useful
> to the MesosClusterScheduler class, which seems to solve my problem.
>
> Please have a look:
>
> https://github.com/apache/spark/pull/9752
>
> Thanks
>
> - Jo Voordeckers
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Please reply if you use Mesos fine grained mode

2015-11-04 Thread Iulian Dragoș
Probably because only coarse-grained mode respects `spark.cores.max` right
now. See (and maybe review ;-)) #9027
 (sorry for the shameless plug).

iulian

On Wed, Nov 4, 2015 at 5:05 PM, Timothy Chen  wrote:

> Hi Chris,
>
> How does coarse grain mode gives you less starvation in your overloaded
> cluster? Is it just because it allocates all resources at once (which I
> think in a overloaded cluster allows less things to run at once).
>
> Tim
>
>
> On Nov 4, 2015, at 4:21 AM, Heller, Chris  wrote:
>
> We’ve been making use of both. Fine-grain mode makes sense for more ad-hoc
> work loads, and coarse-grained for more job like loads on a common data
> set. My preference is the fine-grain mode in all cases, but the overhead
> associated with its startup and the possibility that an overloaded cluster
> would be starved for resources makes coarse grain mode a reality at the
> moment.
>
> On Wednesday, 4 November 2015 5:24 AM, Reynold Xin 
> wrote:
>
>
> If you are using Spark with Mesos fine grained mode, can you please
> respond to this email explaining why you use it over the coarse grained
> mode?
>
> Thanks.
>
>
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Building Spark w/ 1.8 and binary incompatibilities

2015-10-19 Thread Iulian Dragoș
Hey all,

tl;dr; I built Spark with Java 1.8 even though my JAVA_HOME pointed to 1.7.
Then it failed with binary incompatibilities.

I couldn’t find any mention of this in the docs, so It might be a known
thing, but it’s definitely too easy to do the wrong thing.

The problem is that Maven is using the Zinc incremental compiler, which is
a long-running server. If the first build (that spawns the zinc server) is
started with Java 8 on the path, Maven will compile against Java 8 even
after changing JAVA_HOME and rebuilding.

I filed scala-maven-plugin#173
 but so far no
comment.

Steps to reproduce:

   - make sure zinc is not running yet
   - build with JAVA_HOME pointing to 1.8
   - point JAVA_HOME to 1.7
   - clean build
   - run Spark, watch it fail with NoSuchMethodError in ConcurrentHashMap.
   More details here
   

Workaround:

   - build/zinc/bin/zinc -shutdown
   - rebuild

iulian
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Iulian Dragoș
On Fri, Oct 9, 2015 at 10:34 PM, Patrick Wendell  wrote:

> I would push back slightly. The reason we have the PR builds taking so
> long is death by a million small things that we add. Doing a full 2.11
> compile is order minutes... it's a nontrivial increase to the build times.
>

We can host the build if there's a way to post back a comment when the
build is broken.


>
> It doesn't seem that bad to me to go back post-hoc once in a while and fix
> 2.11 bugs when they come up. It's on the order of once or twice per release
> and the typesafe guys keep a close eye on it (thanks!). Compare that to
> literally thousands of PR runs and a few minutes every time, IMO it's not
> worth it.
>

Anything that can be done by a machine should be done by a machine. I am
not sure we have enough data to say it's only once or twice per release,
and even if we were to issue a PR for each breakage, it's additional load
on committers and reviewers, not to mention our own work. I personally
don't see how 2-3 minutes of compute time per PR can justify hours of work
plus reviews.

iulian


>
> On Fri, Oct 9, 2015 at 3:31 PM, Hari Shreedharan <
> hshreedha...@cloudera.com> wrote:
>
>> +1, much better than having a new PR each time to fix something for
>> scala-2.11 every time a patch breaks it.
>>
>> Thanks,
>> Hari Shreedharan
>>
>>
>>
>>
>> On Oct 9, 2015, at 11:47 AM, Michael Armbrust 
>> wrote:
>>
>> How about just fixing the warning? I get it; it doesn't stop this from
>>> happening again, but still seems less drastic than tossing out the
>>> whole mechanism.
>>>
>>
>> +1
>>
>> It also does not seem that expensive to test only compilation for Scala
>> 2.11 on PR builds.
>>
>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-09 Thread Iulian Dragoș
Sorry for not being clear, yes, that's about the Sbt build and treating
warnings as errors.

Warnings in 2.11 are useful, though, it'd be a pity to keep introducing
potential issues. As a stop-gap measure I can disable them in the Sbt
build, is it hard to run the CI test with 2.11/sbt?

iulian


On Thu, Oct 8, 2015 at 7:24 PM, Reynold Xin  wrote:

> The problem only applies to the sbt build because it treats warnings as
> errors.
>
> @Iulian - how about we disable warnings -> errors for 2.11? That would
> seem better until we switch 2.11 to be the default build.
>
>
> On Thu, Oct 8, 2015 at 7:55 AM, Ted Yu  wrote:
>
>> I tried building with Scala 2.11 on Linux with latest master branch :
>>
>> [INFO] Spark Project External MQTT  SUCCESS [
>> 19.188 s]
>> [INFO] Spark Project External MQTT Assembly ... SUCCESS [
>>  7.081 s]
>> [INFO] Spark Project External ZeroMQ .. SUCCESS [
>>  8.790 s]
>> [INFO] Spark Project External Kafka ... SUCCESS [
>> 14.764 s]
>> [INFO] Spark Project Examples . SUCCESS
>> [02:22 min]
>> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
>> 10.286 s]
>> [INFO]
>> 
>> [INFO] BUILD SUCCESS
>> [INFO]
>> 
>> [INFO] Total time: 17:49 min
>>
>> FYI
>>
>> On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu  wrote:
>>
>>> Interesting
>>>
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
>>> shows green builds.
>>>
>>>
>>> On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș <
>>> iulian.dra...@typesafe.com> wrote:
>>>
>>>> Since Oct. 4 the build fails on 2.11 with the dreaded
>>>>
>>>> [error] /home/ubuntu/workspace/Apache Spark (master) on 
>>>> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: 
>>>> no valid targets for annotation on value conf - it is discarded unused. 
>>>> You may specify targets with meta-annotations, e.g. @(transient @param)
>>>> [error] private[netty] class NettyRpcEndpointRef(@transient conf: 
>>>> SparkConf)
>>>>
>>>> Can we have the pull request builder at least build with 2.11? This
>>>> makes #8433 <https://github.com/apache/spark/pull/8433> pretty much
>>>> useless, since people will continue to add useless @transient annotations.
>>>> ​
>>>> --
>>>>
>>>> --
>>>> Iulian Dragos
>>>>
>>>> --
>>>> Reactive Apps on the JVM
>>>> www.typesafe.com
>>>>
>>>>
>>>
>>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Iulian Dragoș
Since Oct. 4 the build fails on 2.11 with the dreaded

[error] /home/ubuntu/workspace/Apache Spark (master) on
2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310:
no valid targets for annotation on value conf - it is discarded
unused. You may specify targets with meta-annotations, e.g.
@(transient @param)
[error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)

Can we have the pull request builder at least build with 2.11? This makes
#8433  pretty much useless,
since people will continue to add useless @transient annotations.
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-14 Thread Iulian Dragoș
On Fri, Aug 14, 2015 at 4:21 AM, Josh Rosen  wrote:

> Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59
>
> On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen  wrote:
>
>> *TL;DR*: would anyone object if I wrote a script to auto-delete pull
>> request comments from AmplabJenkins?
>>
>> Currently there are two bots which post Jenkins test result comments to
>> GitHub, AmplabJenkins and SparkQA.
>>
>> SparkQA is the account which post the detailed Jenkins start and finish
>> messages that contain information on which commit is being tested and which
>> tests have failed. This bot is controlled via the dev/run-tests-jenkins
>> script.
>>
>> AmplabJenkins is controlled by the Jenkins GitHub Pull Request Builder
>> plugin. This bot posts relatively uninformative comments ("Merge build
>> triggered", "Merge build started", "Merge build failed") that do not
>> contain any links or details specific to the tests being run.
>>
>
Some of these can be configured. For instance, make sure to disable "Use
comments to report intermediate phases: triggered et al", and if you add a
publicly accessible URL in "Published Jenkins URL", you will get a link to
the test result in the test result comment. I know these are global
settings, but the Jenkins URL is unique anyway, and intermediate phases are
probably equally annoying to everyone.

You can see the only comment posted for a successful PR build here:
https://github.com/scala-ide/scala-ide/pull/991#issuecomment-128016214

I'd avoid more custom code if possible.

my 2c,
iulian



>
>> It is technically non-trivial prevent these AmplabJenkins comments from
>> being posted in the first place (see
>> https://issues.apache.org/jira/browse/SPARK-4216).
>>
>> However, as a short-term hack I'd like to deploy a script which
>> automatically deletes these comments as soon as they're posted, with an
>> exemption carved out for the "Can an admin approve this patch for testing?"
>> messages. This will help to significantly de-clutter pull request
>> discussions in the GitHub UI.
>>
>> If nobody objects, I'd like to deploy this script sometime in the next
>> few days.
>>
>> (From a technical perspective, my script uses the GitHub REST API and
>> AmplabJenkins' own OAuth token to delete the comments.  The final
>> deployment environment will most likely be the backend of
>> http://spark-prs.appspot.com).
>>
>> - Josh
>>
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-25 Thread Iulian Dragoș
On Fri, Jul 24, 2015 at 8:19 PM, Reynold Xin  wrote:

Jenkins only run Scala 2.10. I'm actually not sure what the behavior is
> with 2.11 for that patch.
>
> iulian - can you take a look into it and see if it is working as expected?
>
It is, in the sense that warnings fail the build. Unfortunately there are
warnings in 2.11 that were not there in 2.10, and that fail the build. For
instance:

[error] 
/Users/dragos/workspace/git/spark/core/src/main/scala/org/apache/spark/rdd/BinaryFileRDD.scala:31:
no valid targets for annotation on value conf - it is discarded
unused. You may specify targets with meta-annotations, e.g.
@(transient @param)
[error] @transient conf: Configuration,
[error]

Currently the 2.11 build is broken. I don’t think fixing these is too hard,
but it requires these parameters to become vals. I haven’t looked at all
warnings, but I think this is the most common one (if not the only one).

iulian


>
> On Fri, Jul 24, 2015 at 10:24 AM, Iulian Dragoș <
> iulian.dra...@typesafe.com> wrote:
>
>> On Thu, Jul 23, 2015 at 6:08 AM, Reynold Xin  wrote:
>>
>> Hi all,
>>>
>>> FYI, we just merged a patch that fails a build if there is a scala
>>> compiler warning (if it is not deprecation warning).
>>>
>> I’m a bit confused, since I see quite a lot of warnings in
>> semi-legitimate code.
>>
>> For instance, @transient (plenty of instances like this in
>> spark-streaming) might generate warnings like:
>>
>> abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ : 
>> StreamingContext)
>>   extends InputDStream[T](ssc_) {
>>
>> // and the warning is:
>> no valid targets for annotation on value ssc_ - it is discarded unused. You 
>> may specify targets with meta-annotations, e.g. @(transient @param)
>>
>> At least that’s what happens if I build with Scala 2.11, not sure if this
>> setting is only for 2.10, or something really weird is happening on my
>> machine that doesn’t happen on others.
>>
>> iulian
>>
>>
>>> In the past, many compiler warnings are actually caused by legitimate
>>> bugs that we need to address. However, if we don't fail the build with
>>> warnings, people don't pay attention at all to the warnings (it is also
>>> tough to pay attention since there are a lot of deprecated warnings due to
>>> unit tests testing deprecated APIs and reliance on Hadoop on deprecated
>>> APIs).
>>>
>>> Note that ideally we should be able to mark deprecation warnings as
>>> errors as well. However, due to the lack of ability to suppress individual
>>> warning messages in the Scala compiler, we cannot do that (since we do need
>>> to access deprecated APIs in Hadoop).
>>>
>>>
>>>  ​
>> --
>>
>> --
>> Iulian Dragos
>>
>> --
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>  ​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Iulian Dragoș
On Thu, Jul 23, 2015 at 6:08 AM, Reynold Xin  wrote:

Hi all,
>
> FYI, we just merged a patch that fails a build if there is a scala
> compiler warning (if it is not deprecation warning).
>
I’m a bit confused, since I see quite a lot of warnings in semi-legitimate
code.

For instance, @transient (plenty of instances like this in spark-streaming)
might generate warnings like:

abstract class ReceiverInputDStream[T: ClassTag](@transient ssc_ :
StreamingContext)
  extends InputDStream[T](ssc_) {

// and the warning is:
no valid targets for annotation on value ssc_ - it is discarded
unused. You may specify targets with meta-annotations, e.g.
@(transient @param)

At least that’s what happens if I build with Scala 2.11, not sure if this
setting is only for 2.10, or something really weird is happening on my
machine that doesn’t happen on others.

iulian


> In the past, many compiler warnings are actually caused by legitimate bugs
> that we need to address. However, if we don't fail the build with warnings,
> people don't pay attention at all to the warnings (it is also tough to pay
> attention since there are a lot of deprecated warnings due to unit tests
> testing deprecated APIs and reliance on Hadoop on deprecated APIs).
>
> Note that ideally we should be able to mark deprecation warnings as errors
> as well. However, due to the lack of ability to suppress individual warning
> messages in the Scala compiler, we cannot do that (since we do need to
> access deprecated APIs in Hadoop).
>
>
>  ​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-29 Thread Iulian Dragoș
On Mon, Jun 29, 2015 at 3:02 AM, Alessandro Baretta 
wrote:

> I am building the current master branch with Scala 2.11 following these
> instructions:
>
> Building for Scala 2.11
>
> To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11
>  property:
>
> dev/change-version-to-2.11.sh
> mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
>
>
> Here's what I'm seeing:
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.security.Groups).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's repl log4j profile:
> org/apache/spark/log4j-defaults-repl.properties
> To adjust logging level use sc.setLogLevel("INFO")
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.5.0-SNAPSHOT
>   /_/
>
> Using *Scala version 2.10.4* (OpenJDK 64-Bit Server VM, Java 1.7.0_79)
> Type in expressions to have them evaluated.
>

Something is deeply wrong with your build.

iulian



> Type :help for more information.
> 15/06/29 00:42:20 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
> ActorSystem [sparkDriver]
> java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage
> overrides final method
> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at
> akka.remote.transport.AkkaPduProtobufCodec$.constructControlMessagePdu(AkkaPduCodec.scala:231)
> at
> akka.remote.transport.AkkaPduProtobufCodec$.(AkkaPduCodec.scala:153)
> at akka.remote.transport.AkkaPduProtobufCodec$.(AkkaPduCodec.scala)
> at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:733)
> at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:703)
>
> What am I doing wrong?
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Various forks

2015-06-25 Thread Iulian Dragoș
Could someone point the source of the Spark-fork used to build
genjavadoc-plugin? Even more important it would be to know the reasoning
behind this fork.

Ironically, this hinders my attempts at removing another fork, the Spark
REPL fork (and the upgrade to Scala 2.11.7). See here
. Since genjavadoc is a compiler
plugin, it is cross-compiled with the full Scala version, meaning someone
needs to publish a new version for 2.11.7.

Ideally, we'd have a list of all forks maintained by the Spark project. I
know about:

- org.spark-project/akka
- org.spark-project/hive
- org.spark-project/genjavadoc-plugin

Are there more? Where are they hosted, and what's the release process
around them?

thanks,
iulian
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-26 Thread Iulian Dragoș
I tried 1.4.0-rc2 binaries on a 3-node Mesos cluster, everything seemed to
work fine, both spark-shell and spark-submit. Cluster mode deployment also
worked.

+1 (non-binding)

iulian

On Tue, May 26, 2015 at 4:44 AM, jameszhouyi  wrote:

> Compiled:
> git clone https://github.com/apache/spark.git
> git checkout tags/v1.4.0-rc2
> ./make-distribution.sh --tgz --skip-java-test -Pyarn -Phadoop-2.4
> -Dhadoop.version=2.5.0 -Phive -Phive-0.13.1 -Phive-thriftserver -DskipTests
>
> Block issue in RC1/RC2:
> https://issues.apache.org/jira/browse/SPARK-7119
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC2-tp12420p12444.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Why use "lib_managed" for the Sbt build?

2015-05-21 Thread Iulian Dragoș
I’m trying to understand why Sbt is configured to pull all libs under
lib_managed.

   - it seems like unnecessary duplication (I will have those libraries
   under ./m2, via maven anyway)
   - every time I call make-distribution I lose lib_managed (via mvn clean
   install) and have to wait to download again all jars next time I use sbt
   - Eclipse does not handle relative paths very well (source attachments
   from lib_managed don’t always work)

So, what is the advantage of putting all dependencies in there, instead of
using the default `~/.ivy2`?

cheers,
iulian
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Problem building master on 2.11

2015-05-19 Thread Iulian Dragoș
There's an open PR to fix it. If you could try it and report back on the PR
it'd be great. More likely to get in fast.

https://github.com/apache/spark/pull/6260

On Mon, May 18, 2015 at 6:43 PM, Fernando O.  wrote:

> I just noticed I sent this to users instead of dev:
>
> -- Forwarded message --
> From: Fernando O. 
> Date: Sat, May 16, 2015 at 4:09 PM
> Subject: Problem building master on 2.11
> To: "u...@spark.apache.org" 
>
>
> Is anyone else having issues when building spark from git?
> I created a jira ticket with a Docker file that reproduces the issue.
>
> The error:
> /spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
> error: not found: type Type
>   protected Type type() { return Type.UPLOAD_BLOCK; }
>
>
> https://issues.apache.org/jira/browse/SPARK-7670
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Intellij Spark Source Compilation

2015-05-11 Thread Iulian Dragoș
Oh, I see. So then try to run one build on the command time firs (or try sbt
avro:generate, though I’m not sure it’s enough). I just noticed that I have
an additional source folder target/scala-2.10/src_managed/main/compiled_avro
for spark-streaming-flume-sink. I guess I built the project once and that’s
why I never saw these errors..

Once we get it to work, let me know so I can update the wiki guide.

thanks,
iulian
​

On Mon, May 11, 2015 at 4:20 PM, rtimp  wrote:

> Hi,
>
> Thanks Iulian. Yeah, I was kind of anticipating I could just ignore
> old-deps
> ultimately. However, Even after doing a clean and build all, I get the
> following still:
>
> Description LocationResourcePathType
> not found: type EventBatch  line 72 SparkAvroCallbackHandler.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 87 SparkAvroCallbackHandler.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 25 SparkSinkUtils.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 48 TransactionProcessor.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 48 TransactionProcessor.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 80 TransactionProcessor.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type EventBatch  line 146TransactionProcessor.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type SparkFlumeProtocol  line 46
> SparkAvroCallbackHandler.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type SparkFlumeProtocol  line 86 SparkSink.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: type SparkSinkEvent  line 115TransactionProcessor.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: value SparkFlumeProtocol line 185
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> not found: value SparkFlumeProtocol line 194
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> Project not built due to errors in dependent project(s)
> spark-streaming-flume-sink  Unknown spark-streaming-flume
>  Scala Problem
> value ack is not a member of Anyline 52 SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value getEventBatch is not a member of Any  line 51
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value getEventBatch is not a member of Any  line 71
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value getEventBatch is not a member of Any  line 91
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value nack is not a member of Any   line 73 SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
>
>
> I'd mention that the EventBatch  and SparkFlumeProtocol were very similar
> to
> what initially occurred for me in intellij, until I clicked the "Generate
> Sources and Update Folders For All Projects" button in the "Maven Projects"
> tool window that the wiki suggests suggests doing.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Intellij-Spark-Source-Compilation-tp12168p12195.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Intellij Spark Source Compilation

2015-05-11 Thread Iulian Dragoș
Hi,

`old-deps` is not really a project, so you can simply skip it (or close
it). The rest should work fine (clean and build all).

On Sat, May 9, 2015 at 10:27 PM, rtimp  wrote:

> Hi Iulian,
>
> Thanks for the reply!
>
> With respect to eclipse, I'm doing this all with a fresh download of the
> scala ide (Build id: 4.0.0-vfinal-20150305-1644-Typesafe) and with a recent
> pull (as of this morning) of the master branch.When I proceed through the
> instructions for eclipse (creating the project files in sbt, adding scala
> compiler 2.10.4 and setting it for all spark projects) I get the following
> errors:
>
> Description LocationResourcePathType
> not found: type EventBatch  line 72 SparkAvroCallbackHandler.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
>
> Description LocationResourcePathType
> not found: type SparkFlumeProtocol  line 46
> SparkAvroCallbackHandler.scala
>
> /spark-streaming-flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
>
> Description LocationResourcePathType
> Project 'old-deps' is missing required library:
> '/home/loki11/code/spark/spark/lib_managed/jars/spark-bagel_2.10-1.2.0.jar'
> Build path  old-depsBuild Path Problem
>
> Description LocationResourcePathType
> Project 'old-deps' is missing required library:
> '/home/loki11/code/spark/spark/lib_managed/jars/spark-bagel_2.10-1.2.0.jar'
> Build path  old-depsBuild Path Problem
> Project 'old-deps' is missing required library:
> '/home/loki11/code/spark/spark/lib_managed/jars/spark-core_2.10-1.2.0.jar'
> Build path  old-depsBuild Path Problem
>
>
> Description LocationResourcePathType
> value ack is not a member of Anyline 52 SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value getEventBatch is not a member of Any  line 91
> SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
> value nack is not a member of Any   line 73 SparkSinkSuite.scala
>
> /spark-streaming-flume-sink/src/test/scala/org/apache/spark/streaming/flume/sink
> Scala Problem
>
>
> I tried to include just a representative sample of each type of error in
> the
> above (I had 30 in total).
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Intellij-Spark-Source-Compilation-tp12168p12182.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Intellij Spark Source Compilation

2015-05-09 Thread Iulian Dragoș
On Sat, May 9, 2015 at 12:29 AM, rtimp  wrote:

> Hello,
>
> I'm trying to compile the master branch of the spark source (25889d8) in
> intellij. I followed the instructions in the wiki
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools,
> namely I downloaded IntelliJ 14.1.2 with jre 1.7.0_55, imported pom.xml,
> generated all sources in the maven toolbar, and compiled. I receive 3
> errors:
>
> Error:(133, 10) java:
> org.apache.spark.network.sasl.SaslEncryption.EncryptedMessage is not
> abstract and does not override abstract method touch(java.lang.Object) in
> io.netty.util.ReferenceCounted
>
> /home/loki11/code/spark/spark/network/common/src/main/java/org/apache/spark/network/buffer/LazyFileRegion.java
> Error:(39, 14) java: org.apache.spark.network.buffer.LazyFileRegion is not
> abstract and does not override abstract method touch(java.lang.Object) in
> io.netty.util.ReferenceCounted
>
> /home/loki11/code/spark/spark/network/common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
> Error:(34, 1) java: org.apache.spark.network.protocol.MessageWithHeader is
> not abstract and does not override abstract method touch(java.lang.Object)
> in io.netty.util.ReferenceCounted
>
> On the command line, build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0
> -DskipTests clean package succeeds as well build/sbt clean assembly as well
> was build/sbt compile.
>
> It seems to me like I'm missing some trivial intellij option (I'm normally
> an eclipse user, but was having even more trouble with that). Any advice?
>

I just updated the instructions for Eclipse users on the wiki page. I'd be
happy to update them if you let me know what went wrong.

iulian


>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Intellij-Spark-Source-Compilation-tp12168.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Speeding up Spark build during development

2015-05-05 Thread Iulian Dragoș
I'm probably the only Eclipse user here, but it seems I have the best
workflow :) At least for me things work as they should: once I imported
projects in the workspace I can build and run/debug tests from the IDE. I
only go to sbt when I need to re-create projects or I want to run the full
test suite.


iulian



On Tue, May 5, 2015 at 7:35 AM, Tathagata Das  wrote:

> In addition to Michael suggestion, in my SBT workflow I also use "~" to
> automatically kickoff build and unit test. For example,
>
> sbt/sbt "~streaming/test-only *BasicOperationsSuite*"
>
> It will automatically detect any file changes in the project and start of
> the compilation and testing.
> So my full workflow involves changing code in IntelliJ and then
> continuously running unit tests in the background on the command line using
> this "~".
>
> TD
>
>
> On Mon, May 4, 2015 at 2:49 PM, Michael Armbrust 
> wrote:
>
> > FWIW... My Spark SQL development workflow is usually to run "build/sbt
> > sparkShell" or "build/sbt 'sql/test-only '".  These
> commands
> > starts in as little as 30s on my laptop, automatically figure out which
> > subprojects need to be rebuilt, and don't require the expensive assembly
> > creation.
> >
> > On Mon, May 4, 2015 at 5:48 AM, Meethu Mathew 
> > wrote:
> >
> > > *
> > > *
> > > ** ** ** ** **  **  Hi,
> > >
> > >  Is it really necessary to run **mvn --projects assembly/ -DskipTests
> > > install ? Could you please explain why this is needed?
> > > I got the changes after running "mvn --projects streaming/ -DskipTests
> > > package".
> > >
> > > Regards,
> > > Meethu
> > >
> > >
> > > On Monday 04 May 2015 02:20 PM, Emre Sevinc wrote:
> > >
> > >> Just to give you an example:
> > >>
> > >> When I was trying to make a small change only to the Streaming
> component
> > >> of
> > >> Spark, first I built and installed the whole Spark project (this took
> > >> about
> > >> 15 minutes on my 4-core, 4 GB RAM laptop). Then, after having changed
> > >> files
> > >> only in Streaming, I ran something like (in the top-level directory):
> > >>
> > >> mvn --projects streaming/ -DskipTests package
> > >>
> > >> and then
> > >>
> > >> mvn --projects assembly/ -DskipTests install
> > >>
> > >>
> > >> This was much faster than trying to build the whole Spark from
> scratch,
> > >> because Maven was only building one component, in my case the
> Streaming
> > >> component, of Spark. I think you can use a very similar approach.
> > >>
> > >> --
> > >> Emre Sevinç
> > >>
> > >>
> > >>
> > >> On Mon, May 4, 2015 at 10:44 AM, Pramod Biligiri <
> > >> pramodbilig...@gmail.com>
> > >> wrote:
> > >>
> > >>  No, I just need to build one project at a time. Right now SparkSql.
> > >>>
> > >>> Pramod
> > >>>
> > >>> On Mon, May 4, 2015 at 12:09 AM, Emre Sevinc 
> > >>> wrote:
> > >>>
> > >>>  Hello Pramod,
> > 
> >  Do you need to build the whole project every time? Generally you
> > don't,
> >  e.g., when I was changing some files that belong only to Spark
> >  Streaming, I
> >  was building only the streaming (of course after having build and
> >  installed
> >  the whole project, but that was done only once), and then the
> > assembly.
> >  This was much faster than trying to build the whole Spark every
> time.
> > 
> >  --
> >  Emre Sevinç
> > 
> >  On Mon, May 4, 2015 at 9:01 AM, Pramod Biligiri <
> >  pramodbilig...@gmail.com
> > 
> > > wrote:
> > > Using the inbuilt maven and zinc it takes around 10 minutes for
> each
> > > build.
> > > Is that reasonable?
> > > My maven opts looks like this:
> > > $ echo $MAVEN_OPTS
> > > -Xmx12000m -XX:MaxPermSize=2048m
> > >
> > > I'm running it as build/mvn -DskipTests package
> > >
> > > Should I be tweaking my Zinc/Nailgun config?
> > >
> > > Pramod
> > >
> > > On Sun, May 3, 2015 at 3:40 PM, Mark Hamstra <
> > m...@clearstorydata.com>
> > > wrote:
> > >
> > >
> > >>
> > >
> >
> https://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn
> > >
> > >> On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri <
> > >>
> > > pramodbilig...@gmail.com>
> > >
> > >> wrote:
> > >>
> > >>  This is great. I didn't know about the mvn script in the build
> > >>>
> > >> directory.
> > >
> > >> Pramod
> > >>>
> > >>> On Fri, May 1, 2015 at 9:51 AM, York, Brennon <
> > >>> brennon.y...@capitalone.com>
> > >>> wrote:
> > >>>
> > >>>  Following what Ted said, if you leverage the `mvn` from within
> the
> >  `build/` directory of Spark you¹ll get zinc for free which
> should
> > 
> > >>> help
> > >
> > >> speed up build times.
> > 
> >  On 5/1/15, 9:45 AM, "Ted Yu"  wrote:
> > 
> >   Pramod:
> > > Please remember to run Zinc so that the build is faster.
> > >
> > > Cheers
> > >
> > 

Re: Update Wiki Developer instructions

2015-05-04 Thread Iulian Dragoș
Ok, here’s how it should be:

   -

   Eclipse Luna
-

   Scala IDE 4.0
-

   Scala Test

The easiest way is to download the Scala IDE bundle from the Scala IDE
download page <http://scala-ide.org/download/sdk.html>. It comes
pre-installed with ScalaTest. Alternatively, use the provided update site
<http://scala-ide.org/download/current.html> or Eclipse Marketplace.

Remove: “ Importing all Spark sub projects at once is not recommended.” ←
that works just fine.

Add:

If you want to develop on Scala 2.10, you need to configure a Scala
installation for the exact Scala version that’s used to compile Spark. At
the time of this writing that is Scala 2.10.4. You can do that in Eclipse
Preferences -> Scala -> Installations by pointing to the lib/ directory of
your Scala distribution. Once this is done, select all Spark projects and
right-click, choose Scala -> Set Scala Installation and point to the 2.10.4
installation. This should clear all errors about invalid cross-compiled
libraries. A clean build should succeed.

On Mon, May 4, 2015 at 3:40 PM, Sean Owen  wrote:

I think it's only committers that can edit it. I suppose you can open
> a JIRA with a suggested text change if it is significant enough to
> need discussion. If it's trivial, just post it here and someone can
> take care of it.
>
> On Mon, May 4, 2015 at 2:32 PM, Iulian Dragoș
>  wrote:
> > I'd like to update the information about using Eclipse to develop on the
> > Spark project found on this page:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38572224
> >
> > I don't see any way to edit this page (I created an account). Since it's
> a
> > wiki, I assumed it's supposed to be editable, but unfortunately I can't
> > find a way. What's the proper way to update it?
> >
> > thanks,
> > iulian
> >
> > --
> >
> > --
> > Iulian Dragos
> >
> > --
> > Reactive Apps on the JVM
> > www.typesafe.com
>
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Update Wiki Developer instructions

2015-05-04 Thread Iulian Dragoș
I'd like to update the information about using Eclipse to develop on the
Spark project found on this page:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38572224

I don't see any way to edit this page (I created an account). Since it's a
wiki, I assumed it's supposed to be editable, but unfortunately I can't
find a way. What's the proper way to update it?

thanks,
iulian

-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Unit tests

2015-02-10 Thread Iulian Dragoș
Thank, Josh, I missed that PR.

On Mon, Feb 9, 2015 at 7:45 PM, Josh Rosen  wrote:

> Hi Iulian,
>
> I think the AkakUtilsSuite failure that you observed has been fixed in
> https://issues.apache.org/jira/browse/SPARK-5548 /
> https://github.com/apache/spark/pull/4343
>
> On February 9, 2015 at 5:47:59 AM, Iulian Dragoș (
> iulian.dra...@typesafe.com) wrote:
>
> Hi Patrick,
>
> Thanks for the heads up. I was trying to set up our own infrastructure for
> testing Spark (essentially, running `run-tests` every night) on EC2. I
> stumbled upon a number of flaky tests, but none of them look similar to
> anything in Jira with the flaky-test tag. I wonder if there's something
> wrong with our infrastructure, or I should simply open Jira tickets with
> the failures I find. For example, one that appears fairly often on our
> setup is in AkkaUtilsSuite "remote fetch ssl on - untrusted server"
> (exception `ActorNotFound`, instead of `TimeoutException`).
>
> thanks,
> iulian
>
>
> On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell 
> wrote:
>
> > Hey All,
> >
> > The tests are in a not-amazing state right now due to a few compounding
> > factors:
> >
> > 1. We've merged a large volume of patches recently.
> > 2. The load on jenkins has been relatively high, exposing races and
> > other behavior not seen at lower load.
> >
> > For those not familiar, the main issue is flaky (non deterministic)
> > test failures. Right now I'm trying to prioritize keeping the
> > PullReqeustBuilder in good shape since it will block development if it
> > is down.
> >
> > For other tests, let's try to keep filing JIRA's when we see issues
> > and use the flaky-test label (see http://bit.ly/1yRif9S):
> >
> > I may contact people regarding specific tests. This is a very high
> > priority to get in good shape. This kind of thing is no one's "fault"
> > but just the result of a lot of concurrent development, and everyone
> > needs to pitch in to get back in a good place.
> >
> > - Patrick
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
>
>
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Unit tests

2015-02-09 Thread Iulian Dragoș
Hi Patrick,

Thanks for the heads up. I was trying to set up our own infrastructure for
testing Spark (essentially, running `run-tests` every night) on EC2. I
stumbled upon a number of flaky tests, but none of them look similar to
anything in Jira with the flaky-test tag. I wonder if there's something
wrong with our infrastructure, or I should simply open Jira tickets with
the failures I find. For example, one that appears fairly often on our
setup is in AkkaUtilsSuite "remote fetch ssl on - untrusted server"
(exception `ActorNotFound`, instead of `TimeoutException`).

thanks,
iulian


On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell  wrote:

> Hey All,
>
> The tests are in a not-amazing state right now due to a few compounding
> factors:
>
> 1. We've merged a large volume of patches recently.
> 2. The load on jenkins has been relatively high, exposing races and
> other behavior not seen at lower load.
>
> For those not familiar, the main issue is flaky (non deterministic)
> test failures. Right now I'm trying to prioritize keeping the
> PullReqeustBuilder in good shape since it will block development if it
> is down.
>
> For other tests, let's try to keep filing JIRA's when we see issues
> and use the flaky-test label (see http://bit.ly/1yRif9S):
>
> I may contact people regarding specific tests. This is a very high
> priority to get in good shape. This kind of thing is no one's "fault"
> but just the result of a lot of concurrent development, and everyone
> needs to pitch in to get back in a good place.
>
> - Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


UnknownHostException while running YarnTestSuite

2015-01-27 Thread Iulian Dragoș
Hi,

I’m trying to run the Spark test suite on an EC2 instance, but I can’t get
Yarn tests to pass. The hostname I get on that machine is not resolvable,
but adding a line in /etc/hosts makes the other tests pass, except for Yarn
tests.

Any help is greatly appreciated!

thanks,
iulian

ubuntu@ip-172-30-0-248:~/spark$ cat /etc/hosts
127.0.0.1 localhost
172.30.0.248 ip-172-30-0-248

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

and the exception:

[info] - run Spark in yarn-client mode *** FAILED *** (2 seconds, 249
milliseconds)
[info]   java.net.UnknownHostException: Invalid host name: local host
is: (unknown); destination host is: "ip-172-30-0-248":57041;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost
[info]   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
[info]   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
[info]   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[info]   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
[info]   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
[info]   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
[info]   at org.apache.hadoop.ipc.Client$Connection.(Client.java:400)
[info]   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1448)
[info]   at org.apache.hadoop.ipc.Client.call(Client.java:1377)
[info]   at org.apache.hadoop.ipc.Client.call(Client.java:1359)
[info]   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
[info]   at com.sun.proxy.$Proxy69.getClusterMetrics(Unknown Source)
[info]   at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:152)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[info]   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:606)
[info]   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
[info]   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
[info]   at com.sun.proxy.$Proxy70.getClusterMetrics(Unknown Source)
[info]   at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:294)
[info]   at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91)
[info]   at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91)
[info]   at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
[info]   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:49)
[info]   at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:90)
[info]   at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
[info]   at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
[info]   at org.apache.spark.SparkContext.(SparkContext.scala:343)
[info]   at 
org.apache.spark.deploy.yarn.YarnClusterDriver$.main(YarnClusterSuite.scala:175)
[info]   at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$1.apply$mcV$sp(YarnClusterSuite.scala:118)
[info]   at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$1.apply(YarnClusterSuite.scala:116)
[info]   at 
org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$1.apply(YarnClusterSuite.scala:116)
[info]   at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(