Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Patrick Wendell
Yeah the best bet is to use ./build/mvn --force (otherwise we'll still
use your system maven).

- Patrick

On Mon, Aug 3, 2015 at 1:26 PM, Sean Owen so...@cloudera.com wrote:
 That statement is true for Spark 1.4.x. But you've reminded me that I
 failed to update this doc for 1.5, to say Maven 3.3.3 is required.
 Patch coming up.

 On Mon, Aug 3, 2015 at 9:12 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. Reason I asked this is, in Building Spark documentation of
 1.4.1, I still see this.

 https://spark.apache.org/docs/latest/building-spark.html

 Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

 But I noticed the following warnings from the build of Spark version
 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to
 newer versions of Spark or for 1.4.1 version as well.

 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion
 failed with message:
 Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

 Guru Medasani
 gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:

 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.

 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:

 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.

 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?

 Guru Medasani
 gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:

 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.

 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.

 Sean

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Make ML Developer APIs public (post-1.4)

2015-08-03 Thread Eron Wright

Hello,

In developing new third-party pipeline components for Spark ML 1.4 (see 
dl4j-spark-ml), I encountered a few gaps in the earlier effort to make the ML 
Developer APIs public (SPARK-5995).I plan to file issues after we discuss 
on this thread.   The below is a list of types that are presently private but 
might best be made public.
VectorUDT.To define a relation with a vector field,  VectorUDT must be 
instantiated.
SchemaUtils.   Third-party pipeline components have a need for checking column 
types and appending columns.
Identifiable trait.   The trait generates a unique identifier for the 
associated pipeline component.  Nice to have a consistent format by reusing the 
trait.
ProbabilisticClassifier.  Third-party components should leverage the complex 
logic around computing only selected columns.
Shared Params (HasLabel, HasFeatures).   This is covered in SPARK-7146 but 
reiterating it here.
Thanks,
Eron Wright



Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'

2015-08-03 Thread Guru Medasani
Hi,

I was looking at the spark-submit and spark-shell --help  on both (Spark 1.3.1 
and Spark 1.5-snapshot) versions and the Spark documentation for submitting 
Spark applications to YARN. It seems to be there is some mismatch in the 
preferred syntax and documentation. 

Spark documentation 
http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
says that we need to specify either yarn-cluster or yarn-client to connect to a 
yarn cluster. 


yarn-client Connect to a YARN  
http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in client 
mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
YARN_CONF_DIR variable.
yarn-clusterConnect to a YARN  
http://spark.apache.org/docs/latest/running-on-yarn.htmlcluster in cluster 
mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
YARN_CONF_DIR variable.
In the spark-submit --help it says the following Options: --master yarn 
--deploy-mode cluster or client.

Usage: spark-submit [options] app jar | python file [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]

Options:
  --master MASTER_URL spark://host:port, mesos://host:port, yarn, or 
local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally 
(client) or
  on one of the worker machines inside the cluster 
(cluster)
  (Default: client).

I want to bring this to your attention as this is a bit confusing for someone 
running Spark on YARN. For example, they look at the spark-submit help command 
and start using the syntax, but when they look at online documentation or 
user-group mailing list, they see different spark-submit syntax. 

From a quick discussion with other engineers at Cloudera it seems like 
—deploy-mode is preferred as it is more consistent with the way things are done 
with other cluster managers, i.e. there is no standalone-cluster or 
standalone-client masters. This applies to Mesos as well.

Either syntax works, but I would like to propose to use ‘-master yarn 
—deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as it 
is consistent with other cluster managers . This would require updating all 
Spark pages related to submitting Spark applications to YARN.

So far I’ve identified the following pages.

1) http://spark.apache.org/docs/latest/running-on-yarn.html 
http://spark.apache.org/docs/latest/running-on-yarn.html
2) http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
http://spark.apache.org/docs/latest/submitting-applications.html#master-urls

There is a JIRA to track the progress on this as well.

https://issues.apache.org/jira/browse/SPARK-9570 
https://issues.apache.org/jira/browse/SPARK-9570
 
The option we choose dictates whether we update the documentation  or 
spark-submit and spark-shell help pages.  

Any thoughts which direction we should go? 

Guru Medasani
gdm...@gmail.com





Unsubscribe

2015-08-03 Thread Trevor Grant
Please drop me from this list

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo

*Fortunate is he, who is able to know the causes of things.  -Virgil*


Re: [ANNOUNCE] Spark branch-1.5

2015-08-03 Thread Sean Owen
Are these about the right rules of engagement for now until the
release candidate?

- Don't merge new features or improvements into 1.5 unless they're
Important and Have Been Discussed
- Docs and tests are OK to merge into 1.5
- Bug fixes can be merged into 1.5, with increasing conservativeness
as the release candidate approaches

FWIW there are now 331 JIRAs targeted at 1.5.0.

Would it be reasonable to start un-targeting non-bug non-blocker
issues? like, would anyone yell if I started doing that? that would
leave ~100 JIRAs, which still seems like more than can actually go
into the release. And anyone can re-target as desired.

I'm interested with using this to communicate about release planning
so we can actually see how things are moving along and decide if 1.5
has to be pushed back or not; otherwise it seems pretty unpredictable
what's coming, going in, and when the process stops and outputs a
release.


On Mon, Aug 3, 2015 at 7:11 PM, Reynold Xin r...@databricks.com wrote:
 Hi Devs,

 Just an announcement that I've cut Spark's branch-1.5 to form the basis of
 the 1.5 release. Other than a few stragglers, this represents the end of
 active feature development for Spark 1.5. If committers are merging any
 features (outside of alpha modules), please shoot me an email so I can help
 coordinate. Any new commits will need to be explicitly merged into
 branch-1.5.

 In the next few days, we should come up with testing plans for the release
 and create umbrella JIRAs for testing various components and changes. I plan
 to cut a preview package for 1.5 towards the end of this week or early next
 week.


 - R



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Spark branch-1.5

2015-08-03 Thread Joseph Bradley
I agree that it's high time to start changing/removing target versions,
especially if component maintainers have a good idea of what is not needed
for 1.5.  I'll start doing that on ML.

On Mon, Aug 3, 2015 at 12:05 PM, Sean Owen so...@cloudera.com wrote:

 Are these about the right rules of engagement for now until the
 release candidate?

 - Don't merge new features or improvements into 1.5 unless they're
 Important and Have Been Discussed
 - Docs and tests are OK to merge into 1.5
 - Bug fixes can be merged into 1.5, with increasing conservativeness
 as the release candidate approaches

 FWIW there are now 331 JIRAs targeted at 1.5.0.

 Would it be reasonable to start un-targeting non-bug non-blocker
 issues? like, would anyone yell if I started doing that? that would
 leave ~100 JIRAs, which still seems like more than can actually go
 into the release. And anyone can re-target as desired.

 I'm interested with using this to communicate about release planning
 so we can actually see how things are moving along and decide if 1.5
 has to be pushed back or not; otherwise it seems pretty unpredictable
 what's coming, going in, and when the process stops and outputs a
 release.


 On Mon, Aug 3, 2015 at 7:11 PM, Reynold Xin r...@databricks.com wrote:
  Hi Devs,
 
  Just an announcement that I've cut Spark's branch-1.5 to form the basis
 of
  the 1.5 release. Other than a few stragglers, this represents the end of
  active feature development for Spark 1.5. If committers are merging any
  features (outside of alpha modules), please shoot me an email so I can
 help
  coordinate. Any new commits will need to be explicitly merged into
  branch-1.5.
 
  In the next few days, we should come up with testing plans for the
 release
  and create umbrella JIRAs for testing various components and changes. I
 plan
  to cut a preview package for 1.5 towards the end of this week or early
 next
  week.
 
 
  - R
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Should spark-ec2 get its own repo?

2015-08-03 Thread Shivaram Venkataraman
I sent a note to the Mesos developers and created
https://github.com/apache/spark/pull/7899 to change the repository
pointer. There are 3-4 open PRs right now in the mesos/spark-ec2
repository and I'll work on migrating them to amplab/spark-ec2 later
today.

My thoughts on moving the python script is that we should have a
wrapper shell script that just fetches the latest version of
spark_ec2.py for the corresponding Spark branch. We already have
separate branches in our spark-ec2 repository for different Spark
versions so it can just be a call to `wget
https://github.com/amplab/spark-ec2/tree/spark-version/driver/spark_ec2.py`.

Thanks
Shivaram

On Sun, Aug 2, 2015 at 11:34 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote:

 I am considering porting some of this to a more general spark-cloud
 launcher, including google/aliyun/rackspace.  It shouldn't be hard at all
 given the current approach for setup/install.


 FWIW, there are already some tools for launching Spark clusters on GCE and
 Azure:

 http://spark-packages.org/?q=tags%3A%22Deployment%22

 Nick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-03 Thread Ted Yu
When I tried to compile against hbase 1.1.1, I got:

[ERROR]
/home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124:
overloaded method next needs result type
[ERROR]   override def next(result: java.util.List[Cell], limit: Int) =
next(result)

Is there plan to support hbase 1.x ?

Thanks

On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) bing.x...@huawei.com
wrote:

 We are happy to announce the availability of the Spark SQL on HBase 1.0.0
 release.
 http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase

 The main features in this package, dubbed “Astro”, include:

 · Systematic and powerful handling of data pruning and
 intelligent scan, based on partial evaluation technique

 · HBase pushdown capabilities like custom filters and coprocessor
 to support ultra low latency processing

 · SQL, Data Frame support

 · More SQL capabilities made possible (Secondary index, bloom
 filter, Primary Key, Bulk load, Update)

 · Joins with data from other sources

 · Python/Java/Scala support

 · Support latest Spark 1.4.0 release



 The tests by Huawei team and community contributors covered the areas:
 bulk load; projection pruning; partition pruning; partial evaluation; code
 generation; coprocessor; customer filtering; DML; complex filtering on keys
 and non-keys; Join/union with non-Hbase data; Data Frame; multi-column
 family test.  We will post the test results including performance tests the
 middle of August.

 You are very welcomed to try out or deploy the package, and help improve
 the integration tests with various combinations of the settings, extensive
 Data Frame tests, complex join/union test and extensive performance tests.
 Please use the “Issues” “Pull Requests” links at this package homepage, if
 you want to report bugs, improvement or feature requests.

 Special thanks to project owner and technical leader Yan Zhou, Huawei
 global team, community contributors and Databricks.   Databricks has been
 providing great assistance from the design to the release.

 “Astro”, the Spark SQL on HBase package will be useful for ultra low
 latency* query and analytics of large scale data sets in vertical
 enterprises**.* We will continue to work with the community to develop
 new features and improve code base.  Your comments and suggestions are
 greatly appreciated.



 Yan Zhou / Bing Xiao

 Huawei Big Data team





Re: Unsubscribe

2015-08-03 Thread Nicholas Chammas
The way to do that is to follow the Unsubscribe link here for dev@spark:

http://spark.apache.org/community.html

We can't drop you. You have to do it yourself.

Nick

On Mon, Aug 3, 2015 at 1:54 PM Trevor Grant trevor.d.gr...@gmail.com
wrote:

 Please drop me from this list

 Trevor Grant
 Data Scientist
 https://github.com/rawkintrevo
 http://stackexchange.com/users/3002022/rawkintrevo

 *Fortunate is he, who is able to know the causes of things.  -Virgil*




Moving spark-ec2 to amplab github organization

2015-08-03 Thread Shivaram Venkataraman
Hi Mesos developers

The Apache Spark project has been hosting using
https://github.com/mesos/spark-ec2 as a supporting repository for some
of our EC2 scripts. This is a remnant from the days when the Spark
project itself was hosted at github.com/mesos/spark. Based on
discussions in the Spark Developer mailing list [1], we plan to move
the repository to github.com/amplab/spark-ec2 to enable a better
development workflow. As these scripts are not used by the Apache
Mesos project I don’t think any action is required from the Mesos
developers, but please let me know if you have any thoughts about
this.

Thanks
Shivaram

[1] 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Should-spark-ec2-get-its-own-repo-td13151.html

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-03 Thread Yan Zhou.sc
HBase 1.0 should work fine even though we have not completed full tests yet. 
Support of 1.1 should be able to be added with a minimal effort.

Thanks,

Yan

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Monday, August 03, 2015 10:33 AM
To: Bing Xiao (Bing)
Cc: dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc
Subject: Re: Package Release Annoucement: Spark SQL on HBase Astro

When I tried to compile against hbase 1.1.1, I got:

[ERROR] 
/home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124:
 overloaded method next needs result type
[ERROR]   override def next(result: java.util.List[Cell], limit: Int) = 
next(result)

Is there plan to support hbase 1.x ?

Thanks

On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) 
bing.x...@huawei.commailto:bing.x...@huawei.com wrote:
We are happy to announce the availability of the Spark SQL on HBase 1.0.0 
release.  http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
The main features in this package, dubbed “Astro”, include:

• Systematic and powerful handling of data pruning and intelligent 
scan, based on partial evaluation technique

• HBase pushdown capabilities like custom filters and coprocessor to 
support ultra low latency processing

• SQL, Data Frame support

• More SQL capabilities made possible (Secondary index, bloom filter, 
Primary Key, Bulk load, Update)

• Joins with data from other sources

• Python/Java/Scala support

• Support latest Spark 1.4.0 release


The tests by Huawei team and community contributors covered the areas: bulk 
load; projection pruning; partition pruning; partial evaluation; code 
generation; coprocessor; customer filtering; DML; complex filtering on keys and 
non-keys; Join/union with non-Hbase data; Data Frame; multi-column family test. 
 We will post the test results including performance tests the middle of August.
You are very welcomed to try out or deploy the package, and help improve the 
integration tests with various combinations of the settings, extensive Data 
Frame tests, complex join/union test and extensive performance tests.  Please 
use the “Issues” “Pull Requests” links at this package homepage, if you want to 
report bugs, improvement or feature requests.
Special thanks to project owner and technical leader Yan Zhou, Huawei global 
team, community contributors and Databricks.   Databricks has been providing 
great assistance from the design to the release.
“Astro”, the Spark SQL on HBase package will be useful for ultra low latency 
query and analytics of large scale data sets in vertical enterprises. We will 
continue to work with the community to develop new features and improve code 
base.  Your comments and suggestions are greatly appreciated.

Yan Zhou / Bing Xiao
Huawei Big Data team




PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Sean Owen
If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
via brew on OS X), then this won't affect you, but I wanted to call
attention to https://github.com/apache/spark/pull/7852 which makes
Maven 3.3.3 the minimum required to build Spark. This heads off
problems from some behavior differences that Patrick and I observed
between 3.3 and 3.2 last week, on top of the dependency reduced POM
glitch from the 1.4.1 release window.

Again all you need to do is use build/mvn if you don't already have
the latest Maven installed and all will be well.

Sean

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[ANNOUNCE] Spark branch-1.5

2015-08-03 Thread Reynold Xin
Hi Devs,

Just an announcement that I've cut Spark's branch-1.5 to form the basis of
the 1.5 release. Other than a few stragglers, this represents the end of
active feature development for Spark 1.5. *If committers are merging any
features (outside of alpha modules), please shoot me an email so I can help
coordinate. Any new commits will need to be explicitly merged into
branch-1.5*.

In the next few days, we should come up with testing plans for the release
and create umbrella JIRAs for testing various components and changes. I
plan to cut a preview package for 1.5 towards the end of this week or early
next week.


- R


Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-03 Thread james
Based on the latest spark code(commit
608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query
to test two group of combined configuration and seemed that currently it
don't work fine in tungsten-sort shuffle manager from below results:

*Test 1# (PASSED)*
spark.shuffle.manager=sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true 

*Test 2#(FAILED)*
spark.shuffle.manager=tungsten-sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true 

15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode4:50313
15/08/03 16:46:02 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 586 bytes
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:60490
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:56319
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:58179
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:32816
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:55840
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:46874
15/08/03 16:46:02 WARN scheduler.TaskSetManager: Lost task 42.0 in stage
158.0 (TID 1548, bignode4): java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:118)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:107)
at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:140)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:120)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13563.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Guru Medasani
Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT 
this morning. 

WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

Should we be using maven 3.3.3 locally or build/mvn starting from Spark 1.4.1 
or Spark version 1.5?

Guru Medasani
gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:
 
 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.
 
 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.
 
 Sean
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Guru Medasani
Thanks Sean. Reason I asked this is, in Building Spark documentation of 1.4.1, 
I still see this.

https://spark.apache.org/docs/latest/building-spark.html 
https://spark.apache.org/docs/latest/building-spark.html

Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

But I noticed the following warnings from the build of Spark version 
1.5.0-snapshot. So I was wondering if the changes you mentioned relate to newer 
versions of Spark or for 1.4.1 version as well.

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

[WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed 
with message:
Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

Guru Medasani
gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:
 
 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.
 
 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.
 
 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.
 
 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?
 
 Guru Medasani
 gdm...@gmail.com
 
 
 
 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:
 
 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.
 
 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.
 
 Sean
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Sean Owen
Using ./build/mvn should always be fine. Your local mvn is fine too if
it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
on OS X out there will have, by the way.

On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.

 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?

 Guru Medasani
 gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:

 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.

 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.

 Sean

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Sean Owen
That statement is true for Spark 1.4.x. But you've reminded me that I
failed to update this doc for 1.5, to say Maven 3.3.3 is required.
Patch coming up.

On Mon, Aug 3, 2015 at 9:12 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. Reason I asked this is, in Building Spark documentation of
 1.4.1, I still see this.

 https://spark.apache.org/docs/latest/building-spark.html

 Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

 But I noticed the following warnings from the build of Spark version
 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to
 newer versions of Spark or for 1.4.1 version as well.

 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion
 failed with message:
 Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

 Guru Medasani
 gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:

 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.

 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:

 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.

 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?

 Guru Medasani
 gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:

 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.

 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.

 Sean

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Marcelo Vanzin
Just note that if you have mvn in your path, you need to use build/mvn
--force.

On Mon, Aug 3, 2015 at 12:38 PM, Sean Owen so...@cloudera.com wrote:

 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.

 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:
  Thanks Sean. I noticed this one while building Spark version
 1.5.0-SNAPSHOT
  this morning.
 
  WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
  failed with message:
  Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.
 
  Should we be using maven 3.3.3 locally or build/mvn starting from Spark
  1.4.1 or Spark version 1.5?
 
  Guru Medasani
  gdm...@gmail.com
 
 
 
  On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:
 
  If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
  via brew on OS X), then this won't affect you, but I wanted to call
  attention to https://github.com/apache/spark/pull/7852 which makes
  Maven 3.3.3 the minimum required to build Spark. This heads off
  problems from some behavior differences that Patrick and I observed
  between 3.3 and 3.2 last week, on top of the dependency reduced POM
  glitch from the 1.4.1 release window.
 
  Again all you need to do is use build/mvn if you don't already have
  the latest Maven installed and all will be well.
 
  Sean
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 
Marcelo


Re: [ANNOUNCE] Spark branch-1.5

2015-08-03 Thread Michael Armbrust

 Would it be reasonable to start un-targeting non-bug non-blocker
 issues? like, would anyone yell if I started doing that? that would
 leave ~100 JIRAs, which still seems like more than can actually go
 into the release. And anyone can re-target as desired.


I think the maintainers of the various components should take care of
this.  Reynold and I just did a pass over SQL and I think that by Friday
there should only be blocker bugs / documentation remaining.