date:20160613

[GitHub] spark issue #13644: [SPARK-15925][SQL][SPARKR] Replaces registerTempTable wi...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13644
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10953: [SPARK-12177] [STREAMING] Update KafkaDStreams to new Ka...

2016-06-13 Thread markgrover

Github user markgrover commented on the issue:

https://github.com/apache/spark/pull/10953
  
Yeah, I agree with @koeninger. This PR is pretty out of date, it makes 
sense to turn focus on Cody's PR #11863 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13651#discussion_r66880282
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -525,7 +525,7 @@ object TypeCoercion {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
   // Skip nodes who has not been resolved yet,
   // as this is an extra rule which should be applied at last.
-  case e if !e.resolved => e
+  case e if !e.childrenResolved => e
 
   // Decimal and Double remain the same
--- End diff --

We can simplify this:
```
case e if !e.childrenResolved => e
case d: Divide if d.dataType.isInstanceOf[IntegralType] => ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...

2016-06-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13638#discussion_r66880127
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -155,8 +156,9 @@ private[sql] class CacheManager extends Logging {
* function will over invalidate.
*/
   private[sql] def invalidateCache(plan: LogicalPlan): Unit = writeLock {
+val canonicalized = plan.canonicalized
 cachedData.foreach {
-  case data if data.plan.collect { case p if p.sameResult(plan) => p 
}.nonEmpty =>
+  case data if data.plan.collect { case p if 
p.sameResult(canonicalized) => p }.nonEmpty =>
--- End diff --

I don't think so.
For example, if the cached plan is `LocalRelation` (which is canonicalized) 
and the `plan` argument is `SubqueryAlias(LocalRelation)` (which is not 
canonicalized), it will fail to find the same-result plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13651#discussion_r66879827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -213,7 +213,7 @@ case class Multiply(left: Expression, right: Expression)
 case class Divide(left: Expression, right: Expression)
 extends BinaryArithmetic with NullIntolerant {
 
-  override def inputType: AbstractDataType = NumericType
--- End diff --

we should also cleanup the `divide` expression to remove code for integral 
division.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13651#discussion_r66879738
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2847,4 +2847,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   test("SPARK-15887: hive-site.xml should be loaded") {
 assert(spark.sessionState.newHadoopConf().get("hive.in.test") == 
"true")
   }
+
+  test("SPARK-15776 Divide expression inside an Aggregation function 
should not " +
--- End diff --

I think we need some low level unit test instead of end-to-end test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13651
  
**[Test build #60439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60439/consoleFull)**
 for PR 13651 at commit 
[`df08eea`](https://github.com/apache/spark/commit/df08eeacd85187ca5a71463fc5d25f63426ebe84).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13651: [SPARK-15776][SQL] Divide Expression inside Aggre...

2016-06-13 Thread clockfly

GitHub user clockfly opened a pull request:

https://github.com/apache/spark/pull/13651

[SPARK-15776][SQL] Divide Expression inside Aggregation function is casted 
to wrong type

## What changes were proposed in this pull request?

This PR fixes the problem that Divide Expression inside Aggregation 
function is casted to wrong type. After the fix, the behavior is consistent 
with Hive.

**Before the change:**

```
scala> sql("select sum(1 / 2) as a").schema
res4: org.apache.spark.sql.types.StructType = 
StructType(StructField(a,LongType,true))

scala> sql("select sum(1 / 2) as a").show()
+---+
|  a|
+---+
|0  |
+---+
```

**After the change:**

```
scala> sql("select sum(1 / 2) as a").schema
res4: org.apache.spark.sql.types.StructType = 
StructType(StructField(a,DoubleType,true))

scala> sql("select sum(1 / 2) as a").show()
+---+
|  a|
+---+
|0.5|
+---+
```

## How was this patch tested?

Unit test.

This PR is based on https://github.com/apache/spark/pull/13524 by 
@Sephiroth-Lin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clockfly/spark SPARK-15776

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13651


commit df08eeacd85187ca5a71463fc5d25f63426ebe84
Author: Sean Zhong 
Date:   2016-06-13T22:09:20Z

SPARK-15776 Divide Expression inside an Aggregation function is casted to 
wrong type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13338
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13338
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60427/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13338
  
**[Test build #60427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60427/consoleFull)**
 for PR 13338 at commit 
[`bf22b5a`](https://github.com/apache/spark/commit/bf22b5ab0bc8369949ac33833b078e7e13c7ce35).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66878387
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1650,14 +1646,15 @@ SELECT * FROM jsonTable
 ## Hive Tables
 
 Spark SQL also supports reading and writing data stored in [Apache 
Hive](http://hive.apache.org/).
-However, since Hive has a large number of dependencies, it is not included 
in the default Spark assembly.
-Hive support is enabled by adding the `-Phive` and `-Phive-thriftserver` 
flags to Spark's build.
-This command builds a new assembly directory that includes Hive. Note that 
this Hive assembly directory must also be present
-on all of the worker nodes, as they will need access to the Hive 
serialization and deserialization libraries
-(SerDes) in order to access data stored in Hive.
+However, since Hive has a large number of dependencies, these dependencies 
are not included in the
+default Spark distribution. If Hive dependencies can be found on the 
classpath, Spark will load them
+automatically. Note that these Hive dependencies must also be present on 
all of the worker nodes, as
+they will need access to the Hive serialization and deserialization 
libraries (SerDes) in order to
+access data stored in Hive.
 
-Configuration of Hive is done by placing your `hive-site.xml`, 
`core-site.xml` (for security configuration),
-`hdfs-site.xml` (for HDFS configuration) file in `conf/`.
+Configuration of Hive is done by placing your `core-site.xml` (for 
security configuration),
+`hdfs-site.xml` (for HDFS configuration) file in `conf/`, and adding 
configurations in your
+`hive-site.xml` into `conf/spark-defaults.conf`.
--- End diff --

it will not be true soon, users only need to put `hive-site.xml` in 
classpath


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #7963: [SPARK-6227] [MLlib] [PySpark] Implement PySpark wrappers...

2016-06-13 Thread MechCoder

Github user MechCoder commented on the issue:

https://github.com/apache/spark/pull/7963
  
Bump?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66877689
  
--- Diff: docs/sql-programming-guide.md ---
@@ -604,49 +607,47 @@ JavaRDD people = 
sc.textFile("examples/src/main/resources/people.txt").m
   });
 
 // Apply a schema to an RDD of JavaBeans and register it as a table.
-DataFrame schemaPeople = sqlContext.createDataFrame(people, Person.class);
+Dataset schemaPeople = spark.createDataFrame(people, Person.class);
 schemaPeople.createOrReplaceTempView("people");
 
 // SQL can be run over RDDs that have been registered as tables.
-DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 
13 AND age <= 19")
+Dataset teenagers = spark.sql("SELECT name FROM people WHERE age >= 
13 AND age <= 19")
 
-// The results of SQL queries are DataFrames and support all the normal 
RDD operations.
 // The columns of a row in the result can be accessed by ordinal.
-List teenagerNames = teenagers.javaRDD().map(new Function() {
+List teenagerNames = teenagers.map(new MapFunction() {
   public String call(Row row) {
 return "Name: " + row.getString(0);
   }
-}).collect();
+}).collectAsList();
 
 {% endhighlight %}
 
 
 
 
 
+
--- End diff --

looks like it's still valid in python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66877316
  
--- Diff: docs/sql-programming-guide.md ---
@@ -587,7 +590,7 @@ for the JavaBean.
 
 {% highlight java %}
 // sc is an existing JavaSparkContext.
-SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
+SparkSession spark = new org.apache.spark.sql.SparkSession(sc);
 
 // Load a text file and convert each line to a JavaBean.
 JavaRDD people = 
sc.textFile("examples/src/main/resources/people.txt").map(
--- End diff --

is this example still valid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66877088
  
--- Diff: docs/sql-programming-guide.md ---
@@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be 
implicitly converted to a Dat
 registered as a table. Tables can be used in subsequent SQL statements.
 
 {% highlight scala %}
-// sc is an existing SparkContext.
-val sqlContext = new org.apache.spark.sql.SQLContext(sc)
+val spark: SparkSession // An existing SparkSession
 // this is used to implicitly convert an RDD to a DataFrame.
-import sqlContext.implicits._
+import spark.implicits._
 
 // Define the schema using a case class.
 // Note: Case classes in Scala 2.10 can support only up to 22 fields. To 
work around this limit,
 // you can use custom classes that implement the Product interface.
 case class Person(name: String, age: Int)
 
-// Create an RDD of Person objects and register it as a table.
-val people = 
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p 
=> Person(p(0), p(1).trim.toInt)).toDF()
+// Create an RDD of Person objects and register it as a temporary view.
+val people = sc
+  .textFile("examples/src/main/resources/people.txt")
+  .map(_.split(","))
+  .map(p => Person(p(0), p(1).trim.toInt))
+  .toDF()
--- End diff --

There is no reflection anymore, now we always use the type `T` to create 
encoder and serialize the object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-13 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/12938
  
What keeps causing this failure?  Is it the change in conf.py?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13649
  
**[Test build #60438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60438/consoleFull)**
 for PR 13649 at commit 
[`a466517`](https://github.com/apache/spark/commit/a46651794d701370d673b362019274fe76a2ff29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3095/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3093/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3094/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13611: [SPARK-15887][SQL] Bring back the hive-site.xml s...

2016-06-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13611


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3092/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3091/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13649
  
Wow, weird test failure:

```
Running Spark unit tests

[info] Running Spark tests using SBT with these arguments:  -Pyarn 
-Phadoop-2.3 -Phive-thriftserver -Phive 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest
 hive-thriftserver/test mllib/test hive/test examples/test sql/test
Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option 
MaxPermSize=512m; support was removed in 8.0
[info] Loading project definition from 
/home/jenkins/workspace/SparkPullRequestBuilder/project
[CodeBlob (0x7fe7b0214e90)]
Framesize: 2
Runtime Stub (0x7fe7b0214e90): handle_exception_from_callee Runtime1 
stub
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is 
disabled
[CodeBlob (0x7fe7b0214e90)]
Framesize: 2
Runtime Stub (0x7fe7b0214e90): handle_exception_from_callee Runtime1 
stub
[thread 140627161380608 also had an error]#

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (sharedRuntime.cpp:834), pid=37507, tid=140631972611840
#  fatal error: exception happened outside interpreter, nmethods and vtable 
stubs at pc 0x7fe7b0214f71
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 
1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode 
linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/SparkPullRequestBuilder/hs_err_pid37507.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
/home/jenkins/workspace/SparkPullRequestBuilder/build/sbt-launch-lib.bash: 
line 72: 37507 Aborted (core dumped) "$@"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13649
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3090/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #3089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3089/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13498: [SPARK-15011][SQL] Re-enable 'analyze MetastoreRelations...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13498
  
**[Test build #60437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60437/consoleFull)**
 for PR 13498 at commit 
[`655b0c7`](https://github.com/apache/spark/commit/655b0c73a54cbad3ac3c611a3c869feffbe9a1b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13611: [SPARK-15887][SQL] Bring back the hive-site.xml support ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13611
  
**[Test build #60422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60422/consoleFull)**
 for PR 13611 at commit 
[`8b53b22`](https://github.com/apache/spark/commit/8b53b226f0347c545bd13525d6d18bcf6f9a097e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13611: [SPARK-15887][SQL] Bring back the hive-site.xml support ...

2016-06-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13611
  
Thanks. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13638: [SPARK-15915][SQL] CacheManager should use canonicalized...

2016-06-13 Thread marmbrus

Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/13638
  
Seems reasonable.  Is this a regression from 1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13638: [SPARK-15915][SQL] CacheManager should use canoni...

2016-06-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/13638#discussion_r66876008
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -155,8 +156,9 @@ private[sql] class CacheManager extends Logging {
* function will over invalidate.
*/
   private[sql] def invalidateCache(plan: LogicalPlan): Unit = writeLock {
+val canonicalized = plan.canonicalized
 cachedData.foreach {
-  case data if data.plan.collect { case p if p.sameResult(plan) => p 
}.nonEmpty =>
+  case data if data.plan.collect { case p if 
p.sameResult(canonicalized) => p }.nonEmpty =>
--- End diff --

I think this is redundant,  `sameResult` already compares the canonicalized 
plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13649
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60434/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13649
  
**[Test build #60434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60434/consoleFull)**
 for PR 13649 at commit 
[`a466517`](https://github.com/apache/spark/commit/a46651794d701370d673b362019274fe76a2ff29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13649
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13611: [SPARK-15887][SQL] Bring back the hive-site.xml support ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13611
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13563: [SPARK-15826] [CORE] PipedRDD to allow configurab...

2016-06-13 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/13563#discussion_r66875458
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -734,12 +737,14 @@ abstract class RDD[T: ClassTag](
   printPipeContext: (String => Unit) => Unit = null,
   printRDDElement: (T, String => Unit) => Unit = null,
   separateWorkingDir: Boolean = false,
-  bufferSize: Int = 8192): RDD[String] = withScope {
+  bufferSize: Int = 8192,
+  encoding: Charset = StandardCharsets.UTF_8): RDD[String] = withScope 
{
--- End diff --

> I will use String instead. Is that fine ?

ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13611: [SPARK-15887][SQL] Bring back the hive-site.xml support ...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60422/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66875336
  
--- Diff: docs/sql-programming-guide.md ---
@@ -171,9 +171,9 @@ df.show()
 
 
 {% highlight r %}
-sqlContext <- SQLContext(sc)
+spark <- SparkSession(sc)
--- End diff --

SparkR doesn't have SparkSession


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66875208
  
--- Diff: docs/sql-programming-guide.md ---
@@ -145,10 +145,10 @@ df.show()
 
 
 {% highlight java %}
-JavaSparkContext sc = ...; // An existing JavaSparkContext.
-SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
+SparkSession spark = ...; // An existing SparkSession.
+SparkSession spark = new org.apache.spark.sql.SparkSession(sc);
--- End diff --

hm?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13648: [SQL][DOC][minor] document the contract of encoder seria...

2016-06-13 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13648
  
Please create a JIRA ticket.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60421/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13413
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66874954
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
+One use of Spark SQL is to execute SQL queries.
 Spark SQL can also be used to read data from an existing Hive 
installation. For more on how to
 configure this feature, please refer to the [Hive Tables](#hive-tables) 
section. When running
-SQL from within another programming language the results will be returned 
as a [DataFrame](#DataFrames).
+SQL from within another programming language the results will be returned 
as a [Dataset\[Row\]](#datasets).
 You can also interact with the SQL interface using the 
[command-line](#running-the-spark-sql-cli)
 or over [JDBC/ODBC](#running-the-thrift-jdbcodbc-server).
 
-## DataFrames
+## Datasets and DataFrames
 
-A DataFrame is a distributed collection of data organized into named 
columns. It is conceptually
-equivalent to a table in a relational database or a data frame in 
R/Python, but with richer
-optimizations under the hood. DataFrames can be constructed from a wide 
array of [sources](#data-sources) such
-as: structured data files, tables in Hive, external databases, or existing 
RDDs.
+A Dataset is a new interface added in Spark 1.6 that tries to provide the 
benefits of RDDs (strong
+typing, ability to use powerful lambda functions) with the benefits of 
Spark SQL's optimized
+execution engine. A Dataset can be [constructed](#creating-datasets) from 
JVM objects and then
+manipulated using functional transformations (map, flatMap, filter, etc.).
 
-The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.sql.DataFrame),
-[Java](api/java/index.html?org/apache/spark/sql/DataFrame.html),
-[Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and 
[R](api/R/index.html).
+The Dataset API is the successor of the DataFrame API, which was 
introduced in Spark 1.3. In Spark
+2.0, Datasets and DataFrames are unified, and DataFrames are now 
equivalent to Datasets of `Row`s.
+In fact, `DataFrame` is simply a type alias of `Dataset[Row]` in [the 
Scala API][scala-datasets].
+However, [Java API][java-datasets] users must use `Dataset` instead.
 
-## Datasets
+[scala-datasets]: api/scala/index.html#org.apache.spark.sql.Dataset
+[java-datasets]: api/java/index.html?org/apache/spark/sql/Dataset.html
 
-A Dataset is a new experimental interface added in Spark 1.6 that tries to 
provide the benefits of
-RDDs (strong typing, ability to use powerful lambda functions) with the 
benefits of Spark SQL's
-optimized execution engine. A Dataset can be 
[constructed](#creating-datasets) from JVM objects and then manipulated
-using functional transformations (map, flatMap, filter, etc.).
+Python does not have support for the Dataset API, but due to its dynamic 
nature many of the
+benefits are already available (i.e. you can access the field of a row by 
name naturally
+`row.columnName`). The case for R is similar.
 
-The unified Dataset API can be used both in 
[Scala](api/scala/index.html#org.apache.spark.sql.Dataset) and
-[Java](api/java/index.html?org/apache/spark/sql/Dataset.html). Python does 
not yet have support for
-the Dataset API, but due to its dynamic nature many of the benefits are 
already available (i.e. you can
-access the field of a row by name naturally `row.columnName`).

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13413
  
**[Test build #60421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60421/consoleFull)**
 for PR 13413 at commit 
[`75665be`](https://github.com/apache/spark/commit/75665beb74f9a16979dad9161206b863573021b1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13650: [SPARK-9623] [ML] Provide variance for RandomForestRegre...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13650
  
**[Test build #60436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60436/consoleFull)**
 for PR 13650 at commit 
[`0e4e82f`](https://github.com/apache/spark/commit/0e4e82fb778e94aa4641b63e09d848a0362e5939).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13650: [SPARK-9623] [ML] Provide variance for RandomForestRegre...

2016-06-13 Thread MechCoder

Github user MechCoder commented on the issue:

https://github.com/apache/spark/pull/13650
  
cc: @yanboliang @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66873712
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
 the same execution engine is used, independent of which API/language you 
are using to express the
-computation. This unification means that developers can easily switch back 
and forth between the
-various APIs based on which provides the most natural way to express a 
given transformation.
+computation. This unification means that developers can easily switch back 
and forth between
+different APIs based on which provides the most natural way to express a 
given transformation.
 
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in
 the `spark-shell`, `pyspark` shell, or `sparkR` shell.
 
 ## SQL
 
-One use of Spark SQL is to execute SQL queries written using either a 
basic SQL syntax or HiveQL.
--- End diff --

why change this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13563: [SPARK-15826] [CORE] PipedRDD to allow configurab...

2016-06-13 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/13563#discussion_r66873430
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -734,12 +737,14 @@ abstract class RDD[T: ClassTag](
   printPipeContext: (String => Unit) => Unit = null,
   printRDDElement: (T, String => Unit) => Unit = null,
   separateWorkingDir: Boolean = false,
-  bufferSize: Int = 8192): RDD[String] = withScope {
+  bufferSize: Int = 8192,
+  encoding: Charset = StandardCharsets.UTF_8): RDD[String] = withScope 
{
--- End diff --

@zsxwing 

>> Use Codec for Scala API.

`Codec` is scala specific. I intentionally did not use that because I 
wanted Java and Scala APIs to accept the same param types.

Anyways, based on the test case failures, none of Charset or Codec would 
work because they need to be serializable (I see 
`org.apache.spark.SparkException: Task not serializable` while running this 
change). I will use `String` instead. Is that fine ?

>> I suggest using Codec.defaultCharsetCodec as the default value

Thanks for catching that. Its unfortunate to not have UTF8 as the default 
... but backward compatibility is far more important.

```
Caused by: java.io.NotSerializableException: sun.nio.cs.UTF_8
Serialization stack:
- object not serializable (class: sun.nio.cs.UTF_8, value: UTF-8)
- field (class: org.apache.spark.rdd.PipedRDD, name: 
org$apache$spark$rdd$PipedRDD$$encoding, type: class java.nio.charset.Charset)
- object (class org.apache.spark.rdd.PipedRDD, PipedRDD[1] at pipe at 
:29)
- field (class: org.apache.spark.rdd.RDD$$anonfun$collect$1, name: 
$outer, type: class org.apache.spark.rdd.RDD)
- object (class org.apache.spark.rdd.RDD$$anonfun$collect$1, 
)
- field (class: 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12, name: $outer, type: 
class org.apache.spark.rdd.RDD$$anonfun$collect$1)
- object (class 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12, )
at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 56 more
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13650: [SPARK-9623] [ML] Provide variance for RandomForestRegre...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13650
  
**[Test build #60435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60435/consoleFull)**
 for PR 13650 at commit 
[`75254c9`](https://github.com/apache/spark/commit/75254c91cf8d9c2f3638a3f9b1cfd5c029e10996).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13482: [SPARK-15725][YARN] Ensure ApplicationMaster sleeps for ...

2016-06-13 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/13482
  
Ok, I'm fine with this as a work around for now since you don't really know 
and this will ensure it, but please clean up the code so that its clear which 
sleep is which and add a nice comment stating why we are doing this.

Then I think we should file another jira to investigate a more proper fix 
for this.  We shouldn't have to wait for reason to schedule, 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13650: [SPARK-9623] [ML] Provide variance for RandomFore...

2016-06-13 Thread MechCoder

GitHub user MechCoder opened a pull request:

https://github.com/apache/spark/pull/13650

[SPARK-9623] [ML] Provide variance for RandomForestRegressor predictions

## What changes were proposed in this pull request?
It is useful to get the variance of predictions from the 
`RandomForestRegressor` to plot confidence intervals on the predictions. I 
verified the formula from page 17 of this paper 
(http://arxiv.org/pdf/1211.0906v2.pdf)

## How was this patch tested?
I added a couple of tests to the RandomForestRegression test suite.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MechCoder/spark random_forest_var

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13650


commit 75254c91cf8d9c2f3638a3f9b1cfd5c029e10996
Author: MechCoder 
Date:   2016-06-09T18:22:53Z

[SPARK-9623] [ML] Provide variance for RandomForestRegressor predictions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r66872913
  
--- Diff: docs/sql-programming-guide.md ---
@@ -12,130 +12,130 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the 
basic Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of 
both the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. 
There are several ways to
-interact with Spark SQL including SQL, the DataFrames API and the Datasets 
API. When computing a result
+interact with Spark SQL including SQL and the Datasets API. When computing 
a result
--- End diff --

how `, DataFrame API(python/R) and Dataset API(scala/java)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13649
  
/cc @liancheng for review (since you reviewed the original tests in #11775).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13649: [SPARK-15929] Fix portability of DataFrameSuite path glo...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13649
  
**[Test build #60434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60434/consoleFull)**
 for PR 13649 at commit 
[`a466517`](https://github.com/apache/spark/commit/a46651794d701370d673b362019274fe76a2ff29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13623
  
@rxin Thanks. Consolidated all the underscore- and dot-files filtering 
logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13649: [SPARK-15929] Fix portability of DataFrameSuite p...

2016-06-13 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/13649

[SPARK-15929] Fix portability of DataFrameSuite path globbing tests

The DataFrameSuite regression tests for SPARK-13774 fail in my environment 
because they attempt to glob over all of `/mnt` and some of the subdirectories 
restrictive permissions which cause the test to fail.

This patch rewrites those tests to remove all environment-specific 
assumptions; the tests now create their own unique temporary paths for use in 
the tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-15929

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13649


commit a46651794d701370d673b362019274fe76a2ff29
Author: Josh Rosen 
Date:   2016-06-08T19:43:37Z

Clean up environment assumptions in test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-06-13 Thread njwhite

Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
@squito is this OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13603: [SPARK-15865][CORE] Blacklist should not result in job h...

2016-06-13 Thread kayousterhout

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/13603
  
Ohh good point that makes sense re: lost executors.  Given that, I agree 
that this approach seems like the right one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13603: [SPARK-15865][CORE] Blacklist should not result i...

2016-06-13 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/13603#discussion_r66870624
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -280,10 +280,54 @@ private[spark] class TaskSchedulerImpl(
 }
   }
 }
+if (!launchedTask && isTaskSetCompletelyBlacklisted(taskSet)) {
+  taskSet.abort(s"Aborting ${taskSet.taskSet} because it has a task 
which cannot be scheduled" +
+s" on any executor due to blacklists.")
+}
 return launchedTask
   }
 
   /**
+   * Check whether the given task set has been blacklisted to the point 
that it can't run anywhere.
+   *
+   * It is possible that this taskset has become impossible to schedule 
*anywhere* due to the
+   * blacklist.  The most common scenario would be if there are fewer 
executors than
+   * spark.task.maxFailures. We need to detect this so we can fail the 
task set, otherwise the job
+   * will hang.
+   *
+   * The check here is a balance between being sure to catch the issue, 
but not wasting
+   * too much time inside the scheduling loop.  Just check if the last 
task is schedulable
+   * on any of the available executors.  So this is O(numExecutors) 
worst-case, but it'll
+   * really be fast unless you've got a bunch of things blacklisted.  Its 
possible it won't detect
+   * the unschedulable task immediately, but if it returns false, there is 
at least *some* task
+   * that is schedulable, and after scheduling all of those, we'll 
eventually find the unschedulable
+   * task.
+   */
+  private[scheduler] def isTaskSetCompletelyBlacklisted(
--- End diff --

I think it would be cleaner to add this method to the TaskSetManager class  
(and then you don't need the pollPendingTask method) -- and then just pass in 
the executorsByHost map.  That also makes things a little easier to change in 
the future, if there gets to be some easier way of checking if a particular 
task set is completely blacklisted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13637: [SPARK-15914][SQL] Add deprecated method back to SQLCont...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13637
  
**[Test build #60433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60433/consoleFull)**
 for PR 13637 at commit 
[`04ef1b5`](https://github.com/apache/spark/commit/04ef1b557cf4267e85c98993c11e7f6a6a31b6c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13648: [SQL][DOC][minor] document the contract of encoder seria...

2016-06-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13648
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13593: [SPARK-15864] [SQL] Fix Inconsistent Behaviors when Unca...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13593
  
hi @gatorsmile , do you wanna update it? now both `tryUncacheQuery` and 
`uncacheQuery` won't unregister accumulator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13636: [SPARK-15637][SPARKR] Remove R version check since maske...

2016-06-13 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13636
  
For reference, I was using 

```
R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13646
  
**[Test build #60432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60432/consoleFull)**
 for PR 13646 at commit 
[`3e47166`](https://github.com/apache/spark/commit/3e471665505ba0b259fcd7b4a69d2c4ae1f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13648: [SQL][DOC][minor] document the contract of encoder seria...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13648
  
**[Test build #60431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60431/consoleFull)**
 for PR 13648 at commit 
[`cdda303`](https://github.com/apache/spark/commit/cdda303ed624aaf3389fb190ff8c473f06afa681).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13645: [HOTFIX] Revert "[MINOR][SQL] Standardize 'continuous qu...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13645
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60416/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13645: [HOTFIX] Revert "[MINOR][SQL] Standardize 'continuous qu...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13645
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13648: [SQL][DOC][minor] document the contract of encoder seria...

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13648
  
cc @hvanhovell @liancheng @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13648: [SQL][DOC][minor] document the contract of encode...

2016-06-13 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13648

[SQL][DOC][minor] document the contract of encoder serializer expressions

## What changes were proposed in this pull request?

In our encoder framework, we imply that serializer expressions should use 
`BoundReference` to refer to the input object, and a lot of codes depend on 
this contract(e.g. ExpressionEncoder.tuple).  This PR adds some document and 
assert in `ExpressionEncoder` to make it clearer.


## How was this patch tested?

existing tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark comment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13648


commit cdda303ed624aaf3389fb190ff8c473f06afa681
Author: Wenchen Fan 
Date:   2016-06-13T20:26:01Z

document the contract of encoder serializer expressions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13645: [HOTFIX] Revert "[MINOR][SQL] Standardize 'continuous qu...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13645
  
**[Test build #60416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60416/consoleFull)**
 for PR 13645 at commit 
[`2199031`](https://github.com/apache/spark/commit/21990313db506ac13eb7a29f3dd9f2022712cafd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13636: [SPARK-15637][SPARKR] Remove R version check since maske...

2016-06-13 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13636
  
I'm still seeing errors after this change in my environment:

```
Failed 
-
1. Failure: Check masked functions (@test_context.R#31) 

length(maskedBySparkR) not equal to length(namesOfMasked).
1/1 mismatches
[1] 22 - 20 == 2


2. Failure: Check masked functions (@test_context.R#32) 

sort(maskedBySparkR) not equal to sort(namesOfMasked).
Lengths differ: 22 vs 20


DONE 
===
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13524: [SPARK-15776][SQL] Type coercion incorrect

2016-06-13 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/13524
  
@Sephiroth-Lin 

I think you can use a simpler case in the description of this PR.

Such as:

```
select sum(4/3)
```

The expected result is:
```
1.3..
```

The actual result is:
```
1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13623
  
**[Test build #60430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60430/consoleFull)**
 for PR 13623 at commit 
[`ee438e4`](https://github.com/apache/spark/commit/ee438e466e9b5368f821e5cac580393ecf8921ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13539
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13539
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60412/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13645: [HOTFIX] Revert "[MINOR][SQL] Standardize 'continuous qu...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13645
  
**[Test build #3081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3081/consoleFull)**
 for PR 13645 at commit 
[`2199031`](https://github.com/apache/spark/commit/21990313db506ac13eb7a29f3dd9f2022712cafd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13539
  
**[Test build #60412 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60412/consoleFull)**
 for PR 13539 at commit 
[`186283e`](https://github.com/apache/spark/commit/186283e9321120b9a8def7a3ba51ecf5c423e049).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13646
  
**[Test build #60419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60419/consoleFull)**
 for PR 13646 at commit 
[`42a8d16`](https://github.com/apache/spark/commit/42a8d16ed0b7e8175a58d1d6fa21685cc36c85c2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13646
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13646
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60419/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13646
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13140: [SPARK-15230] [SQL] distinct() does not handle column na...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13140
  
**[Test build #60429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60429/consoleFull)**
 for PR 13140 at commit 
[`2f7ffbd`](https://github.com/apache/spark/commit/2f7ffbd58a3437898f32e7603ca6b603f5fd5088).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13482: [SPARK-15725][YARN] Ensure ApplicationMaster sleeps for ...

2016-06-13 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/13482
  
@tgravescs, removing `notifyAll` doesn't solve the problem entirely, it 
just removes one path that's causing the `allocate` call to be run too many 
times. (Also, I haven't tested delaying loss reasons in our Spark jobs at 
scale, other than for the 200ms introduced here.) Ensuring that `allocate` is 
not called too often addresses the problem no matter what the immediate cause 
is. That's why I think it's a good idea to fix the two separately: first ensure 
that `allocate` will not run too often and starve other operations on the 
`YarnAllocator` and, second, track down the cases that cause this.

Even if we were to fix the `YarnAllocator` so we don't have resource 
contention, ensuring a min interval between calls to `allocate` is a good idea 
so that Spark doesn't make too many useless calls to the resource manager. And, 
I don't want to track down this same bug in 3 months because of a different 
path trigger it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to ...

2016-06-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13444: [SPARK-15530][SQL] Set #parallelism for file list...

2016-06-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13444


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-13 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/13613
  
Thanks. Merging to master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13644: [SPARK-15925][SQL][SPARKR] Replaces registerTempTable wi...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13644
  
**[Test build #60428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60428/consoleFull)**
 for PR 13644 at commit 
[`56a3b9e`](https://github.com/apache/spark/commit/56a3b9e17659f0ea391e6627e4e2136397af4447).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-06-13 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/13513
  
@tdas @zsxwing , what is your comment about this PR? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13221: [SPARK-15443][SQL][Streaming] Properly explain co...

2016-06-13 Thread jerryshao

Github user jerryshao closed the pull request at:

https://github.com/apache/spark/pull/13221


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13221: [SPARK-15443][SQL][Streaming] Properly explain continuou...

2016-06-13 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/13221
  
I'm going to close until I have a thorough fix about this issue, thanks a 
lot for your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...

2016-06-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13444
  
Thanks! Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60413/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13613
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...

2016-06-13 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13137
  
@maropu Just had an offline discussion with @yhuai. So this case is a 
little bit different from #13444. In #13444, the number of leaf files is 
unknown before issuing the job, and each task may take one or more directories 
and further list them recursively, thus increasing parallelism is potentially 
useful. Plus that listing leaf files may suffer from data skew (one directory 
containing significantly more files than others).

In the Parquet schema reading case, the file number is already known, and 
there's no data skew problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13613
  
**[Test build #60413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60413/consoleFull)**
 for PR 13613 at commit 
[`4971da3`](https://github.com/apache/spark/commit/4971da3598685ab5c9c0274dda95412bc01bedfe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13338: [SPARK-13723] [YARN] Change behavior of --num-exe...

2016-06-13 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/13338#discussion_r66862738
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2309,21 +2310,24 @@ private[spark] object Utils extends Logging {
   }
 
   /**
-   * Return whether dynamic allocation is enabled in the given conf
-   * Dynamic allocation and explicitly setting the number of executors are 
inherently
-   * incompatible. In environments where dynamic allocation is turned on 
by default,
-   * the latter should override the former (SPARK-9092).
+   * Return whether dynamic allocation is enabled in the given conf.
*/
   def isDynamicAllocationEnabled(conf: SparkConf): Boolean = {
-val numExecutor = conf.getInt("spark.executor.instances", 0)
 val dynamicAllocationEnabled = 
conf.getBoolean("spark.dynamicAllocation.enabled", false)
-if (numExecutor != 0 && dynamicAllocationEnabled) {
-  logWarning("Dynamic Allocation and num executors both set, thus 
dynamic allocation disabled.")
-}
-numExecutor == 0 && dynamicAllocationEnabled &&
+dynamicAllocationEnabled &&
   (!isLocalMaster(conf) || 
conf.getBoolean("spark.dynamicAllocation.testing", false))
   }
 
+  /**
+   * Return the initial number of executors for dynamic allocation.
+   */
+  def getDynamicAllocationInitialExecutors(conf: SparkConf): Int = {
+Seq(
+  conf.get(DYN_ALLOCATION_MIN_EXECUTORS),
+  conf.get(DYN_ALLOCATION_INITIAL_EXECUTORS),
+  conf.get(EXECUTOR_INSTANCES).getOrElse(0)).max
--- End diff --

Do we need to support environment variable `SPARK_EXECUTOR_INSTANCES`? 
Since it is not officially deprecated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13338: [SPARK-13723] [YARN] Change behavior of --num-executors ...

2016-06-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13338
  
**[Test build #60427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60427/consoleFull)**
 for PR 13338 at commit 
[`bf22b5a`](https://github.com/apache/spark/commit/bf22b5ab0bc8369949ac33833b078e7e13c7ce35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13137: [SPARK-15247][SQL] Set the default number of part...

2016-06-13 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13137#discussion_r66862256
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -795,11 +795,15 @@ private[sql] object ParquetFileFormat extends Logging 
{
 // side, and resemble fake `FileStatus`es there.
 val partialFileStatusInfo = filesToTouch.map(f => (f.getPath.toString, 
f.getLen))
 
+// Set the number of partitions to prevent following schema reads from 
generating many tasks
+// in case of a small number of parquet files.
+val numParallelism = Math.min(partialFileStatusInfo.size + 1, 1)
--- End diff --

`Math.min(partialFileStatusInfo.size + 1, parallelism)` is better. I think 
this case is different form https://github.com/apache/spark/pull/13444. At 
here, we already have a set of files and we apply the same operation to every 
file. However, for the issue that https://github.com/apache/spark/pull/13444 is 
trying to address, we do not really the amount of work assigned to a task (it 
depends on the number of actual files in a dir).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 >

301 - 400 of 708 matches

Mail list logo