date:20160825

[GitHub] spark issue #14815: [SPARK-17244] Catalyst should not pushdown non-determini...

2016-08-25 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/14815
  
cc @hvanhovell @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-25 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @srowen Could you please review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14815: [SPARK-17244] Catalyst should not pushdown non-de...

2016-08-25 Thread sameeragarwal

GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/14815

[SPARK-17244] Catalyst should not pushdown non-deterministic join conditions

## What changes were proposed in this pull request?

Given that non-deterministic expressions can be stateful, pushing them down 
the query plan during the optimization phase can cause incorrect behavior. This 
patch fixes that issue by explicitly disabling that.

## How was this patch tested?

A new test in `FilterPushdownSuite` that checks catalyst behavior for both 
deterministic and non-deterministic join conditions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark constraint-inputfile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14815


commit 95150970d7e5a71d9271a209a8ee453ce20f8097
Author: Sameer Agarwal 
Date:   2016-08-24T19:37:33Z

Joins should not pushdown non-deterministic conditions

commit 6728fc31bab1fd53e1005f892496ec61b6d22cd0
Author: Sameer Agarwal 
Date:   2016-08-25T20:45:32Z

unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #64435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64435/consoleFull)**
 for PR 14637 at commit 
[`09f3197`](https://github.com/apache/spark/commit/09f3197e7cac9a45315bf5bdaed57c97bcd0e46d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14814
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64434/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14814
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14814
  
**[Test build #64434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64434/consoleFull)**
 for PR 14814 at commit 
[`17bb37e`](https://github.com/apache/spark/commit/17bb37e529b69823858d3e5edb0891a1ba6c9205).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...

2016-08-25 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14814
  
/cc @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/14637
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/8880#discussion_r76318724
  
--- Diff: 
core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.security
+
+import java.io.{InputStream, OutputStream}
+import java.util.Properties
+import javax.crypto.spec.{IvParameterSpec, SecretKeySpec}
+
+import org.apache.commons.crypto.random._
+import org.apache.commons.crypto.stream._
+import org.apache.hadoop.io.Text
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.config._
+
+/**
+ * A util class for manipulating IO encryption and decryption streams.
+ */
+private[spark] object CryptoStreamUtils {
+  /**
+   * Constants and variables for spark IO encryption
+   */
+  val SPARK_IO_TOKEN = new Text("SPARK_IO_TOKEN")
+
+  // The initialization vector length in bytes.
+  val IV_LENGTH_IN_BYTES = 16
+  // The prefix of IO encryption related configurations in Spark 
configuration.
+  val SPARK_IO_ENCRYPTION_COMMONS_CONFIG_PREFIX = 
"spark.io.encryption.commons.config."
+  // The prefix for the configurations passing to Apache Commons Crypto 
library.
+  val COMMONS_CRYPTO_CONF_PREFIX = "commons.crypto."
+
+  /**
+   * Helper method to wrap [[OutputStream]] with [[CryptoOutputStream]] 
for encryption.
+   */
+  def createCryptoOutputStream(
+  os: OutputStream,
+  sparkConf: SparkConf): OutputStream = {
+val properties = toCryptoConf(sparkConf, 
SPARK_IO_ENCRYPTION_COMMONS_CONFIG_PREFIX,
+  COMMONS_CRYPTO_CONF_PREFIX)
+val iv = createInitializationVector(properties)
+os.write(iv)
+val credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
+val key = credentials.getSecretKey(SPARK_IO_TOKEN)
+val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION)
+new CryptoOutputStream(transformationStr, properties, os,
+  new SecretKeySpec(key, "AES"), new IvParameterSpec(iv))
+  }
+
+  /**
+   * Helper method to wrap [[InputStream]] with [[CryptoInputStream]] for 
decryption.
+   */
+  def createCryptoInputStream(
+  is: InputStream,
+  sparkConf: SparkConf): InputStream = {
+val properties = toCryptoConf(sparkConf, 
SPARK_IO_ENCRYPTION_COMMONS_CONFIG_PREFIX,
+  COMMONS_CRYPTO_CONF_PREFIX)
+val iv = new Array[Byte](IV_LENGTH_IN_BYTES)
+is.read(iv, 0, iv.length)
+val credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
+val key = credentials.getSecretKey(SPARK_IO_TOKEN)
+val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION)
+new CryptoInputStream(transformationStr, properties, is,
+  new SecretKeySpec(key, "AES"), new IvParameterSpec(iv))
+  }
+
+  /**
+   * Get Commons-crypto configurations from Spark configurations 
identified by prefix.
+   */
+  def toCryptoConf(
+  conf: SparkConf,
+  sparkPrefix: String,
+  cryptoPrefix: String): Properties = {
--- End diff --

nit: you don't need `sparkPrefix` and `cryptoPrefix` any more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64430/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #64430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64430/consoleFull)**
 for PR 14637 at commit 
[`09f3197`](https://github.com/apache/spark/commit/09f3197e7cac9a45315bf5bdaed57c97bcd0e46d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14814: [SPARK-17242][Document]Update links of external dstream ...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14814
  
**[Test build #64434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64434/consoleFull)**
 for PR 14814 at commit 
[`17bb37e`](https://github.com/apache/spark/commit/17bb37e529b69823858d3e5edb0891a1ba6c9205).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14814: [SPARK-17242][Document]Update links of external d...

2016-08-25 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/14814

[SPARK-17242][Document]Update links of external dstream projects

## What changes were proposed in this pull request?

Updated links of external dstream projects.

## How was this patch tested?

Just document changes.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark dstream-link

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14814.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14814


commit 17bb37e529b69823858d3e5edb0891a1ba6c9205
Author: Shixiong Zhu 
Date:   2016-08-25T20:16:34Z

Update links of external dstream projects




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14813: [SPARK-17240][core] Make SparkConf serializable again.

2016-08-25 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/14813
  
thanks!  LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14813: [SPARK-17240][core] Make SparkConf serializable again.

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14813
  
**[Test build #64433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64433/consoleFull)**
 for PR 14813 at commit 
[`45cf302`](https://github.com/apache/spark/commit/45cf3028e778f9685224612829814a108932242c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14637
  
LGTM now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...

2016-08-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14777#discussion_r76313519
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) 
extends LeafExpression with
 case (v: Short, ShortType) => v + "S"
 case (v: Long, LongType) => v + "L"
 // Float type doesn't have a suffix
-case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})"
-case (v: Double, DoubleType) => v + "D"
+case (v: Float, FloatType) =>
+  val castedValue = v match {
+case _ if v.isNaN => "'NaN'"
+case Float.PositiveInfinity => "'Infinity'"
+case Float.NegativeInfinity => "'-Infinity'"
+case _ => v
+  }
+  s"CAST($castedValue AS ${FloatType.sql})"
+case (v: Double, DoubleType) =>
+  v match {
+case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})"
+case Double.PositiveInfinity => s"CAST('Infinity' AS 
${DoubleType.sql})"
+case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
+case _ => v + "D"
+  }
 case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
--- End diff --

According to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-FloatingPointTypes:

> Floating point literals are assumed to be DOUBLE. Scientific notation is 
not yet supported.

However, the professed lack of support for scientific notation seems to be 
contradicted by https://issues.apache.org/jira/browse/HIVE-2536 and manual 
tests.

Here's a test query which demonstrates the precision issues in decimal 
literals:

```
SELECT
CAST(-0.06688467811848818630 as DECIMAL(38, 36)), 
CAST(-6.688467811848818630E-18 AS DECIMAL(38, 36))
```

In Hive, these both behave equivalently: both forms of the number are 
interpreted as double so we lose precision and both cases wind up as 
`0.06688467811848818` (with the final three digits lost).

In Spark 2.0, the first expanded form is parsed as a decimal literal, while 
the scientific notation form is parsed as a double, so the expanded form 
correctly preserves the decimal while the scientific notation causes precision 
loss (as in Hive).

I think there's two possible fixes here: we could either emit the 
fully-expanded form or could update Spark's parser to treat scientific notation 
floating point literals as decimals.

From a consistency point, I'm in favor of the latter approach because I 
don't think it makes sense for `1.1` and `1.1e0` to be treated differently. 

Given all of this, I think that it would certainly be _safe_ to emit 
fully-expanded forms of the decimal but I'm not sure if this is the optimal fix 
because it doesn't resolve inconsistencies between Spark and Hive and results 
in really ugly, hard-to-read expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14813: [SPARK-17240][core] Make SparkConf serializable a...

2016-08-25 Thread vanzin

GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/14813

[SPARK-17240][core] Make SparkConf serializable again.

Make the config reader transient, and initialize it lazily so that
serialization works with both java and kryo (and hopefully any other
custom serializer).

Added unit test to make sure SparkConf remains serializable and the
reader works with both built-in serializers.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-17240

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14813


commit 45cf3028e778f9685224612829814a108932242c
Author: Marcelo Vanzin 
Date:   2016-08-25T19:49:52Z

[SPARK-17240][core] Make SparkConf serializable again.

Make the config reader transient, and initialize it lazily so that
serialization works with both java and kryo (and hopefully any other
custom serializer).

Added unit test to make sure SparkConf remains serializable and the
reader works with both built-in serializers.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14812: [SPARK-17237][SQL] Remove unnecessary backticks in a piv...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14812
  
**[Test build #64432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64432/consoleFull)**
 for PR 14812 at commit 
[`530d5c0`](https://github.com/apache/spark/commit/530d5c03b414d9743944d532ecb9e9bd1c0bf5a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14812: [SPARK-17237][SQL] Remove unnecessary backticks i...

2016-08-25 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/14812

[SPARK-17237][SQL] Remove unnecessary backticks in a pivot result schema 

## What changes were proposed in this pull request?
A schema of pivot results has nested backticks (e.g. \`3_count(\`c\`)\`).
Since `Dataset#resolve` cannot handle the nested backticks, these column 
references fail
after pivoting. This pr is to remove the unnecessary backticks.

## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-17237

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14812


commit 530d5c03b414d9743944d532ecb9e9bd1c0bf5a5
Author: Takeshi YAMAMURO 
Date:   2016-08-25T19:17:50Z

Fix a bug to handle missing data after pivoting




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-08-25 Thread ooq

Github user ooq commented on the issue:

https://github.com/apache/spark/pull/14176
  
@davies I guess there is still benefit to make it public? If the user knows 
that their workload would always run faster with single-level, e.g., many 
distinct keys. I thought about `spark.sql.codegen.aggregate.map.fast.enable` or 
`spark.sql.codegen.aggregate.map.codegen.enable`, but none of them captures the 
fact that the biggest distinction is the two level design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14811: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14811
  
**[Test build #64431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64431/consoleFull)**
 for PR 14811 at commit 
[`e44d943`](https://github.com/apache/spark/commit/e44d94316e1641ea7db34efab5f0d669090d2599).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/14798
  
@zsxwing PR #14811 is a backport of this PR to `branch-2.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14811: [SPARK-17231][CORE] Avoid building debug or trace...

2016-08-25 Thread mallman

GitHub user mallman opened a pull request:

https://github.com/apache/spark/pull/14811

[SPARK-17231][CORE] Avoid building debug or trace log messages unless

This is simply a backport of #14798 to `branch-2.0`. This backport omits 
the change to `ExternalShuffleBlockHandler.java`. In `branch-2.0`, that file 
does not contain the log message that was patched in `master`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VideoAmp/spark-public 
spark-17231-logging_perf_improvements-2.0_backport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14811


commit e44d94316e1641ea7db34efab5f0d669090d2599
Author: Michael Allman 
Date:   2016-08-25T19:06:45Z

[SPARK-17231][CORE] Avoid building debug or trace log messages unless
the respective log level is enabled




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14774: [SparkR][BUILD]:ignore cran-check.out under R fol...

2016-08-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14774


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...

2016-08-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14777#discussion_r76305691
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) 
extends LeafExpression with
 case (v: Short, ShortType) => v + "S"
 case (v: Long, LongType) => v + "L"
 // Float type doesn't have a suffix
-case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})"
-case (v: Double, DoubleType) => v + "D"
+case (v: Float, FloatType) =>
+  val castedValue = v match {
+case _ if v.isNaN => "'NaN'"
+case Float.PositiveInfinity => "'Infinity'"
+case Float.NegativeInfinity => "'-Infinity'"
+case _ => v
+  }
+  s"CAST($castedValue AS ${FloatType.sql})"
+case (v: Double, DoubleType) =>
+  v match {
+case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})"
+case Double.PositiveInfinity => s"CAST('Infinity' AS 
${DoubleType.sql})"
+case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
+case _ => v + "D"
+  }
 case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
--- End diff --

Actually, let me go ahead and quickly confirm whether Hive will support 
full expansion...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...

2016-08-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r76305079
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -237,21 +237,27 @@ private[hive] class 
HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
   new Path(metastoreRelation.catalogTable.storage.locationUri.get),
   partitionSpec)
 
-val inferredSchema = if (fileType.equals("parquet")) {
-  val inferredSchema =
-defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles())
-  inferredSchema.map { inferred =>
-ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, 
inferred)
-  }.getOrElse(metastoreSchema)
-} else {
-  defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles()).get
+val schema = fileType match {
+  case "parquet" =>
+val inferredSchema =
+  defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles())
+
+// For Parquet, get correct schema by merging Metastore schema 
data types
--- End diff --

I think we have a test. @liancheng should have more info. But, one 
clarification is that this merging is based on column name (we do take care the 
case sensitivity issue though). So, if you want to name a column with another 
name, I think it is not doable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14798: [SPARK-17231][CORE] Avoid building debug or trace...

2016-08-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14798


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/14798
  
Will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14798
  
@mallman It has some conflicts with 2.0. Could you submit another PR for 
branch 2.0, please? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14798
  
LGTM. I was just thinking to work on this yesterday! Thanks, merging to 
master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #64430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64430/consoleFull)**
 for PR 14637 at commit 
[`09f3197`](https://github.com/apache/spark/commit/09f3197e7cac9a45315bf5bdaed57c97bcd0e46d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8880#discussion_r76298874
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/security/IOEncryptionSuite.scala ---
@@ -0,0 +1,332 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.security
--- End diff --

This is still in the "yarn" module. Weren't you going to move it to "core"? 
(As in the physical location of the file, not the scala package name.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8880#discussion_r76298505
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -413,6 +414,10 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 }
 
 if (master == "yarn" && deployMode == "client") 
System.setProperty("SPARK_YARN_MODE", "true")
+if (_conf.get(IO_ENCRYPTION_ENABLED) && 
!SparkHadoopUtil.get.isYarnMode()) {
+  throw new SparkException("IO encryption is only supported in YARN 
mode, please disable it " +
+"by setting spark.io.encryption.enabled to false")
--- End diff --

nit: use `${IO_ENCRYPTION_ENABLED.key}` instead of the hardcoded key name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64427/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14239
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64427/consoleFull)**
 for PR 14239 at commit 
[`190d7fa`](https://github.com/apache/spark/commit/190d7fa8e8e2b0795e12eebba568be7428647f68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13852: [SPARK-16200][SQL] Rename AggregateFunction#suppo...

2016-08-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13852#discussion_r76292731
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -45,7 +45,7 @@ abstract class Collect extends ImperativeAggregate {
 
   override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
 
-  override def supportsPartial: Boolean = false
+  override def forceSortAggregate: Boolean = true
--- End diff --

yea. either way, it seems partial agg. becomes meaningless in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64429/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread mgummelt

Github user mgummelt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r76292137
  
--- Diff: dev/create-release/release-build.sh ---
@@ -186,12 +186,13 @@ if [[ "$1" == "package" ]]; then
 
   # We increment the Zinc port each time to avoid OOM's and other 
craziness if multiple builds
   # share the same Zinc server.
-  make_binary_release "hadoop2.3" "-Psparkr -Phadoop-2.3 -Phive 
-Phive-thriftserver -Pyarn" "3033" &
-  make_binary_release "hadoop2.4" "-Psparkr -Phadoop-2.4 -Phive 
-Phive-thriftserver -Pyarn" "3034" &
-  make_binary_release "hadoop2.6" "-Psparkr -Phadoop-2.6 -Phive 
-Phive-thriftserver -Pyarn" "3035" &
-  make_binary_release "hadoop2.7" "-Psparkr -Phadoop-2.7 -Phive 
-Phive-thriftserver -Pyarn" "3036" &
-  make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 
-Pyarn" "3037" &
-  make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn" 
"3038" &
+  FLAGS="-Psparkr -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn -Pmesos"
+  make_binary_release "hadoop2.3" "$FLAGS" "3033" &
+  make_binary_release "hadoop2.4" "$FLAGS" "3034" &
--- End diff --

ah, yea.  fixing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64429/consoleFull)**
 for PR 14710 at commit 
[`380291b`](https://github.com/apache/spark/commit/380291b7122aaf1fab461a07d72f0c285696c967).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64428/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12004
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12004
  
**[Test build #64428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64428/consoleFull)**
 for PR 12004 at commit 
[`b25d497`](https://github.com/apache/spark/commit/b25d49701b4015b49efc6c89734301525d803524).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14637: [SPARK-16967] move mesos to module

2016-08-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14637#discussion_r76291436
  
--- Diff: dev/create-release/release-build.sh ---
@@ -186,12 +186,13 @@ if [[ "$1" == "package" ]]; then
 
   # We increment the Zinc port each time to avoid OOM's and other 
craziness if multiple builds
   # share the same Zinc server.
-  make_binary_release "hadoop2.3" "-Psparkr -Phadoop-2.3 -Phive 
-Phive-thriftserver -Pyarn" "3033" &
-  make_binary_release "hadoop2.4" "-Psparkr -Phadoop-2.4 -Phive 
-Phive-thriftserver -Pyarn" "3034" &
-  make_binary_release "hadoop2.6" "-Psparkr -Phadoop-2.6 -Phive 
-Phive-thriftserver -Pyarn" "3035" &
-  make_binary_release "hadoop2.7" "-Psparkr -Phadoop-2.7 -Phive 
-Phive-thriftserver -Pyarn" "3036" &
-  make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 
-Pyarn" "3037" &
-  make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn" 
"3038" &
+  FLAGS="-Psparkr -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn -Pmesos"
+  make_binary_release "hadoop2.3" "$FLAGS" "3033" &
+  make_binary_release "hadoop2.4" "$FLAGS" "3034" &
--- End diff --

This is wrong now; "FLAGS" enables "-Phadoop-2.3" when here it shuold be 
"-Phadoop-2.4" (and matching versions in the lines below).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13852: [SPARK-16200][SQL] Rename AggregateFunction#suppo...

2016-08-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13852#discussion_r76291119
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -45,7 +45,7 @@ abstract class Collect extends ImperativeAggregate {
 
   override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
 
-  override def supportsPartial: Boolean = false
+  override def forceSortAggregate: Boolean = true
--- End diff --

oh, after changing this name, it will not show that we do not partial agg 
for this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64426/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64426/consoleFull)**
 for PR 14753 at commit 
[`ca574e1`](https://github.com/apache/spark/commit/ca574e145543c6fc555220fa8080bf7fbe152ba5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64429/consoleFull)**
 for PR 14710 at commit 
[`380291b`](https://github.com/apache/spark/commit/380291b7122aaf1fab461a07d72f0c285696c967).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-25 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14710
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8880#discussion_r76289131
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnIOEncryptionSuite.scala ---
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.yarn
+
+import java.io._
+import java.nio.ByteBuffer
+import java.security.PrivilegedExceptionAction
+import java.util.{ArrayList => JArrayList, LinkedList => JLinkedList, UUID}
+
+import scala.runtime.AbstractFunction1
+
+import com.google.common.collect.HashMultiset
+import com.google.common.io.ByteStreams
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.junit.Assert.assertEquals
+import org.mockito.Mock
+import org.mockito.MockitoAnnotations
+import org.mockito.invocation.InvocationOnMock
+import org.mockito.stubbing.Answer
+import org.mockito.Answers.RETURNS_SMART_NULLS
+import org.mockito.Matchers.{eq => meq, _}
+import org.mockito.Mockito._
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, Matchers}
+
+import org.apache.spark._
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.executor.{ShuffleWriteMetrics, TaskMetrics}
+import org.apache.spark.internal.config._
+import org.apache.spark.io.CompressionCodec
+import org.apache.spark.memory.{TaskMemoryManager, TestMemoryManager}
+import org.apache.spark.network.buffer.NioManagedBuffer
+import org.apache.spark.network.util.LimitedInputStream
+import org.apache.spark.security.CryptoStreamUtils
+import org.apache.spark.serializer._
+import org.apache.spark.shuffle._
+import org.apache.spark.shuffle.sort.{SerializedShuffleHandle, 
UnsafeShuffleWriter}
+import org.apache.spark.storage._
+import org.apache.spark.util.Utils
+
+private[spark] class YarnIOEncryptionSuite extends SparkFunSuite with 
Matchers with
--- End diff --

> Do you mean we need not unset this ENV variable in the tear down block?

I mean that if you change the check in SparkContext to not throw an 
exception when "spark.testing" is set, you shouldn't need to set/unset 
"SPARK_YARN_MODE" in the test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14801: [SPARK-17234] [SQL] Table Existence Checking when Index ...

2016-08-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14801
  
Sure, will revert it back and use the existing `AnalysisException`. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64425/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14809
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14809
  
**[Test build #64425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64425/consoleFull)**
 for PR 14809 at commit 
[`915d2b5`](https://github.com/apache/spark/commit/915d2b5a1dd8c26a37d0b99ba0503a0d95b6f3f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...

2016-08-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14777#discussion_r76284138
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 ---
@@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) 
extends LeafExpression with
 case (v: Short, ShortType) => v + "S"
 case (v: Long, LongType) => v + "L"
 // Float type doesn't have a suffix
-case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})"
-case (v: Double, DoubleType) => v + "D"
+case (v: Float, FloatType) =>
+  val castedValue = v match {
+case _ if v.isNaN => "'NaN'"
+case Float.PositiveInfinity => "'Infinity'"
+case Float.NegativeInfinity => "'-Infinity'"
+case _ => v
+  }
+  s"CAST($castedValue AS ${FloatType.sql})"
+case (v: Double, DoubleType) =>
+  v match {
+case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})"
+case Double.PositiveInfinity => s"CAST('Infinity' AS 
${DoubleType.sql})"
+case Double.NegativeInfinity => s"CAST('-Infinity' AS 
${DoubleType.sql})"
+case _ => v + "D"
+  }
 case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})"
--- End diff --

Hmmm as discussed, that's going to look very ugly but might be more 
compatible with Postgres and won't be lossy for very precise decimals. I say 
that we defer to followup for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14794: [SPARK-15083][WEB UI] History Server can OOM due ...

2016-08-25 Thread ajbozarth

Github user ajbozarth closed the pull request at:

https://github.com/apache/spark/pull/14794


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14794: [SPARK-15083][WEB UI] History Server can OOM due to unli...

2016-08-25 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/14794
  
Thanks @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14774: [SparkR][BUILD]:ignore cran-check.out under R folder

2016-08-25 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14774
  
LGTM. Thanks @wangmiao1981 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14794: [SPARK-15083][WEB UI] History Server can OOM due to unli...

2016-08-25 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/14794
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14798: [SPARK-17231][CORE] Avoid building debug or trace log me...

2016-08-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/14798
  
I focused mainly on trace and debug logging. I didn't do much with errors 
or warnings, especially where exceptions are logged. I'm assuming these are 
less frequent, and the cost of building those log messages is insignificant 
compared to the circumstances which called for them in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/14785
  
Please also add test cases for matrices. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64422/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14810: Branch 1.6

2016-08-25 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14810
  
Looks like an error -- close this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #64422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64422/consoleFull)**
 for PR 14452 at commit 
[`6cb40f1`](https://github.com/apache/spark/commit/6cb40f12e074e0350aa01778c955b35631160858).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64421/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14750
  
**[Test build #64421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64421/consoleFull)**
 for PR 14750 at commit 
[`5b41a39`](https://github.com/apache/spark/commit/5b41a3973abbe25bccbeaa2718bb6ef209303bee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2016-08-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/14617
  
@jerryshao The UI changes look great. I have not had a chance to scrutinize 
the source changes. Hopefully we can get someone else to help review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-25 Thread f7753

Github user f7753 commented on the issue:

https://github.com/apache/spark/pull/14239
  
@tgravescs  Thank you.

Currently, I'm not load all the data into memory, I use parameters 
`spark.shuffle.prepare.open ` to switch this mechanism off/on and 
`spark.shuffle.prepare.count` to control the block number to cache. So here 
gives the user the privilege to control the MEM used for the pre-fetch block 
based on their machine conditions.

OS cache may do not have much impact on this(If my understanding is wrong, 
please correct me, thanks), since the shuffle block produced by map side will 
not be read more than one time in a normal job. Once the shuffle block consumed 
by the reduce side, it would be of no use, so it may be in the write buffer. If 
there is enough memory, this would not make the reading process more slow, and 
if not, we can use the limited memory to pre load the data. While transfer 
process succeed,  release the mem buffer to load the data the next 
`FetchRequest` contains, until all the data has been send to the reduce side.

I have implement this and tested based on the branch 1.4 and 1.6, using 
Intel Hibench4.0 terasort 1TB data size, I got about 30% performance 
enhancements, on a cluster which has 5 node, each node has 96GB Memï¼CPU is 
Xeon E5 v3 , 7200RPM Disk.

Here we may search some paper and refer to them to make it more consummate 
. e.g. âHPMR: Prefetching and pre-shuffling in shared MapReduce computation 
environment â

Thanks for you feedback, any work you want me to co-operate would be my 
pleasure, I love Spark so much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-08-25 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14176
  
Can we make this `spark.sql.codegen.aggregate.map.twolevel.enable` 
internal? otherwise we should have a better name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14810: Branch 1.6

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14810
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14810: Branch 1.6

2016-08-25 Thread sujan121

GitHub user sujan121 opened a pull request:

https://github.com/apache/spark/pull/14810

Branch 1.6

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14810


commit 7482c7b5aba5b649510bbb8886bbf2b44f86f543
Author: Shixiong Zhu 
Date:   2016-01-18T23:38:03Z

[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume 
integration doc

This PR added instructions to get flume assembly jar for Python users in 
the flume integration page like Kafka doc.

Author: Shixiong Zhu 

Closes #10746 from zsxwing/flume-doc.

(cherry picked from commit a973f483f6b819ed4ecac27ff5c064ea13a8dd71)
Signed-off-by: Tathagata Das 

commit d43704d7fc6a5e9da4968b1dafa8d4b1c341ee8d
Author: Shixiong Zhu 
Date:   2016-01-19T00:50:05Z

[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis 
integration doc

This PR added instructions to get Kinesis assembly jar for Python users in 
the Kinesis integration page like Kafka doc.

Author: Shixiong Zhu 

Closes #10822 from zsxwing/kinesis-doc.

(cherry picked from commit 721845c1b64fd6e3b911bd77c94e01dc4e5fd102)
Signed-off-by: Tathagata Das 

commit 68265ac23e20305474daef14bbcf874308ca8f5a
Author: Wenchen Fan 
Date:   2016-01-19T05:20:19Z

[SPARK-12841][SQL][BRANCH-1.6] fix cast in filter

In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better 
alias if possible. However, for cases like filter, the `UnresolvedAlias` can't 
be resolved and actually we don't need a better alias for this case. This PR 
move the cast wrapping logic to `Column.named` so that we will only do it when 
we need a alias name.

backport https://github.com/apache/spark/pull/10781 to 1.6

Author: Wenchen Fan 

Closes #10819 from cloud-fan/bug.

commit 30f55e5232d85fd070892444367d2bb386dfce13
Author: proflin 
Date:   2016-01-19T08:15:43Z

[SQL][MINOR] Fix one little mismatched comment according to the codes in 
interface.scala

Author: proflin 

Closes #10824 from proflin/master.

(cherry picked from commit c00744e60f77edb238aff1e30b450dca65451e91)
Signed-off-by: Reynold Xin 

commit 962e618ec159f8cd26543f42b2ce484fd5a5d8c5
Author: Wojciech Jurczyk 
Date:   2016-01-19T09:36:45Z

[MLLIB] Fix CholeskyDecomposition assertion's message

Change assertion's message so it's consistent with the code. The old 
message says that the invoked method was lapack.dports, where in fact it was 
lapack.dppsv method.

Author: Wojciech Jurczyk 

Closes #10818 from wjur/wjur/rename_error_message.

(cherry picked from commit ebd9ce0f1f55f7d2d3bd3b92c4b0a495c51ac6fd)
Signed-off-by: Sean Owen 

commit 40fa21856aded0e8b0852cdc2d8f8bc577891908
Author: Josh Rosen 
Date:   2016-01-21T00:10:28Z

[SPARK-12921] Use SparkHadoopUtil reflection in 
SpecificParquetRecordReaderBase

It looks like there's one place left in the codebase, 
SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's 
reflective accesses of TaskAttemptContext methods, which could create problems 
when using a single Spark artifact with both Hadoop 1.x and 2.x.

Author: Josh Rosen 

Closes #10843 from JoshRosen/SPARK-12921.

commit b5d7dbeb3110a11716f6642829f4ea14868ccc8a
Author: Liang-Chi Hsieh 
Date:   2016-01-22T02:55:28Z

[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array

https://issues.apache.org/jira/browse/SPARK-12747

Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real".

Author: Liang-Chi Hsieh 

Closes #10695 from viirya/fix-postgres-jdbc.

(cherry picked from commit 55c7dd031b8a58976922e469626469aa4aff1391)
Signed-off-by: Reynold Xin 

commit

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14785
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14785
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64424/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14785
  
**[Test build #64424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64424/consoleFull)**
 for PR 14785 at commit 
[`1ec924c`](https://github.com/apache/spark/commit/1ec924cf8ca1bbe68fd5e700550dfa422e445b59).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12004
  
**[Test build #64428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64428/consoleFull)**
 for PR 12004 at commit 
[`b25d497`](https://github.com/apache/spark/commit/b25d49701b4015b49efc6c89734301525d803524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64420/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #64420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64420/consoleFull)**
 for PR 14452 at commit 
[`6a8011b`](https://github.com/apache/spark/commit/6a8011bc9dfa3289e98a5efe65e92704b85bb4b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14239
  
**[Test build #64427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64427/consoleFull)**
 for PR 14239 at commit 
[`190d7fa`](https://github.com/apache/spark/commit/190d7fa8e8e2b0795e12eebba568be7428647f68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14809
  
**[Test build #64425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64425/consoleFull)**
 for PR 14809 at commit 
[`915d2b5`](https://github.com/apache/spark/commit/915d2b5a1dd8c26a37d0b99ba0503a0d95b6f3f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-25 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14753
  
@hvanhovell This is supposed to work with window functions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14753: [SPARK-17187][SQL] Supports using arbitrary Java object ...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14753
  
**[Test build #64426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64426/consoleFull)**
 for PR 14753 at commit 
[`ca574e1`](https://github.com/apache/spark/commit/ca574e145543c6fc555220fa8080bf7fbe152ba5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14809: [SPARK-17238][SQL] simplify the logic for converting dat...

2016-08-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14809
  
cc @yhuai @gatorsmile @liancheng @clockfly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14809: [SPARK-17238][SQL] simplify the logic for convert...

2016-08-25 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/14809

[SPARK-17238][SQL] simplify the logic for converting data source table into 
hive compatible format

## What changes were proposed in this pull request?

Previously we have 2 conditions to decide whether a data source table is 
hive-compatible:

1. the data source is file-based and has a corresponding Hive serde
2. have a `path` entry in data source options/storage properties

However, if condition 1 is true, condition 2 must be true too, as we will 
put the default table path into data source options/storage properties for 
managed data source tables.

There is also a potential issue: we will set the `locationUri` even for 
managed table.

This PR removes the condition 2 and only set the `locationUri` for external 
data source tables.

Note: this is also a first step to unify the `path` of data source tables 
and `locationUri` of hive serde tables. For hive serde tables, `locationUri` is 
only set for external table. For data source tables, `path` is always set. We 
can make them consistent after this PR.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14809


commit 915d2b5a1dd8c26a37d0b99ba0503a0d95b6f3f3
Author: Wenchen Fan 
Date:   2016-08-25T15:11:23Z

simplify the logic for converting data source table into hive compatible 
format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14786
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14786
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64419/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14786: [SPARK-17212][SQL] TypeCoercion supports widening conver...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14786
  
**[Test build #64419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64419/consoleFull)**
 for PR 14786 at commit 
[`d035eb3`](https://github.com/apache/spark/commit/d035eb3ba725250f6238a5b8a189b6749065cf95).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r76264528
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TypedImperativeAggregateSuite.scala
 ---
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
DataInputStream, DataOutputStream}
+
+import org.apache.spark.sql.TypedImperativeAggregateSuite.TypedMax
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{BoundReference, 
Expression, GenericMutableRow, SpecificMutableRow}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate
+import org.apache.spark.sql.execution.aggregate.SortAggregateExec
+import org.apache.spark.sql.expressions.Window
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType, 
IntegerType, LongType}
+
+class TypedImperativeAggregateSuite extends QueryTest with 
SharedSQLContext {
+
+  import testImplicits._
+
+  private val random = new java.util.Random()
+
+  private val data = (0 until 1000).map { _ =>
+(random.nextInt(10), random.nextInt(100))
+  }
+
+  test("aggregate with object aggregate buffer") {
+val agg = new TypedMax(BoundReference(0, IntegerType, nullable = 
false))
+
+val group1 = (0 until data.length / 2)
+val group1Buffer = agg.createAggregationBuffer()
+group1.foreach { index =>
+  val input = InternalRow(data(index)._1, data(index)._2)
+  agg.update(group1Buffer, input)
+}
+
+val group2 = (data.length / 2 until data.length)
+val group2Buffer = agg.createAggregationBuffer()
+group2.foreach { index =>
+  val input = InternalRow(data(index)._1, data(index)._2)
+  agg.update(group2Buffer, input)
+}
+
+val mergeBuffer = agg.createAggregationBuffer()
+agg.merge(mergeBuffer, group1Buffer)
+agg.merge(mergeBuffer, group2Buffer)
+
+assert(mergeBuffer.value == data.map(_._1).max)
+assert(agg.eval(mergeBuffer) == data.map(_._1).max)
+
+// Tests low level eval(row: InternalRow) API.
+val row = new GenericMutableRow(Array(mergeBuffer): Array[Any])
+
+// Evaluates directly on row consist of aggregation buffer object.
+assert(agg.eval(row) == data.map(_._1).max)
+  }
+
+  test("supports SpecificMutableRow as mutable row") {
+val aggregationBufferSchema = Seq(IntegerType, LongType, BinaryType, 
IntegerType)
+val aggBufferOffset = 2
+val buffer = new SpecificMutableRow(aggregationBufferSchema)
+val agg = new TypedMax(BoundReference(ordinal = 1, dataType = 
IntegerType, nullable = false))
+  .withNewMutableAggBufferOffset(aggBufferOffset)
+
+agg.initialize(buffer)
+data.foreach { kv =>
+  val input = InternalRow(kv._1, kv._2)
+  agg.update(buffer, input)
+}
+assert(agg.eval(buffer) == data.map(_._2).max)
+  }
+
+  test("dataframe aggregate with object aggregate buffer, should not use 
HashAggregate") {
+val df = data.toDF("a", "b")
+val max = new TypedMax($"a".expr)
+
+// Always uses SortAggregateExec
+val sparkPlan = 
df.select(Column(max.toAggregateExpression())).queryExecution.sparkPlan
+assert(sparkPlan.isInstanceOf[SortAggregateExec])
+  }
+
+  test("dataframe aggregate with object aggregate buffer, no group by") {
+val df = data.toDF("key", "value").coalesce(2)
+val query = df.select(typedMax($"key"), count($"key"), 
typedMax($"value"), count($"value"))
+val maxKey = data.map(_._1).max
+val countKey = data.size
+val maxValue = data.map(_._2).max
+val countValue = data.size
+val expected = Seq(Row(maxKey,

[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

2016-08-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14753#discussion_r76263947
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,144 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * Aggregation function which allows **arbitrary** user-defined java 
object to be used as internal
+ * aggregation buffer object.
+ *
+ * {{{
+ *aggregation buffer for normal aggregation function `avg`
+ *|
+ *v
+ *  
+--+---+---+
+ *  |  sum1 (Long) | count1 (Long) | generic user-defined 
java objects |
+ *  
+--+---+---+
+ * ^
+ * |
+ *Aggregation buffer object for 
`TypedImperativeAggregate` aggregation function
+ * }}}
+ *
+ * Work flow (Partial mode aggregate at Mapper side, and Final mode 
aggregate at Reducer side):
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object.
+ *  2. Upon each input row, the framework calls
+ * `update(buffer: T, input: InternalRow): Unit` to update the 
aggregation buffer object T.
+ *  3. After processing all rows of current group (group by key), the 
framework will serialize
+ * aggregation buffer object T to storage format (Array[Byte]) and 
persist the Array[Byte]
+ * to disk if needed.
+ *  4. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. The framework calls `createAggregationBuffer(): T` to create an 
empty internal aggregation
+ * buffer object (type T) for merging.
+ *  2. For each aggregation output of Stage 1, The framework de-serializes 
the storage
+ * format (Array[Byte]) and produces one input aggregation object 
(type T).
+ *  3. For each input aggregation object, the framework calls 
`merge(buffer: T, input: T): Unit`
+ * to merge the input aggregation object into aggregation buffer 
object.
+ *  4. After processing all input aggregation objects of current group 
(group by key), the framework
+ * calls method `eval(buffer: T)` to generate the final output for 
this group.
+ *  5. The framework moves on to next group, until all groups have been 
processed.
+ *
+ * NOTE: SQL with TypedImperativeAggregate functions is planned in sort 
based aggregation,
+ * instead of hash based aggregation, as TypedImperativeAggregate use 
BinaryType as aggregation
+ * buffer's storage format, which is not supported by hash based 
aggregation. Hash based
+ * aggregation only support aggregation buffer of mutable types (like 
LongType, IntType that have
+ * fixed length and can be mutated in place in UnsafeRow)
+ */
+abstract class TypedImperativeAggregate[T] extends ImperativeAggregate {
--- End diff --

`ImperativeAggregate` only defines the interface. It does not specify what 
are accepted buffer types, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14785
  
**[Test build #64424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64424/consoleFull)**
 for PR 14785 at commit 
[`1ec924c`](https://github.com/apache/spark/commit/1ec924cf8ca1bbe68fd5e700550dfa422e445b59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14808
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-08-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14808
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64423/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-08-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14808
  
**[Test build #64423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64423/consoleFull)**
 for PR 14808 at commit 
[`ba5a4e2`](https://github.com/apache/spark/commit/ba5a4e2cc14e253ea3465d887c3cf5a2a9d82a80).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Params(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 508 matches

Mail list logo