date:20160718

[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14256
  
Hi, @lw-lin . This seems to resolve SPARK-16613 , too. Could you check 
that? If possible, please add SPARK-16613 into the title, too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14251
  
**[Test build #62511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62511/consoleFull)**
 for PR 14251 at commit 
[`90d6851`](https://github.com/apache/spark/commit/90d6851bead39875769011954f20a9ae2d333853).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14253: [Doc] improve python doc for rdd.histogram and da...

2016-07-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14253


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14251
  
Now, `findTightestCommonTypeToString` becomes public and the testcase is 
moved and reduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14255
  
For easy comparison, `lint-java` results are here.

- https://travis-ci.org/dongjoon-hyun/spark/jobs/145738728 (Current master: 
[SPARK-16303][DOCS][EXAMPLES] ...)
- https://travis-ci.org/dongjoon-hyun/spark/jobs/145738812  (This PR)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...

2016-07-18 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14253
  
Merging in master/2.0. THanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...

2016-07-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14222
  
Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14253
  
**[Test build #3188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3188/consoleFull)**
 for PR 14253 at commit 
[`6d8c9aa`](https://github.com/apache/spark/commit/6d8c9aabfc9ce105156fc8eb96b9e35777b03477).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

2016-07-18 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r71278812
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala
 ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+class SimplifyCastsSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: 
Nil
+  }
+
+  test("non-nullable to non-nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
false)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("non-nullable to nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
+val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
true)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("nullable to non-nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
false)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+comparePlans(optimized, plan)
+  }
+
+  test("nullable to nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
true)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  def map(keyType: DataType, valueType: DataType, nullable: Boolean): 
AttributeReference =
--- End diff --

This is because [current 
map()](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L246)
 cannot pass information on `nullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13382
  
**[Test build #62510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62510/consoleFull)**
 for PR 13382 at commit 
[`e19ec3d`](https://github.com/apache/spark/commit/e19ec3d2b145879e7ea73fa847761cfdeb7d5c95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14014
  
Let's also update the description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71277693
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-16602 Nvl/Coalesce") {
--- End diff --

Oh, right. I see. It's the same way to input type checking. This test is 
too heavy for the exact purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71277607
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -146,6 +151,15 @@ case class CatalogTable(
   requireSubsetOfSchema(sortColumnNames, "sort")
   requireSubsetOfSchema(bucketColumnNames, "bucket")
 
+  lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) {
--- End diff --

I'm not quite sure if it's safe to so, why do we have `CatalogColumn` at 
the first place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread dafrista

Github user dafrista commented on the issue:

https://github.com/apache/spark/pull/13382
  
Thanks @ericl I've added that information to the class comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14255
  
**[Test build #62509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62509/consoleFull)**
 for PR 14255 at commit 
[`c44a8a0`](https://github.com/apache/spark/commit/c44a8a0863f2a232370bd68999a335f328ebf8bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14256
  
**[Test build #62508 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62508/consoleFull)**
 for PR 14256 at commit 
[`8517465`](https://github.com/apache/spark/commit/85174658b5392b5fd9773a89ee7b24a3db08c334).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14255
  
Rebased to resolve conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14200: [SPARK-16528][SQL] Fix NPE problem in HiveClientI...

2016-07-18 Thread jacek-lewandowski

Github user jacek-lewandowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14200#discussion_r71277387
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -320,7 +320,7 @@ private[hive] class HiveClientImpl(
 name = d.getName,
 description = d.getDescription,
 locationUri = d.getLocationUri,
-properties = d.getParameters.asScala.toMap)
+properties = Option(d.getParameters).map(_.asScala.toMap).orNull)
--- End diff --

Perhaps... however this would change the semantics which was out of the 
scope of this ticket. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...

2016-07-18 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14222
  
@viirya I'm going to take over the PR and play with the API a little bit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14256: [SPARK-16620][CORE] Add back tokenization process...

2016-07-18 Thread lw-lin

GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/14256

[SPARK-16620][CORE] Add back tokenization process in RDD.pipe(command: 
String)

## What changes were proposed in this pull request?

Currently `RDD.pipe(command: String)`:
- works only with a single command with no option specified, such as 
`RDD.pipe("wc")`
- does not work when command is specified with some options, such as 
`RDD.pipe("wc -l")`

This is a regression from Spark 1.6.

This patch adds back tokenization process in RDD.pipe(command: String).

## How was this patch tested?
Added a test which would pass in 1.6, would fail prior to this patch, and 
would pass after this patch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark rdd-pipe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14256


commit 85174658b5392b5fd9773a89ee7b24a3db08c334
Author: Liwei Lin 
Date:   2016-07-19T05:34:46Z

Fix pipe(command) & pipe(command, env)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71277158
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-16602 Nvl/Coalesce") {
--- End diff --

We can just test it in expression unit test by calling 
replaceForTypeCoercion, can't we?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14014#discussion_r71277147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ---
@@ -442,13 +445,23 @@ private[parquet] class ParquetRowConverter(
 private val elementConverter: Converter = {
   val repeatedType = parquetSchema.getType(0)
   val elementType = catalystSchema.elementType
-  val parentName = parquetSchema.getName
 
-  if (isElementType(repeatedType, elementType, parentName)) {
+  // At this stage, we're not sure whether the repeated field maps to 
the element type or is
+  // just the syntactic repeated group of the 3-level standard LIST 
layout. Here we try to
+  // convert the repeated field into a Catalyst type to see whether 
the converted type matches
+  // the Catalyst array element type.
+  val guessedElementType = schemaConverter.convertField(repeatedType)
+
+  if (DataType.equalsIgnoreCompatibleNullability(guessedElementType, 
elementType)) {
+// If the repeated field corresponds to the element type, creates 
a new converter using the
+// type of the repeated field.
 newConverter(repeatedType, elementType, new ParentContainerUpdater 
{
   override def set(value: Any): Unit = currentArray += value
 })
   } else {
+// If the repeated field corresponds to the syntactic group in the 
standard 3-level Parquet
+// LIST layout, creates a new converter using the only child field 
of the repeated field.
+assert(!repeatedType.isPrimitive && 
repeatedType.asGroupType().getFieldCount == 1)
 new ElementConverter(repeatedType.asGroupType().getType(0), 
elementType)
--- End diff --

Can we add examples at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14200: [SPARK-16528][SQL] Fix NPE problem in HiveClientI...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14200#discussion_r71277056
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -320,7 +320,7 @@ private[hive] class HiveClientImpl(
 name = d.getName,
 description = d.getDescription,
 locationUri = d.getLocationUri,
-properties = d.getParameters.asScala.toMap)
+properties = Option(d.getParameters).map(_.asScala.toMap).orNull)
--- End diff --

Is `Map.empty` a better default value? Or we should update the `properties` 
field in `CatalogDatabase` to indicate that it's nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14245: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java ex...

2016-07-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71277066
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-16602 Nvl/Coalesce") {
--- End diff --

Anyway, I will try to move this. Thank you for fast review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceG...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14222#discussion_r71277037
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/expressions/ReduceAggregatorSuite.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.expressions.ReduceAggregator
+
+class ReduceAggregatorSuite extends SparkFunSuite {
--- End diff --

just put this in DatasetAggregatorSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71276984
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-16602 Nvl/Coalesce") {
--- End diff --

I thought we should have here because `Nvl` is RuntimeReplaceable. (Or, did 
I do some misunderstanding again?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71276925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,7 +100,8 @@ object TypeCoercion {
   }
 
   /** Similar to [[findTightestCommonType]], but can promote all the way 
to StringType. */
-  private def findTightestCommonTypeToString(left: DataType, right: 
DataType): Option[DataType] = {
+  private[catalyst] def findTightestCommonTypeToString(left: DataType, 
right: DataType)
--- End diff --

Oh, sure. I will make this `public`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14255: [MINOR] Fix Java Linter `LineLength` errors

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14255
  
**[Test build #62507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62507/consoleFull)**
 for PR 14255 at commit 
[`8cf8c78`](https://github.com/apache/spark/commit/8cf8c7882d6fe201f653e5df7cd055df87af42ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71276731
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2965,4 +2965,32 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-16602 Nvl/Coalesce") {
--- End diff --

maybe this should be a unit test for the analyzer rather than an end to end 
test?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14251: [SPARK-16602][SQL] `Nvl` function should support ...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14251#discussion_r71276614
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,7 +100,8 @@ object TypeCoercion {
   }
 
   /** Similar to [[findTightestCommonType]], but can promote all the way 
to StringType. */
-  private def findTightestCommonTypeToString(left: DataType, right: 
DataType): Option[DataType] = {
+  private[catalyst] def findTightestCommonTypeToString(left: DataType, 
right: DataType)
--- End diff --

you can just make this public I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14255: [MINOR] Fix Java Linter `LineLength` errors

2016-07-18 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/14255

[MINOR] Fix Java Linter `LineLength` errors

## What changes were proposed in this pull request?

This PR fixes four java linter `LineLength` errors. Those are all 
`LineLength` errors, but we had better remove all java linter errors before 
release.

## How was this patch tested?

After pass the Jenkins, `./dev/lint-java`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark minor_java_linter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14255


commit 8cf8c7882d6fe201f653e5df7cd055df87af42ff
Author: Dongjoon Hyun 
Date:   2016-07-19T05:45:15Z

[MINOR] Fix Java Linter `LineLength` errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14014#discussion_r71276489
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRecordMaterializer.scala
 ---
@@ -30,10 +30,11 @@ import org.apache.spark.sql.types.StructType
  * @param catalystSchema Catalyst schema of the rows to be constructed
  */
 private[parquet] class ParquetRecordMaterializer(
-parquetSchema: MessageType, catalystSchema: StructType)
+parquetSchema: MessageType, catalystSchema: StructType, 
schemaConverter: ParquetSchemaConverter)
--- End diff --

Add `schemaConverter` to the scaladoc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14227: [SPARK-16582][SQL] Explicitly define isNull = fal...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14227#discussion_r71276500
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -377,6 +377,7 @@ abstract class UnaryExpression extends Expression {
   """)
 } else {
   ev.copy(code = s"""
+boolean ${ev.isNull} = false;
--- End diff --

I don't quite understand this, we explicitly define `isNull = "false"` 
below, how could `ev.isNull` be referenced later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...

2016-07-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13912#discussion_r71276368
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
applies to both date type and timestamp type. 
By default, it is None
which means trying to parse times and date by
``java.sql.Timestamp.valueOf()`` and 
``java.sql.Date.valueOf()``.
+:param timezone: defines the timezone to be used for both date 
type and timestamp type.
+ If a timezone is specified in the data, this will 
load them after
--- End diff --

I will clean up the PR description and all those soon with a better 
proposal. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14253
  
**[Test build #3188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3188/consoleFull)**
 for PR 14253 at commit 
[`6d8c9aa`](https://github.com/apache/spark/commit/6d8c9aabfc9ce105156fc8eb96b9e35777b03477).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram and dataframe...

2016-07-18 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14253
  
Jenkins, test this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14247: [MINOR] Remove unused arg in als.py

2016-07-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14245: [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example u...

2016-07-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14245
  
Thanks. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14254: [SPARK-16619] Add shuffle service metrics entry in monit...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14254
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14247: [MINOR] Remove unused arg in als.py

2016-07-18 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14247
  
Merging in master. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13912#discussion_r71275973
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
applies to both date type and timestamp type. 
By default, it is None
which means trying to parse times and date by
``java.sql.Timestamp.valueOf()`` and 
``java.sql.Date.valueOf()``.
+:param timezone: defines the timezone to be used for both date 
type and timestamp type.
+ If a timezone is specified in the data, this will 
load them after
--- End diff --

Ah ic - you want to control the zone this gets converted to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source supports custo...

2016-07-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13912#discussion_r71275955
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -328,6 +328,10 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
applies to both date type and timestamp type. 
By default, it is None
which means trying to parse times and date by
``java.sql.Timestamp.valueOf()`` and 
``java.sql.Date.valueOf()``.
+:param timezone: defines the timezone to be used for both date 
type and timestamp type.
+ If a timezone is specified in the data, this will 
load them after
--- End diff --

Why not just have timezone as part of the dateFormat, so users can specify 
it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14254: Add shuffle service metrics entry in monitoring d...

2016-07-18 Thread lovexi

GitHub user lovexi opened a pull request:

https://github.com/apache/spark/pull/14254

Add shuffle service metrics entry in monitoring docs

## What changes were proposed in this pull request?

Add shuffle service metrics entry in currently supporting metrics list in 
monitoring docs.

## How was this patch tested?

Check the docs for changes

JIRA link: https://issues.apache.org/jira/browse/SPARK-16619


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lovexi/spark yangyang-monitoring-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14254


commit abacb11005b1fb81832a12558980814021cebae1
Author: Yangyang Liu 
Date:   2016-07-19T05:47:48Z

Add shuffle service metrics entry in docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13778
  
ping @cloud-fan Can you check if this is good for you now? It is for a 
while. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r71275696
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala
 ---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl._
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+class SimplifyCastsSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: 
Nil
+  }
+
+  test("non-nullable to non-nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
false)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("non-nullable to nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
+val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
true)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("nullable to non-nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
false)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+comparePlans(optimized, plan)
+  }
+
+  test("nullable to nullable array cast") {
+val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
+val plan = input.select('a.cast(ArrayType(IntegerType, 
true)).as("casted")).analyze
+val optimized = Optimize.execute(plan)
+val expected = input.select('a.as("casted")).analyze
+comparePlans(optimized, expected)
+  }
+
+  def map(keyType: DataType, valueType: DataType, nullable: Boolean): 
AttributeReference =
--- End diff --

why need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r71275743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 object SimplifyCasts extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
 case Cast(e, dataType) if e.dataType == dataType => e
+case c @ Cast(e, dataType) => (e.dataType, dataType) match {
--- End diff --

cc @yhuai  @liancheng, is it always safe to do it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14207
  
> when the data/files are changed by external system (e.g., appended by a 
streaming system), the stored schema can be inconsistent with the actual schema 
of the data.

I think this problem already exists, as we will use cached schema instead 
of inferring it everytime. The only difference is after reboot, this PR will 
still use the stored schema, and require users to refresh table manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
@gatorsmile Yea. I meant that as you use the stored schema without inferred 
schema for table, when the data/files are changed by external system (e.g., 
appended by a streaming system), the stored schema can be inconsistent with the 
actual schema of the data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable token manager for...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14065
  
**[Test build #62506 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62506/consoleFull)**
 for PR 14065 at commit 
[`b8eeb28`](https://github.com/apache/spark/commit/b8eeb28b141b678cf4ccace36564f24536758132).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14207
  
@viirya Schema inference is time-consuming, especially when the number of 
files is huge. Thus, we should avoid refreshing it every time. That is one of 
the major reasons why we have a metadata cache for data source tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14253: [Doc] improve python doc for rdd.histogram

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14253
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14253: [Doc] improve python doc for rdd.histogram

2016-07-18 Thread mortada

GitHub user mortada opened a pull request:

https://github.com/apache/spark/pull/14253

[Doc] improve python doc for rdd.histogram

## What changes were proposed in this pull request?

doc change only


## How was this patch tested?

doc change only 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mortada/spark histogram_typos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14253


commit 979c7f44690c5239f49621733de112ec623e
Author: Mortada Mehyar 
Date:   2016-07-19T05:22:58Z

[Doc] improve python doc for rdd.histogram




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...

2016-07-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14222
  
ping @rxin The change is ok for you? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
@gatorsmile When the data/files are input by an external system, and Spark 
is just used to process them in batch. Does it mean that schema can be 
inconsistent? Or it should call refresh every time it is going to query the 
table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62504/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62504 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)**
 for PR 13704 at commit 
[`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14251
  
Hi, @rxin .
Could you review this `Nvl` PR again?
I can solve that by only replacing `findTightestCommonTypeOfTwo` into 
`findTightestCommonTypeToString`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13990: [SPARK-16287][SQL] Implement str_to_map SQL funct...

2016-07-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13990#discussion_r71273378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -393,3 +394,56 @@ case class CreateNamedStructUnsafe(children: 
Seq[Expression]) extends Expression
 
   override def prettyName: String = "named_struct_unsafe"
 }
+
+/**
+ * Creates a map after splitting the input text into key/value pairs using 
delimiters
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after 
splitting the text " +
+"into key/value pairs using delimiters. " +
+"Default delimiters are ',' for pairDelim and ':' for keyValueDelim.",
+  extended = """ > SELECT _FUNC_('a:1,b:2,c:3',',',':');\n 
map("a":"1","b":"2","c":"3") """)
+case class StringToMap(text: Expression, pairDelim: Expression, 
keyValueDelim: Expression)
+  extends TernaryExpression with CodegenFallback{
+
+  def this(child: Expression, pairDelim: Expression) = {
+this(child, pairDelim, Literal(":"))
+  }
+
+  def this(child: Expression) = {
+this(child, Literal(","), Literal(":"))
+  }
+
+  override def children: Seq[Expression] = Seq(text, pairDelim, 
keyValueDelim)
+
+  override def dataType: DataType = MapType(StringType, StringType, 
valueContainsNull = false)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

looks like it's simpler to follow `XPathExtract` to do the type check, i.e. 
implement `ExpectsInputTypes` to check the type, and override 
`checkInputDataTypes` for the foldable check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14251: [SPARK-16602][SQL] `Nvl` function should support numeric...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14251
  
**[Test build #62505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62505/consoleFull)**
 for PR 14251 at commit 
[`53ae02f`](https://github.com/apache/spark/commit/53ae02f12d6d4113aa5cddaad2c7b80d902fe95e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71273081
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -146,6 +151,15 @@ case class CatalogTable(
   requireSubsetOfSchema(sortColumnNames, "sort")
   requireSubsetOfSchema(bucketColumnNames, "bucket")
 
+  lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) {
--- End diff --

oh, having this is because `CatalogColumn` is using string as the type? I 
think we should just use `StructType` as the schema and remove `CatalogColumn`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71272934
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -303,6 +303,7 @@ object CreateDataSourceTableUtils extends Logging {
 matcher.matches()
   }
 
+  // TODO: it's only used in tests, remove it.
   def createDataSourceTable(
--- End diff --

If it is not used, how about we just remove it and update the test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13382
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62503/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13382
  
**[Test build #62503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62503/consoleFull)**
 for PR 13382 at commit 
[`0fe4bc8`](https://github.com/apache/spark/commit/0fe4bc8a0232f9e6a4dcb6df76fc3f256b784803).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71272434
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -313,18 +313,48 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[CreateTableUsing]] or a [[CreateTableUsingAsSelect]] 
logical plan.
*/
   override def visitCreateTableUsing(ctx: CreateTableUsingContext): 
LogicalPlan = withOrigin(ctx) {
-val (table, temp, ifNotExists, external) = 
visitCreateTableHeader(ctx.createTableHeader)
-if (external) {
--- End diff --

How about we do not change this for now (when we decide which syntax to use 
for create table)? We may only support it in the syntax that is compatible with 
Hive tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62502/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)**
 for PR 14132 at commit 
[`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL][WIP] move hive hack for data s...

2016-07-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71272290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -146,6 +151,15 @@ case class CatalogTable(
   requireSubsetOfSchema(sortColumnNames, "sort")
   requireSubsetOfSchema(bucketColumnNames, "bucket")
 
+  lazy val userSpecifiedSchema: Option[StructType] = if (schema.nonEmpty) {
--- End diff --

What is this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62501/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14102
  
**[Test build #62501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62501/consoleFull)**
 for PR 14102 at commit 
[`cfe6bed`](https://github.com/apache/spark/commit/cfe6beda1a1db64aab5d2f84a68a5ee1e2bdd905).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)**
 for PR 13704 at commit 
[`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62500/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62500/consoleFull)**
 for PR 14132 at commit 
[`5ba2ad7`](https://github.com/apache/spark/commit/5ba2ad7aa6cab364e09a2c0dae529b8270aed153).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/13382
  
Cool, @JoshRosen I'll leave this for you to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...

2016-07-18 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/13382#discussion_r71266677
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ---
@@ -27,8 +27,8 @@ import org.apache.spark.util.Utils
 
 /**
  * A class for writing JVM objects directly to a file on disk. This class 
allows data to be appended
- * to an existing block and can guarantee atomicity in the case of faults 
as it allows the caller to
- * revert partial writes.
+ * to an existing block. Callers can write to the same file and commit 
these writes.
+ * In case of faults, callers should atomically revert the uncommitted 
partial writes.
--- End diff --

Perhaps elaborate a bit more, e.g. "For efficiency, this class retains the 
underlying file channel across multiple commits to a file. The channel is kept 
open until close() is called on DiskBlockObjectWriter."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Sche...

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71266605
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -351,6 +353,44 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   }
 
   /**
+   * Refresh the inferred schema stored in the external catalog for data 
source tables.
+   */
+  private def refreshInferredSchema(tableIdent: TableIdentifier): Unit = {
+val table = sessionCatalog.getTableMetadataOption(tableIdent)
+table.foreach { tableDesc =>
+  if (DDLUtils.isDatasourceTable(tableDesc) && 
DDLUtils.isSchemaInferred(tableDesc)) {
+val partitionColumns = 
DDLUtils.getPartitionColumnsFromTableProperties(tableDesc)
+val bucketSpec = 
DDLUtils.getBucketSpecFromTableProperties(tableDesc)
+val dataSource =
+  DataSource(
+sparkSession,
+userSpecifiedSchema = None,
+partitionColumns = partitionColumns,
+bucketSpec = bucketSpec,
+className = 
tableDesc.properties(CreateDataSourceTableUtils.DATASOURCE_PROVIDER),
+options = tableDesc.storage.serdeProperties)
+.resolveRelation().asInstanceOf[HadoopFsRelation]
+
+val schemaProperties = new mutable.HashMap[String, String]
+CreateDataSourceTableUtils.saveSchema(
+  sparkSession, dataSource.schema, 
dataSource.partitionSchema.fieldNames, schemaProperties)
+
+val tablePropertiesWithoutSchema = tableDesc.properties.filterKeys 
{ k =>
+  // Keep the properties that are not for schema or partition 
columns
+  k != CreateDataSourceTableUtils.DATASOURCE_SCHEMA_NUMPARTS &&
--- End diff --

Will change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Sche...

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71266596
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -487,6 +487,10 @@ object DDLUtils {
 isDatasourceTable(table.properties)
   }
 
+  def isSchemaInferred(table: CatalogTable): Boolean = {
+table.properties.get(DATASOURCE_SCHEMA_TYPE) == 
Option(SchemaType.INFERRED.name)
--- End diff --

Thanks! @rxin @jaceklaskowski 

I will not change it because using `contains` will break the Scala 2.10 
compiler. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10881: [SPARK-12967][Netty] Avoid NettyRpc error message during...

2016-07-18 Thread JerryLead

Github user JerryLead commented on the issue:

https://github.com/apache/spark/pull/10881
  
This bug still exists in latest Spark 1.6.2. How about merging it to 
branch-1.6? @nishkamravi2 @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13382
  
**[Test build #62503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62503/consoleFull)**
 for PR 13382 at commit 
[`0fe4bc8`](https://github.com/apache/spark/commit/0fe4bc8a0232f9e6a4dcb6df76fc3f256b784803).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread dafrista

Github user dafrista commented on the issue:

https://github.com/apache/spark/pull/13382
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread dafrista

Github user dafrista commented on the issue:

https://github.com/apache/spark/pull/13382
  
Thanks @ericl. I pushed a commit addressing your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62502 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62502/consoleFull)**
 for PR 14132 at commit 
[`404a322`](https://github.com/apache/spark/commit/404a322686f0603c84e542c4ca8b5353dcc0f9d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Right, there is `table` API, too. Thank you, I'll add that, too.
By the way, I still in the downtown. I need to go home for dinner. I'll 
take care that tonight.
Thank you again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level t...

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14054#discussion_r71265164
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -158,25 +159,41 @@ object JdbcUtils extends Logging {
   rddSchema: StructType,
   nullTypes: Array[Int],
   batchSize: Int,
-  dialect: JdbcDialect): Iterator[Byte] = {
+  dialect: JdbcDialect,
+  isolationLevel: Int): Iterator[Byte] = {
 require(batchSize >= 1,
   s"Invalid value `${batchSize.toString}` for parameter " +
   s"`${JdbcUtils.JDBC_BATCH_INSERT_SIZE}`. The minimum value is 1.")
 
 val conn = getConnection()
 var committed = false
-val supportsTransactions = try {
-  conn.getMetaData().supportsDataManipulationTransactionsOnly() ||
-  
conn.getMetaData().supportsDataDefinitionAndDataManipulationTransactions()
-} catch {
-  case NonFatal(e) =>
-logWarning("Exception while detecting transaction support", e)
-true
+
+var finalIsolationLevel = Connection.TRANSACTION_NONE
+if (isolationLevel != Connection.TRANSACTION_NONE) {
+  try {
+val metadata = conn.getMetaData
+if (metadata.supportsTransactions()) {
+  if (metadata.supportsTransactionIsolationLevel(isolationLevel))  
{
+finalIsolationLevel = isolationLevel
+  } else {
+val defaultIsolation = metadata.getDefaultTransactionIsolation
+logWarning(s"Requested isolation level $isolationLevel is not 
supported; " +
+s"falling back to isolation level $defaultIsolation")
+finalIsolationLevel = defaultIsolation
+  }
+} else {
+  logWarning(s"Requested isolation level $isolationLevel, but 
transactions are unsupported")
+}
+  } catch {
+case NonFatal(e) => logWarning("Exception while detecting 
transaction support", e)
+  }
 }
+val supportsTransactions = finalIsolationLevel != 
Connection.TRANSACTION_NONE
--- End diff --

Yeah, if possible, the default isolation needs to be consistent. Otherwise, 
it might be hard to debug if users hit the issue in the production environment. 
Sometimes, the problem is hard to reproduce it especially for the 
isolation-related issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71265020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
+val tid = if (names.length > 1) {
+  TableIdentifier(names(1), Some(names(0)))
+} else {
+  TableIdentifier(param, None)
+}
+try {
+  catalog.lookupRelation(tid)
+
+  var stop = false
+  resolvedChild = resolvedChild.transformDown {
+case r @ BroadcastHint(SubqueryAlias(t, _))
+  if !stop && resolver(t, tid.identifier) =>
+  stop = true
+  r
+case r @ SubqueryAlias(t, _) if !stop && resolver(t, 
tid.identifier) =>
+  stop = true
+  BroadcastHint(r)
--- End diff --

I think we have to remove it; otherwise, the result will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
For your reference, below is a simple case if users want to do it using 
dataframe
```Scala
sql("CREATE TABLE tab1(c1 int)")
val df = spark.read.table("tab1")
df.join(broadcast(df))
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Yep. I made that case. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
What I mean is currently how to broadcast the Hive table `tab1`? I'm making 
the testcase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
Is it related? This is the most basic test case, right?
```SQL
  CREATE TABLE tab1(c1 int)
  select * from tab1, tab1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function

2016-07-18 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13990
  
@cloud-fan Comment addressed, test passed ð


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Does this work in `DataFrame` API, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
Not all the joins have the operators `SubqueryAlias`. For example, below is 
a self join against Hive tables:
```
== Analyzed Logical Plan ==
c1: int, c1: int
Project [c1#7, c1#8]
+- Join Inner
   :- MetastoreRelation default, tab1
   +- MetastoreRelation default, tab1
```

Thus, the current solution does not work, right?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13382: [SPARK-5581][Core] When writing sorted map output file, ...

2016-07-18 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/13382
  
This LGTM with some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...

2016-07-18 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/13382#discussion_r71262947
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ---
@@ -46,102 +46,145 @@ private[spark] class DiskBlockObjectWriter(
   extends OutputStream
   with Logging {
 
+  /**
+   * Guards against close calls, e.g. from a wrapping stream.
+   * Call manualClose to close the stream that was extended by this trait.
+   */
+  private trait ManualCloseOutputStream extends OutputStream {
+abstract override def close(): Unit = {
+  flush()
+}
+
+def manualClose(): Unit = {
+  super.close()
+}
+  }
+
   /** The file channel, used for repositioning / truncating the file. */
   private var channel: FileChannel = null
+  private var mcs: ManualCloseOutputStream = null
   private var bs: OutputStream = null
   private var fos: FileOutputStream = null
   private var ts: TimeTrackingOutputStream = null
   private var objOut: SerializationStream = null
   private var initialized = false
+  private var streamOpen = false
   private var hasBeenClosed = false
-  private var commitAndCloseHasBeenCalled = false
 
   /**
* Cursors used to represent positions in the file.
*
-   * ||---   |
-   * ^^  ^
-   * ||finalPosition
-   * |  reportedPosition
-   *   initialPosition
+   * ||---|
+   *   ^  ^
+   *   |committedPosition
+   * reportedPosition
*
-   * initialPosition: Offset in the file where we start writing. Immutable.
* reportedPosition: Position at the time of the last update to the 
write metrics.
-   * finalPosition: Offset where we stopped writing. Set on 
closeAndCommit() then never changed.
+   * committedPosition: Offset after last committed write.
* -: Current writes to the underlying file.
* x: Existing contents of the file.
*/
-  private val initialPosition = file.length()
-  private var finalPosition: Long = -1
-  private var reportedPosition = initialPosition
+  private var committedPosition = file.length()
+  private var reportedPosition = committedPosition
 
   /**
* Keep track of number of records written and also use this to 
periodically
* output bytes written since the latter is expensive to do for each 
record.
*/
   private var numRecordsWritten = 0
 
+  private def initialize(): Unit = {
+fos = new FileOutputStream(file, true)
+channel = fos.getChannel()
+ts = new TimeTrackingOutputStream(writeMetrics, fos)
+class ManualCloseBufferedOutputStream
+  extends BufferedOutputStream(ts, bufferSize) with 
ManualCloseOutputStream
+mcs = new ManualCloseBufferedOutputStream
+  }
+
   def open(): DiskBlockObjectWriter = {
 if (hasBeenClosed) {
   throw new IllegalStateException("Writer already closed. Cannot be 
reopened.")
 }
-fos = new FileOutputStream(file, true)
-ts = new TimeTrackingOutputStream(writeMetrics, fos)
-channel = fos.getChannel()
-bs = compressStream(new BufferedOutputStream(ts, bufferSize))
+if (!initialized) {
+  initialize()
+  initialized = true
+}
+bs = compressStream(mcs)
 objOut = serializerInstance.serializeStream(bs)
-initialized = true
+streamOpen = true
 this
   }
 
-  override def close() {
+  /**
+   * Close and cleanup all resources.
+   * Should call after committing or reverting partial writes.
+   */
+  private def closeResources(): Unit = {
 if (initialized) {
-  Utils.tryWithSafeFinally {
-if (syncWrites) {
-  // Force outstanding writes to disk and track how long it takes
-  objOut.flush()
-  val start = System.nanoTime()
-  fos.getFD.sync()
-  writeMetrics.incWriteTime(System.nanoTime() - start)
-}
-  } {
-objOut.close()
-  }
-
+  mcs.manualClose()
   channel = null
+  mcs = null
   bs = null
   fos = null
   ts = null
   objOut = null
   initialized = false
+  streamOpen = false
   hasBeenClosed = true
 }
   }
 
-  def isOpen: Boolean = objOut != null
+  /**
+   * Commits any remaining partial writes and closes resources.
+   */
+  override def close() {
+if (initialized) {
+  Utils.tryWithSafeFinally {
+commit()
+  } {
+closeResources()
+  }

[GitHub] spark pull request #13382: [SPARK-5581][Core] When writing sorted map output...

2016-07-18 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/13382#discussion_r71262912
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ---
@@ -46,102 +46,145 @@ private[spark] class DiskBlockObjectWriter(
   extends OutputStream
   with Logging {
 
+  /**
+   * Guards against close calls, e.g. from a wrapping stream.
+   * Call manualClose to close the stream that was extended by this trait.
--- End diff --

Could you also update the class-level comment to note the commit-and-resume 
behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 518 matches

Mail list logo