[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17059
  
@datumbox you mention there is GC & performance overhead which makes some 
sense. Have you run into problems with very large scale (like millions users & 
items & ratings)? I did regression tests 
[here](https://docs.google.com/spreadsheets/d/1iX5LisfXcZSTCHp8VPoo5z-eCO85A5VsZDtZ5e475ks/edit?usp=sharing)
 - admittedly vs `1.6.1` at that time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17064: [SPARK-19736][SQL] refreshByPath should clear all...

2017-02-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17064#discussion_r103393968
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -168,15 +168,16 @@ class CacheManager extends Logging {
   (fs, path.makeQualified(fs.getUri, fs.getWorkingDirectory))
 }
 
-cachedData.foreach {
-  case data if data.plan.find(lookupAndRefresh(_, fs, 
qualifiedPath)).isDefined =>
-val dataIndex = cachedData.indexWhere(cd => 
data.plan.sameResult(cd.plan))
-if (dataIndex >= 0) {
-  data.cachedRepresentation.cachedColumnBuffers.unpersist(blocking 
= true)
-  cachedData.remove(dataIndex)
-}
-
sparkSession.sharedState.cacheManager.cacheQuery(Dataset.ofRows(sparkSession, 
data.plan))
-  case _ => // Do Nothing
+cachedData.filter {
--- End diff --

The problem can be shown clearly with an example code snippet:

val t = new scala.collection.mutable.ArrayBuffer[Int]

t += 1
t += 2

t.foreach { 
  case i if i > 0 =>
println(s"i = $i")
val index = t.indexWhere(_ == i)
if (index >= 0) {
  t.remove(index)
}
println(s"t: $t")
t += (i + 2)
println(s"t: $t")
}

Output:

i = 1// The first iteration, we get the first element "1"
t: ArrayBuffer(2)   // "1" has been removed from the array
t: ArrayBuffer(2, 3) // New element "3" has been inserted
i = 3   // In next iteration, element "2" is wrongly skipped
t: ArrayBuffer(2) // "3" has been removed from the array
t: ArrayBuffer(2, 5) 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17015
  
LGTM except a few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread datumbox
Github user datumbox commented on the issue:

https://github.com/apache/spark/pull/17059
  
@imatiach-msft This comparison is intentional and checks 2 things: That the 
number is within integer range and that the Id does not have any non-zero 
digits after the decimal point. If the number is outside integer range the 
intValue will overflow and the number will not match its double part. Moreover 
if it has fractional part it will also will not match the integer.

Effectively we are making sure that the UserId and ItemId have values 
within integer range.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103393810
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -1,179 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.hive
-
-import java.io.IOException
-
-import com.google.common.base.Objects
-import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.hive.common.StatsSetupConst
-import org.apache.hadoop.hive.ql.metadata.{Partition, Table => HiveTable}
-import org.apache.hadoop.hive.ql.plan.TableDesc
-
-import org.apache.spark.sql.SparkSession
-import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.catalog._
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, 
AttributeReference, Expression}
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
-import org.apache.spark.sql.execution.FileRelation
-import org.apache.spark.sql.hive.client.HiveClientImpl
-import org.apache.spark.sql.types.StructField
-
-
-private[hive] case class MetastoreRelation(
-databaseName: String,
-tableName: String)
-(val catalogTable: CatalogTable,
- @transient private val sparkSession: SparkSession)
-  extends LeafNode with MultiInstanceRelation with FileRelation with 
CatalogRelation {
-
-  override def equals(other: Any): Boolean = other match {
-case relation: MetastoreRelation =>
-  databaseName == relation.databaseName &&
-tableName == relation.tableName &&
-output == relation.output
-case _ => false
-  }
-
-  override def hashCode(): Int = {
-Objects.hashCode(databaseName, tableName, output)
-  }
-
-  override protected def otherCopyArgs: Seq[AnyRef] = catalogTable :: 
sparkSession :: Nil
-
-  @transient val hiveQlTable: HiveTable = 
HiveClientImpl.toHiveTable(catalogTable)
-
-  @transient override def computeStats(conf: CatalystConf): Statistics = {
-catalogTable.stats.map(_.toPlanStats(output)).getOrElse(Statistics(
-  sizeInBytes = {
-val totalSize = 
hiveQlTable.getParameters.get(StatsSetupConst.TOTAL_SIZE)
-val rawDataSize = 
hiveQlTable.getParameters.get(StatsSetupConst.RAW_DATA_SIZE)
-// TODO: check if this estimate is valid for tables after 
partition pruning.
-// NOTE: getting `totalSize` directly from params is kind of 
hacky, but this should be
-// relatively cheap if parameters for the table are populated into 
the metastore.
-// Besides `totalSize`, there are also `numFiles`, `numRows`, 
`rawDataSize` keys
-// (see StatsSetupConst in Hive) that we can look at in the future.
-BigInt(
-  // When table is external,`totalSize` is always zero, which will 
influence join strategy
-  // so when `totalSize` is zero, use `rawDataSize` instead
-  // when `rawDataSize` is also zero, use 
`HiveExternalCatalog.STATISTICS_TOTAL_SIZE`,
-  // which is generated by analyze command.
-  if (totalSize != null && totalSize.toLong > 0L) {
-totalSize.toLong
-  } else if (rawDataSize != null && rawDataSize.toLong > 0) {
-rawDataSize.toLong
-  } else if 
(sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled) {
-try {
-  val hadoopConf = sparkSession.sessionState.newHadoopConf()
-  val fs: FileSystem = 
hiveQlTable.getPath.getFileSystem(hadoopConf)
-  fs.getContentSummary(hiveQlTable.getPath).getLength
-} catch {
-  case e: IOException =>
-logWarning("Failed to get table size from hdfs.", e)
-sparkSession.sessionState.conf.defaultSizeInBytes
-}
- 

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #73575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73575/testReport)**
 for PR 16867 at commit 
[`9778b67`](https://github.com/apache/spark/commit/9778b679fee8e3ad9f27ab190f144c085b8d0de4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17093
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73568/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17093
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17093
  
**[Test build #73568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73568/testReport)**
 for PR 17093 at commit 
[`a3ac29b`](https://github.com/apache/spark/commit/a3ac29bede0f54faa6707616fb94fc261cfaaf2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73566/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17015
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17015
  
**[Test build #73566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73566/testReport)**
 for PR 17015 at commit 
[`d10bfbc`](https://github.com/apache/spark/commit/d10bfbc71ac30b74222f6794edb5c62ad562f3e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-27 Thread datumbox
Github user datumbox commented on a diff in the pull request:

https://github.com/apache/spark/pull/17059#discussion_r103392500
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends 
Params with HasPredictionCo
* Attempts to safely cast a user/item id to an Int. Throws an exception 
if the value is
* out of integer range.
*/
-  protected val checkedCast = udf { (n: Double) =>
-if (n > Int.MaxValue || n < Int.MinValue) {
-  throw new IllegalArgumentException(s"ALS only supports values in 
Integer range for columns " +
-s"${$(userCol)} and ${$(itemCol)}. Value $n was out of Integer 
range.")
-} else {
-  n.toInt
+  protected val checkedCast = udf { (n: Any) =>
+n match {
+  case v: Int => v // Avoid unnecessary casting
+  case v: Number =>
+val intV = v.intValue()
+// Checks if number within Int range and has no fractional part.
+if (v.doubleValue == intV) {
--- End diff --

@imatiach-msft: In this snippet we deal with Uset id and Item id. Those 
things should no have fractional bits. What I do here is convert the number 
into Integer and compare its double value. If the values are identical one of 
the following is true:
- The value is Byte or Short.
- The value is Long but within the Integer range.
- The value is Double or Float but without any fractional part.

I think this snippet is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103391249
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 ---
@@ -138,27 +153,35 @@ case class HiveTableScanExec(
 }
   }
 
+  // exposed for tests
+  @transient lazy val rawPartitions = {
+val prunedPartitions = if 
(sparkSession.sessionState.conf.metastorePartitionPruning) {
--- End diff --

Checked the history. It sounds like @liancheng can answer whether this is 
still needed or not. : )

https://github.com/apache/spark/pull/7421#issuecomment-122527391 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16867#discussion_r103391138
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -911,14 +916,14 @@ private[spark] class TaskSetManager(
 logDebug("Checking for speculative tasks: minFinished = " + 
minFinishedForSpeculation)
 if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 
0) {
   val time = clock.getTimeMillis()
-  val durations = 
taskInfos.values.filter(_.successful).map(_.duration).toArray
-  Arrays.sort(durations)
+  val durations = successfulTasksSet.toArray.map(taskInfos(_).duration)
   val medianDuration = durations(min((0.5 * 
tasksSuccessful).round.toInt, durations.length - 1))
   val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 
minTimeToSpeculation)
   // TODO: Threshold should also look at standard deviation of task 
durations and have a lower
   // bound based on that.
   logDebug("Task length threshold for speculation: " + threshold)
-  for ((tid, info) <- taskInfos) {
+  for (tid <- runningTasksSet) {
+val info = taskInfos(tid)
--- End diff --

@kayousterhout 
Thanks a lot for your comments :)
I will keep this simple change in this pr. For time complexity improvement, 
I will make another pr and try add some measurements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/16867#discussion_r103390199
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -911,14 +916,14 @@ private[spark] class TaskSetManager(
 logDebug("Checking for speculative tasks: minFinished = " + 
minFinishedForSpeculation)
 if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 
0) {
   val time = clock.getTimeMillis()
-  val durations = 
taskInfos.values.filter(_.successful).map(_.duration).toArray
-  Arrays.sort(durations)
+  val durations = successfulTasksSet.toArray.map(taskInfos(_).duration)
   val medianDuration = durations(min((0.5 * 
tasksSuccessful).round.toInt, durations.length - 1))
   val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 
minTimeToSpeculation)
   // TODO: Threshold should also look at standard deviation of task 
durations and have a lower
   // bound based on that.
   logDebug("Task length threshold for speculation: " + threshold)
-  for ((tid, info) <- taskInfos) {
+  for (tid <- runningTasksSet) {
+val info = taskInfos(tid)
--- End diff --

Echoing what Imran said -- I'm definitely +1 on merging this simple change. 
 The other changes in this PR add a bunch of complexity, so I'd need to see 
measurements demonstrating a significant improvement in performance to be 
convinced that we should merge them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17090
  
the same as  https://github.com/apache/spark/pull/12574 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17064: [SPARK-19736][SQL] refreshByPath should clear all...

2017-02-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17064#discussion_r103390013
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -168,15 +168,16 @@ class CacheManager extends Logging {
   (fs, path.makeQualified(fs.getUri, fs.getWorkingDirectory))
 }
 
-cachedData.foreach {
-  case data if data.plan.find(lookupAndRefresh(_, fs, 
qualifiedPath)).isDefined =>
-val dataIndex = cachedData.indexWhere(cd => 
data.plan.sameResult(cd.plan))
-if (dataIndex >= 0) {
-  data.cachedRepresentation.cachedColumnBuffers.unpersist(blocking 
= true)
-  cachedData.remove(dataIndex)
-}
-
sparkSession.sharedState.cacheManager.cacheQuery(Dataset.ofRows(sparkSession, 
data.plan))
-  case _ => // Do Nothing
+cachedData.filter {
--- End diff --

After `filter`, we iterate on a different collection than `cachedData`, so 
it is no problem to add/delete elements to `cachedData`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73563/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #73563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73563/testReport)**
 for PR 16867 at commit 
[`cd16008`](https://github.com/apache/spark/commit/cd16008cc857ca38b4f22861625b6ebb9f82a066).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

2017-02-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13036
  
I am happy to do so. I assume that It seems already almost done except for 
https://github.com/apache/spark/pull/13036#discussion_r84476560?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73562/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #73562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73562/testReport)**
 for PR 16867 at commit 
[`f28d900`](https://github.com/apache/spark/commit/f28d90055f5a9db8c5de8a27b5c39a119cc5c670).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103387597
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -40,38 +38,24 @@ case class AnalyzeColumnCommand(
 val sessionState = sparkSession.sessionState
 val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
 val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
-val relation =
-  
EliminateSubqueryAliases(sparkSession.table(tableIdentWithDB).queryExecution.analyzed)
-
-// Compute total size
-val (catalogTable: CatalogTable, sizeInBytes: Long) = relation match {
-  case catalogRel: CatalogRelation =>
-// This is a Hive serde format table
-(catalogRel.catalogTable,
-  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
-
-  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
-// This is a data source format table
-(logicalRel.catalogTable.get,
-  AnalyzeTableCommand.calculateTotalSize(sessionState, 
logicalRel.catalogTable.get))
-
-  case otherRelation =>
-throw new AnalysisException("ANALYZE TABLE is not supported for " +
-  s"${otherRelation.nodeName}.")
+val tableMeta = sessionState.catalog.getTableMetadata(tableIdentWithDB)
+if (tableMeta.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException("ANALYZE TABLE is not supported on 
views.")
 }
+val sizeInBytes = AnalyzeTableCommand.calculateTotalSize(sessionState, 
tableMeta)
 
 // Compute stats for each column
 val (rowCount, newColStats) =
-  AnalyzeColumnCommand.computeColumnStats(sparkSession, 
tableIdent.table, relation, columnNames)
+  AnalyzeColumnCommand.computeColumnStats(sparkSession, 
tableIdentWithDB, columnNames)
--- End diff --

`object AnalyzeColumnCommand` is not needed. We can move 
`computeColumnStats ` into the `case class AnalyzeColumnCommand`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103387529
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -90,10 +74,10 @@ object AnalyzeColumnCommand extends Logging {
*/
   def computeColumnStats(
--- End diff --

Now, this is not being used for testing. We can mark it as private. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16929
  
**[Test build #73574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73574/testReport)**
 for PR 16929 at commit 
[`0c088bf`](https://github.com/apache/spark/commit/0c088bfc9469f5dc546f4d153ada609ad3b0b6ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16929
  
@brkyvz, @marmbrus - I think it is ready for another look. Could you see if 
I understood your comments correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...

2017-02-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16929#discussion_r103386481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression])
 }
 
 /**
- * Converts an json input string to a [[StructType]] with the specified 
schema.
+ * Converts an json input string to a [[StructType]] or [[ArrayType]] with 
the specified schema.
  */
 case class JsonToStruct(
-schema: StructType,
+schema: DataType,
 options: Map[String, String],
 child: Expression,
 timeZoneId: Option[String] = None)
   extends UnaryExpression with TimeZoneAwareExpression with 
CodegenFallback with ExpectsInputTypes {
   override def nullable: Boolean = true
 
-  def this(schema: StructType, options: Map[String, String], child: 
Expression) =
+  def this(schema: DataType, options: Map[String, String], child: 
Expression) =
 this(schema, options, child, None)
 
+  override def checkInputDataTypes(): TypeCheckResult = schema match {
--- End diff --

I tried several combinations with `TypeCollection` but it seems not working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16929
  
**[Test build #73573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73573/testReport)**
 for PR 16929 at commit 
[`9f1e966`](https://github.com/apache/spark/commit/9f1e96637cd6d67db0b5811daf2b33a9f49980a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17095
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17095
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73565/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17095
  
**[Test build #73565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73565/testReport)**
 for PR 17095 at commit 
[`55c525e`](https://github.com/apache/spark/commit/55c525efe29002138369ade386da6ec7a268d54a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17089: [SPARK-19756][SQL] drop the table cache after inserting ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17089
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17052#discussion_r103384682
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -989,7 +989,8 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
   withSQLConf(
 SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key -> "2",
 // Force deleting the old logs
-SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY.key -> "1"
+SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY.key -> "1",
+SQLConf.UNSUPPORTED_OPERATION_CHECK_ENABLED.key -> "false"
   ) {
--- End diff --

Close the "UNSUPPORTED_OPERATION_CHECK_ENABLED", as `Source.getBatch` 
returns DF whose `isStreaming` is true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17052#discussion_r103385145
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -231,8 +231,9 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   case EventTimeWatermark(columnName, delay, child) =>
 EventTimeWatermarkExec(columnName, delay, planLater(child)) :: Nil
 
-  case PhysicalAggregation(
-namedGroupingExpressions, aggregateExpressions, 
rewrittenResultExpressions, child) =>
+  case agg @ PhysicalAggregation(
+namedGroupingExpressions, aggregateExpressions, 
rewrittenResultExpressions, child)
+if agg.isStreaming =>
 
--- End diff --

Apply this strategy only if the logicplan is streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16929
  
**[Test build #73572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73572/testReport)**
 for PR 16929 at commit 
[`54e60bb`](https://github.com/apache/spark/commit/54e60bb149cd882c21856f19df0cf375c3ca3b20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17089: [SPARK-19756][SQL] drop the table cache after inserting ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17089
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73564/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17089: [SPARK-19756][SQL] drop the table cache after inserting ...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17089
  
**[Test build #73564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73564/testReport)**
 for PR 17089 at commit 
[`8bca8d3`](https://github.com/apache/spark/commit/8bca8d35e04e582f73052411e42811a8c90329de).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-27 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17052#discussion_r103385054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1119,11 +1119,16 @@ case class DecimalAggregates(conf: CatalystConf) 
extends Rule[LogicalPlan] {
  */
 object ConvertToLocalRelation extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case Project(projectList, LocalRelation(output, data))
+case Project(projectList, lr @ LocalRelation(output, data))
 if !projectList.exists(hasUnevaluableExpr) =>
   val projection = new InterpretedProjection(projectList, output)
   projection.initialize(0)
-  LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+  if (lr.isStreaming) {
+LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+  .setIncremental()
+  } else {
+LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+  }
   }
--- End diff --

In a streaming query, we will transfrom stream source to a batch 
`LocalRelation` whose `isStreaming` is true, so we should keep new 
LocalRelation's `isStreaming` is true in this rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17056#discussion_r103384950
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -371,6 +370,48 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 new StructType().add("array", arrayOfString).add("map", 
mapOfString))
   .add("structOfUDT", structOfUDT))
 
+  test("hive-hash for decimal") {
+def checkHiveHashForDecimal(
+input: String,
+precision: Int,
+scale: Int,
+expected: Long): Unit = {
+  val decimal = Decimal.apply(new java.math.BigDecimal(input))
+  decimal.changePrecision(precision, scale)
+  val decimalType = DataTypes.createDecimalType(precision, scale)
+  checkHiveHash(decimal, decimalType, expected)
+}
+
+checkHiveHashForDecimal("18", 38, 0, 558)
+checkHiveHashForDecimal("-18", 38, 0, -558)
+checkHiveHashForDecimal("-18", 38, 12, -558)
+checkHiveHashForDecimal("18446744073709001000", 38, 19, -17070057)
--- End diff --

The main reason why not all of them match is because difference in how 
scale and precision are enforced within Hive vs Spark.

Hive does it using its own custom logic : 
https://github.com/apache/hive/blob/branch-1.2/common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java#L274

Spark has its own way : 
https://github.com/apache/spark/blob/0e2405490f2056728d1353abbac6f3ea177ae533/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L230

Now when one does `CAST(-18446744073709001000BD AS DECIMAL(38,19))`, it 
does NOT fit in Hive's range and it will convert it to `null`... and `HASH()` 
over `null` will return 0.

In case of Spark, `CAST(-18446744073709001000BD AS DECIMAL(38,19))` is 
valid and running `HASH()` over it thus gives some non-zero result.
 
TLDR: this difference is before the hashing function comes into the 
picture. Making this in sync would mean the semantics of Decimal in Spark need 
to be matched with that in Hive. I don't think its a good idea to embark on 
that as it will be a breaking change plus this PR is not a strong reason to 
push for that. Hive-hash is best effort.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17056#discussion_r103384029
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -371,6 +370,48 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 new StructType().add("array", arrayOfString).add("map", 
mapOfString))
   .add("structOfUDT", structOfUDT))
 
+  test("hive-hash for decimal") {
+def checkHiveHashForDecimal(
+input: String,
+precision: Int,
+scale: Int,
+expected: Long): Unit = {
+  val decimal = Decimal.apply(new java.math.BigDecimal(input))
+  decimal.changePrecision(precision, scale)
+  val decimalType = DataTypes.createDecimalType(precision, scale)
+  checkHiveHash(decimal, decimalType, expected)
+}
+
+checkHiveHashForDecimal("18", 38, 0, 558)
--- End diff --

These were generated over Hive 1.2.1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17052
  
**[Test build #73571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73571/testReport)**
 for PR 17052 at commit 
[`9ffbad2`](https://github.com/apache/spark/commit/9ffbad20f45cc4c001059be0203ccab887316b18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-27 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16954#discussion_r103384302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1398,42 +1399,46 @@ class Analyzer(
 }
   } while (!current.resolved && !current.fastEquals(previous))
 
-  // Step 2: Pull out the predicates if the plan is resolved.
+  // Step 2: pull the outer references and record them as children of 
SubqueryExpression
   if (current.resolved) {
 // Make sure the resolved query has the required number of output 
columns. This is only
 // needed for Scalar and IN subqueries.
 if (requiredColumns > 0 && requiredColumns != current.output.size) 
{
   failAnalysis(s"The number of columns in the subquery 
(${current.output.size}) " +
 s"does not match the required number of columns 
($requiredColumns)")
 }
-// Pullout predicates and construct a new plan.
-f.tupled(rewriteSubQuery(current, plans))
+// Validate the outer reference and record the outer references as 
children of
+// subquery expression.
+f.tupled(current, checkAndGetOuterReferences(current))
--- End diff --

@hvanhovell We can remove tupled and just say f(current, ...) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16929
  
**[Test build #73570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73570/testReport)**
 for PR 16929 at commit 
[`72d6410`](https://github.com/apache/spark/commit/72d641018635aae94cc89e216e30540233d461f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16938: [SPARK-19583][SQL]CTAS for data source table with...

2017-02-27 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16938#discussion_r103384037
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -140,8 +140,8 @@ case class CreateDataSourceTableAsSelectCommand(
 return Seq.empty
   }
 
-  saveDataIntoTable(
-sparkSession, table, table.storage.locationUri, query, mode, 
tableExists = true)
+  saveDataIntoTable(sparkSession, table, table.storage.locationUri, 
query, mode,
--- End diff --

ping @cloud-fan ~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...

2017-02-27 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17055
  
ping @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103383919
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -91,18 +97,58 @@ class ResolveHiveSerdeTable(session: SparkSession) 
extends Rule[LogicalPlan] {
 
   // Infers the schema, if empty, because the schema could be 
determined by Hive
   // serde.
-  val catalogTable = if (query.isEmpty) {
-val withSchema = HiveUtils.inferSchema(withStorage)
-if (withSchema.schema.length <= 0) {
+  val withSchema = if (query.isEmpty) {
+val inferred = HiveUtils.inferSchema(withStorage)
+if (inferred.schema.length <= 0) {
   throw new AnalysisException("Unable to infer the schema. " +
-s"The schema specification is required to create the table 
${withSchema.identifier}.")
+s"The schema specification is required to create the table 
${inferred.identifier}.")
 }
-withSchema
+inferred
   } else {
 withStorage
   }
 
-  c.copy(tableDesc = catalogTable)
+  c.copy(tableDesc = withSchema)
+  }
+}
+
+class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] 
{
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
+case relation: CatalogRelation
+if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.tableMeta.stats.isEmpty =>
+  val table = relation.tableMeta
+  // TODO: check if this estimate is valid for tables after partition 
pruning.
+  // NOTE: getting `totalSize` directly from params is kind of hacky, 
but this should be
+  // relatively cheap if parameters for the table are populated into 
the metastore.
+  // Besides `totalSize`, there are also `numFiles`, `numRows`, 
`rawDataSize` keys
+  // (see StatsSetupConst in Hive) that we can look at in the future.
+  // When table is external,`totalSize` is always zero, which will 
influence join strategy
+  // so when `totalSize` is zero, use `rawDataSize` instead
+  // when `rawDataSize` is also zero, use 
`HiveExternalCatalog.STATISTICS_TOTAL_SIZE`,
--- End diff --

This is out of dated, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10307
  
**[Test build #73569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73569/testReport)**
 for PR 10307 at commit 
[`e425438`](https://github.com/apache/spark/commit/e4254389a46e297bf89a45a07d85cd565ba6343e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17015#discussion_r103383279
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -349,36 +350,41 @@ object CatalogTypes {
 
 
 /**
- * An interface that is implemented by logical plans to return the 
underlying catalog table.
- * If we can in the future consolidate SimpleCatalogRelation and 
MetastoreRelation, we should
- * probably remove this interface.
+ * A [[LogicalPlan]] that represents a table.
  */
-trait CatalogRelation {
-  def catalogTable: CatalogTable
-  def output: Seq[Attribute]
-}
+case class CatalogRelation(
+tableMeta: CatalogTable,
+dataCols: Seq[Attribute],
+partitionCols: Seq[Attribute]) extends LeafNode with 
MultiInstanceRelation {
+  assert(tableMeta.identifier.database.isDefined)
+  assert(tableMeta.partitionSchema.sameType(partitionCols.toStructType))
+  assert(tableMeta.dataSchema.sameType(dataCols.toStructType))
+
+  // The partition column should always appear after data columns.
+  override def output: Seq[Attribute] = dataCols ++ partitionCols
+
+  def isPartitioned: Boolean = partitionCols.nonEmpty
+
+  override def equals(relation: Any): Boolean = relation match {
+case other: CatalogRelation => tableMeta == other.tableMeta && output 
== other.output
+case _ => false
+  }
 
+  override def hashCode(): Int = {
+Objects.hashCode(tableMeta.identifier, output)
+  }
 
-/**
- * A [[LogicalPlan]] that wraps [[CatalogTable]].
- *
- * Note that in the future we should consolidate this and 
HiveCatalogRelation.
- */
-case class SimpleCatalogRelation(
-metadata: CatalogTable)
-  extends LeafNode with CatalogRelation {
-
-  override def catalogTable: CatalogTable = metadata
-
-  override lazy val resolved: Boolean = false
-
-  override val output: Seq[Attribute] = {
-val (partCols, dataCols) = metadata.schema.toAttributes
-  // Since data can be dumped in randomly with no validation, 
everything is nullable.
-  
.map(_.withNullability(true).withQualifier(Some(metadata.identifier.table)))
-  .partition { a =>
-metadata.partitionColumnNames.contains(a.name)
-  }
-dataCols ++ partCols
+  /** Only compare table identifier. */
+  override lazy val cleanArgs: Seq[Any] = Seq(tableMeta.identifier)
+
+  override def computeStats(conf: CatalystConf): Statistics = {
+// For data source tables, we will create a `LogicalRelation` and 
won't call this method, for
+// hive serde tables, we will always generate a statistics.
+// TODO: unify the table stats generation.
+tableMeta.stats.map(_.toPlanStats(output)).get
--- End diff --

Yeah, the value should be always filled by `DetermineTableStats`, but maybe 
we still can issue an exception when it is `None`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73560/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17079
  
**[Test build #73560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73560/testReport)**
 for PR 17079 at commit 
[`94879a8`](https://github.com/apache/spark/commit/94879a8528a3d50cf67c8952b05ac9e408f5ecdd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17052
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17052
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73558/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17093: [SPARK-19761][SQL]create InMemoryFileIndex with an empty...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17093
  
**[Test build #73568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73568/testReport)**
 for PR 17093 at commit 
[`a3ac29b`](https://github.com/apache/spark/commit/a3ac29bede0f54faa6707616fb94fc261cfaaf2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17052
  
**[Test build #73558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73558/testReport)**
 for PR 17052 at commit 
[`c87651a`](https://github.com/apache/spark/commit/c87651abaf4af9eea1b292495fb0708dd0265274).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10307
  
**[Test build #73567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73567/testReport)**
 for PR 10307 at commit 
[`401f682`](https://github.com/apache/spark/commit/401f6829dcadb6d0f2ce51c99520cc55dbc28995).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73567/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10307
  
**[Test build #73567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73567/testReport)**
 for PR 10307 at commit 
[`401f682`](https://github.com/apache/spark/commit/401f6829dcadb6d0f2ce51c99520cc55dbc28995).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17067
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17090
  
cc @MLnick


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17067
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73561/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17067
  
**[Test build #73561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73561/testReport)**
 for PR 17067 at commit 
[`e4f347e`](https://github.com/apache/spark/commit/e4f347e648efba81c1be1ff679d7dd88967d408b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

2017-02-27 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13036
  
Ok, lets see if maybe @zero323 or @HyukjinKwon are interested in taking 
this over. Otherwise I'll add this to my backlog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

2017-02-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/13036
  
@holdenk please feel free to take this over. Can't find time to work on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods sho...

2017-02-27 Thread sethah
Github user sethah closed the pull request at:

https://github.com/apache/spark/pull/13036


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17047
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103379909
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ColumnResolutionSuite.scala
 ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.sql.{AnalysisException, QueryTest, Row}
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SQLTestUtils
+
+class ColumnResolutionSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
--- End diff --

The logic to resolve the column in the LogicalPlan is same - there is no 
change there.  I wanted to test the hive table to make sure that the qualifier 
information is correctly set. We update the qualifier info in MetastoreRelation 
so wanted to have coverage for hive table. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17015
  
**[Test build #73566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73566/testReport)**
 for PR 17015 at commit 
[`d10bfbc`](https://github.com/apache/spark/commit/d10bfbc71ac30b74222f6794edb5c62ad562f3e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17047
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73554/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17047
  
**[Test build #73554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73554/testReport)**
 for PR 17047 at commit 
[`7753998`](https://github.com/apache/spark/commit/7753998f0a21073a05897b8945c8e61a1fe4fc84).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17015: [SPARK-19678][SQL] remove MetastoreRelation

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17015
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103378842
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ---
@@ -52,6 +52,19 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("column resolution scenarios with local temp view") {
+val df = Seq(2).toDF("i1")
+df.createOrReplaceTempView("table1")
+withTempView("table1") {
+  checkAnswer(spark.sql("SELECT table1.* FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT * FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT table1.i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT a.i1 FROM table1 AS a"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1 AS a"), Row(2))
--- End diff --

Also doable for global temporary view, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103378807
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/columnresolution.sql 
---
@@ -0,0 +1,82 @@
+-- Scenario: column resolution scenarios with datasource table
+CREATE DATABASE mydb1;
+use mydb1;
--- End diff --

Please use upper case for SQL keywords.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103378740
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ColumnResolutionSuite.scala
 ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.spark.sql.{AnalysisException, QueryTest, Row}
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SQLTestUtils
+
+class ColumnResolutionSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
--- End diff --

For the test cases you want to keep here, you can move it to `sql/core`. 
Why we need to test hive serde tables? Compared with data source tables, it is 
touching different code paths to resolve columns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/17059
  
@datumbox I like the changes, I just had a minor concern about the code 
where we call v.intValue and then compare this to v.doubleValue -- due to 
precision issues, I'm not sure if this is desirable, since the data could come 
from any source and be slightly modified outside the Int range, and the 
previous code does not do this check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73559/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17052
  
**[Test build #73559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73559/testReport)**
 for PR 17052 at commit 
[`59f4272`](https://github.com/apache/spark/commit/59f4272ee97b77bc1aaedb7daf63acf1b417d58e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17089: [SPARK-19756][SQL] drop the table cache after ins...

2017-02-27 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17089#discussion_r103378494
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -148,6 +148,9 @@ case class InsertIntoHadoopFsRelationCommand(
 options = options)
 
   fileIndex.foreach(_.refresh())
+  catalogTable.foreach { table =>
+
sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(table.identifier))
--- End diff --

it is recached? 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L154


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17095
  
**[Test build #73565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73565/testReport)**
 for PR 17095 at commit 
[`55c525e`](https://github.com/apache/spark/commit/55c525efe29002138369ade386da6ec7a268d54a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103378392
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ---
@@ -52,6 +52,19 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("column resolution scenarios with local temp view") {
+val df = Seq(2).toDF("i1")
+df.createOrReplaceTempView("table1")
+withTempView("table1") {
+  checkAnswer(spark.sql("SELECT table1.* FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT * FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT table1.i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT a.i1 FROM table1 AS a"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1 AS a"), Row(2))
--- End diff --

Sure, let me look at converting these too. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17059: [SPARK-19733][ML]Removed unnecessary castings and...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17059#discussion_r103378267
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala 
---
@@ -82,12 +82,20 @@ private[recommendation] trait ALSModelParams extends 
Params with HasPredictionCo
* Attempts to safely cast a user/item id to an Int. Throws an exception 
if the value is
* out of integer range.
*/
-  protected val checkedCast = udf { (n: Double) =>
-if (n > Int.MaxValue || n < Int.MinValue) {
-  throw new IllegalArgumentException(s"ALS only supports values in 
Integer range for columns " +
-s"${$(userCol)} and ${$(itemCol)}. Value $n was out of Integer 
range.")
-} else {
-  n.toInt
+  protected val checkedCast = udf { (n: Any) =>
+n match {
+  case v: Int => v // Avoid unnecessary casting
+  case v: Number =>
+val intV = v.intValue()
+// Checks if number within Int range and has no fractional part.
+if (v.doubleValue == intV) {
--- End diff --

Sorry, I'm not sure if this is a good idea due to floating point 
precision... the code above doesn't seem to do this check, it just calls toInt 
-- however, if this is absolutely necessary, I would hope that we could give 
the user some way to specify the Int range or precision.  Also, if we are going 
to go ahead with this change, then we should add some tests to verify the case 
that the exception is thrown, but without some ability to specify the precision 
I'm not sure if this is the correct thing to do (?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17067#discussion_r103378199
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ---
@@ -52,6 +52,19 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("column resolution scenarios with local temp view") {
+val df = Seq(2).toDF("i1")
+df.createOrReplaceTempView("table1")
+withTempView("table1") {
+  checkAnswer(spark.sql("SELECT table1.* FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT * FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT table1.i1 FROM table1"), Row(2))
+  checkAnswer(spark.sql("SELECT a.i1 FROM table1 AS a"), Row(2))
+  checkAnswer(spark.sql("SELECT i1 FROM table1 AS a"), Row(2))
--- End diff --

How about these test cases for temporary views? 

```
-- Test data.
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), 
(null, null)
AS testData(a, b);
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17089: [SPARK-19756][SQL] drop the table cache after inserting ...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17089
  
**[Test build #73564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73564/testReport)**
 for PR 17089 at commit 
[`8bca8d3`](https://github.com/apache/spark/commit/8bca8d35e04e582f73052411e42811a8c90329de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17089: [SPARK-19756][SQL] drop the table cache after inserting ...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17089
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17089: [SPARK-19756][SQL] drop the table cache after ins...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17089#discussion_r103377788
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -148,6 +148,9 @@ case class InsertIntoHadoopFsRelationCommand(
 options = options)
 
   fileIndex.foreach(_.refresh())
+  catalogTable.foreach { table =>
+
sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(table.identifier))
--- End diff --

This PR can make the behavior consistent with what we did for [insertion of 
Hive serve 
tables](//github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L378)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17067: [SPARK-19602][SQL][TESTS] Add tests for qualified column...

2017-02-27 Thread skambha
Github user skambha commented on the issue:

https://github.com/apache/spark/pull/17067
  
Thanks much Xiao for the review and comments. 

I have made the following changes: 
- Separated out the -ve cases from the +ve cases. 
- Moved positive tests and also the cases that should be supported into the 
SQLQueryTestSuite framework.  A new test file columnresolution.sql and the 
corresponding master out file is added. 
- Clean up the ColumnResolutionSuite to remove cases that are covered in 
the SQLQueryTestSuite
- I have kept the -ve cases in the ColumnResolutionSuite because the exprId 
shows up in the exception.
- I also wanted to cover a case against a hive serde table so I have kept 
those tests in the ColumnResolutionSuite

Please advise if we should move any others.  Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17093: [SPARK-19761][SQL]create InMemoryFileIndex with a...

2017-02-27 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17093#discussion_r103377566
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +179,12 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("InMemoryFileIndex with empty rootPaths when 
PARALLEL_PARTITION_DISCOVERY_THRESHOLD is 0") {
--- End diff --

I think the user should not set it to a negative number initiatively, but 
it is better if we can cover these situations~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17095
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73556/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17095
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17095
  
**[Test build #73556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73556/testReport)**
 for PR 17095 at commit 
[`570ce24`](https://github.com/apache/spark/commit/570ce24bee80dad5b2e897db34d04f3752139555).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17093: [SPARK-19761][SQL]create InMemoryFileIndex with a...

2017-02-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17093#discussion_r103377147
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala
 ---
@@ -178,6 +179,12 @@ class FileIndexSuite extends SharedSQLContext {
   assert(catalog2.allFiles().nonEmpty)
 }
   }
+
+  test("InMemoryFileIndex with empty rootPaths when 
PARALLEL_PARTITION_DISCOVERY_THRESHOLD is 0") {
--- End diff --

After this fix, when users setting it to `-1`, we still face the same 
strange error. We need a complete fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17094: [SPARK-19762][ML][WIP] Hierarchy for consolidating ML ag...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73557/testReport)**
 for PR 17094 at commit 
[`9a04d0b`](https://github.com/apache/spark/commit/9a04d0bc51bed29bca28a5e34ebc5b614b6560d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17094: [SPARK-19762][ML][WIP] Hierarchy for consolidating ML ag...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73557/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17094: [SPARK-19762][ML][WIP] Hierarchy for consolidating ML ag...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #73563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73563/testReport)**
 for PR 16867 at commit 
[`cd16008`](https://github.com/apache/spark/commit/cd16008cc857ca38b4f22861625b6ebb9f82a066).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >