date:20160418

[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...

2016-04-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12433#discussion_r60180864
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -802,7 +807,12 @@ private[spark] class BlockManager(
   logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
   if (level.replication > 1) {
 // Wait for asynchronous replication to finish
-Await.ready(replicationFuture, Duration.Inf)
+try {
+  Await.ready(replicationFuture, Duration.Inf)
--- End diff --

@ScrapCodes, towards your other comment, I think that timeouts in this case 
might already happen to be covered by network / RPC timeouts within the 
`replicationFuture`'s code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...

2016-04-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12433#discussion_r60180666
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -260,7 +260,12 @@ private[spark] class BlockManager(
   def waitForAsyncReregister(): Unit = {
 val task = asyncReregisterTask
 if (task != null) {
-  Await.ready(task, Duration.Inf)
+  try {
+Await.ready(task, Duration.Inf)
--- End diff --

According to the Scaladoc (and actual usages), it looks like this 
particular `waitForAsyncReregister` method is only used in test code and I'm 
guessing that it's probably called from within an interrupt-based timeout block.

As for the other usages, we'd have to consider them on a case-by-case basis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...

2016-04-18 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/9565#issuecomment-211760253
  
When using member method as udf., for example,  `def createTransformFunc` 
in `org.apache.spark.ml.Transformer`, jenkins tests always get an exception.

Otherwise, it works well.

BTW, I can't reproduce that exception locally. Maybe java version matters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table

2016-04-18 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/12460#discussion_r60180497
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -254,6 +251,21 @@ class SparkSqlAstBuilder extends AstBuilder {
 }
   }
 
+   /**
+* A column path can be specified as an parameter to describe command. 
It is a dot separated
+* elements where the last element can be a String.
+* TODO - check with Herman
--- End diff --

Yeah Herman. Not supporting it would certainly simplify things. FYI -  I 
checked that the unit test case describe_xpath.q which exercises this syntax is 
not getting tested in HiveCompatibleSuite. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...

2016-04-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12484#discussion_r60180310
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockHandlerSuite.scala
 ---
@@ -204,26 +222,26 @@ class ReceivedBlockHandlerSuite
 sparkConf.set("spark.storage.unrollFraction", "0.4")
 // Block Manager with 12000 * 0.4 = 4800 bytes of free space for unroll
 blockManager = createBlockManager(12000, sparkConf)
+// This block is way too large to possibly be cached in memory:
+def hugeBlock: IteratorBlock = IteratorBlock(List.fill(100)(new 
Array[Byte](1000)).iterator)
 
 // there is not enough space to store this block in MEMORY,
 // But BlockManager will be able to serialize this block to WAL
 // and hence count returns correct value.
--- End diff --

@dibbhatt, I'm confused because it seems like your comment says that we 
should fail a job if blocks cannot be persisted because without that 
persistence the job will not work correctly even if the WAL is enabled. 
However, that claim seems to be contradicted by the comment describing this 
test case, which seems to suggest that this job should succeed despite the 
block being far too large to be successfully stored. In the old test case, 
however, the block appeared to be too small and actually _was_ being stored in 
memory, meaning that this comment wasn't describing the actual behavior of the 
test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12352


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12352#issuecomment-211759198
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12352#discussion_r60180155
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -53,33 +55,77 @@ class FileScanRDD(
 
   override def compute(split: Partition, context: TaskContext): 
Iterator[InternalRow] = {
 val iterator = new Iterator[Object] with AutoCloseable {
+  private val inputMetrics = context.taskMetrics().inputMetrics
+  private val existingBytesRead = inputMetrics.bytesRead
+
+  // Find a function that will return the FileSystem bytes read by 
this thread. Do this before
+  // apply readFunction, because it might read some bytes.
+  private val getBytesReadCallback: Option[() => Long] =
+SparkHadoopUtil.get.getFSBytesReadOnThreadCallback()
+
+  // For Hadoop 2.5+, we get our input bytes from thread-local Hadoop 
FileSystem statistics.
+  // If we do a coalesce, however, we are likely to compute multiple 
partitions in the same
+  // task and in the same thread, in which case we need to avoid 
override values written by
+  // previous partitions (SPARK-13071).
+  private def updateBytesRead(): Unit = {
+getBytesReadCallback.foreach { getBytesRead =>
+  inputMetrics.setBytesRead(existingBytesRead + getBytesRead())
+}
+  }
+
+  // If we can't get the bytes read from the FS stats, fall back to 
the file size,
+  // which may be inaccurate.
+  private def updateBytesReadWithFileSize(): Unit = {
+if (getBytesReadCallback.isEmpty && currentFile != null) {
+  inputMetrics.incBytesRead(currentFile.length)
+}
+  }
+
   private[this] val files = 
split.asInstanceOf[FilePartition].files.toIterator
+  private[this] var currentFile: PartitionedFile = null
   private[this] var currentIterator: Iterator[Object] = null
 
   def hasNext = (currentIterator != null && currentIterator.hasNext) 
|| nextIterator()
-  def next() = currentIterator.next()
+  def next() = {
+val nextElement = currentIterator.next()
+// TODO: we should have a better separation of row based and batch 
based scan, so that we
--- End diff --

i think in the future maybe we should just make everything batch based, and 
then this problem goes away.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12352#issuecomment-211759031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56194/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12352#issuecomment-211759029
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support single argument version of sqlContext....

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12488#issuecomment-211759087
  
Just a minor doc comment. LGTM otherwise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support single argument version of sqlContext....

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12488#discussion_r60180069
  
--- Diff: python/pyspark/sql/context.py ---
@@ -147,12 +147,24 @@ def setConf(self, key, value):
 self._ssql_ctx.setConf(key, value)
 
 @since(1.3)
-def getConf(self, key, defaultValue):
+def getConf(self, key, defaultValue=None):
 """Returns the value of Spark SQL configuration property for the 
given key.
 
-If the key is not set, returns defaultValue.
+If the key is not set, returns defaultValue, if set, otherwise, 
return the
--- End diff --

Maybe

```
If the key is not set and defaultValue is not None, return defaultValue.
If the key is not set and defaultValue is None, return the system default 
value.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12352#issuecomment-211758857
  
**[Test build #56194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56194/consoleFull)**
 for PR 12352 at commit 
[`c265546`](https://github.com/apache/spark/commit/c26554639f4a2615907d7b46af3005ff3f335d08).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60180011
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCodegenSuite.scala
 ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.SimpleCatalystConf
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.Literal._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+
+class OptimizeCodegenSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("OptimizeCodegen", Once, 
OptimizeCodegen(SimpleCatalystConf(true))) :: Nil
+  }
+
+  protected def assertEquivalent(e1: Expression, e2: Expression): Unit = {
+val correctAnswer = Project(Alias(e2, "out")() :: Nil, 
OneRowRelation).analyze
+val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, 
OneRowRelation).analyze)
+comparePlans(actual, correctAnswer)
+  }
+
+  test("Codegen only when the number of branches is small.") {
--- End diff --

Oh. Sure. I'll add those testcases, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60179863
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -142,16 +139,54 @@ case class CaseWhen(branches: Seq[(Expression, 
Expression)], elseValue: Option[E
 }
   }
 
-  def shouldCodegen: Boolean = {
-branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN
+  override def toString: String = {
+val cases = branches.map { case (c, v) => s" WHEN $c THEN $v" 
}.mkString
+val elseCase = elseValue.map(" ELSE " + _).getOrElse("")
+"CASE" + cases + elseCase + " END"
   }
 
+  override def sql: String = {
+val cases = branches.map { case (c, v) => s" WHEN ${c.sql} THEN 
${v.sql}" }.mkString
+val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
+"CASE" + cases + elseCase + " END"
+  }
+}
+
+
+/**
+ * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE 
e] END".
+ * When a = true, returns b; when c = true, returns d; else returns e.
+ *
+ * @param branches seq of (branch condition, branch value)
+ * @param elseValue optional value for the else branch
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = 
true, returns b; when c = true, return d; else return e.")
+// scalastyle:on line.size.limit
+case class CaseWhen(
+val branches: Seq[(Expression, Expression)],
+val elseValue: Option[Expression] = None)
+  extends CaseWhenBase(branches, elseValue) with CodegenFallback with 
Serializable {
--- End diff --

That would be right. `CaseWhenCodegen` is always generated from `CaseWhen`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60179727
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala ---
@@ -29,6 +29,7 @@ trait CatalystConf {
   def groupByOrdinal: Boolean
 
   def optimizerMaxIterations: Int
+  def maxCaseBranches: Int
--- End diff --

Thank you for quick review. Sure. And also `maxCaseBranchesForCodegen` in 
SQLConf.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...

2016-04-18 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12484#issuecomment-211758183
  
@dibbhatt,

Are you suggesting that this pull request introduces a bug? If so, are 
there any regression tests that will demonstrate it? I'm still unclear on 
precisely what the problem is from Spark Streaming's point of view, since your 
linked PR only adds unit tests for BlockManager functionality and doesn't have 
end-to-end application-level tests which exhibit how the old BlockManager 
behavior caused problems for streaming.

The PR discussion that you linked to is really long and has a somewhat 
unclear resolution. If there was a bug which motivated that PR, do you know 
whether it was previously resolved through another patch / other fixes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9565#issuecomment-211758161
  
What's the problem with runtime mirror?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211757342
  
**[Test build #56201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56201/consoleFull)**
 for PR 12490 at commit 
[`942e145`](https://github.com/apache/spark/commit/942e145b03f2d31a21c90736b80fd380ebf25940).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211756345
  
**[Test build #56200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56200/consoleFull)**
 for PR 12490 at commit 
[`c214204`](https://github.com/apache/spark/commit/c2142049cf9f4e577d9a0d1f57a21c869ae8486a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread jyshen15

Github user jyshen15 commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-211756190
  
i will handle the python style issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/12259#discussion_r60178619
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.udt
--- End diff --

You meant to move it to `org.apache.spark.ml.linalg.udt`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...

2016-04-18 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/9565


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...

2016-04-18 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/9565#issuecomment-211755597
  
Close this now. Maybe revisit this in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-04-18 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/11926


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12412#issuecomment-211755452
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211755392
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56199/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211755389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211755381
  
**[Test build #56199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56199/consoleFull)**
 for PR 12490 at commit 
[`0630ea3`](https://github.com/apache/spark/commit/0630ea3e0a9c829760fe5cb470dec41c4c1bf677).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-04-18 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-211755298
  
Close this and think better solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12412#issuecomment-211755456
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56192/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12412#issuecomment-211755041
  
**[Test build #56192 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56192/consoleFull)**
 for PR 12412 at commit 
[`08acf5c`](https://github.com/apache/spark/commit/08acf5c9a2638a94ce16df6fab124d3aeeea13d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12490#issuecomment-211754828
  
**[Test build #56199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56199/consoleFull)**
 for PR 12490 at commit 
[`0630ea3`](https://github.com/apache/spark/commit/0630ea3e0a9c829760fe5cb470dec41c4c1bf677).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12259#discussion_r60178343
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala ---
@@ -0,0 +1,99 @@
+/*
--- End diff --

Let's create `VectorUDTSuite.scala`, and `MatrixUDT.scala` for 
maintainability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12259#discussion_r60178107
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.udt
+
+import org.apache.spark.ml.linalg.{DenseMatrix, Matrix, SparseMatrix}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.catalyst.util.GenericArrayData
+import org.apache.spark.sql.types._
+
+private[spark] class MatrixUDT extends UserDefinedType[Matrix] {
--- End diff --

`private[ml]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60177562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -242,6 +261,12 @@ object CaseWhen {
   }
 }
 
+/** Factory methods for CaseWhenCodegen. */
+object CaseWhenCodegen {
--- End diff --

we can remove this given the above comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12353#issuecomment-211751566
  
cc @cloud-fan this change actually makes your other thing easier i think.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-211750751
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56198/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60177491
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCodegenSuite.scala
 ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.SimpleCatalystConf
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.Literal._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+
+class OptimizeCodegenSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("OptimizeCodegen", Once, 
OptimizeCodegen(SimpleCatalystConf(true))) :: Nil
+  }
+
+  protected def assertEquivalent(e1: Expression, e2: Expression): Unit = {
+val correctAnswer = Project(Alias(e2, "out")() :: Nil, 
OneRowRelation).analyze
+val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, 
OneRowRelation).analyze)
+comparePlans(actual, correctAnswer)
+  }
+
+  test("Codegen only when the number of branches is small.") {
--- End diff --

can you make sure you construct a few more test cases

one with nested casewhen, and one with multiple case when in one operator, 
and one with multiple casewhen in different operators


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-211750733
  
**[Test build #56198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)**
 for PR 11812 at commit 
[`ecde52c`](https://github.com/apache/spark/commit/ecde52c3d0e73c5210940c743e135f68e8d1386a).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread flyjy

Github user flyjy commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-211750648
  
Thanks. Have updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12435#issuecomment-211750061
  
**[Test build #56197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56197/consoleFull)**
 for PR 12435 at commit 
[`e8c14d6`](https://github.com/apache/spark/commit/e8c14d60deb1c068f770d7ff3fc9bef000aff899).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-211750023
  
**[Test build #56198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)**
 for PR 11812 at commit 
[`ecde52c`](https://github.com/apache/spark/commit/ecde52c3d0e73c5210940c743e135f68e8d1386a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60177331
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -142,16 +139,54 @@ case class CaseWhen(branches: Seq[(Expression, 
Expression)], elseValue: Option[E
 }
   }
 
-  def shouldCodegen: Boolean = {
-branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN
+  override def toString: String = {
+val cases = branches.map { case (c, v) => s" WHEN $c THEN $v" 
}.mkString
+val elseCase = elseValue.map(" ELSE " + _).getOrElse("")
+"CASE" + cases + elseCase + " END"
   }
 
+  override def sql: String = {
+val cases = branches.map { case (c, v) => s" WHEN ${c.sql} THEN 
${v.sql}" }.mkString
+val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
+"CASE" + cases + elseCase + " END"
+  }
+}
+
+
+/**
+ * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE 
e] END".
+ * When a = true, returns b; when c = true, returns d; else returns e.
+ *
+ * @param branches seq of (branch condition, branch value)
+ * @param elseValue optional value for the else branch
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = 
true, returns b; when c = true, return d; else return e.")
+// scalastyle:on line.size.limit
+case class CaseWhen(
+val branches: Seq[(Expression, Expression)],
+val elseValue: Option[Expression] = None)
+  extends CaseWhenBase(branches, elseValue) with CodegenFallback with 
Serializable {
--- End diff --

maybe just have a toCodegen function that creates CaseWhenCodegen?

We can then remove `object CaseWhenCodegen`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12353#discussion_r60177186
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala ---
@@ -29,6 +29,7 @@ trait CatalystConf {
   def groupByOrdinal: Boolean
 
   def optimizerMaxIterations: Int
+  def maxCaseBranches: Int
--- End diff --

maxCaseBranchesForCodegen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12492#discussion_r60176644
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -48,8 +49,8 @@ case class Size(child: Expression) extends 
UnaryExpression with ExpectsInputType
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(array(obj1, obj2,...)) - Sorts the input array in 
ascending order according to the natural ordering of the array elements.",
-  extended = " > SELECT _FUNC_(array('b', 'd', 'c', 'a'));\n 'a', 'b', 
'c', 'd'")
+  usage = "_FUNC_(array(array, ascendingOrder)) - Sorts the input array in 
ascending order according to the natural ordering of the array elements.",
--- End diff --

this is wrong?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14398] [SQL] Audit non-reserved keyword...

2016-04-18 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12191#issuecomment-211741548
  
The compiler should emit a `tableswitch` instead of a `lookupswitch` when 
the nonReserved keywords are grouped together; which is a bit faster. I don't 
think the improvement is large enought to warrant another change and another 
PR. So lets merge this one and be done.

LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12353#issuecomment-211740746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56191/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12353#issuecomment-211740744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12353#issuecomment-211740602
  
**[Test build #56191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56191/consoleFull)**
 for PR 12353 at commit 
[`a9294bd`](https://github.com/apache/spark/commit/a9294bdd01c125dcc7a7b232a7b14b476678e731).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12381#issuecomment-211740371
  
**[Test build #56189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56189/consoleFull)**
 for PR 12381 at commit 
[`5290476`](https://github.com/apache/spark/commit/5290476d7ca6af010fb539f3ae7c69b7fea0c852).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12381#issuecomment-211740457
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56189/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12381#issuecomment-211740456
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12492#issuecomment-211740393
  
**[Test build #56196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56196/consoleFull)**
 for PR 12492 at commit 
[`9238c41`](https://github.com/apache/spark/commit/9238c4186f4ccde0b240ede692598d60bf6bbcfb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211740321
  
We should be able to remove almost all the methods on 
InternalAccumulators.scala, shouldn't we? All that includes create, createAll, 
createShuffleReadAccums, ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table

2016-04-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12460#discussion_r60175987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -254,6 +251,21 @@ class SparkSqlAstBuilder extends AstBuilder {
 }
   }
 
+   /**
+* A column path can be specified as an parameter to describe command. 
It is a dot separated
+* elements where the last element can be a String.
+* TODO - check with Herman
--- End diff --

It is a bit more complicates than I thought. We allow strings here because 
Hive allows us to use the `'$elem'`, `'$keys'` and `'$values'` 'keywords'. That 
is why I added strings to the rule. I am not sure if we should support this. 
What do you guys think?

This is what I found in the Hive manual:
```SQL
DESCRIBE [EXTENDED|FORMATTED]  [db_name.]table_name[ col_name ( 
[.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];
```

See also: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60175966
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala ---
@@ -36,7 +36,7 @@ class StageInfo(
 val rddInfos: Seq[RDDInfo],
 val parentIds: Seq[Int],
 val details: String,
-val internalAccumulators: Seq[Accumulator[_]] = Seq.empty,
+val taskMetrics: TaskMetrics = null,
--- End diff --

agree


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12492#issuecomment-211739747
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60175851
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala ---
@@ -36,7 +36,7 @@ class StageInfo(
 val rddInfos: Seq[RDDInfo],
 val parentIds: Seq[Int],
 val details: String,
-val internalAccumulators: Seq[Accumulator[_]] = Seq.empty,
+val taskMetrics: TaskMetrics = null,
--- End diff --

is this only null in tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60175871
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala ---
@@ -36,7 +36,7 @@ class StageInfo(
 val rddInfos: Seq[RDDInfo],
 val parentIds: Seq[Int],
 val details: String,
-val internalAccumulators: Seq[Accumulator[_]] = Seq.empty,
+val taskMetrics: TaskMetrics = null,
--- End diff --

if that's the case, it might be better to always create a taskmetric rather 
than leave it at null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12492#issuecomment-211739754
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56195/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12492#issuecomment-211739740
  
**[Test build #56195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56195/consoleFull)**
 for PR 12492 at commit 
[`67fb4f0`](https://github.com/apache/spark/commit/67fb4f022a1e12dec9d9f467c6fa26f38abbb040).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60175818
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -217,21 +170,45 @@ class TaskMetrics private[spark] (initialAccums: 
Seq[Accumulator[_]]) extends Se
*/
   private[spark] def mergeShuffleReadMetrics(): Unit = synchronized {
 if (tempShuffleReadMetrics.nonEmpty) {
-  _shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics)
+  shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics)
 }
   }
 
-  /**
-   * Metrics related to shuffle write, defined only in shuffle map stages.
-   */
-  def shuffleWriteMetrics: ShuffleWriteMetrics = _shuffleWriteMetrics
+  // Only used for test
+  private[spark] val testAccum =
+sys.props.get("spark.testing").map(_ => 
TaskMetrics.createAccum[Long](TEST_ACCUM))
+
+  @transient private[spark] lazy val internalAccums: Seq[Accumulable[_, 
_]] = {
--- End diff --

we collect these internal accumulators together, so that it's easier to:

1. register all of them in scheduler.
2. get the internal accumulator info out of given accumulator updates in 
`TaskMetrics.fromAccumulatorUpdates`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12492#issuecomment-211739523
  
**[Test build #56195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56195/consoleFull)**
 for PR 12492 at commit 
[`67fb4f0`](https://github.com/apache/spark/commit/67fb4f022a1e12dec9d9f467c6fa26f38abbb040).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...

2016-04-18 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/6848#issuecomment-211739456
  
IMO, this is useful in one way that hadoop configuration need not be a 
global state. We can have a default set of configuration that we use everywhere 
as a default. And then in every hadoop related method a user has an alternative 
to override the default. 

Binary compatibility will definitely be broken, but source compatibility 
might not be affected i.e. one might need to recompile the project with newer 
spark version. As it is asked already, it should be okay for 2.0 ?

@andrewor14 ping !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60175666
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -268,23 +243,11 @@ private[spark] class ListenerTaskMetrics(
 }
 
 private[spark] object TaskMetrics extends Logging {
+  import InternalAccumulator._
 
   def empty: TaskMetrics = new TaskMetrics
 
-  /**
-   * Get an accumulator from the given map by name, assuming it exists.
-   */
-  def getAccum[T](accumMap: Map[String, Accumulator[_]], name: String): 
Accumulator[T] = {
-require(accumMap.contains(name), s"metric '$name' is missing")
-val accum = accumMap(name)
-try {
-  // Note: we can't do pattern matching here because types are erased 
by compile time
-  accum.asInstanceOf[Accumulator[T]]
-} catch {
-  case e: ClassCastException =>
-throw new SparkException(s"accumulator $name was of unexpected 
type", e)
-}
-  }
+  def createAccum[T](name: String): Accumulator[T] = 
create(name).asInstanceOf[Accumulator[T]]
--- End diff --

I'd move the creation of accumulators in here, rather than delegating to 
InternalAccumulators.

Also maybe just have createLongAccumulator and createCollectionAccumulator; 
then it becomes obvious at the callsite what's going on, and we also don't need 
to have conditional branches in create.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-211739045
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56190/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] Fixed the Typos in Collection Fu...

2016-04-18 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/12492

[SPARK-12457] Fixed the Typos in Collection Functions

 What changes were proposed in this pull request?
https://github.com/apache/spark/pull/12185 contains the original PR I 
submitted in https://github.com/apache/spark/pull/10418

However, it misses one of the extended example, a wrong description and a 
few typos for collection functions. This PR is fix all these issues. 

 How was this patch tested?
The existing test cases already cover it. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark expressionUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12492


commit 67fb4f022a1e12dec9d9f467c6fa26f38abbb040
Author: gatorsmile 
Date:   2016-04-19T05:36:49Z

fixed a few typos in collection functions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-211739043
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12259#issuecomment-211738829
  
**[Test build #56190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56190/consoleFull)**
 for PR 12259 at commit 
[`85d1df0`](https://github.com/apache/spark/commit/85d1df0acdca497cc63363783db07701eff93ba6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VectorUDT extends UserDefinedType[Vector] `
  * `  s\"Can not load in UserDefinedType $`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...

2016-04-18 Thread hujy

Github user hujy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12491#discussion_r60175227
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -159,7 +159,9 @@ private[classification] trait LogisticRegressionParams 
extends ProbabilisticClas
 @Since("1.2.0")
 @Experimental
 class LogisticRegression @Since("1.2.0") (
-@Since("1.4.0") override val uid: String)
+@Since("1.4.0") override val uid: String,
+@Since("2.0.0") val numFeatures: Int = 0,
--- End diff --

I think the values are passed in when user create the object. When user 
call toString, the values are returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...

2016-04-18 Thread hujy

Github user hujy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12491#discussion_r60175230
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -262,6 +262,8 @@ def load(cls, sc, path):
 model.setThreshold(threshold)
 return model
 
+def __repr__(self):
+return self._call_java("toString")
 
--- End diff --

ok :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...

2016-04-18 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/10418


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60174901
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -217,21 +170,45 @@ class TaskMetrics private[spark] (initialAccums: 
Seq[Accumulator[_]]) extends Se
*/
   private[spark] def mergeShuffleReadMetrics(): Unit = synchronized {
 if (tempShuffleReadMetrics.nonEmpty) {
-  _shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics)
+  shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics)
 }
   }
 
-  /**
-   * Metrics related to shuffle write, defined only in shuffle map stages.
-   */
-  def shuffleWriteMetrics: ShuffleWriteMetrics = _shuffleWriteMetrics
+  // Only used for test
+  private[spark] val testAccum =
+sys.props.get("spark.testing").map(_ => 
TaskMetrics.createAccum[Long](TEST_ACCUM))
+
+  @transient private[spark] lazy val internalAccums: Seq[Accumulable[_, 
_]] = {
--- End diff --

is this here for a reason?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...

2016-04-18 Thread dibbhatt

Github user dibbhatt commented on the pull request:

https://github.com/apache/spark/pull/12484#issuecomment-211734038
  
Hi @JoshRosen , Isn't this fix is somehow related to the issue discussed 
here https://github.com/apache/spark/pull/6990. 

You can refer to the final comments from @andrewor14 

https://github.com/apache/spark/pull/6990#issuecomment-120515683

The issue here is ,  If a block fails to unroll, the ReceivedBlockHandler 
wont be getting the block id and will never know about the block and will not 
include it in a future computation. the problem is that if you can't store a 
block locally, the receiver thinks the block has not been stored anywhere -- 
even if it has been successfully written to WAL . isn't it ?

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12352#issuecomment-211733351
  
**[Test build #56194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56194/consoleFull)**
 for PR 12352 at commit 
[`c265546`](https://github.com/apache/spark/commit/c26554639f4a2615907d7b46af3005ff3f335d08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread lresende

Github user lresende commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211732862
  
Ok, I will work with @JoshRosen on the trigger part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14369][SQL] Locality support for FileSc...

2016-04-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12153#discussion_r60174480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -621,20 +621,40 @@ class HDFSFileCatalog(
 
   def getStatus(path: Path): Array[FileStatus] = 
leafDirToChildrenFiles(path)
 
+  private implicit class LocatedFileStatusIterator(iterator: 
RemoteIterator[LocatedFileStatus])
+extends Iterator[LocatedFileStatus] {
+
+override def hasNext: Boolean = iterator.hasNext
+
+override def next(): LocatedFileStatus = iterator.next()
+  }
+
   private def listLeafFiles(paths: Seq[Path]): 
mutable.LinkedHashSet[FileStatus] = {
 if (paths.length >= 
sqlContext.conf.parallelPartitionDiscoveryThreshold) {
   HadoopFsRelation.listLeafFilesInParallel(paths, hadoopConf, 
sqlContext.sparkContext)
--- End diff --

Let's also have a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14369][SQL] Locality support for FileSc...

2016-04-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12153#discussion_r60174434
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -621,20 +621,40 @@ class HDFSFileCatalog(
 
   def getStatus(path: Path): Array[FileStatus] = 
leafDirToChildrenFiles(path)
 
+  private implicit class LocatedFileStatusIterator(iterator: 
RemoteIterator[LocatedFileStatus])
+extends Iterator[LocatedFileStatus] {
+
+override def hasNext: Boolean = iterator.hasNext
+
+override def next(): LocatedFileStatus = iterator.next()
+  }
+
   private def listLeafFiles(paths: Seq[Path]): 
mutable.LinkedHashSet[FileStatus] = {
 if (paths.length >= 
sqlContext.conf.parallelPartitionDiscoveryThreshold) {
   HadoopFsRelation.listLeafFilesInParallel(paths, hadoopConf, 
sqlContext.sparkContext)
--- End diff --

Seems we also need to update the `listLeafFiles` that is called by 
`listLeafFilesInParallel`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60174384
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -36,15 +36,10 @@ private[spark] class TaskContextImpl(
 override val taskMemoryManager: TaskMemoryManager,
 localProperties: Properties,
 @transient private val metricsSystem: MetricsSystem,
-initialAccumulators: Seq[Accumulator[_]] = 
InternalAccumulator.createAll())
+val taskMetrics: TaskMetrics)
--- End diff --

add override


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211732260
  
**[Test build #56193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56193/consoleFull)**
 for PR 12472 at commit 
[`6226058`](https://github.com/apache/spark/commit/622605830643014aac5d0a2f5f30dab567530faf).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13811][SPARK-13836] [SQL] Removed IsNot...

2016-04-18 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/11649


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12472#discussion_r60174365
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala ---
@@ -65,7 +65,7 @@ object TaskContext {
* An empty task context that does not represent an actual task.
--- End diff --

while you are at this, can you document this is only used for testing?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211732276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56193/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211732273
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211731808
  
ready for review :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211731818
  
We can run them via some trigger phrase though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211731762
  
They have been flaky and causing other pull requests to fail. That's why we 
shouldn't run them on Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread lresende

Github user lresende commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211731480
  
@rxin Let me move them to a specific docker profile. But I would still run 
them on Jenkins, as the infrastructure is already setup there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12472#issuecomment-211731220
  
**[Test build #56193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56193/consoleFull)**
 for PR 12472 at commit 
[`6226058`](https://github.com/apache/spark/commit/622605830643014aac5d0a2f5f30dab567530faf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211730704
  
I don't even think they should run on pull requests. Tests that require 
extensive external setup (or downloading things) in general are flaky.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread lresende

Github user lresende commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211729901
  
@rxin, just trying to understand, is the oracle test the only one failing ? 
Or you are suggesting we move the whole docker based tests to a separate 
profile that would only run on Jenkins ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...

2016-04-18 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/12433#discussion_r60173470
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -260,7 +260,12 @@ private[spark] class BlockManager(
   def waitForAsyncReregister(): Unit = {
 val task = asyncReregisterTask
 if (task != null) {
-  Await.ready(task, Duration.Inf)
+  try {
+Await.ready(task, Duration.Inf)
--- End diff --

Unrelated to this PR, But waiting for infinite time has a downside, that if 
this (main)thread blocks then the app running will appear to have hanged with 
no way to know unless one checks the thread dump somehow. However if it is for 
finite time duration, an exception is thrown on timeout. In the case 
`Duration.Inf` there is no exception ever thrown.

If I am correct about the above, I am not sure why it is being used widely 
? I am just asking so I understand if there is some side to it that I do not 
understand. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211729460
  
I just failed to build Spark locally once due to the docker oracle test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...

2016-04-18 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12412#issuecomment-211729217
  
**[Test build #56192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56192/consoleFull)**
 for PR 12412 at commit 
[`08acf5c`](https://github.com/apache/spark/commit/08acf5c9a2638a94ce16df6fab124d3aeeea13d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...

2016-04-18 Thread holdenk

Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/12491#issuecomment-211729265
  
Thanks for taking the initiative on this PR - at first glance it seems like 
this approach might not quite work but its easier to tell with some tests - 
could you add a test case and run it locally? You add your test in 
LogisticRegressionSuite.scala for the scala test.  As well you may find the 
linter tools ./dev/lint-scala & ./dev/lint-python to be useful :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests

2016-04-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12270#issuecomment-211728923
  
These tests are too flaky. I've already seen a few failures.

We should disable them from the normal tests and maybe occasionally running 
them (via some trigger or just run it once before the release).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...

2016-04-18 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12491#discussion_r60173022
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -159,7 +159,9 @@ private[classification] trait LogisticRegressionParams 
extends ProbabilisticClas
 @Since("1.2.0")
 @Experimental
 class LogisticRegression @Since("1.2.0") (
-@Since("1.4.0") override val uid: String)
+@Since("1.4.0") override val uid: String,
+@Since("2.0.0") val numFeatures: Int = 0,
--- End diff --

Why are we adding these vals here? Where do they get set from? Are they 
needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...

2016-04-18 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/12491#discussion_r60172784
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -262,6 +262,8 @@ def load(cls, sc, path):
 model.setThreshold(threshold)
 return model
 
+def __repr__(self):
+return self._call_java("toString")
 
--- End diff --

I think Python style asks for two new lines here (try running 
./dev/lint-python locally :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...

2016-04-18 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12259#discussion_r60172565
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.linalg.udt
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1122 matches

Mail list logo