from:"jaceklaskowski"

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390132
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -134,15 +135,14 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
   */
 
   test("Fitting without numClasses in metadata") {
-val df: DataFrame = 
spark.createDataFrame(TreeTests.featureImportanceData(sc))
+val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
 val gbt = new GBTClassifier().setMaxDepth(1).setMaxIter(1)
 gbt.fit(df)
--- End diff --

Wonder why this line is separate not part of 139? Any reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69390117
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala 
---
@@ -71,8 +71,7 @@ class ClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 
   test("getNumClasses") {
 def getTestData(labels: Seq[Double]): DataFrame = {
--- End diff --

repeated. What about Moving it outside `test` methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14020: [SPARK-16349][sql] Fall back to isolated class lo...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14020#discussion_r69382827
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 ---
@@ -264,7 +270,7 @@ private[hive] class IsolatedClientLoader(
   throw new ClassNotFoundException(
 s"$cnf when creating Hive client using classpath: 
${execJars.mkString(", ")}\n" +
 "Please make sure that jars for your version of hive and 
hadoop are included in the " +
--- End diff --

Just a nitpick...should 'hive' be Hive as the line above + Hadoop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14026: [SPARK-13569][STREAMING][KAFKA] pattern based top...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14026#discussion_r69382788
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
 ---
@@ -79,8 +81,71 @@ private case class Subscribe[K, V](
   def onStart(currentOffsets: ju.Map[TopicPartition, jl.Long]): 
Consumer[K, V] = {
 val consumer = new KafkaConsumer[K, V](kafkaParams)
 consumer.subscribe(topics)
-if (currentOffsets.isEmpty) {
-  offsets.asScala.foreach { case (topicPartition, offset) =>
+val toSeek = if (currentOffsets.isEmpty) {
+  offsets
+} else {
+  currentOffsets
+}
+if (!toSeek.isEmpty) {
+  // work around KAFKA-3370 when reset is none
+  val aor = kafkaParams.get(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG)
+  val shouldSuppress = aor != null && 
aor.asInstanceOf[String].toUpperCase == "NONE"
+  try {
+consumer.poll(0)
+  } catch {
+case x: NoOffsetForPartitionException if shouldSuppress =>
+  // silence exception
+  }
+  toSeek.asScala.foreach { case (topicPartition, offset) =>
+  consumer.seek(topicPartition, offset)
--- End diff --

4 chars for indent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14031#discussion_r69382719
  
--- Diff: project/SparkBuild.scala ---
@@ -723,8 +723,8 @@ object Unidoc {
 .map(_.filterNot(_.getCanonicalPath.contains("org/apache/hadoop")))
 },
 
-// Javadoc options: create a window title, and group key packages on 
index page
-javacOptions in doc := Seq(
+// Javadoc options: create a window title
--- End diff --

Do we really need that line? It's in the git history at the very least and 
JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14030#discussion_r69382676
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala
 ---
@@ -35,35 +35,109 @@ class ForeachSinkSuite extends StreamTest with 
SharedSQLContext with BeforeAndAf
 sqlContext.streams.active.foreach(_.stop())
   }
 
-  test("foreach") {
+  test("foreach() with `append` output mode") {
 withTempDir { checkpointDir =>
   val input = MemoryStream[Int]
   val query = input.toDS().repartition(2).writeStream
 .option("checkpointLocation", checkpointDir.getCanonicalPath)
+.outputMode("append")
 .foreach(new TestForeachWriter())
 .start()
+
+  // -- batch 0 ---
   input.addData(1, 2, 3, 4)
   query.processAllAvailable()
 
-  val expectedEventsForPartition0 = Seq(
+  var expectedEventsForPartition0 = Seq(
 ForeachSinkSuite.Open(partition = 0, version = 0),
 ForeachSinkSuite.Process(value = 1),
 ForeachSinkSuite.Process(value = 3),
 ForeachSinkSuite.Close(None)
   )
-  val expectedEventsForPartition1 = Seq(
+  var expectedEventsForPartition1 = Seq(
 ForeachSinkSuite.Open(partition = 1, version = 0),
 ForeachSinkSuite.Process(value = 2),
 ForeachSinkSuite.Process(value = 4),
 ForeachSinkSuite.Close(None)
   )
 
-  val allEvents = ForeachSinkSuite.allEvents()
+  var allEvents = ForeachSinkSuite.allEvents()
+  assert(allEvents.size === 2)
+  assert {
+allEvents === Seq(expectedEventsForPartition0, 
expectedEventsForPartition1) ||
+  allEvents === Seq(expectedEventsForPartition1, 
expectedEventsForPartition0)
+  }
+
+  ForeachSinkSuite.clear()
+
+  // -- batch 1 ---
+  input.addData(5, 6, 7, 8)
+  query.processAllAvailable()
+
+  expectedEventsForPartition0 = Seq(
+ForeachSinkSuite.Open(partition = 0, version = 1),
+ForeachSinkSuite.Process(value = 5),
+ForeachSinkSuite.Process(value = 7),
+ForeachSinkSuite.Close(None)
+  )
+  expectedEventsForPartition1 = Seq(
+ForeachSinkSuite.Open(partition = 1, version = 1),
+ForeachSinkSuite.Process(value = 6),
+ForeachSinkSuite.Process(value = 8),
+ForeachSinkSuite.Close(None)
+  )
+
+  allEvents = ForeachSinkSuite.allEvents()
   assert(allEvents.size === 2)
   assert {
 allEvents === Seq(expectedEventsForPartition0, 
expectedEventsForPartition1) ||
   allEvents === Seq(expectedEventsForPartition1, 
expectedEventsForPartition0)
   }
+
+  query.stop()
+}
+  }
+
+  test("foreach() with `complete` output mode") {
+withTempDir { checkpointDir =>
+  val input = MemoryStream[Int]
+
+  val query = input.toDS()
+.groupBy().count().as[Long].map(_.toInt)
+.writeStream
+.option("checkpointLocation", checkpointDir.getCanonicalPath)
+.outputMode("complete")
--- End diff --

Are really output modes strings? No enums or similar more type-safe values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14030#discussion_r69382669
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala
 ---
@@ -35,35 +35,109 @@ class ForeachSinkSuite extends StreamTest with 
SharedSQLContext with BeforeAndAf
 sqlContext.streams.active.foreach(_.stop())
   }
 
-  test("foreach") {
+  test("foreach() with `append` output mode") {
 withTempDir { checkpointDir =>
   val input = MemoryStream[Int]
   val query = input.toDS().repartition(2).writeStream
 .option("checkpointLocation", checkpointDir.getCanonicalPath)
+.outputMode("append")
 .foreach(new TestForeachWriter())
 .start()
+
+  // -- batch 0 ---
   input.addData(1, 2, 3, 4)
   query.processAllAvailable()
 
-  val expectedEventsForPartition0 = Seq(
+  var expectedEventsForPartition0 = Seq(
 ForeachSinkSuite.Open(partition = 0, version = 0),
 ForeachSinkSuite.Process(value = 1),
 ForeachSinkSuite.Process(value = 3),
 ForeachSinkSuite.Close(None)
   )
-  val expectedEventsForPartition1 = Seq(
+  var expectedEventsForPartition1 = Seq(
 ForeachSinkSuite.Open(partition = 1, version = 0),
 ForeachSinkSuite.Process(value = 2),
 ForeachSinkSuite.Process(value = 4),
 ForeachSinkSuite.Close(None)
   )
 
-  val allEvents = ForeachSinkSuite.allEvents()
+  var allEvents = ForeachSinkSuite.allEvents()
+  assert(allEvents.size === 2)
+  assert {
+allEvents === Seq(expectedEventsForPartition0, 
expectedEventsForPartition1) ||
+  allEvents === Seq(expectedEventsForPartition1, 
expectedEventsForPartition0)
+  }
+
+  ForeachSinkSuite.clear()
+
+  // -- batch 1 ---
+  input.addData(5, 6, 7, 8)
+  query.processAllAvailable()
+
+  expectedEventsForPartition0 = Seq(
+ForeachSinkSuite.Open(partition = 0, version = 1),
+ForeachSinkSuite.Process(value = 5),
+ForeachSinkSuite.Process(value = 7),
+ForeachSinkSuite.Close(None)
+  )
+  expectedEventsForPartition1 = Seq(
+ForeachSinkSuite.Open(partition = 1, version = 1),
+ForeachSinkSuite.Process(value = 6),
+ForeachSinkSuite.Process(value = 8),
+ForeachSinkSuite.Close(None)
+  )
+
+  allEvents = ForeachSinkSuite.allEvents()
   assert(allEvents.size === 2)
   assert {
 allEvents === Seq(expectedEventsForPartition0, 
expectedEventsForPartition1) ||
   allEvents === Seq(expectedEventsForPartition1, 
expectedEventsForPartition0)
--- End diff --

Same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14030: [SPARK-16350][SQL] Fix support for incremental pl...

2016-07-02 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/14030#discussion_r69382667
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala
 ---
@@ -35,35 +35,109 @@ class ForeachSinkSuite extends StreamTest with 
SharedSQLContext with BeforeAndAf
 sqlContext.streams.active.foreach(_.stop())
   }
 
-  test("foreach") {
+  test("foreach() with `append` output mode") {
 withTempDir { checkpointDir =>
   val input = MemoryStream[Int]
   val query = input.toDS().repartition(2).writeStream
 .option("checkpointLocation", checkpointDir.getCanonicalPath)
+.outputMode("append")
 .foreach(new TestForeachWriter())
 .start()
+
+  // -- batch 0 ---
   input.addData(1, 2, 3, 4)
   query.processAllAvailable()
 
-  val expectedEventsForPartition0 = Seq(
+  var expectedEventsForPartition0 = Seq(
 ForeachSinkSuite.Open(partition = 0, version = 0),
 ForeachSinkSuite.Process(value = 1),
 ForeachSinkSuite.Process(value = 3),
 ForeachSinkSuite.Close(None)
   )
-  val expectedEventsForPartition1 = Seq(
+  var expectedEventsForPartition1 = Seq(
 ForeachSinkSuite.Open(partition = 1, version = 0),
 ForeachSinkSuite.Process(value = 2),
 ForeachSinkSuite.Process(value = 4),
 ForeachSinkSuite.Close(None)
   )
 
-  val allEvents = ForeachSinkSuite.allEvents()
+  var allEvents = ForeachSinkSuite.allEvents()
+  assert(allEvents.size === 2)
+  assert {
+allEvents === Seq(expectedEventsForPartition0, 
expectedEventsForPartition1) ||
+  allEvents === Seq(expectedEventsForPartition1, 
expectedEventsForPartition0)
--- End diff --

`should contain theSameElementsAs`? See 
http://www.scalatest.org/user_guide/using_matchers#workingWithAggregations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r68506637
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +26,20 @@ import org.apache.spark.sql.test.SharedSQLContext
 class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
   import testImplicits._
 
+  test("primitive type on array") {
+val df = sparkContext.parallelize(Seq(1, 2), 1).toDF("v")
+val resDF = df.selectExpr("Array(v + 2, v + 3)")
+checkAnswer(resDF,
+  Seq(Row(Array(3, 4)), Row(Array(4, 5
+  }
+
+  test("primitive type and null on array") {
+val df = sparkContext.parallelize(Seq(1, 2), 1).toDF("v")
+val resDF = df.selectExpr("Array(v + 2, null, v + 3)")
+checkAnswer(resDF,
+  Seq(Row(Array(3, null, 4)), Row(Array(4, null, 5
+  }
+
   test("UDF on struct") {
 val f = udf((a: String) => a)
--- End diff --

`val f = udf[String, String](identity)` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r68506607
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +26,20 @@ import org.apache.spark.sql.test.SharedSQLContext
 class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext {
   import testImplicits._
 
+  test("primitive type on array") {
+val df = sparkContext.parallelize(Seq(1, 2), 1).toDF("v")
--- End diff --

`Seq(1, 2).toDF("v")` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r68506564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -51,27 +51,52 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val arrayClass = classOf[GenericArrayData].getName
 val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  final boolean ${ev.isNull} = false;
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
+val dt = dataType match {
+  case a @ ArrayType(et, _) => et
+}
+val isPrimitive = ctx.isPrimitiveType(dt)
+val evals = children.map(e => e.genCode(ctx))
+val allNonNull = evals.find(_.isNull != "false").isEmpty
--- End diff --

`evals.forall(_.isNull == "true")` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r68506515
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -51,27 +51,52 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 val arrayClass = classOf[GenericArrayData].getName
 val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  final boolean ${ev.isNull} = false;
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
+val dt = dataType match {
--- End diff --

This can be shorter (simpler?) with the pattern matching on assignment 
"trick", i.e.

```
val ArrayType(dt, _) = dataType
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13912#discussion_r68506371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -195,18 +202,50 @@ private[sql] class CsvOutputWriter(
   private var records: Long = 0L
   private val csvWriter = new LineCsvWriter(params, 
dataSchema.fieldNames.toSeq)
 
-  private def rowToString(row: Seq[Any]): Seq[String] = row.map { field =>
-if (field != null) {
-  field.toString
-} else {
-  params.nullValue
+  private def rowToString(row: InternalRow): Seq[String] = {
+var i = 0
+val values = new Array[String](row.numFields)
+while (i < row.numFields) {
+  if (!row.isNullAt(i)) {
+values(i) = fieldsConverters(i).apply(row, i)
+  } else {
+values(i) = params.nullValue
+  }
+  i += 1
+}
+values
+  }
+
+  private def makeConverter(dataType: DataType): ValueConverter = {
+dataType match {
--- End diff --

Why do you place `dataType match` on a separate line? One less `{` if at 
line 219 ;-) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13912: [SPARK-16216][SQL] CSV data source does not write...

2016-06-26 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13912#discussion_r68506352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -195,18 +202,50 @@ private[sql] class CsvOutputWriter(
   private var records: Long = 0L
   private val csvWriter = new LineCsvWriter(params, 
dataSchema.fieldNames.toSeq)
 
-  private def rowToString(row: Seq[Any]): Seq[String] = row.map { field =>
-if (field != null) {
-  field.toString
-} else {
-  params.nullValue
+  private def rowToString(row: InternalRow): Seq[String] = {
+var i = 0
+val values = new Array[String](row.numFields)
+while (i < row.numFields) {
--- End diff --

Please use `values.indices` and then `map` or `foreach` to make it more 
functional (and hopefully readable). With the change, you'll link `values` to 
indices (not relying on `row.numFields` used twice).

BTW, do we need `values` to be initialized first before adding elements? 
I'd vote for `foldLeft` as a better alternative, e.g.

```
def appendRowValue(arr: Array[String], i: Int) = {
  // do the if here
  arr
}
(0 to row.numFields).foldLeft(Array[String]()) { case (arr, i) => arr }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13654: [SPARK-15868] [Web UI] Executors table in Executo...

2016-06-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13654#discussion_r66961307
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -69,13 +73,13 @@ private[ui] class ExecutorsPage(
 }
 
 val execInfo = activeExecutorInfo ++ deadExecutorInfo
-val execInfoSorted = execInfo.sortBy(_.id)
+val execInfoSorted = execInfo.sortWith((a, b) => strToInt(a.id) > 
strToInt(b.id))
--- End diff --

I think I'd use `Ordering` trait instead as it's not very easy to spot what 
the ordering is.

```
case class Info(id: String)
val infos = Seq(Info("5"), Info("driver"), Info("1"), Info("10"), 
Info("500"))

implicit val onString = Ordering[Int].on((s: String) => 
util.Try(s.toInt).getOrElse(-1))
infos.sortBy(_.id)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66235235
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select(Column("a"), Column("b").as("b", md))
--- End diff --

Use `df.select('a, 'b.as("b", md))` instead. It's more compact and 
"modern". Our tests should promote the latest API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66235290
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
+val md = new MetadataBuilder().putString("key", "value").build()
+val dfWithmeta = df.select(Column("a"), Column("b").as("b", md))
+
+withTempPath { dir =>
+  val path = s"${dir.getCanonicalPath}/data"
+  dfWithmeta.write.parquet(path)
+
+  readParquetFile(path) { dfwithmeta2 =>
--- End diff --

Replace `dfwithmeta2` to `df`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13555: [SPARK-15804][SQL]Include metadata in the toStruc...

2016-06-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13555#discussion_r66234892
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -625,6 +625,22 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-15804: write out the metadata to parquet file") {
+val data = (1, "abc") ::(2, "helloabcde") :: Nil
+val df = spark.createDataFrame(data).toDF("a", "b")
--- End diff --

Merge the lines 630 and 631 to `Seq((1,"abc"),(2,"hello")).toDF("a", "b")` 
instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825524
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -529,7 +529,28 @@ object SQLConf {
   .internal()
   .doc("How long in milliseconds a file is guaranteed to be visible 
for all readers.")
   .timeConf(TimeUnit.MILLISECONDS)
-  .createWithDefault(60 * 1000L) // 10 minutes
+  .createWithDefault(60 * 10 * 1000L) // 10 minutes
+
+  val FILE_SOURCE_LOG_DELETION = 
SQLConfigBuilder("spark.sql.streaming.fileSource.log.deletion")
+.internal()
+.doc("Whether to delete the expired log files in file stream source.")
+.booleanConf
+.createWithDefault(true)
+
+  val FILE_SOURCE_LOG_COMPACT_INTERVAL =
+SQLConfigBuilder("spark.sql.streaming.fileSource.log.compactInterval")
+  .internal()
+  .doc("Number of log files after which all the previous files " +
+"are compacted into the next log file.")
+  .intConf
+  .createWithDefault(10)
+
+  val FILE_SOURCE_LOG_CLEANUP_DELAY =
+SQLConfigBuilder("spark.sql.streaming.fileSource.log.cleanupDelay")
+  .internal()
+  .doc("How long in milliseconds a file is guaranteed to be visible 
for all readers.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefault(60 * 10 * 1000L) // 10 minutes
--- End diff --

A nitpick but think it'd be easier to "decode" - `10 * 60 * 1000L`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
+  s"positive value.")
+
+  private val fileCleanupDelayMs = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY)
+
+  private val isDeletingExpiredLog = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_DELETION)
+
+  private var compactBatchId: Long = -1L
+
+  private def isCompactionBatch(batchId: Long, compactInterval: Long): 
Boolean = {
+batchId % compactInterval == 0
+  }
+
+  override def add(batchId: Long, metadata: Seq[String]): Boolean = {
+if (isCompactionBatch(batchId, compactInterval)) {
+  compactMetadataLog(batchId - 1)
+}
+
+super.add(batchId, metadata)
+  }
+
+  private def compactMetadataLog(batchId: Long): Unit = {
+// read out compact metadata and merge with new metadata.
+val batches = super.get(Some(compactBatchId), Some(batchId))
+val totalMetadata = batches.flatMap(_._2)
+if (totalMetadata.isEmpty) {
+  return
+}
+
+// Remove old compact metadata file and rewrite.
+val renamedPath = new Path(path, 
s".${batchId.toString}-${UUID.randomUUID.toString}.tmp")
+fileManager.rename(batchIdToPath(batchId), renamedPath)
+
+var isSuccess = false
+try {
+  isSuccess = super.add(batchId, totalMetadata)
+} catch {
+  case NonFatal(e) => isSuccess = false
--- End diff --

Why are you setting `isSuccess` to `false` since it's `false` already?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
+  s"positive value.")
+
+  private val fileCleanupDelayMs = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_CLEANUP_DELAY)
+
+  private val isDeletingExpiredLog = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_DELETION)
+
+  private var compactBatchId: Long = -1L
+
+  private def isCompactionBatch(batchId: Long, compactInterval: Long): 
Boolean = {
+batchId % compactInterval == 0
+  }
+
+  override def add(batchId: Long, metadata: Seq[String]): Boolean = {
+if (isCompactionBatch(batchId, compactInterval)) {
+  compactMetadataLog(batchId - 1)
+}
+
+super.add(batchId, metadata)
+  }
+
+  private def compactMetadataLog(batchId: Long): Unit = {
+// read out compact metadata and merge with new metadata.
+val batches = super.get(Some(compactBatchId), Some(batchId))
+val totalMetadata = batches.flatMap(_._2)
+if (totalMetadata.isEmpty) {
+  return
+}
+
+// Remove old compact metadata file and rewrite.
+val renamedPath = new Path(path, 
s".${batchId.toString}-${UUID.randomUUID.toString}.tmp")
+fileManager.rename(batchIdToPath(batchId), renamedPath)
+
+var isSuccess = false
+try {
+  isSuccess = super.add(batchId, totalMetadata)
+} catch {
+  case NonFatal(e) => isSuccess = false
+} finally {
+  if (!isSuccess) {
+// Rollback to the previous status if compaction is failed.
--- End diff --

s/status/state ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-06-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13513#discussion_r65825440
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -129,3 +131,86 @@ class FileStreamSource(
 
   override def toString: String = s"FileStreamSource[$qualifiedBasePath]"
 }
+
+class FileStreamSourceLog(sparkSession: SparkSession, path: String)
+  extends HDFSMetadataLog[Seq[String]](sparkSession, path) {
+
+  // Configurations about metadata compaction
+  private val compactInterval = 
sparkSession.conf.get(SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL)
+  require(compactInterval > 0,
+s"Please set ${SQLConf.FILE_SOURCE_LOG_COMPACT_INTERVAL.key} (was 
$compactInterval) to a " +
--- End diff --

I'd move `(was $compactInterval)` at the end of the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][DOC][MINOR] Remove incorrect scaladoc

2016-06-01 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/13432

[CORE][DOC][MINOR] Remove incorrect scaladoc

## What changes were proposed in this pull request?

It removes a sentence about `SparkListener` being internal and may change 
in the future yet the class is `@DeveloperApi` that may or may not say the 
same. If it does not, it's clearly incorrect. If it says what `@DeveloperApi` 
is for, it's a duplication (and/or would require other `@DeveloperApi` to have 
it, too).

## How was this patch tested?

manual build




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SparkListener

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13432


commit 306e60ebc135ccaceed24209e04d11a0929027f2
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-06-01T06:14:47Z

[CORE][DOC][MINOR] Remove incorrect scaladoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...

2016-05-31 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65140246
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala ---
@@ -41,6 +49,16 @@ class Database(
 }
 
 
+/**
+ * A table in Spark, as returned by the `listTables` method in [[Catalog]].
--- End diff --

Use `[[Catalog#listTables]]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...

2016-05-31 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65140296
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala ---
@@ -83,9 +112,19 @@ class Column(
 }
 
 
-// TODO(andrew): should we include the database here?
+/**
+ * A user-defined function in Spark, as returned by `listFunctions` method 
in [[Catalog]].
--- End diff --

Use `[[Catalog#listFunctions]]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...

2016-05-31 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65140165
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala ---
@@ -25,6 +25,14 @@ import 
org.apache.spark.sql.catalyst.DefinedByConstructorParams
 // Note: all classes here are expected to be wrapped in Datasets and so 
must extend
 // DefinedByConstructorParams for the catalog to be able to create 
encoders for them.
 
+/**
+ * A database in Spark, as returned by the `listDatabases` method defined 
in [[Catalog]].
--- End diff --

Use `[[Catalog#listDatabases]]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-31 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65136606
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -937,8 +937,12 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
  */
 case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
-case e @ CaseWhen(branches, _) if branches.size < 
conf.maxCaseBranchesForCodegen =>
-  e.toCodegen()
+case e: CaseWhen if canCodeGen(e) => e.toCodegen()
--- End diff --

Sorry for nitpicking, but could you use `canCodegen` instead (to follow the 
name of the method to call)? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-29 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13392#discussion_r65008333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -937,7 +937,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
  */
 case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
-case e @ CaseWhen(branches, _) if branches.size < 
conf.maxCaseBranchesForCodegen =>
+case e @ CaseWhen(branches, elseBranch)
+if branches.size + elseBranch.size <= 
conf.maxCaseBranchesForCodegen =>
--- End diff --

Reading the case takes a while and  and I think it'd greatly benefit from 
introducing a local `def` - a predicate - for the condition (I can't figure out 
a name for this, sorry)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][MINOR][DOC] Removing incorrect scaladoc

2016-05-28 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/13384

[CORE][MINOR][DOC] Removing incorrect scaladoc

## What changes were proposed in this pull request?

I don't think the method will ever throw an exception so removing a false 
comment. Sorry @srowen and @rxin again -- I simply couldn't resist.

I wholeheartedly support merging the change with a bigger one (and trashing 
this PR).

## How was this patch tested?

Manual build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark blockinfomanager

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13384


commit 7c029d61b1927534c442d2892679d7c6e682cf7e
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-05-28T21:14:57Z

[CORE][MINOR][DOC] Removing incorrect scaladoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][DOC][MINOR] typos + links

2016-05-28 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/13383

[CORE][DOC][MINOR] typos + links

## What changes were proposed in this pull request?

A very tiny change to javadoc (which I don't mind if gets merged with a 
bigger change). I've just found it annoying and couldn't resist proposing a 
pull request. Sorry @srowen and @rxin.

## How was this patch tested?

Manual build



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark memory-consumer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13383


commit 93cc32ecda3c10c53e79872a4130913213f0ec67
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-05-28T21:08:14Z

[CORE][DOC][MINOR] typos + links




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-27 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/13329#issuecomment-70008
  
Thanks @rxin and @srowen for your help and patience! I'll close the pull 
request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-27 Thread jaceklaskowski

Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/13329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-27 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/13329#issuecomment-42718
  
How am I supposed to read this? Do you want me to...forget about the 
changes? All of them or just some? Which one would you accept since @srowen 
said: "Some of this is OK"? It _appears_ there's _some_ merit in the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-27 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/13329#issuecomment-222140528
  
The issue with `nonEmpty` is that you could easily miss the negation (and 
that's why Scala offers `nonEmpty`). I don't think it's the final solution, but 
certainly believe it's far better for future readers to know what's going on in 
these lines. I believe the changes improve readability (but can happily revert 
_some_ if you point me at the places that need this).

It's just me to believe that by doing these small changes the code becomes 
more readable. I spent enough time with it to think it needs so (more often 
than I'm proposing).

Please guide me to learn your coding style. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-27 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13329#discussion_r64900321
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnScheduler.scala ---
@@ -31,9 +31,8 @@ private[spark] class YarnScheduler(sc: SparkContext) 
extends TaskSchedulerImpl(s
 Logger.getLogger(classOf[RackResolver]).setLevel(Level.WARN)
   }
 
-  // By default, rack is unknown
   override def getRackForHost(hostPort: String): Option[String] = {
-val host = Utils.parseHostPort(hostPort)._1
+val (host, _) = Utils.parseHostPort(hostPort)
--- End diff --

It is when you agree that you could easily (?) miss `_1` at the very end. I 
do agree and I did miss it few times while reviewing that piece of code. 
Opinions may vary and I can happily revert this change if requested (I need 
your advice to learn your coding style).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SQL][MINOR] Scaladoc fixes + string int...

2016-05-26 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/13329

[CORE][SQL][MINOR] Scaladoc fixes + string interpolation

## What changes were proposed in this pull request?

Scaladoc fixes + string interpolation for logging

## How was this patch tested?

local build + manual testing


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark minor-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13329


commit 2e794dcbf69af5e6d399e3d07cec002bbd573bc9
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-05-26T17:45:26Z

[CORE][SQL][MINOR] Scaladoc fixes + string interpolation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15458][SQL][STREAMING] Disable schema i...

2016-05-20 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13238#discussion_r64127555
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -140,6 +140,18 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 
   import testImplicits._
 
+  private def withSchemaInference(f: => Unit): Unit = {
+withSQLConf(("spark.sql.streaming.allowSchemaInference", "true")) { f }
--- End diff --

Why do you use `{ f }` not `(f)`? Also, I've seen 
`"spark.sql.streaming.allowSchemaInference" -> "true"` used in the past.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15458][SQL][STREAMING] Disable schema i...

2016-05-20 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13238#discussion_r64127527
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -186,6 +187,14 @@ case class DataSource(
 val path = caseInsensitiveOptions.getOrElse("path", {
   throw new IllegalArgumentException("'path' is not specified")
 })
+if (!sparkSession.conf.get(SQLConf.STREAMING_SCHEMA_INFERENCE) &&
--- End diff --

This and the other 2 lines: could you please define a local `val` and name 
it in a way that tells what the `if` is supposed to guard?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63987980
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -348,6 +348,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val CARTESIAN_PRODUCT_ENABLED = 
SQLConfigBuilder("spark.sql.join.cartesian.enabled")
+.doc("When false, we will throw an error if a query contains a 
cartesian product")
+.booleanConf
+.createWithDefault(false)
+
   val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal")
 .doc("When true, the ordinal numbers are treated as the position in 
the select list. " +
  "When false, the ordinal numbers in order/sort By clause are 
ignored.")
--- End diff --

Is there any reason for `By` uppercase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13210#discussion_r63987729
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -55,10 +55,19 @@ object UnsupportedOperationChecker {
 case _: InsertIntoTable =>
   throwError("InsertIntoTable is not supported with streaming 
DataFrames/Datasets")
 
-case Aggregate(_, _, child) if child.isStreaming && outputMode == 
Append =>
-  throwError(
-"Aggregations are not supported on streaming 
DataFrames/Datasets in " +
-  "Append output mode. Consider changing output mode to 
Update.")
+case Aggregate(_, _, child) if child.isStreaming =>
+  if (outputMode == Append) {
+throwError(
+  "Aggregations are not supported on streaming 
DataFrames/Datasets in " +
+"Append output mode. Consider changing output mode to 
Update.")
+  }
+  val moreStreamingAggregates = child.find {
+case Aggregate(_, _, grandchild) if grandchild.isStreaming => 
true
+case _ => false
+  }
+  if (moreStreamingAggregates.nonEmpty) {
--- End diff --

What is the type of `moreStreamingAggregates`? Option? What do you think 
about `foreach`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13210#discussion_r63987696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -55,10 +55,19 @@ object UnsupportedOperationChecker {
 case _: InsertIntoTable =>
   throwError("InsertIntoTable is not supported with streaming 
DataFrames/Datasets")
 
-case Aggregate(_, _, child) if child.isStreaming && outputMode == 
Append =>
-  throwError(
-"Aggregations are not supported on streaming 
DataFrames/Datasets in " +
-  "Append output mode. Consider changing output mode to 
Update.")
+case Aggregate(_, _, child) if child.isStreaming =>
+  if (outputMode == Append) {
+throwError(
+  "Aggregations are not supported on streaming 
DataFrames/Datasets in " +
+"Append output mode. Consider changing output mode to 
Update.")
--- End diff --

I'd suggest changing "Append" to using the Append value and Update too, so 
when you refactor the code it'd be easier to catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13210#discussion_r63987667
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -55,10 +55,19 @@ object UnsupportedOperationChecker {
 case _: InsertIntoTable =>
   throwError("InsertIntoTable is not supported with streaming 
DataFrames/Datasets")
 
-case Aggregate(_, _, child) if child.isStreaming && outputMode == 
Append =>
-  throwError(
-"Aggregations are not supported on streaming 
DataFrames/Datasets in " +
-  "Append output mode. Consider changing output mode to 
Update.")
+case Aggregate(_, _, child) if child.isStreaming =>
+  if (outputMode == Append) {
+throwError(
+  "Aggregations are not supported on streaming 
DataFrames/Datasets in " +
+"Append output mode. Consider changing output mode to 
Update.")
+  }
+  val moreStreamingAggregates = child.find {
+case Aggregate(_, _, grandchild) if grandchild.isStreaming => 
true
+case _ => false
+  }
+  if (moreStreamingAggregates.nonEmpty) {
+throwError("Multiple streaming aggregations are not supported 
DataFrames/Datasets")
--- End diff --

"are not supported DataFrames/Datasets"? It appears as "in" or "for" might 
be missing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15067] [YARN] YARN executors are launch...

2016-05-08 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12985#issuecomment-217727591
  
What do you think about a unit test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15152][DOC][MINOR] Scaladoc and Code st...

2016-05-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12928#discussion_r62170912
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ---
@@ -98,13 +97,12 @@ private[spark] class Pool(
   }
 
   override def getSortedTaskSetQueue: ArrayBuffer[TaskSetManager] = {
-var sortedTaskSetQueue = new ArrayBuffer[TaskSetManager]
-val sortedSchedulableQueue =
-  
schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator)
-for (schedulable <- sortedSchedulableQueue) {
-  sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue
-}
-sortedTaskSetQueue
+schedulableQueue
--- End diff --

You're right! I have overlooked that. I'll revert the change. Sorry.

(Hmmm, how did it work then?! #confused)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15152][DOC][MINOR] Scaladoc and Code st...

2016-05-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12928#discussion_r62170253
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -168,7 +168,7 @@ private[spark] class Client(
   val appContext = createApplicationSubmissionContext(newApp, 
containerContext)
 
   // Finally, submit and monitor the application
-  logInfo(s"Submitting application ${appId.getId} to ResourceManager")
+  logInfo(s"Submitting application $appId to ResourceManager")
--- End diff --

It's inconsistent with the other log messages where `appId` is printed out 
instead. It then makes tracing the calls slightly harder - cf. 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala#L180-L182
 and 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala#L174-L175
 and few other places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15152][DOC][MINOR] Scaladoc and Code st...

2016-05-05 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12928#discussion_r62169812
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -155,10 +155,10 @@ private[spark] class Client(
 
   // Get a new application from our RM
   val newApp = yarnClient.createApplication()
-  val newAppResponse = newApp.getNewApplicationResponse()
-  appId = newAppResponse.getApplicationId()
+  val newAppResponse = newApp.getNewApplicationResponse
--- End diff --

Yes, there are, but IDEA kept bugging me all the time about calling getters 
with `()` as if mutation/computation happened. I'll revert it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15152][DOC][MINOR] Scaladoc and Code st...

2016-05-05 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/12928

[SPARK-15152][DOC][MINOR] Scaladoc and Code style Improvements

## What changes were proposed in this pull request?

Minor doc and code style fixes

## How was this patch tested?

local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-15152

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12928


commit a3449278bc04b1e40cf782779066d58df899ea06
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-05-05T10:00:28Z

[DOC][MINOR] SPARK-15152 Scaladoc and Code style Improvements




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14938][ML] replace RDD.map with Dataset...

2016-04-28 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12718#issuecomment-215550190
  
Other than the few places where you could use symbols not string literals 
LGTM. Excellent job! Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14938][ML] replace RDD.map with Dataset...

2016-04-28 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12718#discussion_r61494448
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -79,11 +79,12 @@ final class ChiSqSelector(override val uid: String)
 
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): ChiSqSelectorModel = {
+val sqlContext = dataset.sqlContext
+import sqlContext.implicits._
+
 transformSchema(dataset.schema, logging = true)
-val input = dataset.select($(labelCol), $(featuresCol)).rdd.map {
-  case Row(label: Double, features: Vector) =>
-LabeledPoint(label, features)
-}
+val input = 
dataset.select(col($(labelCol)).cast(DoubleType).as("label"),
--- End diff --

Sorry, couldn't resist :) I'd change `"label"` to be a symbol `'label`. Not 
very widely used, but think it deserves its place in the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-26 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12469#issuecomment-214638031
  
@srowen Reverted the line removal and rebased with master. Mind merging it 
to the repo? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14889][Spark Core] scala.MatchError: NO...

2016-04-25 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12666#discussion_r60991107
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ---
@@ -55,6 +55,8 @@ private[spark] class Pool(
 new FairSchedulingAlgorithm()
   case SchedulingMode.FIFO =>
 new FIFOSchedulingAlgorithm()
+  case _ =>
+throw new RuntimeException(s"The scheduler mode $schedulingMode is 
not supported by Spark.")
--- End diff --

In 
[TaskSchedulerImpl](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L118)
 they use: "Unrecognized spark.scheduler.mode: $schedulingModeConf" so I'd 
change it to "Unsupported spark.scheduler.mode: $schedulingMode".

I'm also not sure about the other places like 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L133-L138.
 I know that the change would not let it be executed with `NONE` or unsupported 
scheduling modes, but when Pool changes...just thinking aloud.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][MINOR] Screenshot + minor fixes to impr...

2016-04-23 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12569#issuecomment-213819194
  
Could this PR have a bit of @srowen's attention? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-22 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60727194
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -80,10 +80,8 @@ class HiveContext private[hive](
   protected[sql] override def sharedState: HiveSharedState = {
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
-
--- End diff --

You're correct @srowen the line was there, but the file had changed since 
your first review of my *minor* change so I had to rebase. While rebasing and 
fixing the merge conflict I noticed the two lines and remove one. Since I was 
fixing the merge conflict I introduced the additional change (as it should've 
been noticed and fixed before).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-22 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60726392
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -80,10 +80,8 @@ class HiveContext private[hive](
   protected[sql] override def sharedState: HiveSharedState = {
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
-
--- End diff --

It was because I touched the file (not the line) and while I was rebasing I 
noticed the offending line. When I was at it I decided to remove it since I was 
fixing the merge conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][MINOR] Screenshot + minor fixes to impr...

2016-04-22 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12569#issuecomment-213387430
  
Title has changed to incorporate @HyukjinKwon's suggestion. Please review 
@srowen. I intentionally have not included "in the programming guide" due to 
"[DOCS]". Is this ok now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-22 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60725372
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -80,10 +80,8 @@ class HiveContext private[hive](
   protected[sql] override def sharedState: HiveSharedState = {
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
-
--- End diff --

In this case, it caused the last merge conflict so when I was rebasing I 
noticed the line and removed. I can revert the change if that's what you want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][MINOR] Accumulators

2016-04-22 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12569#issuecomment-213275339
  
@HyukjinKwon In that case I'd ask for the alternative as I currently have 
no idea how to make it clearer (it wasn't me to say "the title is not clear" 
:)) What do you miss in the title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-21 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12469#issuecomment-213118039
  
I fixed the issue with the two failing tests due to the error message 
having been changed.

Jenkins retest this please

:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-21 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12469#issuecomment-212920619
  
Done @srowen. Thanks a lot for your patience.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][MINOR] Accumulators

2016-04-21 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12569#discussion_r60577867
  
--- Diff: docs/programming-guide.md ---
@@ -1328,12 +1328,18 @@ value of the broadcast variable (e.g. if the 
variable is shipped to a new node l
 Accumulators are variables that are only "added" to through an associative 
and commutative operation and can
 therefore be efficiently supported in parallel. They can be used to 
implement counters (as in
 MapReduce) or sums. Spark natively supports accumulators of numeric types, 
and programmers
-can add support for new types. If accumulators are created with a name, 
they will be
+can add support for new types.
+
+If accumulators are created with a name, they will be
 displayed in Spark's UI. This can be useful for understanding the progress 
of
 running stages (NOTE: this is not yet supported in Python).
 
+
--- End diff --

I copied it from another file 
https://github.com/apache/spark/blob/master/docs/cluster-overview.md as I 
didn't know how to include images.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOCS][MINOR] Accumulators

2016-04-21 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/12569

[DOCS][MINOR] Accumulators

## What changes were proposed in this pull request?

Added screenshot + minor fixes to improve reading

## How was this patch tested?

Manual

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark docs-accumulators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12569.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12569


commit 8cdc916256c792b212e0b52593fbfd5e0f1b7c8b
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-04-21T08:49:18Z

[DOCS][MINOR] Accumulators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Minor typo fixes

2016-04-20 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12469#issuecomment-212316780
  
@srowen Mind reviewing and possibly merging afterwards? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][ML] Use Datasets (to improve internal ...

2016-04-18 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/11915#issuecomment-211605171
  
Sorry, no. Got distracted and had no time for it. Let me work on it later 
today (it's 00:21 already).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor typo fixes

2016-04-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60092825
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
 ---
@@ -46,8 +46,7 @@ import org.apache.spark.sql.SQLContext
  * files in a directory always shows the latest files.
  */
 class HDFSMetadataLog[T: ClassTag](sqlContext: SQLContext, path: String)
-  extends MetadataLog[T]
-  with Logging {
+  extends MetadataLog[T] with Logging {
--- End diff --

Against the code style, don't you think? How to proceed with similar cases? 
Please guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor typo fixes

2016-04-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60092677
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -453,8 +453,8 @@ class Dataset[T] private[sql](
* Returns true if this [[Dataset]] contains one or more sources that 
continuously
* return data as it arrives. A [[Dataset]] that reads data from a 
streaming source
* must be executed as a [[ContinuousQuery]] using the `startStream()` 
method in
-   * [[DataFrameWriter]].  Methods that return a single answer, (e.g., 
`count()` or
-   * `collect()`) will throw an [[AnalysisException]] when there is a 
streaming
+   * [[DataFrameWriter]]. Methods that return a single answer, e.g. 
`count()` or
--- End diff --

See http://dictionary.cambridge.org/dictionary/english/e-g. There should 
only be a comma before e.g. not after it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor typo fixes

2016-04-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60092525
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -38,8 +38,6 @@ import org.apache.spark.sql.types._
 
 /**
  * Represents a numeric vector, whose index type is Int and value type is 
Double.
- *
- * Note: Users should not implement this interface.
--- End diff --

That's my point with the change. If uses should not implement this 
interface, we should offer them alternatives. Do you know any? I'm happy to add 
it to the scaladoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor typo fixes

2016-04-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12469#discussion_r60092387
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -438,7 +438,6 @@ class LinearRegressionModel private[ml] (
 }
   }
 
-
--- End diff --

How to fix them? I've seen few commits with such a change so I thought this 
one would be enough to include this too. Please guide as I've got few other 
similar cases and don't know how to proceed with small but irritating cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Minor typo fixes

2016-04-18 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/12469

Minor typo fixes

## What changes were proposed in this pull request?

Minor typo fixes (too minor to deserve separate a JIRA)

## How was this patch tested?

local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark minor-typo-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12469


commit 13bcfbbe03f9c78512547d6b69d13b945a163e65
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-04-18T11:46:56Z

Minor typo fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14288][SQL] Memory Sink for streaming

2016-04-11 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12119#discussion_r59296064
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -275,23 +277,64 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def startStream(): ContinuousQuery = {
-val dataSource =
-  DataSource(
-df.sqlContext,
-className = source,
-options = extraOptions.toMap,
-partitionColumns = normalizedParCols.getOrElse(Nil))
-
-val queryName = extraOptions.getOrElse("queryName", 
StreamExecution.nextName)
-val checkpointLocation = extraOptions.getOrElse("checkpointLocation", {
-  new Path(df.sqlContext.conf.checkpointLocation, 
queryName).toUri.toString
-})
-df.sqlContext.sessionState.continuousQueryManager.startQuery(
-  queryName,
-  checkpointLocation,
-  df,
-  dataSource.createSink(),
-  trigger)
+if (source == "memory") {
+  val queryName =
+extraOptions.getOrElse(
+  "queryName", throw new AnalysisException("queryName must be 
specified for memory sink"))
+  val checkpointLocation = extraOptions.get("checkpointLocation").map 
{ userSpecified =>
+new Path(userSpecified).toUri.toString
+  }.orElse {
+val checkpointConfig: Option[String] =
+  df.sqlContext.conf.getConf(
+SQLConf.CHECKPOINT_LOCATION,
+None)
+
+checkpointConfig.map { location =>
+  new Path(location, queryName).toUri.toString
+}
+  }.getOrElse {
+Utils.createTempDir(namePrefix = "memory.stream").getCanonicalPath
+  }
+
+  // If offsets have already been created, we trying to resume a query.
+  val checkpointPath = new Path(checkpointLocation, "offsets")
+  val fs = 
checkpointPath.getFileSystem(df.sqlContext.sparkContext.hadoopConfiguration)
+  if (fs.exists(checkpointPath)) {
+throw new AnalysisException(
+  s"Unable to resume query written to memory sink. Delete 
$checkpointPath to start over.")
+  } else {
+checkpointPath.toUri.toString
--- End diff --

@marmbrus I think it's a dead code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14500] [ML] Accept Dataset[_] instead o...

2016-04-09 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12274#discussion_r59124622
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/model/DecisionTreeModel.scala 
---
@@ -32,7 +32,7 @@ import org.apache.spark.mllib.tree.configuration.{Algo, 
FeatureType}
 import org.apache.spark.mllib.tree.configuration.Algo._
 import org.apache.spark.mllib.util.{Loader, Saveable}
 import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+import org.apache.spark.sql.{DataFrame, Dataset, Row, SQLContext}
--- End diff --

Again, no other change in the file and you need the import?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14500] [ML] Accept Dataset[_] instead o...

2016-04-09 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12274#discussion_r59124613
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/regression/impl/GLMRegressionModel.scala
 ---
@@ -23,7 +23,7 @@ import org.json4s.jackson.JsonMethods._
 import org.apache.spark.SparkContext
 import org.apache.spark.mllib.linalg.Vector
 import org.apache.spark.mllib.util.Loader
-import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+import org.apache.spark.sql.{DataFrame, Dataset, Row, SQLContext}
--- End diff --

I'm surprised you need this change without any other changes in the file? 
How could that be?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14500] [ML] Accept Dataset[_] instead o...

2016-04-09 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12274#discussion_r59124592
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -171,18 +171,18 @@ abstract class PredictionModel[FeaturesType, M <: 
PredictionModel[FeaturesType,
* @param dataset input dataset
* @return transformed dataset with [[predictionCol]] of type [[Double]]
*/
-  override def transform(dataset: DataFrame): DataFrame = {
+  override def transform(dataset: Dataset[_]): DataFrame = {
--- End diff --

What about the return type? If I want to chain transformers by `andThen`, 
i.e. `(tok.transform _).andThen(hashTF.transform)`, won't `DataFrame` be an 
issue? Why do we keep `DataFrame` as the return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14500] [ML] Accept Dataset[_] instead o...

2016-04-09 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/12274#discussion_r59124562
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala ---
@@ -124,7 +124,7 @@ class Pipeline @Since("1.4.0") (
* @return fitted pipeline
*/
   @Since("1.2.0")
--- End diff --

What about `@Since` then? It's a different method, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HOTFIX][SPARK-14402] Fix ExpressionDescriptio...

2016-04-07 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12192#issuecomment-207175670
  
Please please merge it as soon as possible as I'm suffering from not having 
it in master every time I do the build :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HOTFIX][SPARK-14402] Fix ExpressionDescriptio...

2016-04-05 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/12192#issuecomment-206095697
  
Is this really the patch to cause the issue or is this more an interim 
hiccup?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HOTFIX][SPARK-14402] Fix ExpressionDescriptio...

2016-04-05 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/12192

[HOTFIX][SPARK-14402] Fix ExpressionDescription annotation

## What changes were proposed in this pull request?

Fix for the error introduced in 
https://github.com/apache/spark/commit/c59abad052b7beec4ef550049413e95578e545be:

```

/Users/jacek/dev/oss/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:626:
 error: annotation argument needs to be a constant; found: "_FUNC_(str) - 
".+("Returns str, with the first letter of each word in uppercase, all other 
letters in ").+("lowercase. Words are delimited by white space.")
"Returns str, with the first letter of each word in uppercase, all 
other letters in " +

  ^
```

## How was this patch tested?

Local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-14402-HOTFIX

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12192


commit 218bf51fb0132738daf0d544bfcde076af22f444
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-04-06T01:36:19Z

[HOTFIX][SPARK-14402] Fix ExpressionDescription annotation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-25 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11802#discussion_r57455819
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -246,9 +246,7 @@ class StreamingContext private[streaming] (
 checkpointDir != null
   }
 
-  private[streaming] def initialCheckpoint: Checkpoint = {
-if (isCheckpointPresent) _cp else null
-  }
+  private[streaming] def initialCheckpoint: Checkpoint = _cp
--- End diff --

Sorry @srowen to keep you waiting. As to the line, look at 
`isCheckpointPresent` and see that it does what the `if...else` block does, 
i.e. checks whether `_cp` is `null` and if it is, it returns `false` so `null`. 
It's a false check here and hence an easy fix (and that's why I proposed it 
with the other changes).

I can revert it and report another change if you want. Let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][ML] Use Datasets (to improve internal ...

2016-03-23 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/11915

[MINOR][ML] Use Datasets (to improve internal implementation)

## What changes were proposed in this pull request?

Change the current implementation to use Datasets (not the "old-school" 
`map` over `Row`s)

NOTE: I'm sending the pull request to discuss whether my understanding of 
the change is correct or not and if approved (after discussions and further 
changes possibly) learn the right way.

## How was this patch tested?

Local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark sparkml-predictor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11915


commit 74897af4b8b4ba126790377fb6d79bbe96364ec0
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-03-23T07:56:56Z

[MINOR][ML] Use Datasets (to improve internal implementation)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-19 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/11802

[MINOR] Typo fixes

## What changes were proposed in this pull request?

Typo fixes. No functional changes.

## How was this patch tested?

Built the sources and ran with samples.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark typo-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11802


commit c870a8cacbf0e62aab83bae56018f9a89f9e6fa0
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-03-17T23:10:53Z

[MINOR] Typo fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-18 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/11802#issuecomment-198593089
  
Thanks @srowen @thunterdb for review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR] Typo fixes

2016-03-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11802#discussion_r56738903
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -277,7 +277,7 @@ abstract class DStream[T: ClassTag] (
 logInfo(s"Slide time = $slideDuration")
 logInfo(s"Storage level = ${storageLevel.description}")
 logInfo(s"Checkpoint interval = $checkpointDuration")
-logInfo(s"Remember duration = $rememberDuration")
+logInfo(s"Remember interval = $rememberDuration")
--- End diff --

It's consistent with "Checkpoint interval" above and the general impression 
that it's described as "interval" while called "Duration" in the code (due to 
the type).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13825][CORE] Upgrade to Scala 2.11.8

2016-03-15 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/11681#issuecomment-196929219
  
@srowen Mind triggering a test? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13825][CORE] Upgrade to Scala 2.11.8

2016-03-12 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/11681

[SPARK-13825][CORE] Upgrade to Scala 2.11.8

## What changes were proposed in this pull request?

Upgrade to 2.11.8 (from the current 2.11.7)

## How was this patch tested?

A manual build




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-13825-scala-2_11_8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11681


commit 5ae9816ce17ba2ce9b708a68e8a1fd2540024872
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-03-12T22:05:24Z

[SPARK-13825][CORE] Upgrade to Scala 2.11.8




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13776][WebUI]Limit the max number of ac...

2016-03-11 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11615#discussion_r55883199
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -270,9 +270,25 @@ private[spark] object JettyUtils extends Logging {
 
   gzipHandlers.foreach(collection.addHandler)
   connectors.foreach(_.setHost(hostName))
+  // As each acceptor and each selector will use one thread, the 
number of threads should at
+  // least be the number of acceptors and selectors plus 1. (See 
SPARK-13776)
+  var minThreads = 1
+  connectors.foreach { c =>
+// Currently we only use "SelectChannelConnector"
+val connector = c.asInstanceOf[SelectChannelConnector]
+// Limit the max acceptor number to 8 so that we don't waste a lot 
of threads
+connector.setAcceptors(math.min(connector.getAcceptors, 8))
+// The number of selectors always equals to the number of acceptors
+minThreads += connector.getAcceptors * 2
+  }
   server.setConnectors(connectors.toArray)
 
   val pool = new QueuedThreadPool
+  if (serverName.nonEmpty) {
+pool.setName(serverName)
+  }
+  pool.setMaxThreads(math.max(pool.getMaxThreads, minThreads))
+  pool.setMinThreads(math.min(pool.getMinThreads, pool.getMaxThreads))
--- End diff --

So, could getMinThreads be greater than getMaxThreads ever? Why are you 
doing the line? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13776][WebUI]Limit the max number of ac...

2016-03-11 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11615#discussion_r55869059
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -270,9 +270,25 @@ private[spark] object JettyUtils extends Logging {
 
   gzipHandlers.foreach(collection.addHandler)
   connectors.foreach(_.setHost(hostName))
+  // As each acceptor and each selector will use one thread, the 
number of threads should at
+  // least be the number of acceptors and selectors plus 1. (See 
SPARK-13776)
+  var minThreads = 1
+  connectors.foreach { c =>
+// Currently we only use "SelectChannelConnector"
+val connector = c.asInstanceOf[SelectChannelConnector]
+// Limit the max acceptor number to 8 so that we don't waste a lot 
of threads
+connector.setAcceptors(math.min(connector.getAcceptors, 8))
+// The number of selectors always equals to the number of acceptors
+minThreads += connector.getAcceptors * 2
+  }
   server.setConnectors(connectors.toArray)
 
   val pool = new QueuedThreadPool
+  if (serverName.nonEmpty) {
+pool.setName(serverName)
+  }
+  pool.setMaxThreads(math.max(pool.getMaxThreads, minThreads))
+  pool.setMinThreads(math.min(pool.getMinThreads, pool.getMaxThreads))
--- End diff --

When could `getMinThreads` be greater than `getMaxThreads`? Why don't you 
use `minThreads` here, too? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13776][WebUI]Limit the max number of ac...

2016-03-11 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11615#discussion_r55868740
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -270,9 +270,25 @@ private[spark] object JettyUtils extends Logging {
 
   gzipHandlers.foreach(collection.addHandler)
   connectors.foreach(_.setHost(hostName))
+  // As each acceptor and each selector will use one thread, the 
number of threads should at
+  // least be the number of acceptors and selectors plus 1. (See 
SPARK-13776)
+  var minThreads = 1
+  connectors.foreach { c =>
--- End diff --

Use `collect` or "cast" using pattern matching on type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13727] [SQL] SparkConf.contains does no...

2016-03-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11568#discussion_r55408473
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -351,7 +351,16 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging {
   def getAppId: String = get("spark.app.id")
 
   /** Does the configuration contain a given parameter? */
-  def contains(key: String): Boolean = settings.containsKey(key)
+  def contains(key: String): Boolean = {
+if (settings.containsKey(key)) {
+  true
+} else {
+  // try to find the settings in the alternatives
+  configsWithAlternatives.get(key).flatMap { alts =>
+alts.collectFirst { case alt if contains(alt.key) => true }
+  }.isDefined
--- End diff --

It's far too complicated. Would 
`configsWithAlternatives.get("one").contains(...)` work here? What is this 
supposed to do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13727] [SQL] SparkConf.contains does no...

2016-03-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11568#discussion_r55406003
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -351,7 +351,16 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging {
   def getAppId: String = get("spark.app.id")
 
   /** Does the configuration contain a given parameter? */
-  def contains(key: String): Boolean = settings.containsKey(key)
+  def contains(key: String): Boolean = {
+if (settings.containsKey(key)) {
--- End diff --

It always bothers me when I see `if (true) true`. I think 
`settings.containsKey(key) || ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12458][SQL] Add ExpressionDescription t...

2016-03-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/10428#discussion_r55347052
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -126,6 +144,13 @@ case class Hour(child: Expression) extends 
UnaryExpression with ImplicitCastInpu
   }
 }
 
+/*
+ * Returns the minute componemnt of the string/timestamp/interval
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(timestamp_param) - Returns the minute  componemnt of the 
timestamp value.",
--- End diff --

s/the minute  componemnt/the minute component


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-08 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/11575#discussion_r55345145
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -178,6 +178,20 @@ class StringIndexerSuite
 }
   }
 
+  test("StringIndexer on column with empty string values") {
+val data = sc.parallelize(Seq((0, "a"), (1, ""), (2, "c"), (3, "a"), 
(4, "a"), (5, "c")), 2)
+val df = sqlContext.createDataFrame(data).toDF("id", "label")
--- End diff --

These two lines could be written as:

```
Seq((0, "a"), (1, ""), (2, "c"), (3, "a"), (4, "a"), (5, "c")).toDF("id", 
"label")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING][DOCS] Fixes and code improvements ...

2016-02-18 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/11201#issuecomment-186008514
  
@srowen Mind having a look? I'd appreciate. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING][DOCS] Fixes and code improvements ...

2016-02-14 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/11201

[STREAMING][DOCS] Fixes and code improvements for checkpointing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
docs-streaming-checkpointing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11201


commit 3eb999f04eadcf603ecb551f9bf78dc776b80dbe
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-02-14T14:12:11Z

[STREAMING][DOCS] Fixes and code improvements for checkpointing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix for [SPARK-12854][SQL] Implement complex t...

2016-02-01 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/10946#issuecomment-177873394
  
@JoshRosen @rxin a friendly reminder to merge the change (or close it if 
irrelevant)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix for [SPARK-12854][SQL] Implement complex t...

2016-01-28 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/10946#issuecomment-176115262
  
I'd appreciate having it merged since the current master is broken without 
the fix - checked early morning today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Fix for [SPARK-12854][SQL] Implement complex t...

2016-01-27 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/10946

Fix for [SPARK-12854][SQL] Implement complex types support in Columnaâ¦

â¦rBatch

Fixes build for Scala 2.11.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-12854-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10946


commit c13743b3a9214f61df1eac6b16e216f2a9c0cc7c
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-01-27T09:12:49Z

Fix for [SPARK-12854][SQL] Implement complex types support in ColumnarBatch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING][MINOR] Scaladoc + logs

2016-01-22 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/10878

[STREAMING][MINOR] Scaladoc + logs

Found while doing code review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
streaming-scaladoc-logs-tiny-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10878


commit 6af94bee120c62fe5b50d2cd7345ed96f70d2ceb
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-01-22T10:41:31Z

[STREAMING][MINOR] Scaladoc + logs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING][MINOR] Typo fixes

2016-01-11 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/10698

[STREAMING][MINOR] Typo fixes



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
streaming-kafka-typo-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10698


commit c9f7e085d92ca3d1b4d51b6fd29d57eb5987d153
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-01-11T12:28:49Z

[STREAMING][MINOR] Typo fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12736][CORE][DEPLOY] Standalone Master ...

2016-01-09 Thread jaceklaskowski

Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/10674#issuecomment-170238716
  
Yes, it does. I'm using the latest revision + the change. It's a serious 
issue since standalone Master cannot be started as of today.

I do not know how it's supposed to have been fixed, but that's exactly what 
helped to resolve the issue. Any help appreciated to make it better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12736][CORE][DEPLOY] Standalone Master ...

2016-01-09 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/10674

[SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due tâ¦

â¦o NoClassDefFoundError: org/spark-project/guava/collect/Maps

/cc @srowen @rxin 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark SPARK-12736

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10674


commit 16c228d9227731e1228e75b419b06bcf29e8765c
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2016-01-09T11:34:07Z

[SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due to 
NoClassDefFoundError: org/spark-project/guava/collect/Maps




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12736][CORE][DEPLOY] Standalone Master ...

2016-01-09 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/10674#discussion_r49266731
  
--- Diff: network/common/pom.xml ---
@@ -55,6 +55,7 @@
 
   com.google.guava
   guava
+  compile
--- End diff --

Thought so, but the line has fixed standalone Master to start (after the 
line got removed in 
https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 589 matches

Mail list logo